The team’s algorithm, called Dreamer, uses past experiences to build up a model of the surrounding world. Dreamer also allows the robot to conduct trial-and-error calculations in a computer program as opposed to the real world, by predicting potential future outcomes of its potential actions. This allows it to learn faster than it could purely by doing. Once the robot had learned to walk, it kept learning to adapt to unexpected situations, such as resisting being toppled by a stick.
“Teaching robots through trial and error is a difficult problem, made even harder by the long training times such teaching requires,” says Lerrel Pinto, an assistant professor of computer science at New York University, who specializes in robotics and machine learning. Dreamer shows that deep reinforcement learning and world models are able to teach robots new skills in a really short amount of time, he says.
Jonathan Hurst, a professor of robotics at Oregon State University, says the findings, which have not yet been peer-reviewed, make it clear that “reinforcement learning will be a cornerstone tool in the future of robot control.”
Removing the simulator from robot training has many perks. The algorithm could be useful for teaching robots how to learn skills in the real world and adapt to situations like hardware failures, Hafner says–for example, a robot could learn to walk with a malfunctioning motor in one leg.
The approach could also have huge potential for more complicated things like autonomous driving, which require complex and expensive simulators, says Stefano Albrecht, an assistant professor of artificial intelligence at the University of Edinburgh. A new generation of reinforcement-learning algorithms could “super quickly pick up in the real world how the environment works,” Albrecht says.