OpenAI Universe - New Way of Training AIs

Written by Nikos Vaggalis

Wednesday, 14 December 2016

Until now, the way a neural network worked, was to supply it with millions of pre-classified data, in the so called supervised learning scheme, which resulted in neural networks only learning what we've instructed them to do.

But there's also another technique, that of reinforcement learning where you let the AI discover by itself what it's supposed to do, without prior knowledge of its surroundings or any other data fed to it.

Microsoft was one of the first to employ this technique in a gaming environment, in trying to make a Minecraft character climb a virtual hill in the so called AIX Minecraft Project. In there, you let the algorithm explore the Minecraft world it was dropped in, let it freely move and interact with its surroundings, and force it to learn by rewarding it when it does something right so that it understands the goal of the game, the goal that it should be aiming for. Of course for us humans it's easy to see that we must climb that hill, or that when Super Mario touches a troll he instantly dies; but not so for an algorithm. Its strength instead, lies in the fact that it can try a billion combinations in the span of a microsecond in order to discover the same thing that humans intuitively had already in possession.

Thus for the algorithm, this innocent climbing of the hill is more or less the equivalent of dropping Discovery Channel's Bear Grylls in a remote and isolated island having to survive with only a compass and a rope, further having to interact with the environment and devise his way out by just relying on his experience and intuition.

OpenAI's Universe not only builds but also expands on this idea by providing a 'Universe' filled with computer games and a toolkit called Gym that lets the AI interact with the Universe in a way familiar to humans, by means of a mouse and keyboard.

(Editor: Universe has now been deprecated and replaced by Gym Retro, which lets you turn classic video games into Gym environments for reinforcement learning and comes with integrations for ~1000 games - https://github.com/openai/retro)

As such due to this simple interface, an agent can interact without requiring an emulator or access to the program's internals, therefore it can play any computer game, interact with a terminal, or browse the web.

The ultimate goal is to:

"train AI systems on the full range of tasks or any task a human can complete with a computer, therefore allowing generally intelligent agents to experience a wide repertoire of tasks so they can develop world knowledge and problem solving strategies that can be efficiently reused in a new task."

It's another attempt in the current trend of pushing the boundaries of general AI as advancements in this field are going to have a much wider impact on society and industry as whole, contrary to the scope of the narrow AI which keeps itself busy with a specific kind of tasks.

That is what Watson is also lately set to do with Project Intu,
which has the aim of:

"transforming the transaction that takes place between the human operator and the machine, be it a device, robot, or anything else capable of carrying an intelligent software agent, into a state of conversation or deeper interaction",

or in short, behave like a human. But, again, in order to do that, the AI should be able to act in general, learn by itself and be able to shift knowledge gained from one experience to the next.

The next step would be for general AI to acquire common sense, but in order to do that it first needs the ability to predict and that's only going to happen under the state of unsupervised learning and not under the currently employed supervised learning that uses humans to annotate the data that machines work with. It's something that Yann LeCun and his team at Facebook work on with their video prediction software.

In Universe's and Gym's case, use cases span boundaries with the AI being able to complete complex tasks, such as looking up things it doesn't know on the internet, managing your email or calendar, completing online classes, or even take an instruction and perform a sequence of actions on a website like feeding an agent the flight details and then observe it in manipulating a user interface to search for the flight.

Of course another use case would be that of training it for warfare, as Gym would face no issues in getting swiftly acquainted with a flight simulator's controls, as outlined in Achieving Autonomous AI Is Closer Than We Think, where AIs fly side by side to human pilots.

The OpenAI Gym toolkit, as well as Universe, are released as open source on Github while all it takes to use them, is first creating your agent in Gym, then importing it and the Universe environment, and then calling an available game, a procedure that can be achieved with the following few lines of Python code:

import gym
import universe # register the universe environments
env = gym.make('flashgames.DuskDrive-v0')
env.configure(remotes=1) 
observation_n = env.reset()
while True:
  # your agent generates action_n at 60 frames per second
  action_n = [[('KeyEvent', 'ArrowUp', True)] for ob in observation_n]
  observation_n, reward_n, done_n, info = env.step(action_n)
  env.render()

More Information

Test Open AI Universe in 3 commands

The AIX Minecraft Project Makes Thinking Software Possible

IBM Watson and Project Intu for Embodied Cognition

Facebook's Yann LeCun On Everything AI

Achieving Autonomous AI Is Closer Than We Think

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Sqlime - Αn Online SQLite Playground
28/01/2025

SQLite lives in the browser thanks to WebAssembly. With Sqlime you can run your workload online with no need of setting up anything. On top of that ask questions on your data with AI enabled CLI [ ... ]

+ Full Story

The IProgrammer Perl 2024 Review
08/01/2025

We recap the main events that happened throughout 2024 in the Perl world as explored by IProgrammer.

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Sunday, 26 May 2019 )

More Information

Related Articles

Comments