DeepMind's AlphaZero Triumphs At Chess
DeepMind's AlphaZero Triumphs At Chess
Written by Sue Gee   
Thursday, 07 December 2017

DeepMind's latest program, AlphaZero, has used reinforcement learning from playing against itself to master the game of chess. Given the important role that chess has occupied in computer science, this is a big breakthrough for neural networks.

deepmindbanner

 

Back in March 2016, Mike James explained Why AlphaGo Changes Everything and now we can repeat the sentiment with respect to AlphaZero, the latest in the line of programs from the DeepMind.team.

Information about AlphaZero comes in a paper that appeared this week authored by David Silver, Thomas Hubert, Julian Schrittwieser and other members of DeepMind including its founder and CEO Demis Hassabis. The most important detail is that AlphaZero is a fully generic algorithm which:

replaces the handcrafted knowledge and domain-specific augmentations used in traditional game-playing programs with deep neural networks and a tabula rasa reinforcement learning algorithm. 

The AlphaZero algorithm was applied to chess, shogi (a Japanese version of chess played on a bigger board) and Go as as far as possible the same algorithm settings, network architecture, and hyper-parameters were used for all three games each of which used a separate instance of AlphaZero. Starting from random play, and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world-champion program in each game, Stockfish for chess, Elmo for shogi and two versions of AlphaGo, within a period of 24 hours using self-play reinforcement learning.

rflncurves

(click in charts to enlarge)

The charts above show the Elo rating (i.e.relative skill level of players in competitor-versus-competitor games) achieved per number of steps. It reveals the typical learning curve for reinforcement learning of rapid improvement followed by a plateau.

The paper clarifies:

In chess, AlphaZero outperformed Stockfish after just 4 hours (300k steps); in shogi, AlphaZero outperformed Elmo after less than 2 hours (110k steps); and in Go, AlphaZero outperformed AlphaGo Lee after 8 hours (165k steps).

In 50 games of chess playing White, AlphaZero won 25 games and drew the other 25; playing Black it chalked up 3 wins and 47 draws. 

chesssq

Are we happy about this?

What about all that time invested in  and alpha/beta pruning algorithms and heuristics for evaluation functions. We can now throw them out - and not only for chess. As this is a generic algorithm it can be put to other scenarios. As long as the compute power is available for the domain we are interested in just start with tabula rasa and let reinforcement learning take over.

But yes, we are certainly happy. Artificial Intelligence has made amazing strides in just a few years and there is the promise of it being able to solve really difficult problems. It isn't yet everything we need and there are still things to work out and invent, but the ability of neural networks to capture hierarchical structure given only rewards to guide it goes well beyond what we might have expected only a short time ago. We don't yet have an AI brain, but we are assembling the components and seeing the basic tools emerging at long last. 

 

More Information

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

 

Related Articles

Why AlphaGo Changes Everything

AlphaGo Zero - From Nought To Top In 40 Days

AlphaGo Triumphs In China

AlphGo Defeats World's Top Ranking Go Player

AlphaGo To Play World Number One Go Player

World Champion Go Player Challenges AlphaGo

AlphaGo Revealed As Mystery Player On Winning Streak

DeepMind's Differentiable Neural Network Thinks Deeply

Deep Learning Chess

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, FacebookGoogle+ or Linkedin.

 

Banner


Blockchain and Bitcoin Skills In Demand
20/11/2017

Blockchain, Bitcoin and Ethereum have been added to Hacker News Hiring Trends, which tracks the popularity of languages, frameworks and technologies in the Hacker News thread "Ask HN: Who is hiring?". [ ... ]



Microsoft SQL Operations Studio With Smart GUI
27/11/2017

Microsoft has released a preview of a free tool for managing SQL Server, Azure SQL Database, and Azure SQL Data Warehouse. SQL Operations Studio provides wider options than the Windows-only SQL Server [ ... ]


More News

 
 

 

blog comments powered by Disqus

Last Updated ( Monday, 11 December 2017 )
 
 

   
Banner
Banner
RSS feed of news items only
I Programmer News
Copyright © 2017 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.