Learning To Be A Computer
Written by Mike James   
Wednesday, 29 October 2014

We have recently seen how much neural networks can learn, but the latest example is very strange. Google has succeeded in teaching a neural network to be a computer simply by showing it programs and what results they produce.

Google's AI team has given us something else to think about. Most neural networks are of the feed forward type and these are characterised by having connections between layers that push the data forward to the output. That is there are no back connections and hence no feedback. Without feedback neural networks have no "memory" and always give the same output when presented with the same input - they are function learning machines. 

Networks that do have back connections are called Recurrent Neural Networks (RNN) and in the past these have been considered as more powerful but difficult to train.

Some types of RNN have been invented that perform particular tasks and are easier to train. On example is the strangely named Long-Short Term Memory (LSTM) which acts as a gated memory element. However LSTMs aren't well understood in the roles that they can play in conjunction with other network components and learning rules. 

 

learninglstm

 

As an experiment to investigate LSTMs, Wojciech Zaremba and Ilya Sutskever decided to see how well an LSTM could learn to execute a Python program. Of course they had to restrict the type of program to give the LSTM a chance of working. The programs used can be evaluated in linear time and use addition, subtraction, multiplication, variable assignment, if statements and for loops - but not double for loops. Every program ends with a print that outputs a single number.

The programs were characterized in difficulty by two numbers: their length, i.e. the number of characters used in the numeric values; and nesting the depth of a parse tree.

Typical programs are

Input:
 j=8584
 for x in range(8):
   j+=920
 b=(1500+j)
 print((b+7567))
Target: 25011

and

Input:
 i=8827
 c=(i-5347)
 print((c+8704) if 2641<8500 else 5308)
Target: 1218

The neural network is given the programs one character at a time and produces one character at a time as output. The LSTM used has 400 units per layer.

At first the networks were difficult to train and to try to make the task easier curriculum training was tried. This is based on the observation that biological learners do so faster if they are presented with simple examples and then the difficulty ramps up. The idea is that it is easier to master the complex cases when you know how the simple ones work. However, in the case of learning to execute programs, simple curriculum learning was actually worse then a random presentation.

What worked best was a mixed strategy of graded exercises with some hard problems thrown in. The suggested reason for this is that when learning simple cases, the LSTM uses its entire memory capacity, even though this is not necessary. When the harder examples are presented the change needed to the structure to accommodate them is too great. The combined approach allows the network to learn simple cases while keeping it under stress by showing cases that are harder. 

What is surprising is that from being shown the training set the network could give the result of a program it had never seen. You present it with the text of a program and it outputs what the program would when run.

Of course, the accuracty dropped as length and nesting increased - the fact that it was able to extract the structure from the character by character presentation sufficiently to predict the output of an unseen program is remarkable enough.

 

learningresults1

 

The researchers admit that they are unsure how the network manages the feat. You might think that this is just a matter of memory. The network could just store the inputs and outputs and when it sees the input again produce the same output. Simple memory doesn't account for the generalization that was observed. The network predicted outputs for programs it hadn't been shown. It is the ability to generalize which suggests that some part of the structure of the program as indicated by its text is being used to predict the output.

What is particularly strange is that they report "interesting" errors such as being off by one.  They speculate that the network might be learning approximate algorithms for things like addition. Instead of learning digit-wise addition with full carry, perhaps learning an addition table for just pairs of digits and using this to add multi-digit numbers without carry would give good, but not perfect, accuracy. 

I don't think that a neural network is going to form the basis of your next CPU, but looking at the ways in which algorithmic information can be extracted from text is a good way to explore how networks might come to "understand".

 

learninglstm

Banner


Raspberry Pi CM5 - Expensive And Undocumented
27/11/2024

So the unexpected has happened - the Compute Module 5 has been launched. But it simply emphasises some problems with adopting the Pi as an IoT device.



RAG from Scratch
10/12/2024

The "RAG from Scratch" tutorial by Langchain coupled with the "RAG playground" are two great educational resources that will help you kickstart your journey with RAG.


More News

 

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Friday, 31 October 2014 )