Google's computational package aimed at making AI easier, TensorFlow, is a little over a year old. Even so, at the TensorFlow Developer Summit, it has been deemed grown up enough to be called 1.0. It also has some new toys.
There is no doubt that TensorFlow has changed the overall feel of AI. Neural networks were something you only got into if you were prepared to commit very large resources. You still have to commit fairly large chunks of time and computing power using TensorFlow, but it seems so much more accessible. Of course there are other frameworks that will let you implement a neural network and some have specific advantages, but TensorFlow is the generalist and the one that non-specialist programmers try first.
The big new feature in version 1.0 is XLA - Accelerated Linear Algebra. This makes things go faster. The blog announcement says:
We'll soon publish updated implementations of several popular models to show how to take full advantage of TensorFlow 1.0 - including a 7.3x speedup on 8 GPUs for Inception v3 and 58x speedup for distributed Inception v3 training on 64 GPUs!
If you know anything about number crunching you might be wondering what magic XLA is. Reading the description doesn't really give you much idea of what is specifically going on. It takes in a description of the computation in High Level Optimizer (HLO) and first performs target-independent optimization and then optimizations that take account of the specific hardware.
At the moment there are two backend modules. One generates code for multiple cpus and the other for multiple GPUs. The supported CPUs are x86-64 and ARM and NVIDIA GPUs.
The documentation states that the main objectives are:
Improve execution speed. Compile subgraphs to reduce the execution time of short-lived Ops to eliminate overhead from the TensorFlow runtime, fuse pipelined operations to reduce memory overhead, and specialize to known tensor shapes to allow for more aggressive constant propagation.
Improve memory usage. Analyze and schedule memory usage, in principle eliminating many intermediate storage buffers.
Reduce reliance on custom Ops. Remove the need for many custom Ops by improving the performance of automatically fused low-level Ops to match the performance of custom Ops that were fused by hand.
Reduce mobile footprint. Eliminate the TensorFlow runtime by ahead-of-time compiling the subgraph and emitting an object/header file pair that can be linked directly into another application. The results can reduce the footprint for mobile inference by several orders of magnitude.
Improve portability. Make it relatively easy to write a new backend for novel hardware, at which point a large fraction of TensorFlow programs will run unmodified on that hardware. This is in contrast with the approach of specializing individual monolithic Ops for new hardware, which requires TensorFlow programs to be rewritten to make use of those Ops.
At the other end of the scale we have a set of new high-level APIs - tf.layers, tf.metrics and tf.losses - which make it easier to create networks. For example, the layers API has a conv2d function which sets up a convolution layer in one call. Similarly the losses APi has lots of measures of how accurate the network is - false positives, cosine distance, rms error and so on and the losses has more complex measures used in training such as cross entropy. There is also a new tf.keras module which provides compatibility with Keras, another well known neural network library.
The new version also promises Python API stability making it more suitable for production. Note that the other language APIs- C, and the new Go and Java API - are not supported in this way and liable to change.
If you would like to see the videos from the 2017 Summit then there is a Playlist that you can spend many happy hours watching: