Google gets into AI with an API
Thursday, 16 September 2010

The new Google Prediction API is classification in a black box. You can send it your data, train a model to recognise data and then use it to classify new data.

Is the web service the future of AI?

Banner

Google announced the Prediction API at this year's Google I/O conference, but it mostly went unnoticed because of so many higher profile announcements. It basically is a REST style API to a general purpose pattern recognition engine. You can upload some data, which can be unstructured text or numeric, to Google Storage for Developers. You then perform a supervised learning stage and finally you can use the trained model to classify new data.

During the preview period the dataset is limited to 100MB and it has to be in a CSV format. Of course  training data has to be correctly classified for the learning - this is supervised learning. The categories can be continuous and then the system performs a generalised regression type analysis.

The example that the documentation gives is of providing sample sentences in different languages. The first entry of each row of the training data set is the language then the sentence in that language. The training occurs asynchronously and the is an API call to determine when training is complete and the classifier has converged. Once trained you can submit new data to be categorised. The results come back with a classification or a prediction in the case of a continuous categorisation variable.

predict1

Apparently the API has no support for returning the details of the trained model and no clue as to the actual classification algorithm used is provided.The suggestion is that the service picks from a range of possible algorithms - so its a sort of expert system for classification algorithms. This is a black box approach to classification. Not troubling the user about the nature of the algorithm makes things simple, but it means you have no idea of the characteristics of the learning or prediction process.

As long as you are will ing to basically take the technique on trust you can just build it into your own application Currently Google claims that developers participating in the Prediction API preview are already using it to identify spam, categorize news, and more.

The API has just had an update and now supports:

:Multi-category prediction: Imagine you’re writing a news aggregator that suggests articles based on the kinds of stories the user has read before. Previously, using the Prediction API, each article could only be tagged with one label - the most pertinent one. For example, an article about a new truck might be labeled as “truck,” but not “roomy” or “quiet.” Now articles can be tagged with all of those labels, with the labels ranked by pertinence, enabling your app to make better recommendations.

Continuous Output: You’d like to create a wine recommendation app. Matching a wine to personal preferences is a tricky task, dependent on many factors, including origin, grape, age, growing environment, and flavor presence. Previously, your app could only label wine as “good,” “decent,” “bad,” or some other set of pre-defined values. Using the new continuous output option, your app can provide a fine-grained ranking of wines based on how well they fit the user’s preferences.

Mixed Inputs: You’re creating an automatic moderator for your blog. You could already classify incoming posts automatically based on comment text and the username of the poster (text inputs), but not the number of times they’ve posted before or the number of users that have liked their posts (numeric inputs). We’ve now added support for mixed inputs, so both numeric and text data can be incorporated in your moderation helper, greatly improving accuracy and letting you get back to making content rather than managing it.

Combining Continuous Output with Mixed Inputs: To further enhance your automatic moderator, you can use continuous output to set thresholds for automatic posting, automatic rejection and manual moderation, further reducing your workload.

You can get all the details about these and other new features on the Prediction API website.

The only other bad news is that this is a closed trial. The Prediction API is being offered as a preview to a limited number of developers. There is no charge for using the service during the preview but it is difficult to see why Google would provide such a service in the future for free. A more worrying point is that as the details of the model are never released to the end user Google, in principle, has more access to your results than you do. Without detailed metrics on performance and graphics showing how the algorithm is working it can be difficult to know why a classification isn't working well. Often a small tweak, like a transform on an input variable, can turn a poor model into a good one. The black box approach may simplify things but it also means you have to put your trust in Google. 

 

predict2

Perhaps this is the future of commercial AI - black box services provided by big companies.

To learn more and sign up for an invitation, please join the waitlist.

 

Banner


Apache Lucene Improves Sparce Indexing
22/10/2024

Apache Lucene 10 has been released. The updated version adds a new IndexInput prefetch API, support for sparse indexing on doc values, and upgraded Snowball dictionaries resulting in improved tokeniza [ ... ]



TypeScript Improves Never-Initialized Variables Checks
21/10/2024

Microsoft has announced TypeScript 5.7 in beta, with improvements including stronger checks for variables that have never been initialized before use, and path rewriting for relative paths.


More News

<ASIN:059652272X>

<ASIN:1934356565>

<ASIN:059680069X>

<ASIN:143022665X>

<ASIN:0470464933>

<ASIN:1430229829>

Last Updated ( Thursday, 16 September 2010 )