Taming Text

Author: Grant S. Ingersoll, Thomas S. Morton and Andrew L. Farris
Publisher: Manning
Pages: 320
ISBN: 978-1933988382
Audience: Java programmers interested in processing text
Rating: 4.5
Reviewer: Alex Armstrong

What do you think a book called "Taming Text" is all about? 

It could be about Unicode or advanced regular expressions or ...

It is important to note that these essentially core text technologies are not what this book is about. What it is about is the task of working with text in an semi-intelligent way.

It is about searching and organizing text in a way that makes sense to a human. Now this is a big task and not just confined to explaining how text is represented in a given programming language. It heads in the direction of Artificial Intelligence (AI) but without needing the complete understanding that such text processing might seem to need. It more or less fits into the category of Natural Language Processing (NLP). In general the methods used in current NLP are statistical and based on any understanding of what the text means.  




Chapter 1 starts of by setting the scene - why you might need this sort of text processing. If you already are an NLP enthusiast you probably don't need to read it but it gets you started nice an easily. 

Chapter 2 is where things really get into gear. It explains the workings of language, well the English language, by working its way through the useful levels of looking at text and providing labels for the different parts of speech. Rather than just being a theory lesson, it also points you in the direction of resources that you can use to identify parts of speech for example. It also discusses the problem of actually reading in the text from files in different formats using the first of the many open source programs discussed in the book - i.e. Apache Tika. 




Chapter 3 deals with the problems of intelligent search using Apache Solr. It is a basic introduction to Solr, how to get it setup and how to customize and optimize it. Chapter 4 moves on to the problems of fuzzy string matching and it first discusses some of the measures of similarity that you can work out. The ideas are implemented with reference to Solr in particular. 

Chapter 5 is called "Identifying people, places and things" and it discusses the named entity recognition problem. This is our first introduction to OpenNLP. Next we find out about clustering text using a range of methods and tools including Carrot and Mahout to implement k-means. Chapter 7 extends this to classification using Lucene. 

In Chapter 8 we discover what the object of the entire exercise has been in that it details the implementation of an example question answering system. To find out much about it you are going to have to run the code provided at the book's website.

The final chapter considers the future of the technology including a quick look at working with other languages, sentiment analysis and the long term goal of semantic analysis. 

This is not a text book nor is it a research monograph. It is aimed at programmers who need to understand enough about NLP to build an intelligent question answering system or similar. You will learn the theory as you go along but it is all explained in fairly plain language and via programming examples. You will need to program in Java and all of the tools are in the main Java oriented. If you are not a Java programmer you can understand the ideas presented but you will probably struggle to get the examples working. The book is also based on opens source tools that are part of the Java eco system - for example Solr, Lucene, Tika, Mahout and so on. If you plan to use other tools or other language then the book will be of less use.

Don't expect the book to show you how to implement complete text understanding, or to show you how to build a system like IBM's Watson question-answering machine. It gives you a very good and very practical overview of what you can achieve fairly easily and with moderate resources.

It is a good Java-oriented introduction to NLP and as such recommended. 



Hello! iOS Development

Author: Lou Franco & Eitan Mendelowitz
Publisher: Manning
Pages: 344
ISBN: 978-1935182986
Audience: Complete beginners
Rating: 2
Reviewer: Lucy Black

This looks like the book to get you started on iOS. Its got a cover with a cartoon character and if you flip though you will see lots more cartoons [ ... ]

MySQL Cookbook, 3rd Ed

Author: Paul DuBois
Publisher: O'Reilly, 2014
Pages: 836 
ISBN: 9781449374020
Kindle: B00M7EN798
Aimed at: MySQL developers
Rating: 5
Reviewed by: Kay Ewbank 

Is MySQL Cookbook the best book on MySQL? This latest edition certainly keeps up its reputation as the go-to refere [ ... ]

More Reviews

Last Updated ( Wednesday, 04 December 2013 )

RSS feed of book reviews only
I Programmer Book Reviews
RSS feed of all content
I Programmer Book Reviews
Copyright © 2014 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.