Handbook of Natural Language Processing (2e) |
Author: Nitin Indurkhya & Fred J. Damerau (Editors) In looking for a Best Book selection in the Artificial Intelligence Mike James selected a highly readable collection of papers on Natural Language Processing. Author: Nitin Indurkhya & Fred J. Damerau (Editors) An academic "handbook" on natural language processing. You might guess that it is going to be another boring collection of difficult to read papers - only you would be wrong! If you need a readable introduction to this important subject - this is it. Natural Language Processing (NLP) is something of a hot topic at the moment because of the success of the statistical approach. Old style NPL concentrated on syntax and parsing with a bit of modeling for semantic content thrown in. One of the problems of working in the field was simply finding enough examples of language in machine form. What existed was usually not natural language as it is spoken or written or used but specialist dialect used for technical documents etc. Now of course we have the Internet and just about everyone types long and short chunks of naturalistic text into various chat and messaging applications - not to mention the huge database that is the web proper. All of this data makes it possible to try new approaches based on the statistical properties of language and to test methods using the large amount of material available. So should you jump in and find out about the new statistical approach and simply forget the syntax and parsing? No - it's essential that you don't throw away all that has been developed and you do need to know the basics of the earlier approaches and this handbook covers both the old and the new. It is a collection of essays which are mostly self-contained. As they are by different authors the voices and quality of the contributions vary, but in the main it is all high quality and very readable. You will, of course, need to be happy with maths - production rules, finite state machines and stats - but mostly the maths is explained as it is needed. Part I deals with classical approaches. After an overview we have essays on each of the standard steps in NLP - preprocessing, lexical analysis, parsing, semantic analysis and language generation. If you read all of the chapters you will be read to move on to the statistical approach described in Part II. This lacks an overview which ought to be added in any third edition. It launches into the topic with an essay on corpus creation - corpus is the jargon word for a database of natural language that can be used for statistical and empirical NLP. From here we move into sometimes specialist territory - treebank annotation, part-of-speech tagging, web distance and word similarity, alignment, disambiguation and so on. There are also some good overviews of particular topics - fundamental statistical techniques, statistical parsing, speech recognition and statistical machine translation. Part III is all about applications and for many it will be the least useful part of the book but for some it will provides evidence that the techniques work. The topics covered include: Chinese machine translation, information retrieval, question answering, information extraction, report generation, ontology, health care, text mining and sentiment analysis. This is a good way to get into NLP. You will probably need additional, more specialized, texts to guide your next steps but this does provide a basic course on the subject suitable both for academic and practical development. Highly recommended.
|
|||
Last Updated ( Thursday, 30 December 2010 ) |