NLUlite – An NLP Database
Written by Kay Ewbank   
Thursday, 11 September 2014

A new natural language parsing database that reads English texts and can then answer questions about them has been released as a public alpha.

NLULite has been created to be developer friendly, and consists of a server and a Python client. You use it by passing texts to it. The text is tagged using the tag frequencies provided in the Open American National Corpus (OANC). Sentences are then parsed by using parsing frequencies extracted from the OANC. A “distance” between words is obtained by using the Wordnet corpus (3.1). The parsing is then improved by choosing the sentences that make more sense according to the Framenet dataset.

As an example of the way it works, if you pass it the text from Wikipedia about snakes, it would then be able to answer questions such as:

what are the snakes able to do?

where do most of the snakes live?

what animal has no limbs?

 

Texts can include simple inference rules such as “If an animal has no limbs it cannot walk”, after which you (or a subsequent user) could ask “what does not walk”, and get an answer given in terms of the text submitted and the inference rules you’ve given.

 

Data sources can include web pages and RSS feeds. The data is kept as objects of the ‘wisdom’ class. Your code can set up many Wisdom objects, and each one is a separate knowledge base. Currently, you can only use NLUlite to parse texts that are smaller than a megabyte, though the developer plans to increase this in future versions. Once the text is parsed, the information is stored as XML.

NLULite is available in a single-threaded free version, or in a commercial multi-threaded version that parses pages much faster.

While there are a number of natural language projects, such as the Stanford Natural Language Processing Group, and the Natural Language Toolkit, this field is still developing.

More Information

NLUlite

Related Articles

Handbook of Natural Language Processing, 2nd Ed (book review)

Taming Text (book review)

 

To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, Facebook, Google+ or Linkedin,  or sign up for our weekly newsletter.

 

Banner


Gender Differences In Coding Style
13/11/2024

A novel investigation into the gender gap between men and women regarding coding ability was undertaken by Dr Siân Brooke. Her conclusion? There is a difference in the Python code [ ... ]



Google Opensources Privacy Library
08/11/2024

Google is making a new differential privacy library available as open source. PipelineDP4J is a Java-based library that can be used to analyse data sets while preserving privacy.


More News

 

espbook

 

Comments




or email your comment to: comments@i-programmer.info

 

Last Updated ( Thursday, 11 September 2014 )