NLUlite – An NLP Database
Written by Kay Ewbank   
Thursday, 11 September 2014

A new natural language parsing database that reads English texts and can then answer questions about them has been released as a public alpha.

NLULite has been created to be developer friendly, and consists of a server and a Python client. You use it by passing texts to it. The text is tagged using the tag frequencies provided in the Open American National Corpus (OANC). Sentences are then parsed by using parsing frequencies extracted from the OANC. A “distance” between words is obtained by using the Wordnet corpus (3.1). The parsing is then improved by choosing the sentences that make more sense according to the Framenet dataset.

As an example of the way it works, if you pass it the text from Wikipedia about snakes, it would then be able to answer questions such as:

what are the snakes able to do?

where do most of the snakes live?

what animal has no limbs?

 

Texts can include simple inference rules such as “If an animal has no limbs it cannot walk”, after which you (or a subsequent user) could ask “what does not walk”, and get an answer given in terms of the text submitted and the inference rules you’ve given.

 

Data sources can include web pages and RSS feeds. The data is kept as objects of the ‘wisdom’ class. Your code can set up many Wisdom objects, and each one is a separate knowledge base. Currently, you can only use NLUlite to parse texts that are smaller than a megabyte, though the developer plans to increase this in future versions. Once the text is parsed, the information is stored as XML.

NLULite is available in a single-threaded free version, or in a commercial multi-threaded version that parses pages much faster.

While there are a number of natural language projects, such as the Stanford Natural Language Processing Group, and the Natural Language Toolkit, this field is still developing.

More Information

NLUlite

Related Articles

Handbook of Natural Language Processing, 2nd Ed (book review)

Taming Text (book review)

 

To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, Facebook, Google+ or Linkedin,  or sign up for our weekly newsletter.

 

Banner


Lightbend Announces Akka 3
15/11/2024

Lightbend, the company that developed Akka, has announced Akka 3, and has changed its name to Akka. The company produces cloud-native microservices frameworks, and Akka is used for building distribute [ ... ]



AI Propels Python To Top Language on GitHub
30/10/2024

This year's Octoverse Report reveals how AI is expanding on GitHub and that Python has now overtaken JavaScript as the most popular language on GitHub. The use of Jupyter Notebooks has also surged.


More News

 

espbook

 

Comments




or email your comment to: comments@i-programmer.info

 

Last Updated ( Thursday, 11 September 2014 )