Natural Language Processing Demystified

Written by Nikos Vaggalis

Friday, 30 December 2022

...is the title of a free, self-paced and comprehensive course that will take you from beginner to expert in this topic. With 15 modules, it provides a solid grounding in NLP covering everything from the very basics to today's advanced models and techniques.

Natural Language Processing or NLP is a subfield of Artificial Intelligence that makes computers understand natural languages like English. So what? why invest into learning NLP in the first place?

nlpsq

NLP tries to make sense out of textual data, which is much more difficult than doing the same with numerical data. Applications of NLP are everywhere because people communicate almost everything in language: web search, advertising, emails, customer service, language translation, virtual agents, medical reports, etc. Many organizations are looking to integrate NLP into their workflows and products they provide such as translation, speech recognition and chatbots. Sounds like a good career move.

Specifically NLP is used today in products like:

voice-driven assistants
natural-language search
question answering
sentiment analysis for automated trading
business intelligence
social media analytics
content summarization

This includes Amazon Alexa, Google Home Assistant, Cortana and Siri, to name just a few implementations.

It's a science that knowledge in it can help to build strong and long lasting careers, hence courses like this can be proven very valuable in starting out in the field.

In this course you'll learn:

The fundamental concepts and algorithms of NLP.
The fundamentals of machine learning.
Understand how neural networks work by building one from scratch.
The popular deep learning methods used in production today.
How to use a variety of popular tools and libraries to go from raw data to a practical, working model.
How to leverage sophisticated pre-trained models for your own projects.
How to accomplish common NLP tasks including extracting key information, document search, text similarity, text classification, finding topics in documents, summarization, translation, generating text, and question answering.
Enough advanced knowledge to keep up with new developments.

In detail, the chapters are :

1. Introduction

2. Tokenization
The usual first step in NLP is to chop our documents into smaller pieces in a process called Tokenization. We'll look at the challenges involved and how to get it done.

3. Basic Preprocessing
Depending on our goal, we may preprocess text further. We'll cover case-folding, stop word removal, stemming, and lemmatization. We'll go over their use cases and their tradeoffs.

4. Advanced Preprocessing
We'll look at tagging our tokens with useful information including part-of-speech tags and named entity tags. We'll also explore different types of sentence parsing to help extract the meaning of a sentence.

5. Measuring Document Similarity With Basic Bag-of-Words
To perform calculations or use machine learning algorithms, we need to first turn our text into numbers. We'll take our first step here by looking at the simplest representation possible, then look at how to perform document similarity.

6. Simple Document Search With TF-IDF
We'll consider the shortcomings of the basic bag-of-words approach, then improve our vectors with TF-IDF and use it for document search.

7. Building Models: Finding Patterns for Fun and Profit
Through a high-level overview of modelling, we'll look at the different types of machine learning, how to evaluate model performance, and what to do when things go wrong.

8. Naive Bayes: Fast and Simple Text Classification
We'll learn how the Naive Bayes classifier works under the hood, see how accuracy can go wrong and how to use precision and recall instead, and then build a text classifier while working through problems along the way.

9. Topic Modelling: Automatically Discovering Topics in Documents
What do you do when you need to make sense of a pile of documents and have no other information? We'll learn one approach to this problem using Latent Dirichlet Allocation. We'll cover how it works, then build a model to discover topics present in a document and to search for similar documents.

10. Neural Networks I: Core Mechanisms and Coding One From Scratch
In this module, we'll cover how neural networks work and how they "learn" to get better over time. To further ground our knowledge, we'll build a neural network from scratch. By the end of this module, you'll have a clear understanding of their core mechanisms.

11. Neural Networks II: Effective Training Techniques
In this module, we'll learn how to set learning rates, how to deal with complex loss surfaces, regularization techniques to fight overfitting, and more. We'll also return to NLP by building a basic deep learning model for text classification. This will conclude our two-part deep dive into neural networks and set us up for the rest of the course.

12. Word Vectors
We'll learn how to vectorize words such that words with similar meanings have closer vectors (aka "embeddings"). This was a breakthrough in NLP and boosted performance on a variety of NLP problems while addressing the shortcomings of previous approaches. We'll look at how to create these word embeddings and how to use them in our models.

13. Recurrent Neural Networks and Language Models
How do you get a computer to generate coherent text? In this module, we'll learn how to do this using a technique called recurrence. We'll go beyond the bag-of-words approaches we've seen so far, and start capturing the information in word order. We'll then learn how to build two new types of models: a part-of-speech tagger, and a language model to generate text.

14. Sequence-to-Sequence and Attention
Whether it's translation, summarization, or even answering questions, a lot of NLP tasks come down to transforming one type of sequence into another. In this module, we'll learn to do that using encoders and decoders. We'll then look at the weaknesses of the standard approach, and enhance our model with Attention. In the demo, we'll build a model to translate languages for us.

15. Transformers From Scratch, Pre-Training, and Transfer Learning
Transformers have revolutionized deep learning. In this module, we'll learn how they work in detail and build one from scratch. We'll then explore how to leverage state-of-the-art models for our projects through pre-training and transfer learning. We'll learn how to fine-tune models from Hugging Face and explore the capabilities of GPT from OpenAI. Along the way, we'll tackle a new task for this course: question answering.

Coding wise it involves performing practical NLP tasks, covering data preparation, model training and testing, and using various popular tools. As usual everything is hosted on a Jupyter Colab notebook.

The pre-requisite barrier is low; you should be comfortable with Python and a bit of high school math. No previous knowledge of NLP or machine learning is assumed.

There's the official site of the course on which you can take it, but it can also be consumed as a Youtube playlist. My recommendation is to follow the official site as it's better organized plus it holds references to the Colab notebooks when they are available. The work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4. 0 International License.

More Information

Natural Language Processing Demystified
Youtube playlist

Take Stanford's Natural Language Understanding For Free
Take Stanford's Natural Language Processing with Deep Learning For Free

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Mozilla Discontinues DeepSpeech
03/07/2025

The DeepSpeech project started by Mozilla has updated its GitHub page with the message "This project is now discontinued", and a change in the project status to archived.

+ Full Story

Jakarta EE 11 Modernises Test Kits
26/06/2025

The Eclipse Foundation has announced the general availability of the Jakarta EE 11 Platform. This release has modernised Test Compatibility Kits (TCKs), and introduces the Jakarta Data speci [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Friday, 30 December 2022 )

More Information

Related Articles

Comments