NodeJS and NLP

In these days, I read about an NLP (Natural Language Processing) Library in NodeJS. The library is called natural and It is a general natural language facility for NodeJS. Tokenizing, stemming, classification, phonetics, tf-idf, WordNet, string similarity, and some inflections are currently supported.
There are different posts which describe this libs, some of them are the Shris Umbel’s blog and webdesignerdepot. In this post, I will summarize these experiences.
Let’s start.


The installation is very simple, you can use the NodeJS package manager npm in the following way:

npm install natural

If you prefer to use the github‘s version, you can:

git clone git://
cd natural
npm install .

Let’s see some simple functions… First of all, you have to include the library:

var nlp = require('natural');


Let’s start with the Tokenizer. What is a Tokenizer? Well, a sentence is formed by word (aka token). A tokenizer split up a string into words. The simplest tokenizer is the WordTokenizer:

var tokenizer = new nlp.WordTokenizer();
console.log(tokenizer.tokenize("This sentence is very short. It is ok."));

It splits on anything except alphabetic characters, digits, and underscores. The result is the following:

[ 'This', 'sentence', 'is', 'very', 'short', 'It', 'is', 'ok' ]

Another is the WordPunctTokenizer that splits on anything except alphabetic characters, digits, punctuation, and underscore. The previous example is:

var wordPunctTokenizer = new nlp.WordPunctTokenizer();
console.log(wordPunctTokenizer.tokenize("This sentence is very short. It is ok."));

and the result is:

[ 'This', 'sentence', 'is', 'very', 'short', '.', 'It', 'is', 'ok', '.' ]

As simple to see, the second tokenizer add the dot ‘.’ in the array.
There are other tokenizers, some of them for a specific language. For example, for Italian, it is possible to use the AggressiveTokenizerIt.

String Distance

Another interesting function is the string distance. The library allows using a different type of distance like Hamming distance, Jaro-Winkler, Levenshtein distance, and Dice coefficient.

For example, we can see the LevenshteinDistance among “Davide” and “Divide”:


The result is 1.


Another interesting function is the classification. Currently, the library has two classifiers, Naive Bayes, and logistic regression. We can start with BayesClassifier:

var classifier = new nlp.BayesClassifier();

Training the model is very simple. You have to add the annotated document and use the method train. I add to the category of documents: television and radio

classifier.addDocument('I like television', 'television');
classifier.addDocument('I hate tv-series', 'television');
classifier.addDocument('Listen to the radio', 'radio');
classifier.addDocument('Change the radio program', 'radio');

for predicting a new sentence, you can use the classified:

console.log(classifier.classify('See television'));

with the label television.

For using the LogisticRegressionClassifier, you have to substitute the BayesClassifier with LogisticRegressionClassifier:

var classifier2 = new nlp.LogisticRegressionClassifier();
classifier2.addDocument('I like television', 'television');
classifier2.addDocument('I hate tv-series', 'television');
classifier2.addDocument('Listen to the radio', 'radio');
classifier2.addDocument('Change the radio program', 'radio');
console.log(classifier2.classify('See television'));

Moreover, the library allows you to use the Maximum Entropy Classifier.
There are more other functions that the library has like: Stemmers, Sentiment Analysis, Phonetics, etc. Just one hint, try it.

Sharing is caring!

3 thoughts on “NodeJS and NLP

  1. Somebody essentially help to make critically posts I might state. This is the very first time I frequented your website page and up to now? I amazed with the research you made to create this actual publish extraordinary. Wonderful task!

  2. Nice blog! Is your theme custom made or did you download it from somewhere? A design like yours with a few simple adjustements would really make my blog shine. Please let me know where you got your theme. Thanks a lot

Leave a Reply to Jacques Cancel reply