Skip to main content Link Search Menu Expand Document (external link) Copy Copied

Stemmers

Currently stemming is supported via the Porter and Lancaster (Paice/Husk) algorithms. The Indonesian and Japanese stemmers do not follow a known algorithm.

var natural = require('natural');

This example uses a Porter stemmer. “word” is returned.

console.log(natural.PorterStemmer.stem("words")); // stem a single word

in Russian:

console.log(natural.PorterStemmerRu.stem("падший"));

in Spanish:

console.log(natural.PorterStemmerEs.stem("jugaría"));

The following stemmers are available:

Language Porter Lancaster Other Module
Dutch X     PorterStemmerNl
English X     PorterStemmer
English   X   LancasterStemmer
Farsi (in progress) X     PorterStemmerFa
French X     PorterStemmerFr
French     X CarryStemmerFr
German X     PorterStemmerDe
Indonesian     X StemmerId
Italian X     PorterStemmerIt
Japanese     X StemmerJa
Norwegian X     PorterStemmerNo
Portugese X     PorterStemmerPt
Russian X     PorterStemmerRu
Spanish X     PorterStemmerEs
Swedish X     PorterStemmerSv

attach() patches stem() and tokenizeAndStem() to String as a shortcut to PorterStemmer.stem(token). tokenizeAndStem() breaks text up into single words and returns an array of stemmed tokens.

natural.PorterStemmer.attach();
console.log("i am waking up to the sounds of chainsaws".tokenizeAndStem());
console.log("chainsaws".stem());

The same thing can be done with a Lancaster stemmer:

natural.LancasterStemmer.attach();
console.log("i am waking up to the sounds of chainsaws".tokenizeAndStem());
console.log("chainsaws".stem());

Carry stemmer

For French an additional stemmer is added called Carry stemmer. This is a Galileo Carry algorithm based on http://www.otlet-institute.org/docs/Carry.pdf

Note :bangbang:: The implementation descibed in the PDF differs with the one from the official C++ implementation. This implementation follows the C++ implementation rules which solves some problems of the algorithm described in the article.

References

  • Carry stemmer is a contribution by Johan Maupetit.
  • PEGjs: Parser Generator for JavaScript, https://pegjs.org/