Stemmers

Currently stemming is supported via the Porter and Lancaster (Paice/Husk) algorithms. The Indonesian and Japanese stemmers do not follow a known algorithm.

var natural = require('natural');

This example uses a Porter stemmer. “word” is returned.

console.log(natural.PorterStemmer.stem("words")); // stem a single word

in Russian:

console.log(natural.PorterStemmerRu.stem("падший"));

in Spanish:

console.log(natural.PorterStemmerEs.stem("jugaría"));

The following stemmers are available:

Language Porter Lancaster Other Module
Dutch X     PorterStemmerNl
English X     PorterStemmer
English   X   LancasterStemmer
Farsi (in progress) X     PorterStemmerFa
French X     PorterStemmerFr
French     X CarryStemmerFr
German X     PorterStemmerDe
Indonesian     X StemmerId
Italian X     PorterStemmerIt
Japanese     X StemmerJa
Norwegian X     PorterStemmerNo
Portugese X     PorterStemmerPt
Russian X     PorterStemmerRu
Spanish X     PorterStemmerEs
Swedish X     PorterStemmerSv
Ukrainian X     PorterStemmerUk

Carry stemmer

For French an additional stemmer is added called Carry stemmer. This is a Galileo Carry algorithm based on http://www.otlet-institute.org/docs/Carry.pdf

Note :bangbang:: The implementation descibed in the PDF differs with the one from the official C++ implementation. This implementation follows the C++ implementation rules which solves some problems of the algorithm described in the article.

References

  • Carry stemmer is a contribution by Johan Maupetit.
  • Ukrainian stemmer is contributed by Pluto Rotegott @rotegott
  • PEGjs: Parser Generator for JavaScript, https://pegjs.org/