Stemmers
Currently stemming is supported via the Porter and Lancaster (Paice/Husk) algorithms. The Indonesian and Japanese stemmers do not follow a known algorithm.
var natural = require('natural');
This example uses a Porter stemmer. “word” is returned.
console.log(natural.PorterStemmer.stem("words")); // stem a single word
in Russian:
console.log(natural.PorterStemmerRu.stem("падший"));
in Spanish:
console.log(natural.PorterStemmerEs.stem("jugaría"));
The following stemmers are available:
Language | Porter | Lancaster | Other | Module |
---|---|---|---|---|
Dutch | X | PorterStemmerNl | ||
English | X | PorterStemmer | ||
English | X | LancasterStemmer | ||
Farsi (in progress) | X | PorterStemmerFa | ||
French | X | PorterStemmerFr | ||
French | X | CarryStemmerFr | ||
German | X | PorterStemmerDe | ||
Indonesian | X | StemmerId | ||
Italian | X | PorterStemmerIt | ||
Japanese | X | StemmerJa | ||
Norwegian | X | PorterStemmerNo | ||
Portugese | X | PorterStemmerPt | ||
Russian | X | PorterStemmerRu | ||
Spanish | X | PorterStemmerEs | ||
Swedish | X | PorterStemmerSv | ||
Ukrainian | X | PorterStemmerUk |
Carry stemmer
For French an additional stemmer is added called Carry stemmer. This is a Galileo Carry algorithm based on http://www.otlet-institute.org/docs/Carry.pdf
Note :bangbang:: The implementation descibed in the PDF differs with the one from the official C++ implementation. This implementation follows the C++ implementation rules which solves some problems of the algorithm described in the article.
References
- Carry stemmer is a contribution by Johan Maupetit.
- Ukrainian stemmer is contributed by Pluto Rotegott @rotegott
- PEGjs: Parser Generator for JavaScript, https://pegjs.org/