Stemmers
Currently stemming is supported via the Porter and Lancaster (Paice/Husk) algorithms. The Indonesian and Japanese stemmers do not follow a known algorithm.
var natural = require('natural');
This example uses a Porter stemmer. “word” is returned.
console.log(natural.PorterStemmer.stem("words")); // stem a single word
in Russian:
console.log(natural.PorterStemmerRu.stem("падший"));
in Spanish:
console.log(natural.PorterStemmerEs.stem("jugaría"));
The following stemmers are available:
| Language | Porter | Lancaster | Other | Module |
|---|---|---|---|---|
| Dutch | X | PorterStemmerNl | ||
| English | X | PorterStemmer | ||
| English | X | LancasterStemmer | ||
| Farsi (in progress) | X | PorterStemmerFa | ||
| French | X | PorterStemmerFr | ||
| French | X | CarryStemmerFr | ||
| German | X | PorterStemmerDe | ||
| Indonesian | X | StemmerId | ||
| Italian | X | PorterStemmerIt | ||
| Japanese | X | StemmerJa | ||
| Norwegian | X | PorterStemmerNo | ||
| Portugese | X | PorterStemmerPt | ||
| Russian | X | PorterStemmerRu | ||
| Spanish | X | PorterStemmerEs | ||
| Swedish | X | PorterStemmerSv | ||
| Ukrainian | X | PorterStemmerUk |
Carry stemmer
For French an additional stemmer is added called Carry stemmer. This is a Galileo Carry algorithm based on http://www.otlet-institute.org/docs/Carry.pdf
Note :bangbang:: The implementation descibed in the PDF differs with the one from the official C++ implementation. This implementation follows the C++ implementation rules which solves some problems of the algorithm described in the article.
References
- Carry stemmer is a contribution by Johan Maupetit.
- Ukrainian stemmer is contributed by Pluto Rotegott @rotegott
- PEGjs: Parser Generator for JavaScript, https://pegjs.org/