How to stem non english words?
Stemming as we have discussed already what is stemming which is nothing but reducing the words to their root size. We have seen stemming for English words, but what about non - english language words, there are stemmers available for non - english words as well. Lets understand this with practical implementation.
from nltk.stem.snowball import GermanStemmer
german_st = GermanStemmer()
token_sample = ["Schreiben","geschrieben"]
Here we have taken some sample words in german whose english translation is:
Schreiben - writing
geschrieben - written
stem_words = [german_st.stem(words) for words in token_sample]
print("Print the output after stemming:",stem_words)
Print the output after stemming: ['schreib', 'geschrieb']
Here we can see the output as, 'schreib', 'geschrieb' whose english translation is:
schreib - write
geschrieb - wrote
So we can see the difference between our sample token words and results after applying stremming on that words.