Get all model words

Hello! I have a question about the methods applied to models.

Is there any method with which you can get a list of all the words in the model? (Suppose I want to get all the words from “ru_syntagrus_joint_parsing”.) Or is there a method that immediately shows whether the word being processed is present in the dictionary or not?

Thanks in advance!

I do not understand what do you mean by all words in the model. The lemmatization part is done on the basis of pymorphy analyzer, which can process out-of-vocabulary words as well. Tagging and parsing components does not use dictionaries in any form.

I mean, I need to find out whether this is an out-of-vocabulary word or nor. For example, pymorphy labels FakeDictionary on out-of-vocabulary words.

OOV words for lemmatization column are exactly OOV words for pymorphy. For other parts of the output, there is no such notion.

1 Like