I’m attempting to get ODQA running in deeppavlov by running it in Pycharm. The purpose is to have it answer a scripted I’m using the following guide as a base of my efforts, almost all of my code is derived from it: Open-domain question answering with DeepPavlov | by Vasily Konovalov | DeepPavlov | Medium . I’m training the ranker model and building the reader model.
Attempting to run the reader on the entirety of my dataset (approximately 1.8gb worth of txt files) results in memory error. Using the small fraction of my dataset, i was able to yield the following logs:
The machine I’m using has 32 gb ram, out of which approximately 24gb RAM available for operations. The tf-idf keeps looping at tokenization and counting hash. Here are my model config settings:
I hope for answers for the following questions:
How large a dataset is the ranker supposed to handle? If it can only handle small datasets, am I doing something wrong? Is the loop as seen in 1st image something to be expected (and waited out), or an erroneous situation indicating I made a mistake?
Given Deeppavlov itself and the guide already went a long way for me, I suspect that if there’s a problem, it’s in configuration file.
Please let me know if there’s anything I should elaborate upon.