Hi everyone!
First of, I really like everything about DeepPavlov: it’s clear and concise website, it’s documentation and presence on Medium.
However, I can’t seem to find a complete walk-through type of tutorial for M-BERT based multilingual QA model.
Basically, here’s what I want to achieve:
I want a model I can access either via API or (even better!) through Telegram. I want this model to be additionally trained on my set of documentations and legislation. If that is even required. Maybe not, I’m not sure.
The overall idea being that I can ask questions on these policies and\or legislation in English, Russian and Azerbaijani and get the answer from there.
And here’s where I’m struggling:
- What is the correct formatting my own documents need to be in for model to work? Currently I have .txt files that I read into variables. Should I apply some additional sanitation? Do “” instead of “\n”, for example? I can’t find any info on this.
- Should I train this model first on my data? I tried training it by locating this part with:
model_config = json.load(open(configs.squad.squad_bert_multilingual_freezed_emb))
pprint(model_config[‘dataset_reader’])
but that didn’t work, no "dataset_reader"s there. - How does one combine different models? I would like to have a bot that opens up with a chit-chat and then answers question using model above if that is at all possible.
And, just in general, I would appreciate links to articles and tutorials that go through similar process from start to finish, so I can maybe learn best-practices so to speak. I am absolutely new to this. Something more comprehensive than the Colab notebooks available in documentation.
P.S. I speak English and Russian, so feel free to answer in whatever language you are most comfortable with.
P.P.S. If that is the wrong subforum for these kind of questions, I’m sorry. Please forward this post to a more appropriate forum then.
P.P.P.S. If no such tutorial exists, I’d be happy to write one for you once I get it working.