I have successfully installed the multi-language NER model(ner_ontonotes_bert_mult).
I want to retrain this model with new data(in the same format as you suggest in the documentation)that are in the Albanian language. Is this possible(to retrain the multi-language NER model from DeepPavlov with data in a different language), or the retrain works only if we have English data??
Retrain the multi language NER model(ner_ontonotes_bert_mult) with a dataset in a different language
Yes, you can fine-tune the model on any language that was used for Multilingual BERT training (bert/multilingual.md at master · google-research/bert · GitHub).
Great! Thanks for the response. Also is there a problem if in my dataset I don’t have all the tags.
I am facing the same error as in this case. In my dataset, I will have only a subset from the list of 18 available tags listed here:
UPDATE: I successfully retrained the ner_ontonotes_bert_mult model with a dataset in Albanian language. Since I didn’t have the tags in my dataset I removed the "fit_on": ["y"],
line from the config description of the tag_vocab
component.
Great! Glad you succeeded.