Retrain the multi language NER model(ner_ontonotes_bert_mult) with a dataset in a different language

Xhensila · May 24, 2021, 8:08am

I have successfully installed the multi-language NER model(ner_ontonotes_bert_mult).
I want to retrain this model with new data(in the same format as you suggest in the documentation)that are in the Albanian language. Is this possible(to retrain the multi-language NER model from DeepPavlov with data in a different language), or the retrain works only if we have English data??

yurakuratov · May 24, 2021, 1:20pm

Yes, you can fine-tune the model on any language that was used for Multilingual BERT training (bert/multilingual.md at master · google-research/bert · GitHub).

Xhensila · May 24, 2021, 1:48pm

Great! Thanks for the response. Also is there a problem if in my dataset I don’t have all the tags.
I am facing the same error as in this case. In my dataset, I will have only a subset from the list of 18 available tags listed here:

UPDATE: I successfully retrained the ner_ontonotes_bert_mult model with a dataset in Albanian language. Since I didn’t have the tags in my dataset I removed the "fit_on": ["y"], line from the config description of the tag_vocab component.

yurakuratov · May 24, 2021, 10:44pm

Great! Glad you succeeded.

Topic		Replies	Views
Training multilingual NER model DeepPavlov Library	1	370	October 19, 2020
What model is used in NER demo? Models	4	340	June 20, 2022
How was ner_ontonotes_bert_mult built? Models	2	298	April 29, 2022
Error in training multilingual NER with own data Tutorials & Guidelines	8	1926	May 19, 2020
Fine Tuning ner_ontonotes_bert with Custom Data Models	3	345	July 23, 2021

Retrain the multi language NER model(ner_ontonotes_bert_mult) with a dataset in a different language

Related topics