I’m trying to use your model ner_ontonotes_bert_mult to implement a few-shot tranfer to russian language (literature topic), and I’m having trouble understand how your model was built originally.
If I’m not mistaking I understand it was built by fine-tuning multilingual BERT model on english part of ontonotes dataset. but how it was generalized to contain 104 languages?
I would be grateful if you would share me the paper that you relied on? it would be very helpful for my research.