Strange fine-tuned NER model behavior

I finetuned NER model on some data, with this config:

config_for_train['dataset_reader']['data_path'] = TRAIN_FILES_DIR #train, val, test
config_for_train['chainer']['pipe'][1]['save_path'] = tag_save_path #new model dir
config_for_train['chainer']['pipe'][2]['return_probas'] = False
config_for_train['chainer']['pipe'][2]['save_path'] = model_save_path #new model dir
config_for_train['chainer']['pipe'][2]['out'] = ['y_pred_ind']

new model folder:

So i changed standard ontonotes BERT config with code below, and built model with the resulting config

self.config_dict["chainer"]["pipe"][1]["load_path"] = str(MODELS_DIR) + \
self.config_dict["chainer"]["pipe"][2]["load_path"] = str(MODELS_DIR) + \

When i trying to find entities in "United States, country in North America that is a federal republic of 50 states and was founded in 1776" it returns something strange:

[[['United', 'States', ',', 'country', 'in', 'North', 'America', 'that', 'is', 'a', 'federal', 'republic', 'of', '50', 'states', 'and', 'was', 'founded', 'in', '1776']], [['I-EVENT', 'I-CARDINAL', 'I-CARDINAL', 'I-LAW', 'I-EVENT', 'I-EVENT', 'I-EVENT', 'I-EVENT', 'I-MONEY', 'I-CARDINAL', 'I-EVENT', 'I-EVENT', 'I-MONEY', 'I-EVENT', 'I-CARDINAL', 'I-MONEY', 'I-EVENT', 'I-EVENT', 'I-EVENT', 'I-EVENT']]]

Standatd OntoNotes NER BERT returns:

[[['United', 'States', ',', 'country', 'in', 'North', 'America', 'that', 'is', 'a', 'federal', 'republic', 'of', '50', 'states', 'and', 'was', 'founded', 'in', '1776']], [['B-GPE', 'I-GPE', 'O', 'O', 'O', 'B-LOC', 'I-LOC', 'O', 'O', 'O', 'O', 'O', 'O', 'B-CARDINAL', 'O', 'O', 'O', 'O', 'O', 'B-DATE']]]

I think problem may be related to tag.dict

Are the tag sets different for the pre-trained model and the fine-tuned one?

tag.dict in the fine-tuned model is different from tag.dict in pre-trained model, i mean, there is another tags order, and another cardinals after the tags(seems like this is count of tag in training data)

Currently, this functionality is not officially supported.

The problem is in not matched number of units in the top layer for pre-trained model and new one. This one should rise an error during load of the model if the number of tags is different in two tag sets. If the number of tags is the same you should see loading model from your_folder in the logs. However, in the case of the same number of tags, situation is still bad, assigning different tags to pre-trained output layer embeddings might be a bad local minimum. Imagine assigning O tag to I-VERY-NEW-TAG, it would be hard to go from O to the new one.

I think you missunderstood me)
I trying to finetune Ontonotes BERT with part of the Ontonotes dataset+some of my sentences, and it all with Ontonotes tags, so there are no new tags
Or, did you mean model finetuning, or model saving is not supporting?

Oh, that makes sense. Thanks for clarification.

Fine-tuning with different tag sets is not supported. Model saving is completely ok.

I the problem might be in another order of the tags, just remove "fit_on": ["y"] in the tag vocabulary part of the config. Also make sure that the model is loaded by finding the loading model from your_folder in the logs.

Thanks, it works nice
I hope, in the near future i’ll will have enough time to write kind of article on medium or Habr, like “how to finetune the DeepPavlov BERT model”