Fine Tuning ner_ontonotes_bert with Custom Data

I am trying finetune the pretrained model with my custom data. The code is working perfectly sometimes. That means I am able to create a new model with good accuracy. But sometimes I am getting a weird training and validation F1 score with the same code

2021-07-17 19:13:37.480 INFO in ‘deeppavlov.core.data.simple_vocab’[‘simple_vocab’] at line 115: [loading vocabulary from /Users/v.supriya/.deeppavlov/models/ner_ontonotes_bert/tag.dict]
2021-07-17 19:13:37.712 INFO in ‘deeppavlov.core.data.simple_vocab’[‘simple_vocab’] at line 101: [saving vocabulary to /Users/v.supriya/.deeppavlov/models/ner_ontonotes_bert/tag.dict]
2021-07-17 19:13:53.506 INFO in ‘deeppavlov.core.models.tf_model’[‘tf_model’] at line 51: [loading model from /Users/v.supriya/.deeppavlov/models/ner_ontonotes_bert/model]
2021-07-17 19:14:50.141 INFO in ‘deeppavlov.core.trainers.nn_trainer’[‘nn_trainer’] at line 199: Initial best ner_f1 of 0
{“valid”: {“eval_examples_count”: 753, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0.7467}, “time_spent”: “0:00:54”, “epochs_done”: 0, “batches_seen”: 0, “train_examples_seen”: 0, “impatience”: 0, “patience_limit”: 100}}
WARNING:tensorflow:From /Users/v.supriya/opt/anaconda3/envs/deeppavlov_15/lib/python3.7/site-packages/deeppavlov/core/trainers/nn_trainer.py:250: The name tf.Summary is deprecated. Please use tf.compat.v1.Summary instead.

{“train”: {“eval_examples_count”: 16, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:04:08”, “epochs_done”: 0, “batches_seen”: 40, “train_examples_seen”: 640, “head_learning_rate”: 0.009999999776482582, “bert_learning_rate”: 1.9999999552965164e-05, “loss”: 7.962039543688297}}
2021-07-17 19:18:56.89 INFO in ‘deeppavlov.core.trainers.nn_trainer’[‘nn_trainer’] at line 212: Did not improve on the ner_f1 of 0
{“valid”: {“eval_examples_count”: 753, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:05:00”, “epochs_done”: 0, “batches_seen”: 40, “train_examples_seen”: 640, “impatience”: 1, “patience_limit”: 100}}
{“train”: {“eval_examples_count”: 16, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:08:14”, “epochs_done”: 0, “batches_seen”: 80, “train_examples_seen”: 1280, “head_learning_rate”: 0.009999999776482582, “bert_learning_rate”: 1.9999999552965164e-05, “loss”: 4.009204307198525}}
2021-07-17 19:23:01.418 INFO in ‘deeppavlov.core.trainers.nn_trainer’[‘nn_trainer’] at line 212: Did not improve on the ner_f1 of 0
{“valid”: {“eval_examples_count”: 753, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:09:05”, “epochs_done”: 0, “batches_seen”: 80, “train_examples_seen”: 1280, “impatience”: 2, “patience_limit”: 100}}
{“train”: {“eval_examples_count”: 16, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:12:54”, “epochs_done”: 0, “batches_seen”: 120, “train_examples_seen”: 1920, “head_learning_rate”: 0.009999999776482582, “bert_learning_rate”: 1.9999999552965164e-05, “loss”: 2.8950684279203416}}
2021-07-17 19:27:46.542 INFO in ‘deeppavlov.core.trainers.nn_trainer’[‘nn_trainer’] at line 212: Did not improve on the ner_f1 of 0
{“valid”: {“eval_examples_count”: 753, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:13:50”, “epochs_done”: 0, “batches_seen”: 120, “train_examples_seen”: 1920, “impatience”: 3, “patience_limit”: 100}}
{“train”: {“eval_examples_count”: 16, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:17:01”, “epochs_done”: 0, “batches_seen”: 160, “train_examples_seen”: 2560, “head_learning_rate”: 0.009999999776482582, “bert_learning_rate”: 1.9999999552965164e-05, “loss”: 2.902792666107416}}
2021-07-17 19:31:55.664 INFO in ‘deeppavlov.core.trainers.nn_trainer’[‘nn_trainer’] at line 212: Did not improve on the ner_f1 of 0
{“valid”: {“eval_examples_count”: 753, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:17:59”, “epochs_done”: 0, “batches_seen”: 160, “train_examples_seen”: 2560, “impatience”: 4, “patience_limit”: 100}}
{“train”: {“eval_examples_count”: 16, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:21:18”, “epochs_done”: 0, “batches_seen”: 200, “train_examples_seen”: 3200, “head_learning_rate”: 0.009999999776482582, “bert_learning_rate”: 1.9999999552965164e-05, “loss”: 2.4636255234479902}}
2021-07-17 19:36:12.793 INFO in ‘deeppavlov.core.trainers.nn_trainer’[‘nn_trainer’] at line 212: Did not improve on the ner_f1 of 0
{“valid”: {“eval_examples_count”: 753, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:22:16”, “epochs_done”: 0, “batches_seen”: 200, “train_examples_seen”: 3200, “impatience”: 5, “patience_limit”: 100}}
{“train”: {“eval_examples_count”: 16, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:25:22”, “epochs_done”: 0, “batches_seen”: 240, “train_examples_seen”: 3840, “head_learning_rate”: 0.009999999776482582, “bert_learning_rate”: 1.9999999552965164e-05, “loss”: 2.249296957999468}}
2021-07-17 19:40:18.927 INFO in ‘deeppavlov.core.trainers.nn_trainer’[‘nn_trainer’] at line 212: Did not improve on the ner_f1 of 0
{“valid”: {“eval_examples_count”: 753, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:26:23”, “epochs_done”: 0, “batches_seen”: 240, “train_examples_seen”: 3840, “impatience”: 6, “patience_limit”: 100}}
{“train”: {“eval_examples_count”: 16, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:29:30”, “epochs_done”: 0, “batches_seen”: 280, “train_examples_seen”: 4480, “head_learning_rate”: 0.009999999776482582, “bert_learning_rate”: 1.9999999552965164e-05, “loss”: 2.758265073597431}}
2021-07-17 19:44:27.746 INFO in ‘deeppavlov.core.trainers.nn_trainer’[‘nn_trainer’] at line 212: Did not improve on the ner_f1 of 0
{“valid”: {“eval_examples_count”: 753, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:30:31”, “epochs_done”: 0, “batches_seen”: 280, “train_examples_seen”: 4480, “impatience”: 7, “patience_limit”: 100}}
{“train”: {“eval_examples_count”: 16, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:34:03”, “epochs_done”: 0, “batches_seen”: 320, “train_examples_seen”: 5120, “head_learning_rate”: 0.009999999776482582, “bert_learning_rate”: 1.9999999552965164e-05, “loss”: 2.457407708466053}}
2021-07-17 19:48:55.990 INFO in ‘deeppavlov.core.trainers.nn_trainer’[‘nn_trainer’] at line 212: Did not improve on the ner_f1 of 0
{“valid”: {“eval_examples_count”: 753, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:35:00”, “epochs_done”: 0, “batches_seen”: 320, “train_examples_seen”: 5120, “impatience”: 8, “patience_limit”: 100}}
{“train”: {“eval_examples_count”: 16, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:38:25”, “epochs_done”: 0, “batches_seen”: 360, “train_examples_seen”: 5760, “head_learning_rate”: 0.009999999776482582, “bert_learning_rate”: 1.9999999552965164e-05, “loss”: 3.3798248738050463}}
2021-07-17 19:53:16.189 INFO in ‘deeppavlov.core.trainers.nn_trainer’[‘nn_trainer’] at line 212: Did not improve on the ner_f1 of 0
{“valid”: {“eval_examples_count”: 753, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:39:20”, “epochs_done”: 0, “batches_seen”: 360, “train_examples_seen”: 5760, “impatience”: 9, “patience_limit”: 100}}
{“train”: {“eval_examples_count”: 16, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:43:12”, “epochs_done”: 1, “batches_seen”: 400, “train_examples_seen”: 6389, “head_learning_rate”: 0.009999999776482582, “bert_learning_rate”: 1.9999999552965164e-05, “loss”: 3.8420878514647483}}
2021-07-17 19:58:06.655 INFO in ‘deeppavlov.core.trainers.nn_trainer’[‘nn_trainer’] at line 212: Did not improve on the ner_f1 of 0
{“valid”: {“eval_examples_count”: 753, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:44:10”, “epochs_done”: 1, “batches_seen”: 400, “train_examples_seen”: 6389, “impatience”: 10, “patience_limit”: 100}}
{“train”: {“eval_examples_count”: 16, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:47:38”, “epochs_done”: 1, “batches_seen”: 440, “train_examples_seen”: 7029, “head_learning_rate”: 0.009999999776482582, “bert_learning_rate”: 1.9999999552965164e-05, “loss”: 2.6904850624501706}}
2021-07-17 20:02:32.435 INFO in ‘deeppavlov.core.trainers.nn_trainer’[‘nn_trainer’] at line 212: Did not improve on the ner_f1 of 0
{“valid”: {“eval_examples_count”: 753, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:48:36”, “epochs_done”: 1, “batches_seen”: 440, “train_examples_seen”: 7029, “impatience”: 11, “patience_limit”: 100}}
{“train”: {“eval_examples_count”: 16, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:51:57”, “epochs_done”: 1, “batches_seen”: 480, “train_examples_seen”: 7669, “head_learning_rate”: 0.009999999776482582, “bert_learning_rate”: 1.9999999552965164e-05, “loss”: 3.107879289984703}}
2021-07-17 20:06:47.835 INFO in ‘deeppavlov.core.trainers.nn_trainer’[‘nn_trainer’] at line 212: Did not improve on the ner_f1 of 0
{“valid”: {“eval_examples_count”: 753, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:52:51”, “epochs_done”: 1, “batches_seen”: 480, “train_examples_seen”: 7669, “impatience”: 12, “patience_limit”: 100}}
{“train”: {“eval_examples_count”: 16, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:55:52”, “epochs_done”: 1, “batches_seen”: 520, “train_examples_seen”: 8309, “head_learning_rate”: 0.009999999776482582, “bert_learning_rate”: 1.9999999552965164e-05, “loss”: 2.2718755930662153}}
2021-07-17 20:10:40.54 INFO in ‘deeppavlov.core.trainers.nn_trainer’[‘nn_trainer’] at line 212: Did not improve on the ner_f1 of 0
{“valid”: {“eval_examples_count”: 753, “metrics”: {“ner_f1”: 0, “ner_token_f1”: 0}, “time_spent”: “0:56:44”, “epochs_done”: 1, “batches_seen”: 520, “train_examples_seen”: 8309, “impatience”: 13, “patience_limit”: 100}}

code used for training is given below and I am using deeppavlov 0.15:

config_dict = parse_config(configs.ner.ner_ontonotes_bert)
config_dict[‘dataset_reader’][‘data_path’] = //path to script folder
train_model(config_dict, download=True)

Hi!

As I can see loss value is decreasing over time, so the model is training. On the other hand train and validation metrics are still zero. Will they increase if let training to continue until it stops?

Is it possible that you overwrite good pre-trained checkpoint (on ontonotes) with model with low validation scores (on your data)?

Hi!

sometimes, I am able to train the model with the same data and same code. During that time, it shows train and val f1 score correctly. And the model is giving expected output on unseen data.

But sometimes, I am getting the above type error. When I am getting this type of f1 score, the model is not behaving as expected. it is not learning anything from the given data.

I am not able to figure out the issue as it is behaving differently with the same code and data

Also, in my understanding, the new checkpoint will be saved only if the val f1 score is greater than that of the existing model check point

Yes, at the start of training initial checkpoint is evaluated and new model will be saved only if metrics are improved.

But sometimes, I am getting the above type error. When I am getting this type of f1 score, the model is not behaving as expected. it is not learning anything from the given data.

I am not able to figure out the issue as it is behaving differently with the same code and data

It is hard to guess, but the reason could be random initialization of the classification head. This can be checked by setting random seeds for the model. Also, to make it more reproducible, random seed for dataset iterator should be set. Removing all kinds of randomness would help with investigating.

Not sure how to help you more in this case.