Repeat: Ner ontonotes Bert model training with ontonotes dataset doesn't finish even after 4 days

ravishpankar · July 12, 2021, 5:35am

please look at the topic with this topic title under deeppavlov framework category updated April 20th. it was handled by Yura kuratov. now it lays unattended.

yurakuratov · July 12, 2021, 2:34pm

ravishpankar · July 12, 2021, 4:48pm

Hi YuraKuratov, Thank you. Can you give me the ideal termination condition? How many epochs and validation patience value for a cpu training?what F1 score is ideal?I’m not sure if I’m asking the questions correctly. I assume that it tries to reach 100% F1 score Or 30 itns to be completed.None of these conditions are met so training did not complete even after 4days. please correct me if I’m wrong and suggest a very nice termination condition.

ravishpankar · July 12, 2021, 4:59pm

If for 4000 batches, the f1scores of each of those 40 batches do not increase then it would stop. 30 itns were not over in those four days and f1score was always increasing for the validation patience value 100. I understand. Now, please suggest a nice value for these two parameters of training.I’m expecting a nice f1score. I’m using a CPU.

yurakuratov · July 13, 2021, 3:35pm

You can always stop training that is running and DeepPavlov will save the best current checkpoint.

If it takes too long for training to stop on CPU you can modify total number of batches (or epochs) that model should train.

For example:
You are ready to wait for X hours. Check how long does it take model to make one iteration (current number of iterations and time is logged both in stdout and tensorboard), let it be Y iterations/second. So, you can set max_batches equal to X * 60 * 60 * Y in configuration file.

ravishpankar · July 14, 2021, 11:30am

Spasibo Yura! For 40 batches it takes nearly 1.5 hrs. Train section has no attribute called max_batches. can you take a look at my config file and suggest changes? when you say iteration do you mean epoch? I’m willing to wait for 24 hrs during weekdays and 48 hrs during weekend. do you suggest gpu? are there any guys who provide free model training platforms like kaggle?

yurakuratov · July 14, 2021, 1:52pm

max_batches could be added like this:

...
},
"train": {
"max_batches": 2000, 
...
},
...

when you say iteration do you mean epoch?

I mean single batch.

do you suggest gpu? are there any guys who provide free model training platforms like kaggle?

Yeah, I definitely suggest you to use GPU. Google Colab is a good option.

Topic		Replies	Views
Ner ontonotes Bert model training with ontonotes dataset doesn't finish even after 4 days DeepPavlov Library	11	626	July 12, 2021
Fine Tuning ner_ontonotes_bert with Custom Data Models	3	347	July 23, 2021
Error while trainig NER ontonotess DeepPavlov Library	0	404	August 28, 2019
NER training ontonotes.bert ERROR Models	2	423	February 6, 2020
Проблемы с обучением на GPU модели ner_rus_bert Models	1	664	April 15, 2020

Repeat: Ner ontonotes Bert model training with ontonotes dataset doesn't finish even after 4 days

Related topics