Repeat: Ner ontonotes Bert model training with ontonotes dataset doesn't finish even after 4 days

please look at the topic with this topic title under deeppavlov framework category updated April 20th. it was handled by Yura kuratov. now it lays unattended.

Hi YuraKuratov, Thank you. Can you give me the ideal termination condition? How many epochs and validation patience value for a cpu training?what F1 score is ideal?I’m not sure if I’m asking the questions correctly. I assume that it tries to reach 100% F1 score Or 30 itns to be completed.None of these conditions are met so training did not complete even after 4days. please correct me if I’m wrong and suggest a very nice termination condition.

If for 4000 batches, the f1scores of each of those 40 batches do not increase then it would stop. 30 itns were not over in those four days and f1score was always increasing for the validation patience value 100. I understand. Now, please suggest a nice value for these two parameters of training.I’m expecting a nice f1score. I’m using a CPU.

You can always stop training that is running and DeepPavlov will save the best current checkpoint.

If it takes too long for training to stop on CPU you can modify total number of batches (or epochs) that model should train.

For example:
You are ready to wait for X hours. Check how long does it take model to make one iteration (current number of iterations and time is logged both in stdout and tensorboard), let it be Y iterations/second. So, you can set max_batches equal to X * 60 * 60 * Y in configuration file.

Spasibo Yura! For 40 batches it takes nearly 1.5 hrs. Train section has no attribute called max_batches. can you take a look at my config file and suggest changes? when you say iteration do you mean epoch? I’m willing to wait for 24 hrs during weekdays and 48 hrs during weekend. do you suggest gpu? are there any guys who provide free model training platforms like kaggle?

max_batches could be added like this:

...
},
"train": {
"max_batches": 2000, 
...
},
...

when you say iteration do you mean epoch?

I mean single batch.

do you suggest gpu? are there any guys who provide free model training platforms like kaggle?

Yeah, I definitely suggest you to use GPU. Google Colab is a good option.