Ner ontonotes Bert model training with ontonotes dataset doesn't finish even after 4 days

I noticed something abnormal. I ran train model with ner ontonotes bert probas config. It did not stop even after 4 days. I’m using a i5 processor, two of them on windows 10. Number of iterations set to 1. No GPU in my machine. I’m using deeppavlov 0.12.1. validation patience has default value. Am I doing something incorrect or it is expected on my machine?


Could you provide your configuration file and logs output?

Before training is started we run evaluation on validation set, it might that validation step takes too long on cpu-only machine. Also, do you see that cpu usage is high while training is running?

Hi Mr.YuraKoratov,
The logs and the config file are here sir. The training is done without the model but with tag.dict.

Best regards,

(Attachment is missing)

(Attachment ner_ontonotes_bert_probas.json is missing)

Looks like attachment is missing.

Hi Mr.YuraKoratov, I’m attaching the config file. I set the number of epochs to 1 before train_model(config).
Please see the logs in this mail. You know, I attached the logs in the form of zip file and the config file but zip isn’t allowed by yahoo. I thought only the logs need to be attached since I presumed the config file would have been sent to you. Sorry about that. Please take a look. I didn’t see the CPU usage btw for those four days.
Best regards,

(Attachment ner_ontonotes_bert_probas.json is missing)

Still no attachment.
I think there is a problem that attached files via mail are not available on the DeepPavlov forum.
Could you try to add them via forum, not via mail?

Hi Mr.YuraKuratov, Attaching the logs and config file.

I don’t know how to attach directly to the topic. So I’m attaching the logs and config file in their order of appearance from my google drive.

Hi Mr.YuraKuratov, Please help.

1 Like

Hi mr.yurakuratov, please help.

Hi mr.yurakuratov, please help. I need to finish training before end of this week. Please tell me about how many iterations I should train. I may be annotating and adding a maximum of two or three thousand sentences to the base dataset and train.

Hello Mr.Yurakuratov, urgent help is needed. Please find time for this issue.

As I see from logs, training is still going cause stopping criteria was not satisfied.
In your config file two stopping criteria are set:

  1. 30 epochs
  2. metrics on validation should not increase for 100 (patience) validations (each validation - every 40 batches).

On the plot ner_token_f1 metric is still increasing.