Memory use when training and evaluating NER model

nikita-yatchenko · July 1, 2021, 3:18pm

Hello,

We are currently using deeppavlov’s framework to extract custom named entities. We have trained a model on a labeled dataset but have encountered unexpected memory usage issues. The issues cause termination of the training / evaluation process.

For some reason during the training and evaluation stages the memory usage shoots up:

The maximum memory available is 8GB. We have GPU available and in both training and prediction / evaluation scripts have a line: os.environ[‘CUDA_VISIBLE_DEVICES’] = ‘0’. Training dataset is only 700 sentences.

Initially it was crashing during training but I reduced the batch size from 16 to 4 and it manages to go through a few epochs (still crashing at the end).

It is more bizarre with the evaluate_model function or python -m deeppavlov evaluate command. The process gets killed (I assume due to memory limitations) after loading vocabulary from a trained NER model. aka last Info log in ‘deeppavlov.core.data.simple_vocab’ at line 115.

Would you kindly point to a source of possible high memory usage and how to remedy it?

Thank you,

Nikita

yurakuratov · July 7, 2021, 9:34am

Hi! Could you provide configuration file that you are using? Is this happening for pre-trained models from the library?

High memory usage might be caused by long sequences. You can try to reduce max_seq_length parameter. But still it is quite strange that model fails with OOM at evaluation stage.

yurakuratov · July 7, 2021, 4:05pm

I have checked GPU_MEM and RAM usage for the evaluation stage (python -m deeppavlov evaluate) for several configs:

config              GPU_MEM       RAM
ner_rus_bert         785MiB       6300MiB
ner_rus_bert_torch  1793MiB       2472MiB

Topic		Replies	Views
Ner ontonotes Bert model training with ontonotes dataset doesn't finish even after 4 days DeepPavlov Library	11	627	July 12, 2021
NER-RU not working DeepPavlov Dream	0	207	April 22, 2023
Repeat: Ner ontonotes Bert model training with ontonotes dataset doesn't finish even after 4 days	6	455	July 14, 2021
OOM error while training BioBert on SQuAD dataset Models	3	843	January 27, 2021
Running out of memory	0	259	April 24, 2021

Memory use when training and evaluating NER model

Related topics