Memory use when training and evaluating NER model

Hello,

We are currently using deeppavlov’s framework to extract custom named entities. We have trained a model on a labeled dataset but have encountered unexpected memory usage issues. The issues cause termination of the training / evaluation process.

For some reason during the training and evaluation stages the memory usage shoots up:


The maximum memory available is 8GB. We have GPU available and in both training and prediction / evaluation scripts have a line: os.environ[‘CUDA_VISIBLE_DEVICES’] = ‘0’. Training dataset is only 700 sentences.

Initially it was crashing during training but I reduced the batch size from 16 to 4 and it manages to go through a few epochs (still crashing at the end).

It is more bizarre with the evaluate_model function or python -m deeppavlov evaluate command. The process gets killed (I assume due to memory limitations) after loading vocabulary from a trained NER model. aka last Info log in ‘deeppavlov.core.data.simple_vocab’ at line 115.

Would you kindly point to a source of possible high memory usage and how to remedy it?

Thank you,

Nikita

Hi! Could you provide configuration file that you are using? Is this happening for pre-trained models from the library?

High memory usage might be caused by long sequences. You can try to reduce max_seq_length parameter. But still it is quite strange that model fails with OOM at evaluation stage.

I have checked GPU_MEM and RAM usage for the evaluation stage (python -m deeppavlov evaluate) for several configs:

config              GPU_MEM       RAM
ner_rus_bert         785MiB       6300MiB
ner_rus_bert_torch  1793MiB       2472MiB