Is it possible to control resources of computer for train and predict process?

ostreech1997 · June 1, 2020, 11:06am

Hi!
Thanks a lot for your amazing library!
I have several questuins about the usage of computer resources for train and predict process.
1)Is it possible to manage memory usage of GPU, when we train model and predict by model using GPU? When I try to train model or load and predict after training process, this tasks take almost all memory of GPU, It’s very bad for my chatbot application, because when I run my appliaction, my models are loaded and all GPU memory is busy and another tasks can’t be solved. May be, it is possible to train model using GPU, and predict using CPU? Or may be, there is a way to limit memory usage of train and predict processes?
2) Is it possible to manage usage of CPU, when we train model and predict by model using CPU. I have similar problem as GPU, I want to manage how many cores will be used during training process.
When I train models using CPU, all 16 cores is used. Because of this, during training process, I can’t solve another tasks. Is it possbile to limit number of cores for train process?

yoptar · June 1, 2020, 1:45pm

Hi @ostreech1997,

Thank you for your praise =)

You can limit what GPU devices are visible by a process by using the CUDA_VISIBLE_DEVICES environment variable. You can set it to an empty string to disable GPU usage. But you would have to build your model for inference in a separate process from training as TensorFlow will not release GPU memory until its process is closed.
I don’t think there’s a simple way of limiting CPU usage for a model training. Docker could help with that, I think.

ostreech1997 · June 3, 2020, 12:00pm

Thanks a lot for your answer!

Topic		Replies	Views
Проблемы с обучением на GPU модели ner_rus_bert Models	1	663	April 15, 2020
Training classification models with several GPUs Models	2	408	February 18, 2020
Memory use when training and evaluating NER model DeepPavlov Library	2	288	July 7, 2021
OOM error with GPU DeepPavlov Library	1	423	February 26, 2020
Упала точность обучения DeepPavlov Library	8	70	July 30, 2024

Is it possible to control resources of computer for train and predict process?

Related topics