How to add the Gradient Accumulation to my model?

I am following this tutorial of yours dp_tutorials/Tutorial_3_RU_Fine_tuning_BERT_classifier.ipynb at master · deepmipt/dp_tutorials · GitHub

However, I am constantly running out of memory so I decided to use Gradient Accumulation to reduce the memory size. How can I add it to my model? I can’t find any tutorial that explains it well

Thank you


We do not have built-in gradient accumulation for our models, but it is straightforward to add.
You should pass full batch to model.train_on_batch / model.__call__ methods and split it on sub-batches in these methods. I would suggest to inherit from TorchTransformersClassifierModel with your implementation of train_on_batch and __call__.