Found BERT extremely hard to train

I am following the tutorial here dp_tutorials/Tutorial_3_RU_Fine_tuning_BERT_classifier.ipynb at master · deepmipt/dp_tutorials · GitHub and I found it extremely hard to train because of the RAM memory issues.
I tried to reduce the batch size to 8, even that does not help to train as I keep running out of memory.

I am wondering on what machine and RAM size was used when training this model?

I ran out of memory on google collab and tried it on Kaggle Kernel, no use at all.
Would be super helpful if someone could give some hint on how to approach this problem?