Recommended preprocessing for ruBERT

Hello. I’m trying to use embedder based on ruBERT and wondering if would i have to specifically preprocess data, sach as normalizing, remove punctuation and so on.


No special preprocessing is needed for this model, but it is better to remove urls, html code and so on.