All necessary files for a specific config file could be downloaded with command:
python -m deeppavlov download config_name
This command downloads everything from configuration’s download section.
Requirements (deepmipt/bert) could be installed with:
python -m deeppavlov install config_name
In case of “BERT on RuSentiment”, download will get pre-trained MultilingualBERT model (the first link) and parameters of the model (the second link) fine-tuned on RuSentiment data.
MultilingualBERT model was pre-trained by Google and we re-use it.
In case if you want to train your BERT model from scratch on MLM and NSP tasks then I would recommend to use original BERT repo or our fork with multi-gpu support and follow instructions in readme file.
Are there plans to make such steps for such models programmatic/reproducible?
Most of the BERT models that are available (English, Multilingual, …) are not pre-trained by DeepPavlov, we just use them as-is. We cannot provide the way to reproduce them.
Also, we don’t have one-command solution for BERT pre-training and I’m not sure that we have this in our roadmap. But all necessary steps for pre-training are known and described: