How pretrain new transfomer model from scratch?

ap.neustroev · July 3, 2020, 6:39am

Greetings!

I have a question related to pretraining of transformers. I am a student studying ML&DL and trying to teach my BART(https://arxiv.org/abs/1910.13461) model for abstraction summarization in Russian, but I have not found on the Internet how to train this model from scratch, or another transformer model. As far as I understand, you need to create your own token library (it seems that this is done through BPE), and then train the model with news articles with headlines. That is, training can be carried out through a trainer from the library of transfomers? and you don’t need to change the data yourself, add noise, mix, etc. (as described in study)?

I decided to ask you for advice, since you have already pretrained RuBERT, please share your opinion on this issue.

Приветствую!

У меня есть вопрос связанный с предобучением трансформеров. Я пока просто студент и изучаю машинное обучение и пытаюсь обучить свою модель BART(https://arxiv.org/abs/1910.13461) для абстрактной суммаризации на русском языке, но в интернете не нашел о том как с нуля обучить эту модель, либо другой трансформер. Я так понял нужно сделать свою библиотеку токенов(Вроде стоит сделать через BPE) и затем обучить модель на новостных статьях с заголовками. Так вот обучение можно сделать через trainer из библиотеки transfomers? и не надо самому изменять данные, добавлять шумы, миксить и тд?(как было написано в статье)

Я решил спросить у вас совет так как вы уже обучили RuBERT, поделитесь пожалуйста вашим мнением по данном вопросу.

mu-arkhipov · July 4, 2020, 5:26am

The training was performed using original BERT repo code. Only a slight modifications were done to support multi-GPU setting. All modifications are in our fork: https://github.com/deepmipt/bert

Topic		Replies	Views
Training Pre-trained Model in Deeppavlov	2	278	February 2, 2021
How can I train my own model? Models	2	366	January 12, 2024
Retrain My Custom Train Model Models	5	342	August 26, 2021
Paraphrase detection model Models	4	1083	May 25, 2020
Complete guide on mulilingual QA model implementation Tutorials & Guidelines	1	344	August 12, 2021

How pretrain new transfomer model from scratch?

Related Topics