Training NER model on my own tags

I don’t understand how to train NER model on my own tags. It seems that I miss some steps. I followed recommendations from here ner_few_shot_ru | Fine-tuning the model · Issue #1071 · deeppavlov/DeepPavlov · GitHub

This is my code

import json
from deeppavlov import configs, build_model, train_model

with configs.ner.ner_rus_bert.open(encoding=‘utf8’) as f:
ner_config = json.load(f)

ner_config[‘dataset_reader’][‘data_path’] = ‘contents/my_data/’ # directory with train.txt, valid.txt and test.txt files
ner_config[‘metadata’][‘variables’][‘NER_PATH’] = ‘contents/’
ner_config[‘metadata’][‘download’] = [ner_config[‘metadata’][‘download’][-1]] # do not download the pretrained ontonotes model

ner_model = train_model(ner_config, download=True)

I, nevertheless, obtain this error:

2024-05-14 11:17:30.315 INFO in ‘deeppavlov.core.data.utils’[‘utils’] at line 97: Downloading from http://files.deeppavlov.ai/v1/ner/ner_rus_bert_torch_new.tar.gz to /root/.deeppavlov/models/ner_rus_bert_torch_new.tar.gz
INFO:deeppavlov.core.data.utils:Downloading from http://files.deeppavlov.ai/v1/ner/ner_rus_bert_torch_new.tar.gz to /root/.deeppavlov/models/ner_rus_bert_torch_new.tar.gz
100%|██████████| 1.44G/1.44G [01:13<00:00, 19.6MB/s]
2024-05-14 11:18:44.831 INFO in ‘deeppavlov.core.data.utils’[‘utils’] at line 284: Extracting /root/.deeppavlov/models/ner_rus_bert_torch_new.tar.gz archive into /root/.deeppavlov/models/ner_rus_bert_torch
INFO:deeppavlov.core.data.utils:Extracting /root/.deeppavlov/models/ner_rus_bert_torch_new.tar.gz archive into /root/.deeppavlov/models/ner_rus_bert_torch
2024-05-14 11:19:22.229 WARNING in ‘deeppavlov.core.trainers.fit_trainer’[‘fit_trainer’] at line 66: TorchTrainer got additional init parameters [‘pytest_max_batches’, ‘pytest_batch_size’] that will be ignored:
WARNING:deeppavlov.core.trainers.fit_trainer:TorchTrainer got additional init parameters [‘pytest_max_batches’, ‘pytest_batch_size’] that will be ignored:
2024-05-14 11:19:23.721 INFO in ‘deeppavlov.core.data.simple_vocab’[‘simple_vocab’] at line 104: [saving vocabulary to /root/.deeppavlov/models/ner_rus_bert_torch/tag.dict]
INFO:deeppavlov.core.data.simple_vocab:[saving vocabulary to /root/.deeppavlov/models/ner_rus_bert_torch/tag.dict]
Some weights of the model checkpoint at DeepPavlov/rubert-base-cased were not used when initializing BertForTokenClassification: [‘cls.seq_relationship.bias’, ‘cls.seq_relationship.weight’, ‘cls.predictions.bias’, ‘cls.predictions.transform.dense.weight’, ‘cls.predictions.decoder.bias’, ‘cls.predictions.decoder.weight’, ‘cls.predictions.transform.LayerNorm.weight’, ‘cls.predictions.transform.dense.bias’, ‘cls.predictions.transform.LayerNorm.bias’]

  • This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Some weights of BertForTokenClassification were not initialized from the model checkpoint at DeepPavlov/rubert-base-cased and are newly initialized: [‘classifier.weight’, ‘classifier.bias’]
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    2024-05-14 11:19:29.348 WARNING in ‘deeppavlov.core.models.torch_model’[‘torch_model’] at line 96: Unable to place component TorchTransformersSequenceTagger on GPU, since no CUDA GPUs are available. Using CPU.
    WARNING:deeppavlov.core.models.torch_model:Unable to place component TorchTransformersSequenceTagger on GPU, since no CUDA GPUs are available. Using CPU.
    2024-05-14 11:19:30.838 ERROR in ‘deeppavlov.core.common.params’[‘params’] at line 108: Exception in <class ‘deeppavlov.models.torch_bert.torch_transformers_sequence_tagger.TorchTransformersSequenceTagger’>
    Traceback (most recent call last):
    File “/usr/local/lib/python3.10/dist-packages/deeppavlov/core/common/params.py”, line 102, in from_params
    component = obj(**dict(config_params, **kwargs))
    File “/usr/local/lib/python3.10/dist-packages/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py”, line 173, in init
    super().init(model, **kwargs)
    File “/usr/local/lib/python3.10/dist-packages/deeppavlov/core/models/torch_model.py”, line 84, in init
    self.load()
    File “/usr/local/lib/python3.10/dist-packages/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py”, line 253, in load
    super().load(fname)
    File “/usr/local/lib/python3.10/dist-packages/deeppavlov/core/models/torch_model.py”, line 144, in load
    self.model.load_state_dict(model_state)
    File “/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py”, line 1671, in load_state_dict
    raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}’.format(
    RuntimeError: Error(s) in loading state_dict for BertForTokenClassification:
    size mismatch for classifier.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).
    size mismatch for classifier.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([2]).
    ERROR:deeppavlov.core.common.params:Exception in <class ‘deeppavlov.models.torch_bert.torch_transformers_sequence_tagger.TorchTransformersSequenceTagger’>
    Traceback (most recent call last):
    File “/usr/local/lib/python3.10/dist-packages/deeppavlov/core/common/params.py”, line 102, in from_params
    component = obj(**dict(config_params, **kwargs))
    File “/usr/local/lib/python3.10/dist-packages/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py”, line 173, in init
    super().init(model, **kwargs)
    File “/usr/local/lib/python3.10/dist-packages/deeppavlov/core/models/torch_model.py”, line 84, in init
    self.load()
    File “/usr/local/lib/python3.10/dist-packages/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py”, line 253, in load
    super().load(fname)
    File “/usr/local/lib/python3.10/dist-packages/deeppavlov/core/models/torch_model.py”, line 144, in load
    self.model.load_state_dict(model_state)
    File “/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py”, line 1671, in load_state_dict
    raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}’.format(
    RuntimeError: Error(s) in loading state_dict for BertForTokenClassification:
    size mismatch for classifier.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).
    size mismatch for classifier.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([2]).

RuntimeError Traceback (most recent call last)
in <cell line: 11>()
9 ner_config[‘metadata’][‘download’] = [ner_config[‘metadata’][‘download’][-1]] # do not download the pretrained ontonotes model
10
—> 11 ner_model = train_model(ner_config, download=True)

9 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
1669
1670 if len(error_msgs) > 0:
→ 1671 raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}’.format(
1672 self.class.name, “\n\t”.join(error_msgs)))
1673 return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for BertForTokenClassification:
size mismatch for classifier.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).
size mismatch for classifier.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([2]).

Dear @egorhowtocode,

Thank you for your interest in DeepPavlov models!

You seem to be doing everything right.

We have a special method in our library for config parsing. Please try using it to modify the default config for your needs and I suggest deleting the download key from config. An example is in the code snippet below.

from deeppavlov import train_model, build_model
from deeppavlov.core.commands.utils import parse_config

ner_config = parse_config('ner_rus_bert')
ner_config[‘dataset_reader’][‘data_path’] = ‘contents/my_data/’
ner_config[‘metadata’][‘variables’][‘NER_PATH’] = ‘contents/’

del ner_config[‘metadata’][‘download’]

ner_model = train_model(ner_config, download=True)

Make sure to set MODEL_PATH variable to the path where you want your model to be saved to and if it contains tag.dict file after the run with the mismatch error, delete it before the new run.

The mismatch happens when the model downloads data from our default dataset and its amount of labels doesn’t match yours.

Hope this will solve your problem.

Kind regards,

Anna.

Thank you very much!
Am I right that I can just set MODEL_PATH = ‘contents/’?

I wrote your code augmenting it with MODEL_PATH:

from deeppavlov import train_model, build_model

from deeppavlov.core.commands.utils import parse_config

MODEL_PATH = 'contents/'

ner_config = parse_config('ner_rus_bert')

ner_config['dataset_reader']['data_path'] = 'contents/my_data/'

ner_config['metadata']['variables']['NER_PATH'] = 'contents/'

del ner_config['metadata']['download']

ner_model = train_model(ner_config, download=True)

And now I get this new error after running the code above:

# Drop BIO or BIOES markup
---> 72     assert all(len(tag.split('-')) <= 2 for tag in y_true)
     73 
     74     y_true = [tag.split('-')[-1] for tag in y_true]

AssertionError:

Dear @egorhowtocode,

looks like the training started successfully and the error occurred during the first metric calculation.
From what I see the error is with the labels in your dataset. The labels should be as in the example below:

Alex B-PER
is O
going O
with O
Marty B-PER
A. I-PER
Rick I-PER
to O
Los B-LOC
Angeles I-LOC

Please check that if you have a more detailed label it has only one ‘-’ in its name, for example:

B-PHONE_NUMBER, but not B-PHONE-NUMBER

If the error occurs again, please send the full text of the Traceback.

Kind regards,
Anna.