Bert Preprocesor error KeyError: '[CLS]'

Input:
bert_proc=BertPreprocessor(vocab_file="/home/conversight/sai/deeppavlov/cased_L-12_H-768_A-12/",do_lower_case=False,max_seq_length=64)
bert_proc(x_train)
Output:
KeyError Traceback (most recent call last)
in
----> 1 bert_proc(x_train)

~/anaconda3/envs/deeppavlov/lib/python3.7/site-packages/deeppavlov/models/preprocessors/bert_preprocessor.py in call (self, texts_a, texts_b)
74 examples = [InputExample(unique_id=0, text_a=text_a, text_b=text_b)
75 for text_a, text_b in zip(texts_a, texts_b)]
—> 76 return convert_examples_to_features(examples, self.max_seq_length, self.tokenizer)
77
78

~/anaconda3/envs/deeppavlov/lib/python3.7/site-packages/bert_dp/preprocessing.py in convert_examples_to_features(examples, seq_length, tokenizer)
74 input_type_ids.append(1)
75
—> 76 input_ids = tokenizer.convert_tokens_to_ids(tokens)
77
78 # The mask has 1 for real tokens and 0 for padding tokens. Only real

~/anaconda3/envs/deeppavlov/lib/python3.7/site-packages/bert_dp/tokenization.py in convert_tokens_to_ids(self, tokens)
177
178 def convert_tokens_to_ids(self, tokens):
–> 179 return convert_by_vocab(self.vocab, tokens)
180
181 def convert_ids_to_tokens(self, ids):

~/anaconda3/envs/deeppavlov/lib/python3.7/site-packages/bert_dp/tokenization.py in convert_by_vocab(vocab, items)
138 output = []
139 for item in items:
–> 140 output.append(vocab[item])
141 return output
142

KeyError: ‘[CLS]’

Can someone explain why I am getting this error.

Hey!
Can you please provide the content of the bert_proc.tokenizer.vocab?

1 Like

OrderedDict()
thank you
I have n’t perfectly loaded the vocab file now I rectfied the error
After specifying the correct path it worked

1 Like