Hi!
Yes, squad_bert_infer.json
model was not trained on data with no answer, but examples with no answer could appear during training process: if paragraph is too long we cut it to 384 subtokens (question + paragraph + special tokens). So, squad_bert
model is able to deal with no answer questions (it uses [CLS]
token as no answer) , but it is better to specially train it on such kind of data.
If you need model trained on data with no answer you can try multi_squad_noans_infer.json config. This model is based on R-Net and data used for training is described here: http://docs.deeppavlov.ai/en/master/features/models/squad.html#squad-with-contexts-without-correct-answers . You can also train BERT-based model on this data.
We don’t have pre-trained BERT model on SQuAD 2.0 dataset, but you can train such model by yourself: all you need is to code dataset_reader
for SQuAD 2.0 dataset or convert SQuAD 2.0 dataset to the same format as SQuAD 1.1 and use squad_bert.json
config for training.