Intent Catcher training and sample dataset

Hey folks,

Total beginner when it comes to AI here. I’m trying to find the sample datasets that the intent catcher uses. DeepPavlov/deeppavlov/configs/intent_catcher at 0.17.6 · deeppavlov/DeepPavlov · GitHub

I’m able to use the default trained classifier but I need to add my own data. Regex is needed for me or else I’d just use one of the other configs. Does anyone have good documentation for how to setup the training and test data?

Docs say training data is suppose to be structured like

{
    "intent_1": ["regexp1", "regexp2"]
}

which is self explanatory, but I cant find what test.json and valid.json are suppose to look like.

Trying to train with empty test and valid files gives me errors.

  File "<...>/deeppavlov/core/trainers/nn_trainer.py", line 175, in _validate
    metrics = list(report['metrics'].items())
AttributeError: 'NoneType' object has no attribute 'items'

Maybe this is more a question about nn_trainer.
Any help would be appreciated.

Thanks!

Hi!

The train/valid/test data are implied to have the same format.
If you do not have valid/test files, you may split your train data to train+valid+test (like here DeepPavlov/rusentiment_convers_bert.json at 0.17.6 · deeppavlov/DeepPavlov · GitHub)
or specify which parts of the dataset you want to evaluate on (DeepPavlov/intent_catcher.json at 0.17.6 · deeppavlov/DeepPavlov · GitHub) – train, valid, test.