How to change dataset for the demo Simple intent recognition question answering bot?

Hello,

I’m new to using the Deeppavlov framework and I’m trying to use the chatbot with a different sample dataset but I’m having trouble switching over to my dataset despite specifying a link to my dataset. It still uses the sample faq dataset from deeppavlov. I’m messing around with the code on my own collab notebook and running the code on my own. Can anyone help me figure out how to point the bot to my dataset? If this sounds like a stupid question I’m really sorry, I’m new to deeppavlov and bots in general.

from deeppavlov import configs

from deeppavlov.core.common.file import read_json

from deeppavlov.core.commands.infer import build_model

from deeppavlov import configs, train_model

model_config = read_json(configs.faq.tfidf_logreg_en_faq)

model_config["dataset_reader"]["data_path"] = None

model_config["dataset_reader"]["data_url"] = "https://docs.google.com/spreadsheets/d/e/2PACX-1vQGu-u5fYNc412AwIGvGlMP0qQAbdJHJ8o4isVHS3T6jpIgXO-uKAj-2iQBXQ1LRWF7PSa4QTiJNZyo/pub?output=csv"

model_config

answer=faq(["help"])

answer

Your code is missing the model training part - you are trying to call the config object instead of actually training and using a model for prediction on your data.

However, this is not the only problem here. Firstly, you might want to change the data_path variable to a string object, otherwise you will face problems here (you may try it yourself to check). Secondly, while trying to run your code with my corrections I have faced a csv-parsing error - please check your csv file again and make sure to get rid of empty rows in it. After you do that, this code should work correctly.

model_config = read_json(configs.faq.tfidf_logreg_en_faq)
model_config["dataset_reader"]["data_path"] = ''
model_config["dataset_reader"]["data_url"] = "your-dataset-link"

faq = train_model(model_config)
answer = faq(["help"])
answer

Привет!
Я тоже совсем новичок в работе с этой библиотекой, да и текстами недавно занялся, так что не судите строго =)

Я сделал все тоже-самое:

from deeppavlov.core.common.file import read_json
from deeppavlov.core.commands.infer import build_model
from deeppavlov import configs, train_model

model_config = read_json(configs.faq.tfidf_logreg_autofaq)
model_config["dataset_reader"]['data_url'] = None
model_config["dataset_reader"]['data_path'] = '/home/skytiger/Dropbox/nlp_chat_bot/files/lgd_faq.csv'
faq = train_model(model_config)

Получаю такой ответ в итоге:

[nltk_data] Downloading package punkt to /home/skytiger/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/skytiger/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package perluniprops to
[nltk_data]     /home/skytiger/nltk_data...
[nltk_data]   Unzipping misc/perluniprops.zip.
[nltk_data] Downloading package nonbreaking_prefixes to
[nltk_data]     /home/skytiger/nltk_data...
[nltk_data]   Unzipping corpora/nonbreaking_prefixes.zip.
2022-04-13 20:09:47.656 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 219: Cannot load model from /home/skytiger/.deeppavlov/models/vectorizer/tfidf_vectorizer_ruwiki_v2.pkl
2022-04-13 20:09:47.657 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 166: Initializing model sklearn.feature_extraction.text:TfidfVectorizer from scratch
2022-04-13 20:09:47.761 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 109: Fitting model sklearn.feature_extraction.text:TfidfVectorizer
2022-04-13 20:09:47.767 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 241: Saving model to /home/skytiger/.deeppavlov/models/vectorizer/tfidf_vectorizer_ruwiki_v2.pkl
2022-04-13 20:09:47.770 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 101: [saving vocabulary to /home/skytiger/.deeppavlov/models/faq/ru_mipt_answers.dict]
2022-04-13 20:09:47.771 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 219: Cannot load model from /home/skytiger/.deeppavlov/models/faq/tfidf_logreg_classifier_v2.pkl
2022-04-13 20:09:47.772 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 166: Initializing model sklearn.linear_model:LogisticRegression from scratch
2022-04-13 20:09:47.836 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 109: Fitting model sklearn.linear_model:LogisticRegression
/home/skytiger/anaconda3/envs/leadgid_faq/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)
/home/skytiger/anaconda3/envs/leadgid_faq/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:469: FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning.
  "this warning.", FutureWarning)
2022-04-13 20:09:47.880 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 241: Saving model to /home/skytiger/.deeppavlov/models/faq/tfidf_logreg_classifier_v2.pkl
2022-04-13 20:09:47.913 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 203: Loading model sklearn.feature_extraction.text:TfidfVectorizer from /home/skytiger/.deeppavlov/models/vectorizer/tfidf_vectorizer_ruwiki_v2.pkl
2022-04-13 20:09:47.914 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 210: Model sklearn.feature_extraction.textTfidfVectorizer loaded  with parameters
2022-04-13 20:09:47.914 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 216: Fitting of loaded model can not be continued. Model can be fitted from scratch.If one needs to continue fitting, please, look at `warm_start` parameter
2022-04-13 20:09:47.915 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /home/skytiger/.deeppavlov/models/faq/ru_mipt_answers.dict]
2022-04-13 20:09:47.916 ERROR in 'deeppavlov.core.common.params'['params'] at line 112: Exception in <class 'deeppavlov.core.data.simple_vocab.SimpleVocabulary'>
Traceback (most recent call last):
  File "/home/skytiger/anaconda3/envs/leadgid_faq/lib/python3.7/site-packages/deeppavlov/core/common/params.py", line 106, in from_params
    component = obj(**dict(config_params, **kwargs))
  File "/home/skytiger/anaconda3/envs/leadgid_faq/lib/python3.7/site-packages/deeppavlov/core/data/simple_vocab.py", line 62, in __init__
    self.load()
  File "/home/skytiger/anaconda3/envs/leadgid_faq/lib/python3.7/site-packages/deeppavlov/core/data/simple_vocab.py", line 118, in load
    token, cnt = self.load_line(ln)
  File "/home/skytiger/anaconda3/envs/leadgid_faq/lib/python3.7/site-packages/deeppavlov/core/data/simple_vocab.py", line 139, in load_line
    token, cnt = ln.rsplit('\t', 1)
ValueError: not enough values to unpack (expected 2, got 1)

Не подскажешь в чем беда?