Hi there Vasiliy,
Thanks fror the quick response. I am running the corpus with that code exactly, unfortunately I get the following output looped:
2020-04-14 08:41:38.103 INFO in ‘deeppavlov.dataset_readers.odqa_reader’[‘odqa_reader’] at line 57: Reading files…
2020-04-14 08:41:38.111 INFO in ‘deeppavlov.dataset_readers.odqa_reader’[‘odqa_reader’] at line 134: Building the database…
0%| | 0/300 [00:00<?, ?it/s]
0it [00:00, ?it/s]2020-04-14 08:41:39.17 INFO in ‘deeppavlov.dataset_readers.odqa_reader’[‘odqa_reader’] at line 57: Reading files…
Traceback (most recent call last):
File “”, line 1, in
File “C:\Users\Kostis\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py”, line 105, in spawn_main
exitcode = _main(fd)
File “C:\Users\Kostis\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py”, line 114, in _main
prepare(preparation_data)
File “C:\Users\Kostis\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py”, line 225, in prepare
_fixup_main_from_path(data[‘init_main_from_path’])
File “C:\Users\Kostis\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py”, line 277, in _fixup_main_from_path
run_name=“mp_main”)
File “C:\Users\Kostis\AppData\Local\Programs\Python\Python37\lib\runpy.py”, line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File “C:\Users\Kostis\AppData\Local\Programs\Python\Python37\lib\runpy.py”, line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File “C:\Users\Kostis\AppData\Local\Programs\Python\Python37\lib\runpy.py”, line 85, in run_code
exec(code, run_globals)
File “C:\Users\Kostis\Desktop\Deeppavlov-Python\RunningPavlovWithOwnData.py”, line 16, in
ranker = train_model(model_config)
File "C:\Users\Kostis\env\Lib\site-packages\deeppavlov_init.py", line 32, in train_model
train_evaluate_model_from_config(config, download=download, recursive=recursive)
File “C:\Users\Kostis\env\Lib\site-packages\deeppavlov\core\commands\train.py”, line 92, in train_evaluate_model_from_config
data = read_data_by_config(config)
File “C:\Users\Kostis\env\Lib\site-packages\deeppavlov\core\commands\train.py”, line 58, in read_data_by_config
return reader.read(data_path, **reader_config)
File “C:\Users\Kostis\env\Lib\site-packages\deeppavlov\dataset_readers\odqa_reader.py”, line 81, in read
self._build_db(save_path, dataset_format, expand_path(data_path))
File “C:\Users\Kostis\env\Lib\site-packages\deeppavlov\dataset_readers\odqa_reader.py”, line 130, in _build_db
Path(save_path).unlink()
File “C:\Users\Kostis\AppData\Local\Programs\Python\Python37\lib\pathlib.py”, line 1304, in unlink
self._accessor.unlink(self)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: ‘C:\Users\Kostis\.deeppavlov\downloads\odqa\enwiki.db’
Also
While I’m trying with my own ranker file I have changed the pipe section’s save_path and load_path to so:
“save_path”: “{MODELS_PATH}/servoy_articles/servoy_documentation.npz”,
“load_path”: “{MODELS_PATH}/servoy_articles/servoy_documentation.npz”
But I get: FileNotFoundError: HashingTfIdfVectorizer path doesn’t exist!
Which is understandable since it doesn’t exist. But I thought it was supposed to create itself when working with my own data.
I’ve also tried creating my own empty .npz file but then the npyio.py file cannot perform:
return pickle.load(fid, **pickle_kwargs)
and thus I get:
Failed to interpret file WindowsPath(‘C:/Users/Kostis/.deeppavlov/models/servoy/servoy_documentation_tfidf_matrix.npz’) as a pickle
How should the .npz be generated then and how should the vectorizer parameters be set up when using your own data?
Or maybe is something wrong with my reader file?
I have set datapath and savepath the following:
“data_path”: “{DOWNLOADS_PATH}/servoy_articles”,
“save_path”: “{DOWNLOADS_PATH}/servoy.db”,
The database is just a small file (12kb) with a table “documents” containing fields “id” (text) and “text” (text).
I copied it from the enwiki.db file from the example.
So does the db need to be filled first before starting with the vectorizer? Or maybe am I doing something wrong with the permissions?
I’m lost…