Based on the instructions from here Offline with Docker and the enhancements from here I have prepared a Dockerfile
that creates an image for an offline use. It is working perfectly with the ner_ontonotes_bert_mult
and the ner_rus_bert
models. However, it fails to work offline with the ner_rus_convers_distilrubert_6L
because even after building it tries to connect the network.
My Dockerfile:
FROM deeppavlov/base-cpu:0.17.6
RUN sed -i 's/mipt/pavlovteam/g' /base/DeepPavlov/deeppavlov/requirements/bert_dp.txt
RUN python -m deeppavlov install ner_rus_convers_distilrubert_6L && \
python -m deeppavlov download ner_rus_convers_distilrubert_6L && \
pip3 install --upgrade protobuf==3.20.0
CMD python -m deeppavlov riseapi ner_rus_convers_distilrubert_6L -p 5000
I even tried to add a single prediction during image building so it downloads everything it needs with the following line:
RUN python -m deeppavlov predict ner_rus_convers_distilrubert_6L -f /etc/passwd
It does downloads something and runs the prediction but then, after disconnect & restart it still tries to get something and results in the following error.
2023-02-27 12:08:11.3 ERROR in 'deeppavlov.core.common.params'['params'] at line 112: Exception in <class 'deeppavlov.models.preprocessors.torch_transformers_preprocessor.TorchTransformersNerPreprocessor'>
Traceback (most recent call last):
File "/base/DeepPavlov/deeppavlov/core/common/params.py", line 106, in from_params
component = obj(**dict(config_params, **kwargs))
File "/base/DeepPavlov/deeppavlov/models/preprocessors/torch_transformers_preprocessor.py", line 322, in __init__
self.tokenizer = AutoTokenizer.from_pretrained(vocab_file, do_lower_case=do_lower_case)
File "/base/venv/lib/python3.7/site-packages/transformers/models/auto/tokenization_auto.py", line 435, in from_pretrained
return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/base/venv/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1680, in from_pretrained
user_agent=user_agent,
File "/base/venv/lib/python3.7/site-packages/transformers/file_utils.py", line 1279, in cached_path
local_files_only=local_files_only,
File "/base/venv/lib/python3.7/site-packages/transformers/file_utils.py", line 1495, in get_from_cache
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.
Traceback (most recent call last):
File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/base/DeepPavlov/deeppavlov/__main__.py", line 4, in <module>
main()
File "/base/DeepPavlov/deeppavlov/deep.py", line 113, in main
start_model_server(pipeline_config_path, args.https, args.key, args.cert, port=args.port)
File "/base/DeepPavlov/deeppavlov/utils/server/server.py", line 179, in start_model_server
model = build_model(model_config)
File "/base/DeepPavlov/deeppavlov/core/commands/infer.py", line 62, in build_model
component = from_params(component_config, mode=mode, serialized=component_serialized)
File "/base/DeepPavlov/deeppavlov/core/common/params.py", line 106, in from_params
component = obj(**dict(config_params, **kwargs))
File "/base/DeepPavlov/deeppavlov/models/preprocessors/torch_transformers_preprocessor.py", line 322, in __init__
self.tokenizer = AutoTokenizer.from_pretrained(vocab_file, do_lower_case=do_lower_case)
File "/base/venv/lib/python3.7/site-packages/transformers/models/auto/tokenization_auto.py", line 435, in from_pretrained
return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/base/venv/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1680, in from_pretrained
user_agent=user_agent,
File "/base/venv/lib/python3.7/site-packages/transformers/file_utils.py", line 1279, in cached_path
local_files_only=local_files_only,
File "/base/venv/lib/python3.7/site-packages/transformers/file_utils.py", line 1495, in get_from_cache
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.
Do you have an idea of how to initialize this model for an offline use?