Multi-threading in NER model?

Sorry for the naive question, I am a newbie of DeepPavlov.

Is a NER model built with build_model multi-threaded (e.g. build_model(configs.ner.ner_ontonotes_bert_mult) ?
Or is there any parameter/arg to set to have multi-threading?

Thanks for your kind support!

The model works in a single thread by default and does not support multi-threading out of the box. However, is the intention to use multi-threading dictated by speeding up purposes only? This question is crucial because the heavy lifting (matrix multiplication, self-attention computation, etc.) is performed by TensorFlow the model can easily utilize multiple CPUs or a single GPU out of the box. Just using batch_size significantly larger than 1 is enough to utilize resources efficiently.

Hello mu-arkhipov

Many thanks for your kind reply.
Yes, the intention is speeding up the process, but during decoding (not training): is the suggestion to increase the batch_size still useful even for decoding only?

Correct, regularly, processing speed at inference (decoding) time measured in samples per second grow with batch_size. The only thing that needs to be taken into account is padding. Suppose there are two sentences in the batch with lengths 1 and 12 tokens. The first one with length 1 will be padded up to 12 tokens, so computations for 11 tokens for the first sentence will be wasted. A good practice is to pack together sentences of similar length. Some times it is referred as bucketing.

Thanks mu-arkhipov for your valuable suggestions!