Enable Batching for NER model inference

Hello,

I am interested in Named Entity Recognition task and adopting DeepPavlov NER models purely for infering the raw texts. In order to accomplish this for a given text, represented as a text string variable, it was decided to go with the following set of code snippets.

  1. Initializing NER model as follows:
    import deeppavlov
    model = deeppavlov.build_model(model_name, download=True, install=True)
  2. Launching inference for a gieven text (string variable) as follows:
    result = model([text])

Given a sequence of texts I am interested in passing a series of texts at once.
Therefore the question is as follows: how to use DeePavlov NER inference in batching mode?

Thanks,
Sincerely,
Nicolay

Dear @nick ,

you can use the following code snippet as a solution:

from deeppavlov import build_model

model = build_model(config, download=True, install=True)

batch_size = 3
for i in range(0, len(text), batch_size):
    batch_res = model(text[i:i+batch_size])

Hope this will help.

1 Like

Thank you for your assistace @Anna!
Passing the list of texts actually and positively affect on performance.

I was able to reproduce that. Let me share a bit details on here for those who will encouter with again. According to the extensive experiment on CPU once batch_size is at least has 3 texts, it results in 1.5 performance increment.

I beleive that the effect of the stable increment is caused by an automatic mechanism of splitting input sequences into batches, sized to the particular and implicit parameter of the NER model.
If so, then I would like to continue with the following question:

is there a way to explicitly controll the maximum size of the batch on initialization or inference?

Thank you!
Sincerely,
Nicolay

Thank you.

Dear @nick,

Thank you for your insight! We will look into it and add the proposed feature to our library.

Kind regards,
Anna.

1 Like

Hello Anna!

Thanks for a such considerations!
Since we already have seen the performance enhancement, I believe it already implements batching in some way, since mechanism of models building is unique to the certain extent.
A quick assumption it could be similar to the concept of batchinig in transformers pipelines but on a model inference API level.

Therefore, it might be already existed feature, which could be taken under control with the related parameters, passed via the kwargs-alike parameters. If so, it might not be that necessary to propose a new feature.

Finally, would it be possible to navigate me in code on where specifically we can take a look on for seeking the related paramter?

Thank you very much for such a quick replies and constant support! :pray:
Sincerely,
Nicolay

Dear @nick ,

You could have a closer look at our function predict_on_stream and try to modify it locally for your purposes. You can use the following code snippet as an example:

from deeppavlov.core.commands.infer import predict_on_stream
import time

for batch_size in [1, 8, 64, 256]:
    print("-" * 30)
    print(f"Streaming batch_size={batch_size}")
    tic = time.perf_counter()
    model = predict_on_stream(config, batch_size=batch_size, file_path='./test.txt')
    toc = time.perf_counter()
    print("-" * 30)
    print(f"Streamed batch_size={batch_size}\nInferenced in {toc - tic:0.4f} seconds")

Please note that this function accepts text only from file, otherwise it will raise error with the message 'To process data from terminal please use interact mode’.
It also doesn’t return anything, it only prints out the result.
This function processes text in batches, but the output is printed by 1 sample at a time.

For example,

[["'Bob", "Ross", "lived", "in", "Florida'", ","], ["O", "O", "O", "O", "B-GPE", "O"]]
[["'Elon", "Musk", "founded", "Tesla'", ","], ["O", "O", "O", "B-PERSON", "O"]]
[["'Miscrosoft", "was", "founded", "in", "the", "USA'", ","], ["O", "O", "O", "O", "O", "B-GPE", "O"]]

instead of

[[["'Bob", 'Ross', 'lived', 'in', "Florida'", ','], ["'Elon", 'Musk', 'founded', "Tesla'", ','], ["'Miscrosoft", 'was', 'founded', 'in', 'the', "USA'", ‘,’]], [['O', 'O', 'O', 'O', 'B-GPE', 'O'], ['O', 'O', 'O', 'B-PERSON', 'O'], ['O', 'O', 'O', 'O', 'O', 'B-GPE', 'O']]]

Hope this can be of some help.

Kind regards,
Anna.

1 Like

Dear @Anna,

Thank you very much for the assistance and the related suggestion with the code!
After investigating this option in greater detail, I found that the implementation of predict_on_stream relies on in_x parameter of the model. Hope the following information I am sharing would be helpful for others.
I have reproduce this implementation but replacing the file_path with the passed iterator of batches as follows:

The most relevant to the topic we can get out of this example is the value of in_x. In my particular case, using ner_ontonotes_bert_mult results in 1. That means, the original predict_on_stream utilizes the model that was build on config as it has no batching support at all.

I think it lies sowhere even further at model, like config metadata which we pass in a form of the related file name, which is located here:

However and from the related metadata, I found that there is no batch_size configuration for chainer, only for train mode and purposes. That seems closes my question.

Thank you very much for assistance!
Nicolay