Enable Batching for NER model inference

nick · January 13, 2024, 9:57pm

Hello,

I am interested in Named Entity Recognition task and adopting DeepPavlov NER models purely for infering the raw texts. In order to accomplish this for a given text, represented as a text string variable, it was decided to go with the following set of code snippets.

Initializing NER model as follows:
import deeppavlov
model = deeppavlov.build_model(model_name, download=True, install=True)
Launching inference for a gieven text (string variable) as follows:
result = model([text])

Given a sequence of texts I am interested in passing a series of texts at once.
Therefore the question is as follows: how to use DeePavlov NER inference in batching mode?

Thanks,
Sincerely,
Nicolay

Anna · January 16, 2024, 9:25am

Dear @nick ,

you can use the following code snippet as a solution:

from deeppavlov import build_model

model = build_model(config, download=True, install=True)

batch_size = 3
for i in range(0, len(text), batch_size):
    batch_res = model(text[i:i+batch_size])

Hope this will help.

nick · January 16, 2024, 3:18pm

Thank you for your assistace @Anna!
Passing the list of texts actually and positively affect on performance.

I was able to reproduce that. Let me share a bit details on here for those who will encouter with again. According to the extensive experiment on CPU once batch_size is at least has 3 texts, it results in 1.5 performance increment.

I beleive that the effect of the stable increment is caused by an automatic mechanism of splitting input sequences into batches, sized to the particular and implicit parameter of the NER model.
If so, then I would like to continue with the following question:

is there a way to explicitly controll the maximum size of the batch on initialization or inference?

Thank you!
Sincerely,
Nicolay

Thank you.

Anna · January 17, 2024, 8:49am

Dear @nick,

Thank you for your insight! We will look into it and add the proposed feature to our library.

Kind regards,
Anna.

nick · January 17, 2024, 11:05am

Hello Anna!

Thanks for a such considerations!
Since we already have seen the performance enhancement, I believe it already implements batching in some way, since mechanism of models building is unique to the certain extent.
A quick assumption it could be similar to the concept of batchinig in transformers pipelines but on a model inference API level.

Therefore, it might be already existed feature, which could be taken under control with the related parameters, passed via the kwargs-alike parameters. If so, it might not be that necessary to propose a new feature.

Finally, would it be possible to navigate me in code on where specifically we can take a look on for seeking the related paramter?

Thank you very much for such a quick replies and constant support!
Sincerely,
Nicolay

Anna · January 19, 2024, 11:17am

Dear @nick ,

You could have a closer look at our function predict_on_stream and try to modify it locally for your purposes. You can use the following code snippet as an example:

from deeppavlov.core.commands.infer import predict_on_stream
import time

for batch_size in [1, 8, 64, 256]:
    print("-" * 30)
    print(f"Streaming batch_size={batch_size}")
    tic = time.perf_counter()
    model = predict_on_stream(config, batch_size=batch_size, file_path='./test.txt')
    toc = time.perf_counter()
    print("-" * 30)
    print(f"Streamed batch_size={batch_size}\nInferenced in {toc - tic:0.4f} seconds")

Please note that this function accepts text only from file, otherwise it will raise error with the message 'To process data from terminal please use interact mode’.
It also doesn’t return anything, it only prints out the result.
This function processes text in batches, but the output is printed by 1 sample at a time.

For example,

[["'Bob", "Ross", "lived", "in", "Florida'", ","], ["O", "O", "O", "O", "B-GPE", "O"]]
[["'Elon", "Musk", "founded", "Tesla'", ","], ["O", "O", "O", "B-PERSON", "O"]]
[["'Miscrosoft", "was", "founded", "in", "the", "USA'", ","], ["O", "O", "O", "O", "O", "B-GPE", "O"]]

instead of

[[["'Bob", 'Ross', 'lived', 'in', "Florida'", ','], ["'Elon", 'Musk', 'founded', "Tesla'", ','], ["'Miscrosoft", 'was', 'founded', 'in', 'the', "USA'", ‘,’]], [['O', 'O', 'O', 'O', 'B-GPE', 'O'], ['O', 'O', 'O', 'B-PERSON', 'O'], ['O', 'O', 'O', 'O', 'O', 'B-GPE', 'O']]]

Hope this can be of some help.

Kind regards,
Anna.

nick · January 26, 2024, 11:28pm

Dear @Anna,

Thank you very much for the assistance and the related suggestion with the code!
After investigating this option in greater detail, I found that the implementation of predict_on_stream relies on in_x parameter of the model. Hope the following information I am sharing would be helpful for others.
I have reproduce this implementation but replacing the file_path with the passed iterator of batches as follows:

github.com

nicolay-r/fast-ner/blob/554eab13e7566104d21dec995132f5612f804e4f/test/test_ner_batching_internal.py#L13-L44


      
          def predict_on_stream(config: Union[str, Path, dict], get_batch_func) -> None:
              """Make a prediction with the component described in corresponding configuration file."""
          
              model: Chainer = build_model(config)
          
              args_count = len(model.in_x)
              while True:
                  batch = get_batch_func()
          
                  if not batch:
                      break
          
                  args = []
                  for i in range(args_count):
                      args.append(batch[i::args_count])
          
                  res = model(*args)
                  if len(model.out_params) == 1:
                      res = [res]
                  for res in zip(*res):

This file has been truncated. show original

The most relevant to the topic we can get out of this example is the value of in_x. In my particular case, using ner_ontonotes_bert_mult results in 1. That means, the original predict_on_stream utilizes the model that was build on config as it has no batching support at all.

I think it lies sowhere even further at model, like config metadata which we pass in a form of the related file name, which is located here:

github.com

deeppavlov/DeepPavlov/blob/1ed6ad41754e5dfe6392271b757c62b04a4d1937/deeppavlov/configs/ner/ner_ontonotes_bert_mult.json#L71


      
              {
                "ref": "tag_vocab",
                "in": ["y_pred_ind"],
                "out": ["y_pred"]
              }
            ],
            "out": ["x_tokens", "y_pred"]
          },
          "train": {
            "epochs": 30,
            "batch_size": 10,
            "metrics": [
              {
                "name": "ner_f1",
                "inputs": ["y", "y_pred"]
              },
              {
                "name": "ner_token_f1",
                "inputs": ["y", "y_pred"]
              }
            ],

However and from the related metadata, I found that there is no batch_size configuration for chainer, only for train mode and purposes. That seems closes my question.

Thank you very much for assistance!
Nicolay

Topic		Replies	Views
Multi-threading in NER model? Tutorials & Guidelines	4	335	September 15, 2020
Обучение NER на своих данных и упаковка ее в сервис на Flask	2	772	December 2, 2021
Speedup bert model inference DeepPavlov Library	0	266	March 5, 2020
Named Entity demo - what version? Documentation	5	53	January 31, 2025
Как подавать данные батчами Models	1	306	October 4, 2022

Enable Batching for NER model inference

Related topics