Dear @nick ,
You could have a closer look at our function predict_on_stream
and try to modify it locally for your purposes. You can use the following code snippet as an example:
from deeppavlov.core.commands.infer import predict_on_stream
import time
for batch_size in [1, 8, 64, 256]:
print("-" * 30)
print(f"Streaming batch_size={batch_size}")
tic = time.perf_counter()
model = predict_on_stream(config, batch_size=batch_size, file_path='./test.txt')
toc = time.perf_counter()
print("-" * 30)
print(f"Streamed batch_size={batch_size}\nInferenced in {toc - tic:0.4f} seconds")
Please note that this function accepts text only from file, otherwise it will raise error with the message 'To process data from terminal please use interact mode’.
It also doesn’t return anything, it only prints out the result.
This function processes text in batches, but the output is printed by 1 sample at a time.
For example,
[["'Bob", "Ross", "lived", "in", "Florida'", ","], ["O", "O", "O", "O", "B-GPE", "O"]]
[["'Elon", "Musk", "founded", "Tesla'", ","], ["O", "O", "O", "B-PERSON", "O"]]
[["'Miscrosoft", "was", "founded", "in", "the", "USA'", ","], ["O", "O", "O", "O", "O", "B-GPE", "O"]]
instead of
[[["'Bob", 'Ross', 'lived', 'in', "Florida'", ','], ["'Elon", 'Musk', 'founded', "Tesla'", ','], ["'Miscrosoft", 'was', 'founded', 'in', 'the', "USA'", ‘,’]], [['O', 'O', 'O', 'O', 'B-GPE', 'O'], ['O', 'O', 'O', 'B-PERSON', 'O'], ['O', 'O', 'O', 'O', 'O', 'B-GPE', 'O']]]
Hope this can be of some help.
Kind regards,
Anna.