Hello, I am currently building a faq model based on the given example and guidelines on the deeppavlov documentation. However there seems to be a weird behavior with the “y_pred_id” values that doesn’t match what it says on the documentation.
Following the pipeline configuration, when the faq model receives a query, it gives a list of percentage of similarity to each question in the data file and then selects the highest percentage and returns its corresponding answer. However there are some odd behaviors of the model from this perspective.
First of all, the “y_pred_id” which holds the index of the highest percentage, does not align with the data file. To clarify, if y_pred_id is 6, it does not correspond to the 6th row of the data but the 2nd. There is this weird offest of 4 units for the y_pred_ids and y_pred_probas. It also does not always follow this offset rule sometimes giving a different offset making it not addressable systematically. I tried reading the documentations and source code but still have no idea why this is happening.
Moreover, since the offset is by 4, what does it mean when y_pred_id holds an index from 0 to 3? To me it seems that the faq model has a built-in default function (not mentioned in the documentations) where if the user’s utterance is close to gibberish, it returns ids from 0 to 3 (or something like this).
Another interesting thing to note is that the y_pred_id data type is numpy.int64 and when turned into python native type using .item() method, if the index was from 0 to 3, it returns -1. Why?
These features or characteristics is not mentioned in the documentation at all so I am very confused…
Thank you so much for your help!