How to set threshold that no answer is returned if no question really matched?

michaelwechner · January 17, 2020, 10:26am

Hi

I am using

python3 -m deeppavlov interact tfidf_logreg_en_faq

which works great but when I enter a question which is completely out of context, like for example “What did I dream last night?”, then I would prefer that no answer from the FAQ is returned.

Is there a way to set some kind of threshold that in such a case no question is returned or that one can detect that this question did not really match with any question inside the FAQ?

Thanks

Michael

yoptar · January 17, 2020, 12:17pm

Hi @michaelwechner,
The simple way would be to replace "max_proba": true in the config’s proba2labels block with "confident_threshold": 0.5 or your desired threshold. This will change the output format and might even lead to returning multiple possible answers for a question if the threshold is smaller then 0.5.

The other way would be to add a post-processor to your config that would filter answers by their probability. It could look something like this:

class ProbaFilter:
    def __init__(self, threshold: float = 0.5, default_value: str = '', **kwargs):
        self.threshold = threshold
        self.default_value = default_value
        
    def __call__(self, answers: List[str], probas: List[List[float]]):
        return [answer if max(ans_probas) > self.threshold else self.default_value
                for answer, ans_probas in zip(answers, probas)]

michaelwechner · January 17, 2020, 12:51pm

Hi @yoptar, thank you very much for your explanation and hint!

michaelwechner · February 1, 2020, 7:08pm

Hi @yoptar

In my trainings data I have the following question/answer:

What is the name of the president of the USA?,“Donald Trump”

When I do the following request:

“q”: [
“What is the name of the president of the USA?”
]

Then I receive as response

[
[
[
“Donald Trump”
],
[
0.0009037585956436307,
…
0.00020985532607495578,
0.9807839562113666
]

When I do the following query

“q”: [
“What is the name of the president of the Russia?”
]

then I receive the following response

[
[
[
“Donald Trump”
],
[
0.0010917099011234978,
…
0.0002233205864783391,
0.948826280203339
]
]
]
]
]

I guess it is basically the same, because I don’t have the question/answer

What is the name of the president of the Russia?,“Vladimir Putin”

in my trainings data.

I understand that the query is nearly the same, just the word “USA” and “Russia” is different. I would have hoped that I am somehow able to recognize in the response that there is no answer to this question, because the answer regarding Russia is not in the trainings data yet, but nevertheless the response values are very similar

Russia: 0.948826280203339
USA: 0.9807839562113666

Do you have a hint how to differentiate, such that the program can decide not to return an answer for “Russia”?

Btw, I have set “confident_threshold”: 0.5 but I guess this does not help in such a case.

Thanks very much

Michael

Topic		Replies	Views
Trying to understand tfidf DeepPavlov Library	4	372	May 13, 2020
How can I permanently correct a wrong answer in ODQA or FAQ? Models	2	518	August 24, 2020
DeepPavlov Faq Model Returning Wrong Index DeepPavlov Library	1	454	December 18, 2020
DeepPavlov Customizing Output DeepPavlov Library	8	640	December 24, 2020
How to evaluate model? DeepPavlov Library	4	4235	February 4, 2020

How to set threshold that no answer is returned if no question really matched?

Related topics