Trying to understand tfidf

michaelwechner · May 12, 2020, 6:57am

Hi

I have added trained with the following questions/answers

python -m deeppavlov train deeppavlov/configs/faq/tfidf_logreg_en_faq.json

Question,Answer
“aaa aaa?”,“Answer 1”
“bbb bbb?”,“Answer 2”
“ccc ccc?”,“Answer 3”

and test it with

python -m deeppavlov interact deeppavlov/configs/faq/tfidf_logreg_en_faq.json

and receive the following results

q::aaa

(‘Answer 1’, [0.9919848791014834, 0.00400756044925833, 0.00400756044925833])

q::bbb

(‘Answer 2’, [0.00400756044925833, 0.9919848791014834, 0.00400756044925833])

q::ccc

(‘Answer 3’, [0.00400756044925833, 0.00400756044925833, 0.9919848791014834])

which somehow makes sense, but

q::zzz

(‘Answer 3’, [0.3333333333333333, 0.3333333333333333, 0.33333333333333337])

I would expect all values to be zero, but I assume this is just how the algorithm works.
Is there some non-code documenation re how the algorithm works?

Also please see my related question some time ago

Thanks for your help

Michael

Vasily · May 12, 2020, 12:34pm

Hey @michaelwechner, thank you very much for your interest.

The output of the model is the probability distribution over the Answers. This is the reason why you get [0.3, 0.3, 0.3] for the last example, this means that the model is equally unsure about all three labels. You can decide about the correct answer by defining a threshold on the maximal probability score.

Let me know if it’s helpful.

michaelwechner · May 12, 2020, 3:41pm

Hi @Vasily

Thanks very much for your feedback!

Yes, that’s what I thought, but I wonder whether there eixists a better alternative

I replaced

```
   "max_proba": true
```

```
   "confident_threshold": 0.5
```

and I would have expected that one still receives

q::zzz

(‘Answer 3’, [0.3333333333333333, 0.3333333333333333, 0.33333333333333337])

because 0.5 > 0.3333333

but instead I received

q::zzz

(, [0.3333333333333333, 0.3333333333333333, 0.33333333333333337])

How does confident_threshold work?

Thanks again

Michael

Vasily · May 13, 2020, 10:06am

You can use just one of the available three options [confident_threshold, max_proba, top_n] in the according priority. When you set confident_threshold=0.5 you filter out all the candidates with the probability less or equal 0.5, which is in your case all the candidates.

michaelwechner · May 13, 2020, 10:40am

ah ok, got it Thanks again!

Topic		Replies	Views
How to set threshold that no answer is returned if no question really matched? Documentation	3	786	February 1, 2020
How to evaluate model? DeepPavlov Library	4	4235	February 4, 2020
Deeplavov Question Answering Model Models	1	445	June 29, 2020
Complete guide on mulilingual QA model implementation Tutorials & Guidelines	1	388	August 12, 2021
Working with squad_noans Models	2	468	June 26, 2020

Trying to understand tfidf

Related topics