NER - name being tagged as "I-PER" alone (ner_conll2003_bert)

ghnp5 · April 18, 2024, 10:27pm

Is it normal/expected that in some situations, the entity will start with an “I-” instead of “B-” ?

I’m using ner_conll2003_bert.

See this minimal input (changing a few words can actually “fix” it, but this is the best minimal example I could get to replicate the issue – please don’t mind the topic or what it’s really saying):

{
  "x": [
    "Former President Donald J. Trump seemed exhausted Monday morning. Several minutes later Mr. Trump appeared to be awake and notice them."
  ]
}

I believe the fact that we have “Donald J. Trump” appearing first, influences the “Trump” occurrence in the second sentence.

The output is:

[
  [
    [
      "Former",
      "President",
      "Donald",
      "J",
      ".",
      "Trump",
      "seemed",
      "exhausted",
      "Monday",
      "morning",
      ".",
      "Several",
      "minutes",
      "later",
      "Mr",
      ".",
      "Trump",
      "appeared",
      "to",
      "be",
      "awake",
      "and",
      "notice",
      "them",
      "."
    ],
    [
      "O",
      "O",
      "B-PER",
      "I-PER",
      "I-PER",
      "I-PER",
      "O",
      "O",
      "O",
      "O",
      "O",
      "O",
      "O",
      "O",
      "O",
      "O",
      "I-PER",
      "O",
      "O",
      "O",
      "O",
      "O",
      "O",
      "O",
      "O"
    ]
  ]
]

Thank you!

Vasily · April 22, 2024, 8:18am

Hey @ghnp5 , Thank you very much for your interest!

This NER ( ner_conll2003_bert) model is already outdated, which might cause inconsistencies in the NER tags. I recommend installing the latest version of DeepPavlov and using our new configuration, ner_bert_base. It appears that your example works fine with the newest version. In addition ner_bert_base supports new entities. Please let me know if you need further assistance.

ghnp5 · April 22, 2024, 6:42pm

Thank you very much, @Vasily.
I’ve updated it now, and I’ll update the handling of the tags, as I see they’re different, and it’s identifying other types of entities as well.

When looking in this page: Named Entity Recognition (NER) — DeepPavlov 1.6.0 documentation

I only see a list of models with no reference to ner_bert_base, and I simply picked the one with the biggest scores, for the en language. The rest of the page also doesn’t reference ner_bert_base at all.
Am I looking at the wrong things, or is the docs page just not up-to-date?

For Sentiment, I’m using sentiment_sst_conv_bert - is this the right one, or should I be using something else? (note - I like that this one has 5 states rather than just 3)

Thanks again!

ghnp5 · April 22, 2024, 7:02pm

Hi @Vasily,

Also, in addition to what I said above, the list of tags here: Named Entity Recognition (NER) — DeepPavlov 1.6.0 documentation
seems to be incorrect/outdated.

I’m getting different tags than what’s listed there.

I suppose what’s in the Demo is the updated list: Demo of Open-Souce NLP Framework DeepPavlov.ai
(at least the tags I’m getting seem to match this list!)

Vasily · April 23, 2024, 8:53am

@ghnp5 Indeed, while the B- and I- tagging inconsistency may occasionally arise, particularly in less frequent entity types, we are actively addressing this issue along with numerous other improvements. Our team is currently testing these enhancements on our Demo page, and we anticipate releasing this updated model in the near future.

ghnp5 · April 23, 2024, 9:10am

Thank you @Vasily!

I understand.

Just checking if you also saw this message above:

When looking in this page: Named Entity Recognition (NER) — DeepPavlov 1.6.0 documentation

I only see a list of models with no reference to ner_bert_base, and I simply picked the one with the biggest scores, for the en language. The rest of the page also doesn’t reference ner_bert_base at all.
Am I looking at the wrong things, or is the docs page just not up-to-date?

For Sentiment, I’m using sentiment_sst_conv_bert - is this the right one, or should I be using something else? (note - I like that this one has 5 states rather than just 3)

Also, it appears that the Demo is using ner_bert_base_mult, but that doesn’t seem to be available to us yet. I’m assuming this is what you mean that the team is testing on Demo, and “_mult” will be available to us later?

Many thanks again!!
Good day.

ghnp5 · April 24, 2024, 1:14am

@Vasily

Just repeating here what I said in my other post:

I’m now running ner_mult_long_demo, found here: New classification models by Kolpnick · Pull Request #1657 · deeppavlov/DeepPavlov · GitHub

This seems to resolve the problems for me, and seems to be consistent with what is running on the Demo page. Great!

It’s looking better, so far.

Got the “cute” word being tagged as WEATHER_DESCRIPTOR:

That’s cute.

But other than that, seems pretty accurate and way better than what I was running before!

Once this model is released, I can revert all my volume overrides on the docker container

Many thanks!

Topic		Replies	Views
Strange fine-tuned NER model behavior DeepPavlov Library	6	679	September 16, 2020
Training NER model on my own tags DeepPavlov Library	3	209	May 20, 2024
Named Entity demo - what version? Documentation	5	53	January 31, 2025
Вопрос по NER обученной на своих данных DeepPavlov Library	3	446	October 22, 2019
Error in training multilingual NER with own data Tutorials & Guidelines	8	1926	May 19, 2020

NER - name being tagged as "I-PER" alone (ner_conll2003_bert)

Related topics