NER - name being tagged as "I-PER" alone (ner_conll2003_bert)

Is it normal/expected that in some situations, the entity will start with an “I-” instead of “B-” ?

I’m using ner_conll2003_bert.

See this minimal input (changing a few words can actually “fix” it, but this is the best minimal example I could get to replicate the issue – please don’t mind the topic or what it’s really saying):

{
  "x": [
    "Former President Donald J. Trump seemed exhausted Monday morning. Several minutes later Mr. Trump appeared to be awake and notice them."
  ]
}

I believe the fact that we have “Donald J. Trump” appearing first, influences the “Trump” occurrence in the second sentence.

The output is:

[
  [
    [
      "Former",
      "President",
      "Donald",
      "J",
      ".",
      "Trump",
      "seemed",
      "exhausted",
      "Monday",
      "morning",
      ".",
      "Several",
      "minutes",
      "later",
      "Mr",
      ".",
      "Trump",
      "appeared",
      "to",
      "be",
      "awake",
      "and",
      "notice",
      "them",
      "."
    ],
    [
      "O",
      "O",
      "B-PER",
      "I-PER",
      "I-PER",
      "I-PER",
      "O",
      "O",
      "O",
      "O",
      "O",
      "O",
      "O",
      "O",
      "O",
      "O",
      "I-PER",
      "O",
      "O",
      "O",
      "O",
      "O",
      "O",
      "O",
      "O"
    ]
  ]
]

Thank you!

Hey @ghnp5 , Thank you very much for your interest!

This NER ( ner_conll2003_bert) model is already outdated, which might cause inconsistencies in the NER tags. I recommend installing the latest version of DeepPavlov and using our new configuration, ner_bert_base. It appears that your example works fine with the newest version. In addition ner_bert_base supports new entities. Please let me know if you need further assistance.

Thank you very much, @Vasily.
I’ve updated it now, and I’ll update the handling of the tags, as I see they’re different, and it’s identifying other types of entities as well.


When looking in this page: Named Entity Recognition (NER) — DeepPavlov 1.6.0 documentation

I only see a list of models with no reference to ner_bert_base, and I simply picked the one with the biggest scores, for the en language. The rest of the page also doesn’t reference ner_bert_base at all.
Am I looking at the wrong things, or is the docs page just not up-to-date?

For Sentiment, I’m using sentiment_sst_conv_bert - is this the right one, or should I be using something else? (note - I like that this one has 5 states rather than just 3)


Thanks again!

Hi @Vasily,

Also, in addition to what I said above, the list of tags here: Named Entity Recognition (NER) — DeepPavlov 1.6.0 documentation
seems to be incorrect/outdated.

I’m getting different tags than what’s listed there.

I suppose what’s in the Demo is the updated list: Demo of Open-Souce NLP Framework DeepPavlov.ai
(at least the tags I’m getting seem to match this list!)

@ghnp5 Indeed, while the B- and I- tagging inconsistency may occasionally arise, particularly in less frequent entity types, we are actively addressing this issue along with numerous other improvements. Our team is currently testing these enhancements on our Demo page, and we anticipate releasing this updated model in the near future.

Thank you @Vasily!

I understand.

Just checking if you also saw this message above:

When looking in this page: Named Entity Recognition (NER) — DeepPavlov 1.6.0 documentation

I only see a list of models with no reference to ner_bert_base, and I simply picked the one with the biggest scores, for the en language. The rest of the page also doesn’t reference ner_bert_base at all.
Am I looking at the wrong things, or is the docs page just not up-to-date?

For Sentiment, I’m using sentiment_sst_conv_bert - is this the right one, or should I be using something else? (note - I like that this one has 5 states rather than just 3)

Also, it appears that the Demo is using ner_bert_base_mult, but that doesn’t seem to be available to us yet. I’m assuming this is what you mean that the team is testing on Demo, and “_mult” will be available to us later?

Many thanks again!!
Good day.

@Vasily

Just repeating here what I said in my other post:

I’m now running ner_mult_long_demo, found here: New classification models by Kolpnick · Pull Request #1657 · deeppavlov/DeepPavlov · GitHub

This seems to resolve the problems for me, and seems to be consistent with what is running on the Demo page. Great!

It’s looking better, so far.

Got the “cute” word being tagged as WEATHER_DESCRIPTOR:

That’s cute.

But other than that, seems pretty accurate and way better than what I was running before!

Once this model is released, I can revert all my volume overrides on the docker container :slight_smile:

Many thanks!