Is it normal/expected that in some situations, the entity will start with an “I-” instead of “B-” ?
I’m using ner_conll2003_bert
.
See this minimal input (changing a few words can actually “fix” it, but this is the best minimal example I could get to replicate the issue – please don’t mind the topic or what it’s really saying):
{
"x": [
"Former President Donald J. Trump seemed exhausted Monday morning. Several minutes later Mr. Trump appeared to be awake and notice them."
]
}
I believe the fact that we have “Donald J. Trump” appearing first, influences the “Trump” occurrence in the second sentence.
The output is:
[
[
[
"Former",
"President",
"Donald",
"J",
".",
"Trump",
"seemed",
"exhausted",
"Monday",
"morning",
".",
"Several",
"minutes",
"later",
"Mr",
".",
"Trump",
"appeared",
"to",
"be",
"awake",
"and",
"notice",
"them",
"."
],
[
"O",
"O",
"B-PER",
"I-PER",
"I-PER",
"I-PER",
"O",
"O",
"O",
"O",
"O",
"O",
"O",
"O",
"O",
"O",
"I-PER",
"O",
"O",
"O",
"O",
"O",
"O",
"O",
"O"
]
]
]
Thank you!