BERT NER Fine Tuning

Could somebody please confirm the format of the dataset for the training ?
I’m new at this and the docs weren’t very clear.

As far as I have understood :-
it’s like this

[sentence] [ entity] [ tag]

Hi @ShaleenAg,

Here are the first few sentences from the OntoNotes train-set:

What    O
kind    O
of      O
memory  O
?       O

We      O
respectfully    O
invite  O
you     O
to      O
watch   O
a       O
special O
edition O
of      O
Across  B-ORG
China   I-ORG
.       O

WW      B-WORK_OF_ART
II      I-WORK_OF_ART
Landmarks       I-WORK_OF_ART
on      I-WORK_OF_ART
the     I-WORK_OF_ART
Great   I-WORK_OF_ART
Earth   I-WORK_OF_ART
of      I-WORK_OF_ART
China   I-WORK_OF_ART
:       I-WORK_OF_ART
Eternal I-WORK_OF_ART
Memories        I-WORK_OF_ART
of      I-WORK_OF_ART
Taihang I-WORK_OF_ART
Mountain        I-WORK_OF_ART

Standing        O
tall    O
on      O
Taihang B-LOC
Mountain        I-LOC
is      O
the     B-WORK_OF_ART
Monument        I-WORK_OF_ART
to      I-WORK_OF_ART
the     I-WORK_OF_ART
Hundred I-WORK_OF_ART
Regiments       I-WORK_OF_ART
Offensive       I-WORK_OF_ART
.       O

So for every word in a sentence it’s [word]\t[entity-tag] and there is an extra empty line between sentences.

You can downolad the OntoNotes dataset from http://files.deeppavlov.ai/deeppavlov_data/ontonotes_ner.tar.gz

1 Like

Thank You for the format …I just wanted to confirm that by training it means fine tuning the BERT right?
or Do I have to begin from the start and is a /t necessary for the format or just a uniform number of spaces will do?

Sorry for the late reply.

Do I have to begin from the start and is a /t necessary for the format or just a uniform number of spaces will do?

Looking at the code, any amount of spaces should work.

1 Like