Could somebody please confirm the format of the dataset for the training ?
I’m new at this and the docs weren’t very clear.
As far as I have understood :-
it’s like this
[sentence] [ entity] [ tag]
Could somebody please confirm the format of the dataset for the training ?
I’m new at this and the docs weren’t very clear.
As far as I have understood :-
it’s like this
[sentence] [ entity] [ tag]
Hi @ShaleenAg,
Here are the first few sentences from the OntoNotes train-set:
What O
kind O
of O
memory O
? O
We O
respectfully O
invite O
you O
to O
watch O
a O
special O
edition O
of O
Across B-ORG
China I-ORG
. O
WW B-WORK_OF_ART
II I-WORK_OF_ART
Landmarks I-WORK_OF_ART
on I-WORK_OF_ART
the I-WORK_OF_ART
Great I-WORK_OF_ART
Earth I-WORK_OF_ART
of I-WORK_OF_ART
China I-WORK_OF_ART
: I-WORK_OF_ART
Eternal I-WORK_OF_ART
Memories I-WORK_OF_ART
of I-WORK_OF_ART
Taihang I-WORK_OF_ART
Mountain I-WORK_OF_ART
Standing O
tall O
on O
Taihang B-LOC
Mountain I-LOC
is O
the B-WORK_OF_ART
Monument I-WORK_OF_ART
to I-WORK_OF_ART
the I-WORK_OF_ART
Hundred I-WORK_OF_ART
Regiments I-WORK_OF_ART
Offensive I-WORK_OF_ART
. O
So for every word in a sentence it’s [word]\t[entity-tag]
and there is an extra empty line between sentences.
You can downolad the OntoNotes dataset from http://files.deeppavlov.ai/deeppavlov_data/ontonotes_ner.tar.gz
Thank You for the format …I just wanted to confirm that by training it means fine tuning the BERT right?
or Do I have to begin from the start and is a /t necessary for the format or just a uniform number of spaces will do?
Sorry for the late reply.
Do I have to begin from the start and is a /t necessary for the format or just a uniform number of spaces will do?
Looking at the code, any amount of spaces should work.