I am training another bot based on gobot architecture and it seems there are two training datasets:
simple_dstc2.tar.gz, dstc2_v2.tar.gz. What is the real difference between them and why do we need two?
Can gobot be trained with “dstc_slotfilling” (not with “slotfill_raw”) but by creating only one dataset? Or is there any way to convert one dataset format to another one?
Also, what is the real difference between gobot_simple (gobot_simple_dstc2.json) vs gobot_dstcs2 (gobot_dstc2.json) except for the dataset formats and some minor configuration settings?
Hey! Sorry for the delayed responce. Such a feedback is very helpful cause we’re working on the global go-bot refactoring these days and such a mess in configs and datasets is to be revized and reorganized to avoid redundant duplication of all sorts.
The work is still WIP so it’s hard to provide clear and verified solution to fix this without affecting anything else. The best straightforward response I can grant is "the real differences in these duplicated cases are exactly the ones you see when you diff them. It’s temporary though and will be improved.
For anyone wondering about the same question and needs to decide with which one to go, I would recommend gobot_dstc2.json with its dataset format. This will enable you to run NER based on neural net and not just simple edit distance.