Simple_dstc2.tar.gz vs dstc2_v2.tar.gz

I am training another bot based on gobot architecture and it seems there are two training datasets:
simple_dstc2.tar.gz, dstc2_v2.tar.gz. What is the real difference between them and why do we need two?

Can gobot be trained with “dstc_slotfilling” (not with “slotfill_raw”) but by creating only one dataset? Or is there any way to convert one dataset format to another one?

Also, what is the real difference between gobot_simple (gobot_simple_dstc2.json) vs gobot_dstcs2 (gobot_dstc2.json) except for the dataset formats and some minor configuration settings?

Hey! Sorry for the delayed responce. Such a feedback is very helpful cause we’re working on the global go-bot refactoring these days and such a mess in configs and datasets is to be revized and reorganized to avoid redundant duplication of all sorts.

The work is still WIP so it’s hard to provide clear and verified solution to fix this without affecting anything else. The best straightforward response I can grant is "the real differences in these duplicated cases are exactly the ones you see when you diff them. It’s temporary though and will be improved.

Thanks for your reply, I really needed it.

For anyone wondering about the same question and needs to decide with which one to go, I would recommend gobot_dstc2.json with its dataset format. This will enable you to run NER based on neural net and not just simple edit distance.

Hi,

After a while, I understood I did not grasp the whole difference between “dstc2-trn.json” and “dstc2-trn.jsonlist”

One difference i found is that we have additional “goals” field in the latter one. However, I still would like to ask for your clarification.

Why do we need the second “dstc2-trn.jsonlist” format, and in what ways is it better than the first one?