Skip to content

Consultation About the data split #3

@shuxiaobo

Description

@shuxiaobo

Hello, I am follwing this paper recently, but i have problem about the data split between the github and paper

On the github says:

The training set consists of 1,201 pairs of conversations and associated summaries.

The validation set consists of 100 pairs of conversations and their summaries.

MTS-Dialog includes 2 test sets; each test set consists of 200 conversations and associated section headers and 

But In the paper:

We use a test set of 100 conversations and notes, randomly selected from the MTS-DIALOG dataset. The remaining pairs are used for training (1,201 pairs) and validation (400 pairs).

The training set data seems to be idetical, but val and test set is totally different, Could you please specific the data split method detail?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions