PRM Training Data

Hi, I noticed that the provided BIRD train dataset contains 18,015 samples, which seem to be from the "After Syntax Tree Deduplication Stage". I would like to ask: is this the dataset used for training?

Or should we use the final "Filtered PRM Training Data" instead? If so, could you please provide that dataset and its format? Thanks a lot!