Skip to content

Preprocess approach  #1

@DenisVorotyntsev

Description

@DenisVorotyntsev

Hello.

Thank you for open sourcing your work. You've done a fantastic job, and I've learned some new stuff by observing your decisions. Thank you.

One part, which is still unclear for me, is your choice to use PrimaryAggFast instead of more advanced feature engineering classes for target lag features, such as TimeReversalSimple. I think that it is partly due to two bugs in the code:

  1. In Preprocess, you use lag features only for cases when we have timestamps ids (fit_transform called only if len(X.primary_id) > 1), but lag features are also useful when we have only 1 Id feature or even zero.
  2. You process target (AbnormalTrainLabel) after target lag features. So, these lag features on non-processed target couldn't give any useful information to the model.

It's possible that due to these two bugs, you haven't observed better score with more advanced lag features and decided to stock with PrimaryAggFast. It might be that I'm not entirely following the logic of the code, but I'd like to discuss it for learning purposes anyway, if you wish.

BR,
Denis

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions