Random Machine Learning #3

When Feature Scaling should be done?

Feature Scaling should be done after splitting the data into train and test datasets.

Feature scaling techniques such as normalization will be done using the train data. The important reason for normalization technique to be done after the train and test data split is to avoid data leakage.

If we do the feature scaling before the data split, we may end up in creating data leakage. That is we may end up getting the unseen test data in the train dataset itself. This will impact the model performance.

So we should do the feature scaling after splitting the data into train and test datasets only.

Leave a comment

Design a site like this with WordPress.com
Get started