Standardisation and normalization

CategoriesData

Standardization and normalization are two common techniques used in the preprocessing of data in machine learning and statistics. While they are related, they have distinct purposes and methods.

Standardization

Purpose:

  • The main goal of standardization is to rescale the features so that they have a mean of 0 and a standard deviation of 1.
  • It is particularly useful when the features in the dataset have different units or scales.
  • In some models, such as linear regression, it enables comparison of coefficients (parameters determining the effect of specific input features).

Effect:

  • After standardization, the features will have a similar scale, making it easier for algorithms that rely on distance metrics (such as k-nearest neighbors or support vector machines) to perform well. This may also enhance learning and assist interpretability of features as described above.
  • It does not guarantee that the data is within a specific range.

Normalization

Purpose:

  • Normalization, on the other hand, aims to scale the features in a dataset to a specific range, usually between 0 and 1.
  • It is beneficial when the features have different ranges or when the algorithm used is sensitive to (or relies on) the magnitude of the features.

Effect:

  • After normalization, all features will be constrained to a specific range, making it particularly useful for algorithms that require input values to be within a certain interval, like neural networks.

Summary

Both standardization and normalization are preprocessing techniques that aim to make the data more suitable for machine learning algorithms. The choice between them depends on the specific requirements of the algorithm being used and the characteristics of the data.

If wishing to intrepret the relative magnitude of input features in Causal Wizard results, you should ensure your data is standardised before uploading it.