Choosing a Normalization Method

Choosing a Normalization Method

Normalization is an important method for data pre-processing, especially for machine learning. Why do we need normalization? Normalization may change the distribution of the feature vector data set and the distances among observation points. Additionally, it may accelerate convergence of deep learning algorithms¹. As another example, out AxHeat product uses normalization for the RF heating of reservoirs, and improves algorithm stability after we have applied it to the water saturation.

In this blog post, we will discuss 3 types of fundamental normalizations. Those methods specific for deep learning, i.e. local response normalization, batch normalization and layer normalization will be discussed in a future blog post².

Zero-mean normalization

Equation:

In this case, zero-mean normalization is the same as standardization. The porcessed data with this method will fit in the standard normal distribution. For some cluster algorithms using distance to measure similarity, i.e. K-means, zero-mean noramlization is a good choice³.

Min-max normalization

Equation:

x = ( x - min ) / ( max - min )

Min-max normalization is a linear normalization technique. It does not change the distribution of the data set. However, if the min and max are not stable among input data sets, it may cause instabilities in the results. Miin-max normalization is the most common method in imaging processing as most of the pixels values are in range of [0, 255].

Non-linear normalizations

The typical non-linear normalizations include logarithm, exponential functions, and inverse trigonometric functions.

The choice of non-linear function depends on the distribution of your inputs and the expected distribution of outputs. log() has better discernibility in range of [0, 1]. arctan() can take any real numbers as inputs and convert to the output to values in the range

.

 

Let's have a look of data distribution after applying zero-mean, min-max, log and arctan normalizations to a standard Cauchy distribution input:

We generate 200 sample points randomly (shown as the blue points), which are in range of [-40.0, 40.0]. After normalizations, the data is shrunk into [-10.0, 15.0]. We do see different data distributions where there is no absolute good or bad. It depends on your criteria. For instance, if your target is to minimize the distance among the points, min-max is a good choice. If you expect evenly distributed data with obvious differences, log may be a good idea.

Lastly, Scikit-learn provides handy tools to compare/visualize normalized results with your inputs⁴. It is a good idea to run your candidate normalization algorithm on your sample data set before applying to your real data set.

 

References

[1] https://www.coursera.org/learn/deep-neural-network/lecture/lXv6U/normalizing-inputs

[2] http://yeephycho.github.io/2016/08/03/Normalizations-in-neural-networks  

[3] http://ai.stanford.edu/~acoates/papers/coatesng_nntot2012.pdf

[4] http://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#normalizer