2 Easy Ways to Normalize data in Python With Examples

In this tutorial, we are going to learn about how to normalize data in Python. While normalizing we change the scale of the data. Data is most commonly rescaled to fall between 0-1.

Why Do We Need To Normalize Data in Python?

Machine learning algorithms tend to perform better or converge faster when the different features (variables) are on a smaller scale. Therefore it is common practice to normalize the data before training machine learning models on it.

Normalization also makes the training process less sensitive to the scale of the features. This results in getting better coefficients after training.

This process of making features more suitable for training by rescaling is called feature scaling.

The formula for Normalization is given below :

Normalization
Normalization

We subtract the minimum value from each entry and then divide the result by the range. Where range is the difference between the maximum value and the minimum value.

Steps to Normalize Data in Python

We are going to discuss two different ways to normalize data in python.

The first one is by using the method ‘normalize()‘ under sklearn.

Using normalize() from sklearn

Let’s start by importing processing from sklearn.

Now, let’s create an array using Numpy.

Now we can use the normalize() method on the array. This method normalizes data along a row. Let’s see the method in action.

Complete code

Here’s the complete code from this section :

Output :

We can see that all the values are now between the range 0 to 1. This is how the normalize() method under sklearn works.

You can also normalize columns in a dataset using this method. Let’s see how to do that next.

Normalize columns in a dataset using normalize()

Since normalize() only normalizes values along rows, we need to convert the column into an array before we apply the method.

To demonstrate we are going to use the California Housing dataset.

Let’s start by importing the dataset.

Next, we need to pick a column and convert it into an array. We are going to use the ‘total_bedrooms‘ column.

Output :

How to Normalize a Dataset Without Converting Columns to Array?

Let’s see what happens when we try to normalize a dataset without converting features into arrays for processing.

Output :

2 Easy Ways to Normalize data in Python

Normalize a dataset

Here the values are normalized along the rows, which can be very unintuitive. Normalizing along rows means that each individual sample is normalized instead of the features.

However, you can specify the axis while calling the method to normalize along a feature (column).

The value of axis parameter is set to 1 by default. If we change the value to 0, the process of normalization happens along a column.

Output :

2 Easy Ways to Normalize data in Python

You can see that the column for total_bedrooms in the output matches the one we got above after converting it into an array and then normalizing.

Using MinMaxScaler() to Normalize Data in Python

Sklearn provides another option when it comes to normalizing data: MinMaxScaler.

This is a more popular choice for normalizing datasets.

Here’s the code for normalizing the housing dataset using MinMaxScaler :

Output :

2 Easy Ways to Normalize data in Python

MinMaxScaler

You can see that the values in the output are between (0 and 1).

MinMaxScaler also gives you the option to select feature range. By default, the range is set to (0,1). Let’s see how to change the range to (0,2).

Output :

2 Easy Ways to Normalize data in Python

range: (0,2)

The values in the output are now between (0,2).

Conclusion

These are two methods to normalize data in Python. We covered two methods of normalizing data under sklearn. Hope you had fun learning with us!

By admin

Leave a Reply