11. Mastering Machine Learning: Demystifying Feature Scaling -- with Python Coding
๐ Welcome back, friends! In my last article, we talked all about feature scaling and why it's super important in machine learning. Today, we're diving even deeper! If you missed the last one, don't worry, you can catch up here: [Mastering Machine Learning: Demystifying Feature Scaling] Today let's explore how to scale features using Python.
Before we jump into coding ๐ฅ๏ธ to implement feature scaling on our x_train_set and x_test_set, let's quickly review the output we got after splitting the data, which we covered in our last article here: [Mastering Machine Learning:The Data Splitting Advantage]. Here's a snippet for easy reference! ๐
Today, our goal with the code is to scale the features in the last two columns of our x_train_set and x_test_set. You might wonder why only the last two columns? Well, it's because the values in the first three columns were derived using one-hot encoding, and they're already standardized or you can say they're dummy ones. So, we're focusing only on standardizing or scaling the feature values of the last two columns. ๐
Here's the block of code that will help us accomplish our goal for today: ๐ฅ๏ธ
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train_set[ :,3:] = sc.fit_transform(x_train_set[:,3:])
x_test_set[ :,3:] = sc.transform(x_test_set[:,3:])
Now let's break down these lines of code step by step:
from sklearn.preprocessing import StandardScaler
This line imports the StandardScaler
class from the sklearn.preprocessing
module. StandardScaler
is a method for standardizing features by removing the mean and scaling to unit variance.
sc = StandardScaler()
This line creates an instance of the StandardScaler
class and assigns it to the variable sc
. This instance will be used to scale the data.
x_train_set[:,3:] = sc.fit_transform(x_train_set[:,3:])
This line scales the training data. It applies the fit_transform
method of the StandardScaler
object (sc
) to the subset of the training data starting from the fourth column ([:,3:]
). This means it standardizes the values in columns 4 onwards. The fit_transform
method computes the mean and standard deviation necessary for scaling the data, and then transforms the data accordingly. The scaled data is then assigned back to the training set.
x_test_set[:,3:] = sc.transform(x_test_set[:,3:])
This line scales the test data. It applies the transform
method of the StandardScaler
object (sc
) to the same subset of the test data as before ([:,3:]
). The transform
method uses the mean and standard deviation computed during the training phase to transform the test data.
It's important to note that for the test set, we use transform
rather than fit_transform
because we're using the parameters (mean and standard deviation) learned from the training set to transform the test set consistently. This scaled test data is then assigned back to the test set.
In summary, these lines of code import the StandardScaler
class, create an instance of it, and use it to scale both the training and test data, ensuring that they are on the same scale for machine learning algorithms.
And now, let's print out the result! ๐ Here's how it looks: ๐จ๏ธ
Whoo! ๐ Pretty cool, huh? We did it! ๐ช Now our entire dataset is all preprocessed and ready to go. If you want to download or reference the entire code, you can find it in our GitHub repository. Here's the link: [Data_Preprocessing.ipynb]. Feel free to explore and utilize the code for your own projects! ๐๐
Next on our journey, we're diving into Regression, where we predict numerical values based on input data. Sounds interesting, right? ๐ So, catch you in the next article! Keep on learning! ๐๐