Let us consider we have a set of rules to identify whether a given image is of which plant.
And if we are going to classify an image based on an individual rule the prediction would be flawed.
Each of these rules individually called weak learners because these rules are not strong enough to make a prediction.
Then for these kinds of problem Ensemble learning is the solution.
What is Ensemble learning??
Ensemble learning is a method that is used to enhance the performance of the machine learning model by combining several learners.
As compared to the single model, this type of model builds models with improved efficiency and accuracy.
Ensemble learning model having two types as follows:
- Bagging — Parallel
- Boosting — Sequential
What is Boosting ??
Boosting is a process that uses a set of machine learning algorithms to combine weak learners to form a strong learner in order to increase the accuracy of the model.
How does boosting algorithm works??
As I already mentioned, the basic principle of the boosting algorithm is to generate multiple weak learners and then combine their prediction to form a strong one.
As shown in the above figure.
All first three iteration is used to generate weak learner:
- In the first iteration, as you can see the decision stump separate two positive sign. still, we are not able to separate three positive signs.
- In the second iteration, decision stump divide two negative sign, and as we can see again decision stumps are unable to classify negative signs
- In the third iteration, decision stump separate three positive and one negative sign.
The above weak learner is used to form a strong learner. And we are going to use that strong learner to build a model.
step 1: The base algorithm reads the data and assigns equal weights to each sample observation.
step 2: False prediction is assigned to the next base learner with a higher weight on these incorrect predictions.
step 3:Repeat step 2 until the algorithm can correctly classify the output.
Boosting algorithms have the following types:
- Adaptive boosting
- Gradient Boosting
- The base algorithm reads the data and assigns equal weights to each sample observation.
- False prediction is assigned to the next base learner with a higher weight on these incorrect predictions.
- Repeat step 2 until the algorithm can correctly classify the output.
In gradient boosting, base learner-generated sequentially in such a way that the present base learner is always more effective than the previous one.
In Gradient boosting having following three main components:
- The loss function needs to be optimized.
- Weak learners are used to computing predictions and forming a strong one.
- An additive model that will regularize the loss function.
XgBoost is more efficient and optimized in the above two.
XgBoosting is an advanced version of the gradient boosting method that is designed to focus on computational speed and model efficiency.
The reason behind the speed is, XgBoost follows parallelization, distributed computing, and cache optimization.
Practical Example of XgBoost and Adaptive Boosting:
Import the important libraries
Now we are importing dataset from google drive following is the code for getting csv file from google drive.
Now, import read the dataset by using the pandas’ library.
Now, Its time to data preprocessing, In preprocessing by using LabelEncoder and OneHotEncoder we are going to encode categorical data.
After data preprocessing, Split the dataset into the train and test dataset. by using train_test_split
Now, build the model by using adaboost classifier.And also calculate the accuracy by using confusion matrix.
Now, Fitting XGBoost to the Training set and calculating accuracy.
Above code available at: https://github.com/Pravin1Borate/Xgboost-Algorithms
By Pravin BorateData science enthusiastic || love to read || craves to learn || live to share gratitude