This tutorial follows the slideshow devoted to the "Bagging, Random Forest and Boosting". We show the implementation of these methods on a data file. We will follow the same steps as the slideshow i.e. we first describe the construction of a decision tree, we measure the prediction performance, and then we see how ensemble methods can improve the results. Various aspects of these methods will be highlighted: the measure of the variable importance, the influence of the parameters, the influence of the characteristics of the underlying classifier (e.g. controlling the tree size), etc.
As a first step, we will focus on R (rpart, adabag and randomforest packages) and Python (scikit-learn package). We can multiply analyses by programming. Among others, we can evaluate the influence of parameters on the performance. As a second step, we will explore the capabilities of software (Tanagra and Knime) providing turnkey solutions, very simple to implement, more accessible for people which do not like programming.
Keywords: R software, R programming, decision tree, classification tree, adabag package, rpart package, randomforest package, Python, scikit-learn package, bagging, boosting, random forest
Components: BAGGING, RND TREE, BOOSTING, C4.5, DISCRETE SELECT EXAMPLES
Tutorial: Bagging, Random Forest et Boosting
Files: randomforest_boosting_en.zip
References:
R. Rakotomalala, "Bagging, Random Forest, Boosting (slides)", December 2015.
Home >
Supervised Learning
> Random Forest - Boosting with R and Python
Wednesday, December 30, 2015
Random Forest - Boosting with R and Python
About The Author
stella
Nulla sagittis convallis arcu. Sed sed nunc. Curabitur consequat. Quisque metus enim, venenatis fermentum, mollis in, porta et, nibh. Duis vulputate elit in elit. Mauris dictum libero id justo.
Subscribe to:
Post Comments (Atom)
Find us on Facebook
Find us on Google Plus
Labels
- Association rules (8)
- Clustering (14)
- Data file handling (17)
- Decision tree (21)
- Exploratory Data Analysis (17)
- Feature Construction (6)
- Feature Selection (8)
- PLS Regression (5)
- Python (11)
- Regression analysis (13)
- Sipina (23)
- Software Comparison (49)
- Statistical methods (3)
- Supervised Learning (67)
- Tanagra (13)
- Text Mining (2)



No comments:
Post a Comment