The treatment of missing values during the learning process has been received a lot of attention of researchers. We have published a tutorial about this in the context of logistic regression induction . By contrast, the handling of the missing values during the classification process, i.e. when we apply the classifier on an unlabeled instance, is less studied. However, the problem is important. Indeed, the model is designed to work only when the instance to label is fully described. If some values are not available, we cannot directly apply the model. We need a strategy to overcome this difficulty .
In this tutorial, we are in the supervised learning context. The classifier is a logistic regression model. All the descriptors are continuous. We want to evaluate on various datasets from the UCI repository the behavior of two imputations methods: the univariate approach and the multivariate approach. The constraint is that the imputation models must rely on information from the learning sample. We consider that this last one does not contain missing values.
We note that the occurrence of the missing value on the instance to classify is "missing completely at random" in our experiments i.e. each descriptor has the same probability to be missing.
Keywords: missing values, missing features, classification model, logistic regression, multiple linear regression, r software, glm, lm, NA
Components: Binary Logistic Regression
Tutorial: : en_Tanagra_Missing_Values_Deployment.pdf
Dataset and programs (R language): md_logistic_reg_deployment.zip
References:
Howell, D.C., "Treatment of Missing Data".
M. Saar-Tsechansky, F. Provost, “Handling Missing Values when Applying Classification Models”, JMLR, 8, pp. 1625-1657, 2007.
Home >
Supervised Learning
> Handling missing values in prediction process
Sunday, October 28, 2012
Handling missing values in prediction process
About The Author
stella
Nulla sagittis convallis arcu. Sed sed nunc. Curabitur consequat. Quisque metus enim, venenatis fermentum, mollis in, porta et, nibh. Duis vulputate elit in elit. Mauris dictum libero id justo.
Labels:
Software Comparison,
Supervised Learning
Subscribe to:
Post Comments (Atom)
Find us on Facebook
Find us on Google Plus
Labels
- Association rules (8)
- Clustering (14)
- Data file handling (17)
- Decision tree (21)
- Exploratory Data Analysis (17)
- Feature Construction (6)
- Feature Selection (8)
- PLS Regression (5)
- Python (11)
- Regression analysis (13)
- Sipina (23)
- Software Comparison (49)
- Statistical methods (3)
- Supervised Learning (67)
- Tanagra (13)
- Text Mining (2)



No comments:
Post a Comment