Description
Data mining is based on the mastery of fundamental data exploration techniques: descriptive, predictive or exploratory statistics. This practical course will introduce you to methods such as regressions and PCA and teach you how to implement them with R software.
Who is this training for ?
For whom ?Infocentre / Datamining / Marketing / Quality managers, users and business database managers.
Prerequisites
Training objectives
Training program
- Introduction to modeling
- Modeling: regression.
- Statistical modeling: reminders of statistical tests.
- Data analysis.
- Introduction to R software.
- Practical work Presentation of several modeling examples.
- Installation of R and the packages to be used.
- Applications on R, tests and interpretations on examples .
- Linear regression analysis
- Principle of linear regression.
- Simple regression, when the model has a single parameter for continuous data.
- Multiple regression, when there are more than 'a parameter.
- Other types of models for continuous data.
- Practical work Practical application in R.
- Case of simple regression and regression multiple.
- Logistic regression analysis
- Presentation of the different types of logistic regression.
- Binary logistic regression.
- Ordinal logistic regression.
- Multinomial logistic regression.
- Practical work Application on R with practical cases for cases of non-continuous data.
- Processing on data with two modalities, then with ordinal modalities, then nominal modalities.
- Component analysis
- Presentation of the different types of analyzes and selection.
- Principal Component Analysis (PCA).
- Multiple Correspondence Analysis (MCA).
- Hierarchical Classification on Principal Components (CHCP).
- Practical work The principal components make it possible to understand the covariance structure of the initial variables and/or to create a smaller number of variables to using this structure.
- Applications on R.
- Factor analysis of data
- Understand the principle of factor analysis: summarize the structure of data into a fewer number of dimensions.
- Factor Correspondence Analysis (CFA).
- Analysis Multiple Factor Analysis (AFM).
- Factor Analysis for Mixed Data (AFDM).
- Practical work Factor analysis exercises on R.
- Identification underlying "factors" of dimensions associated with significant variability.