Machine learning for credit scoring

Applying machine learning to predict the probability of default of mechanical sector companies.

Last updated on Jan 31, 2022 Academic Project, R

Image credit: Pixabay

Introduction

In the practice of banking and financial institutions, financing activities are subject to a careful assessment of the creditworthiness and solvency of a client.

This process is commonly known as credit scoring and is based on the application of statistical methods to analyze particular information about the characteristics of the client (payment history, employment, balance sheet data and other personal information) and the type of loan to be provided, summarizing through a score their risk profile to decide whether to grant or refuse the loan.

This project aims to analyze the solvency of companies in the Italian general mechanical engineering sector, using a sample of companies both active in the market and failed, to better understand what are the main factors that influence the probability of default in this market and create a classification of companies by different risk levels.

About the proect

The analysis was performed in R using machine learning methods for classification (logistic regression and linear discriminant analysis), dimensionality reduction (principal component analysis) and model selection (stepwise).

The data used come from the database AIDA (Analisi Informatizzata delle Aziende Italiane) offered by Bureau van Dijk, which collects information on Italian companies, classified by sector and geographical area.

In particular, the project is structured as follows:

Importing and cleaning data (detection of outliers and missing data)
Creation of balance sheet indices for analysis and descriptive statistics
Sampling: stratification by asset size and geographic area, splitting (training and test sample) for cross validation
Correlation analysis and PCA
Variables and model selection, comparing logistic regression and LDA with regressors obtained from PCA and stepwise methods
Scoring and risk classes
Conclusion

Not yet published, it will be available soon.

machine learning classification logistic regression LDA PCA credit scoring finance

Machine learning for credit scoring

Introduction

About the proect

Related