Missing Value Imputation and Dealing With Outliers
These are an important part of data pre-processing and these are rarely taught in DONKEY ACADEMY who charge you a lot to give you a certificate that doesn’t give you a job.
So okay after that violence and double talk (from Dire Straits) here is how you deal with outliers
1) Replace outliers or missing values them with mean or median – based on distribution -which you see if age< 20 or age>80 then age=median(age)
2) Replace them by capping upper and lower limits. eg an age distribution of 1-120 for bank customers can be capped like if age<20 then age=20 if age>80 then age=80
3) Use MICE package for Imputation (in R) or pandas-mice for Python (https://lnkd.in/f6Z3jj5) eg if males have median age of 50 and females have median age 0f 45, replace all male age missing values with 50 and all female missing values with 45
4) Use OutlierTest in car package in R This is barely the tip of iceberg in missing value and outliers https://lnkd.in/fus_MiF