Missing Value Imputation and Dealing With Outliers

Missing Value Imputation and Dealing With Outliers

These are an important part of data pre-processing and these are rarely taught in DONKEY ACADEMY who charge you a lot to give you a certificate that doesn’t give you a job.

So okay after that violence and double talk (from Dire Straits) here is how you deal with outliers

1) Replace outliers or missing values them with mean or median – based on distribution -which you see if age< 20 or age>80 then age=median(age)

2) Replace them by capping upper and lower limits. eg an age distribution of 1-120 for bank customers can be capped like if age<20 then age=20 if age>80 then age=80

3) Use MICE package for Imputation (in R) or pandas-mice for Python (https://lnkd.in/f6Z3jj5) eg if males have median age of 50 and females have median age 0f 45, replace all male age missing values with 50 and all female missing values with 45

4) Use OutlierTest in car package in R This is barely the tip of iceberg in missing value and outliers https://lnkd.in/fus_MiF

#machinelearning hashtag#algorithms hashtag#pythonprogramminglanguage hashtag#analytics hashtag#datascience hashtag#python hashtag#rstats

Author: Ajay Ohri

http://about.me/ajayohri

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: