Is Kaggle too tough

Is KAGGLE a website only for super human data scientists? NO NO NO

You can be a kaggler very easily-

1) Understand how kernels function especially input file and output submission- The best is to use Notebook method not script method of using code

2) Have basic knowledge of EDA and Data Viz in either R or Python ( if you dont know that EDA means exploratory data analysis you can start learning – from Kaggle KERNELS itself

3) Have basic knowledge of Machine Learning Algorithms (and how to apply ) and how to compare Area under Curve (AUC)

4) Deep Learning is advanced and for Python preferably

5) Practice one hour a day. Kaggle is like a gym for the brain if you do this for a year, see where your career zooms.

And one more thing- cross post your code on Github hashtag#bigdata hashtag#love hashtag#machinelearning hashtag#analytics hashtag#datascience hashtag#deeplearning hashtag#python hashtag#r hashtag#howto hashtag#github hashtag#datamining hashtag#datavisualization

Dear Future Data Scientists

Be a data scientist in 6 months. Learn R or SAS or Python in 6 weeks. Learn Data science by doing one capstone project on one dataset.

Sorry mate, there are no short cuts to success.

Your real data science journey begins AFTER you learn the statistics AFTER you learn the techniques AFTER you learn the tools like R/SAS/Python.

A couple of datasets like Iris / Boston / German Credit / Scraping Tweets wont do it. A few weeks on kaggle wont do it.

You probably need to spend a few more months on Kaggle and a few more months on competitive programming like www.hackerrank.com will bring your data science dreams closer.

Disclaimer-I have interviewed potential data scientists and I have taught on some of these kind of courses. #datascience #python #programming #r #statistics  #datasets

https://www.linkedin.com/feed/update/urn:li:activity:6427435737691582464

Internships are like magic for budding data scientists

https://www.linkedin.com/feed/update/urn:li:activity:6426321920102367232

Internships are the magic that a student to convert to data scientist. Not everyone can solve Kaggle contests while still in engineering or other schools. Not everyone has the money to pay for private institute’s training which basically teach R, Python or SAS but not analytical thinking when confronted with a messy real life dataset.

A project in such training is worth much less than the experience of internships.

Companies who have existing data science teams should also try and give internships to create a  steadier supply of data scientists for their operations besides building a bigger brand in data science recruitment.

Meetups and LinkedIn groups( Facebook groups) are good places to offer internships and students should make attending Meetups as a pseudo proxy part of curriculum with the goal of landing internships in each year of their summer/winter break. #analytics #recruiting #python #r #meetup  #engineers #internships #datascience

Brave New World of Data Science

The rate at which technology is changing boggles the mind.

Machine Learning to Deep Learning.

Chatbots and Block Chain.

Cloud Computing to Big Data Distributed Computing.

New open source libraries in R and Python.

The rate of innovation is increasing.

What is really good, is young people today are unafraid to take risks, to start startups than stick to big companies.

The future belongs to innovation. 

#bigdata #machinelearning #deeplearning #cloudcomputing #python #r #futurism 

Why Budding Data Scientists Should Blog?

One thing I always advise my students and internees. Write a blog and keep a social media presence to distribute the content. Why? Because you are a data scientist only when the world recognizes you as one. Writing also improves your ability to express lucidly complicated topics in a systematic manner. How to stand out in a clogged world of data science posts-

  write  1) simple  2) unique 3) useful blogs.

Having created Decisionstats with +1 million page views over the years- I know it works.  Mazel Tov! #socialmedia #datascience #writing #blogging  

A better Iris Dataset for the current era

The IRIS dataset is the curse of teaching data science. It makes applying algorithms very simple. What is needed is a BIGGER DATASET with missing values and many many variables/features that teaches the whole cycle of data science, not just plain machine learning but also data pre-processing , dimensionality, standardization, as well. Ideally the dataset should be bigger than memory (RAM) to teach efficiency as well. #datascience #machinelearning #algorithms 

https://www.linkedin.com/feed/update/urn:li:activity:6423093728885465088