I get this question a lot – How do I shift to a data science career. I have been doing data analysis since 2004 (in SAS) when we used to call it business analytics , and since 2007 in R, Since 2014 in Python, by when we re branded business analytics as data science. So here are a few basics to people trying to SHIFT to data science.
My answer is learn coding, learn math, and most importantly know when to use what for insights. Data scientists are as good as the insights they create or miss not the code they write.
See this first
A slideshare I put forward last year for Summer School
Do this self examination-
- What are you good at – programming , stats, or business
- What are you bad at- programming , stats or business
- What can you learn and at what proficiency
Learning R, SAS, Python is easy but there is a confusing clutter of resources out there on the internet.
Dont wanna be SAS Certified (its just 100$ psst)
Here is some free SAS Training by Decisionstats
There is no certification in R or Python, though Hadoop has it just like SAS has it.
For R- learn R and RStudio till you can master some of the code here
or see all the R packages here at CRAN VIEWS https://cran.r-project.org/web/views/
A shorter tutorial on Python by the author is here
Learn PANDAS and SCIKIT-LEARN example https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks
Learning Statistics and Techniques
Data Mining in R
Where to learn machine learning
This comes with experiences and domain research and study.
I hope this helps. I will follow with specific answers to specific career questions in data science soon.
At Business Analytics Summit hosted by WeekendR,
I presented at the Delhi School of Economics Economics Department placement workshop a small presentation on careers in analytics
I basically talked of my 12 years of adventures in consulting, writing and teaching around data science and analytics
The revolution will not be televised, brother –Gil Scott-Heron
Veteran R Community members must recall R founder’s Ross Ihaka ‘s warning against Revolution Analytics not being truly open source,
and the sale to Microsoft will be keeping Revolution R open source in the time being ,
it did proved Ross Ihaka was right.
How do you help create an open source revolution in statistics by selling a company to Microsoft beats me.
And how do you just take 6000 packages for free from open source community, add 6-9 packages of your own and then repackage the bundle as a new innovation?
Even though Revolution analytics created 3 CEO JOBs,including SPSS founder Norman Nie, and 1 name change (from computing to analytics) and 1 mass firing ( with a 50% layoff they wont be winning the best employer award), in the end what drives software is lots of sales and not lots of blogs
love for computing and not hypocrisy on love for money should drive science.
A potato is a potato.
In Australia or Seattle or San Fransisco
While R community continues to move ahead with RStudio (open source still), and other interfaces,
SAS is moving forward to embrace Jupyter in it’s free University Edition. The word Jupyter itself is made from Julia, Python and R. Note whether you are a R fan or Py fan or a SAS fan, you should compare and contrast the quality of blogs, the documentation and the interface on your own. As a blogger and data scientist (?) I actually love all science
Using Jupyter and SAS together with SAS University Edition
A few months ago I shared the news about Jupyter notebook support for SAS. If you have SAS for Linux, you can install a free open-source project called sas-kernel and begin running SAS code within your Jupyter notebooks. In my post, I hinted that support for this might be coming in the SAS University Edition. I’m pleased to say that this is one time where my crystal ball actually worked — Jupyter support has arrived!
(Need to learn more about SAS and Jupyter? Watch this 7-minute video from SAS Global Forum.)
How do I run Jupyter Notebook in SAS University Edition using VirtualBox?
In order to run Jupyter Notebook in SAS University Edition, you must first add the SAS University Edition vApp to VirtualBox. When you specify the URL to run Jupyter Notebook, you must specify the port number for Jupyter Notebook.
- Follow the steps to add the SAS University Edition vApp to VirtualBox.
If you want to access files from or save files to your local computer from Jupyter Notebook in SAS University Edition, you must also set up a shared folder. For more information, see the following topics:
- If you downloaded a new version in July 2016, the additional port is automatically added for you. Skip this step and proceed to step 3.
- Start the day with yoga or a fitness routine http://www.youtube.com/watch?v=xEyyu7kk0ZI
- Eat Healthy (hopefully using a nutrition chart printed out and make your food, but if not eat out at a nearby place. If this is too complicated just eat a lot of salad, water and tea)
- Lets get some brain exercises to improve memory and cognition with Lumosity or something else . Most people end up with mobile games ( They may not be the best brain exercise games)
- Plan your day using a notepad and pencil (less complicated) on what you want to do
- DO THE DAY JOB THAT PAYS FOR COFFEE FOR 8 hours a DAY
- Skill up
- Github – contibute every day and be seen
- Linkedin – promote yourself once by a SEO profile
- Twitter – #mybrand #myexpertise once a week
- Wind down for the day using yoga or relaxation music
- Meet people for dinner
- Do until
When overworked analysts use shortcuts to search huge noisy dirty databases, they create trails which can be mined for actual heuristics
A heuristic technique (/hjᵿˈrɪstᵻk/; Ancient Greek: εὑρίσκω, “find” or “discover”), often called simply a heuristic, is any approach to problem solving, learning, or discovery that employs a practical method not guaranteed to be optimal or perfect, but sufficient for the immediate goals. Where finding an optimal solution is impossible or impractical, heuristic methods can be used to speed up the process of finding a satisfactory solution
Example- A Police chief in Chicago may adopt different heuristics than in New York than in New Orleans for allocating human resources
Solution- Make a database of heuristics as actually in practice for that particular domain
Additional Solution- Search Companies to partner not just in giving data but also training and in some case search algorithms for database analysis and database design reviews of Homeland Security
Occam’s razor (also written as Ockham’s razor, and lex parsimoniae in Latin, which means law of parsimony) is a problem-solving principle attributed to William of Ockham (c. 1287–1347), who was an English Franciscan friar, scholastic philosopher and theologian. The principle can be interpreted as stating Among competing hypotheses, the one with the fewest assumptions should be selected.
Related – How to amplify noise in social media using other algorithms