Python for R Users is now published

with the grace of God and blessings of everyone, I am humbly announcing the publication of my third book in data science- Python for R Users – A Data Science Approach http://as.wiley.com/WileyCDA/WileyTitle/productCd-1119126762.html

 

Unable to find vcvarsall.bat while installing Python package

Solution to error message above

  1. Upgrade pip  (https://stackoverflow.com/questions/15221473/how-do-i-update-pip-itself-from-inside-my-virtual-environment )

on Windows

python -m pip install --upgrade pip

2. Download unofficial window binaries from http://www.lfd.uci.edu/~gohlke/pythonlibs/ then install the local wheel using pip

 

C:\Users\Ajay>pip install statsmodels-0.8.0-cp34-cp34m-win_amd64.whl

3 Other more complicated ways- above two methods work best for me

https://blogs.msdn.microsoft.com/pythonengineering/2016/04/11/unable-to-find-vcvarsall-bat/

https://stackoverflow.com/questions/27670365/python-pip-install-error-unable-to-find-vcvarsall-bat-tried-all-solutions

https://stackoverflow.com/questions/2817869/error-unable-to-find-vcvarsall-bat

Change Python Version for Jupyter Notebook

Three ways to do it- sometimes package dependencies force analysts and developers to require older versions of Python

  1. use conda to downgrade Python version (if Anaconda installed already)
 conda install python=3.5.0

Hat tip- http://chris35wills.github.io/conda_python_version/

https://docs.anaconda.com/anaconda/faq#how-do-i-get-the-latest-anaconda-with-python-3-5

2. you download the latest version of Anaconda and then make a Python 3.5 environment.

To create the new environment for Python 3.6, in your Terminal window or an Anaconda Prompt, run:

conda create -n py35 python=3.5 anaconda


3) Uninstall Anaconda and install older version of Anaconda https://repo.continuum.io/archive/  (download the most recent Anaconda that included Python 3.5 by default, Anaconda 4.2.0)

Lie factor in Gun Deaths Visualization

Edward Tufte in his seminal book talked of lie factor. See image below, and how columbine seems higher than virginia tech thanks to the dotted line even though it had 50% less casualities

http://www.infovis-wiki.net/index.php/Lie_Factor

The “Lie Factor” is a value to describe the relation between the size of effect shown in a graphic and the size of effect shown in the data.

Edward Tufte, Prof. at the Yale University, defined the “Lie Factor” in his book “The Visual Display of Quantitative Information” in 1983.

He states the principle that

The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the quantities represented.
Image from http://www.huffingtonpost.com/entry/what-will-happen-to-the-las-vegas-shooters-suite-at-mandalay-bay_us_59d6721ae4b0f6eed34ef753?section=us_politics

What exactly does a Data Scientist do as a job?

Got an interesting query on LinkedIn

  • What exactly does a Data Scientist do as a job

  • what are roles of a data scientist

    Now since I have been for almost 14 years doing something related to data science even before data science became a term on Wikipedia https://en.wikipedia.org/wiki/Data_science – here are my views

    a data scientist is simply a person who can

      write code = in R,Python,Java, SQL, Hadoop (Pig,HQL,MR)   etc

                         = for data storage, querying, summarization,  visualization

                         = how efficiently, and in time (fast results?)

                                                     = where on databases, on cloud, servers

       and understand  enough statistics

             to                              derive insights from data

        so            business can make decisions

    It involves coding, it involves presenting insights, it involves gathering requirements like a consultant. So you need the following

    ability to write complex SQL queries

    ability to move ,create,delete files on command prompt in Linux

    code in Python and in R and in SAS

    do machine learning (in R caret/party/e1071 packages and in Python scikit learn and in Spark MLLIB) and SAS Enterprise Miner

    ability to  learn new languages quickly (Hadoop, Hive , Pyspark)

    do analysis on small data using statistics (R/Python/SAS) and on big data

    make presentations on insights to senior management

    > Lots of roles for a single term -data scientist