How Jupyter Ipython threaten the dominance of RStudio for data science developers

RStudio is the clear market leader in IDE used by developers for R data science.

R is the clear market leader for data science.

Python can do with more wrappers for R like packages.

But Jupyter is awesome (once you get it working!)

Screenshot from 2015-10-20 14:29:53

Hopefully, multi core stuff and cloud hosted stuff should be easy too. Google Cloud Data Labs with hosted Jupyter is just the first step. see https://cloud.google.com/datalab/

One of the best things I like about Jupyter OVER RStudio’s interface is the ability to divide code blocks in cells. In addition the ability to install new packages from with RSTUDIO really helps me over the Jupyter. The syntax prompt in latest version of RSTUDIO is something I wish JUPYTER really worrks on.

Can we have a RSTUDIO like interface to working with Python. Yes Yhat made one and called it RODEO. This is because the interface is based on the ACE editor ( yes esseentially RStdudio the company married ACE Editor to Hadley Wickham to get RSTUDIO the product 😉  . Shiny was wonderful but for scalable data science Python and Java help me just as much as R does for BIg DATA ANALYSIS) Scalability is the key here! Rpubs isnt as popular as NBviewer is and now we can wrap markdown within a Jupyter notebook

Screenshot from 2015-10-20 14:28:03

Screenshot from 2015-10-20 14:27:36

Screenshot from 2015-10-15 18:37:41

can Jupyter help in my data science work more than RStudio? These are early days but I prefer a cross platform cross language ( Julia, Python and R) solution anyday. Provided it works just as seamlessly than the established market leader RStudio.

BIG DATA ANALYTICS is where I clearly see JUPYTER help data scientists more than RStdudio as you can use the IRKERNEL. I am especially hoping to see the Spark Kernel , JS Kernel  https://www.npmjs.com/package/ijavascript  and others be  more production ready for business enterprises.

https://github.com/ibm-et/spark-kernel

A version of the Spark Kernel is deployed as part of the Try Jupyter! site. Select Scala 2.10.4 (Spark 1.4.1) under the New dropdown. Note that this version only supports Scala.

https://github.com/ipython/ipython/wiki/IPython-kernels-for-other-languages

Python/Jupyter kernels:

The Kernel Zero, is of course IPython, which you can get though ipykernel, and still comes (for now) as a dependency of jupyter. The IPython kernel can be thought as a reference implementation, here are other available kernels:

Name Link Jupyter/IPython Version Language(s) Version 3rd party dependencies
ICSharp https://github.com/zabirauf/icsharp Jupyter 4.0 C# 4.0+ scriptcs
IRKernel http://irkernel.github.io/ IPython 3.0 R 3.2 rzmq
SageMath http://www.sagemath.org/ IPython 3.2 Any

Screenshot 2015-10-20 14.07.08 (1)

Screenshot 2015-10-17 18.10.10

 

Screenshot 2015-10-20 11.17.52

Nobody sees the invisible data scientist

You are a data scientist if you help turn data into decisions. This may be a non-glamorous excel sheet, Python or R, or writing one more query to your RDBMS. Data to Decisions is the key. Don’t turn data into just  one more Powerpoint or one more spreadsheet  for management to say hmm, interesting and then ignore. Be a  data scientist for the next epoch not just for surviving the next meeting.

Don’t believe what they told you in Harvard Business Review on competing on analytics. I am trying to talk you into competing within analytics.

Learn one new thing a day. This may be trick in coding, a function in R or a library in Python. Read a lot of technical blogs. Write one  blog post a week. Clarity of thought is only proved when you can write clear words. Blogging is a great way to build your personal brand.

You are not a data scientist if you are in the middle of Drew Conway’s Venn Diagram, that lovely impossible sweet zone of being balanced in business, coding, and math. You are a data scientist if the world acknowledges you as one.

So do those free courses or MOOCs, and do those hackathon contests, but write one blog post a week and learn seven new things in data science in a week. Learning one thing every day. That’s just it. Look at babies. They learn so many things rapidly.

Suspend your cynicsm and your greed for a year or two. Focus on the knowledge. Knowledge shall set you free, but getting paid is what makes you rich. People pay data scientists for their skills, but also their branding. No one wants to lose their job because they hired a sexy data scientist. Help your client or your boss look good in front of their bosses and clients. The one way you can do that is excellence.

Like Bill and Ted, focus on your excellent adventure in data science. Tools- check. Techniques -check. Business reading- check. Blogging-double check. Make a checklist of things you need to learn every day, every week, every month, every year.

Go to meetups, you putz. Dont just sit home on the weekend. Go shake hands with your fellow data scientists. Only way you can beat them is learning more things faster, being branded as a better data scientist by your writing and social media, and finally by being known in the data science fraternity as the person to go to when stuck.

Data scientists are fire fighters with code. Fight the fire in the business and a day will come when they celebrate you as a hero. Put in 10,000 hours of practise in data science. Start from giving half an hour to blogging every week, and half an hour to reading about code and techniques every day.

You give nine hours to the job, two hours to your commute, three hours to the family. You even give six hours of sleep as your brain reboots. Give yourself half an hour.

No one sees the invisible fire fighter. No one sees the man who knows a lot but is too shy to explain , share or give away a part of his code, his knowledge and his wisdom.

Code well. Dig Data Hard. In the future everything related to a decision will have a data scientist lurking somewhere. Be the guy they trust for decision making assistance.

Dont try to maximize your brilliance in your go to visible efforts, instead focus on minimizing your incompetence. Curious people often find solutions wise men overlook . What made DJ Patil, Hillary Mason, Hadley Wickham celebrated  data scientists. Not just Their ability to learn and create, but also  their ability to expand and share their learning. Surely you can share some stuff and improve your visibility.

Screenshot from 2015-10-13 08:18:53

 

 

Data Science is the new kung fu and everybody was

I get dragged, cajoled and manipulated into meetings galore.Those meetings can be summed up into- Hey, I have huge data, and I spent my life with Microsoft certifications and SQL.How do I deal with this data oil, data deluge, etc etc. These remind me of Bruce Lee and the way he started before he started swinging his wooden nonchuck.

Hire us, I say, laconically, to the people expecting us to solve Big Data problem on a whiteboard and a ppt.

On the other side I train people to be data scientists because the oil rush is on and everyone wants to learn how to use a shovel or spade. Some people need a certificate to get a shortlisted  In a interview, some need the minimum knowledge to clear an interview, and everyone wants to know what will happen to their career after 10 years of being a data scientist. These things are like the 36 chambers of Shaolin except the teacher is doing all the tasks on a web conference

I don’t know, I say, apologetically, to the wanna be most sexy profession, data scientists to be.

I also do data science unicorn startups on the side. These things are like Jackie Chan movies, where a challenge is offered to the doju master and a few jokes later everything is getting pummeled or broken. Sometimes there is a pretty girl but this is rarely done. In the technology start up world in India, we dont read Sheryl Sandberg enough.

Pay me, I say, with a greedy glint in my eye to wannabe unicorns.

 

 

Analytics for Income Tax : YES and this is how you do it

Wise Practitioner – Predictive Analytics Interview Series: Jeff Butler at IRS Research, Analysis, and Statistics organization

Q: How would you characterize your agency’s current and/or planned use of predictive analytics?  What is one specific way in which predictive analytics actively drives decisions in your agency?

A: The IRS uses a wide range of analytic methods, tools, and technologies to address such problems as ID theft, refund fraud, inventory optimization, and other activities related to its statutory mandates. In an era of persistently reduced budgets, the use of data analytics has become more important than ever to drive innovation, risk management, and decision making across the agency.

Q: Can you describe the challenges you face or have already overcome in establishing a data-driven environment in your agency?

A: Large organizations don’t change their leopard spots overnight. Building a data-driven culture involves fundamental changes to workforce skills and business-IT relationships, which requires change leadership and long-term commitments.

Q: Can you discuss any near term goals you have for improving your agency’s use of predictive analytics?

A: The U.S. taxpayer population has some complexities that present unique challenges to the IRS. For example, high-wealth individuals often behave more like a business, and businesses with connected entities often look more a group of interrelated economic structures than a single business. There is growing interest in network analysis and related methods as an exploratory approach to better understand these types of patterns.

Q: Can you describe a successful result from the employment of predictive analytics in your agency, i.e., cost avoidance, funds recovered, improved efficiency, etc.

A: ID theft remains a significant challenge for the IRS—and therefore for U.S. taxpayers as well. The financial and psychological cost to families whose tax returns are fabricated by ID thieves can be devastating and long lasting. The use of data analytics has allowed the IRS to accelerate the process of verifying ID theft cases for faster case resolution, lowering direct costs through improved automation. Analytic models are also key to detecting and preventing billions of dollars in fraudulent refund claims each year.

Q: Sneak preview: Please tell us a take-away that you will provide during your talk at Predictive Analytics World for Government.

A: Greater awareness is needed by agencies that the traditional paradigm for analyzing data in massively large environments is changing and skills need to adapt. Organizational boundaries between IT and business have to be removed. Greater emphasis needs to be placed on multi-disciplinary teams that combine skills from computer science, IT, statistics, economics, and applied math.

 

-kindly contributed by

http://www.predictiveanalyticsworld.com/patimes/wise-practitioner-predictive-analytics-interview-series-jeff-butler-at-irs-research-analysis-and-statistics-organization09022015/