2013 Thank You Note

I would like to write a thank you note to  some of the people who helped make Decisionstats.com possible . We had a total of 150,644 views this year.For that, I have to thank you dear readers for putting up with me- it is now our seventh year.

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Total
13,940 12,153 12,948 13,371 12,778  12,085  12,894  11,934  9,914  14,764  12,907  10,956  150,644

I would like to thank Chris  (of Mashape) for helping me with some of the interviews I wrote here .I did 26 interviews this year for Programmable Web and a total of 30+ articles including the interviews in 2013.

Of course- we have now reached 116 excellent interviews on Decisionstats.com alone ( see http://goo.gl/V6UsCG )I would like to thank each one of the interviewees who took precious time to fill out the questions.

Sponsors- I would like to thank Dr Eric Siegel ( individually as an author and as founder chair of www.pawcon.com ) , Nadja and Ingo (for Rapid-Miner) , Dr Jonathan ( for Datamind) , Chris M (for Statace.com ) , Gergely ( Author) and many more during all these six years who have kept us afloat and the servers warm in these days of cold reflection, including Gregory (of KDNuggets.com) and erstwhile AsterData founders.

Training Partners- I would like to thank Lovleen Bhatia ( of Edureka  for giving me the opportunity to make http://www.edureka.in/r-for-analytics which now has 1721 learners as per http://www.edureka.in/)

I would also specially say Thank you to Jigsaw Academy for giving me the opportunity to create
the first affordable and quality R course in Asia http://analyticstraining.com/2013/jigsaw-completes-training-of-300-students-on-r/

These training courses including those by Datamind and Coursera remain a formidable and affordable alternative to many others catching up in the analytics education game in India ( an issue I wrote here)

Each and Everyone of my students (past and present) and Everyone in the #rstats  and SAS-L community, including people who may have been left out.

Thank you sir, for helping me and Decisionstats.com !

Wish each one of you a very happy and Joyous Happy New Year and a great and prosperous 2014!

Desi Movie Review- Dhoom 3

This is Bollywood’s take and tribute to Christopher Nolan. With the finest acting ensemble of Aamir Khan and Bachhan Jr, a plot borrowed half from The Prestige , The Dark Knight and a whole lot of buddy cop- biker gangsta bromance- I loved this movie! That ‘s how nicely they mix the masala together and make it all magically real . When East meets West- magic can happen especially in the movies ( and quite literally in the case of the dancing of  Anglo-Indian actress Katrina Kaif)

Go watch it- it’s a splendid year ending movie to watch with your family

d_1380528739_540x540

Karl Rexer Interview on the state of Analytics

To cap off a wonderful year, we have decided to interview Karl Rexer , founder of http://www.rexeranalytics.com/ and of the data mining survey that is considered the Industry benchmark for the state of the industry in analytics.

Ajay: Describe the history behind doing the survey , how you came up with the idea and what all players do you think survey the data mining and statistical software market apart from you

 Karl: Since the early 2000s I’ve been involved on the organizing and review committees for several data mining conferences and workshops. Early in the 2000s, in the hallways at these conferences I heard many analytic practitioners discussing and comparing their algorithms, data sources, challenges, tools, etc. Since we were already conducting online surveys for several of our clients, and my network of data miners is pretty large, I realized that we could easily do a survey of data miners, and share the results with the data mining community. I saw that the gap was there (and the interest), and we could help fill it. It was a way to give back to the data mining community, and also to raise awareness in the marketplace for my company, Rexer Analytics. So in 2007 we launched the first Data Miner Survey. In the first year, 314 data miners participated, and it’s just grown from there. In each of the last two surveys, over 1200 people participated. The interest we’ve seen in our research summary reports has also been astounding – we get thousands of requests for the summary reports each year. Overall, this just confirms what we originally thought: both inside the industry and beyond, people are hungry for information about data mining.

Are there other surveys and reviews of analytic professionals and the analytic marketplace? Sure. And there’s room for a variety of methodologies and perspectives. Forester and Gartner produce several reports that cover the analytic marketplace – they largely focus on software evaluations and IT trends. There are also surveys of CIOs and IT professionals that sometimes cover analytic topics. James Taylor (Decision Management Solutions) conducted an interesting study this year of Predictive Analytics in the Cloud. And of course, there are also the KDnuggets single-question polls that provide a pulse on people’s views of topical issues.

Ajay: Over the years- what broad trends have you seen in the survey in terms of paradigms- name your top 5 insights over these years

Karl: Well, I can’t think of a fifth one, but I’ve got four key findings and trends we’ve seen over the years we’ve been doing the Data Miner Surveys:

  1. The dramatic rise of open-source data mining tools, especially R. Since 2010, R has been the most-used data mining tool. And in 2013, 70% of data miners report using R. R is frequently used along with other tools, but we also see an increasing number of data miners selecting R as their primary tool.
  2. Data miners consistently report that regression, decision trees, and cluster analysis are the key algorithms they turn to. Each of the surveys, from 2007 through 2013, has shown this same core triad of algorithms.
  3. The challenges data miners face are also consistent: Across multiple years, the #1 challenge data miners report has been “dirty data.”. The other top challenges are “explaining data mining to others” and “difficult access to data”. In response to the 2010 survey, data miners described their best practices in overcoming these three key challenges. A summary of their ideas is available on our website here: http://www.rexeranalytics.com/Overcoming_Challenges.html. And three linked “challenge” pages contain almost 200 verbatim best practice ideas collected from survey respondents.
  4. We also see that there is excitement among analytic professionals, high job satisfaction, and room for more and better analytics. People report that the number of analytic projects is increasing, and the size of analytic teams is increasing too. But still there’s room for much wider and more sophisticated use of analytics – only a minority of data miners consider their companies to be analytically sophisticated.

 Ajay: What percentage of people are now doing analytics on the cloud, on mobile, tablet , versus desktop

Karl: In the past few years we’ve seen a doubling in the percent of people who report doing some of their analytics using cloud environments. It’s still the minority of data miners, but it’s grown from 7% in 2010 to 10% in 2011, and 19% reporting using cloud environments in 2013.

Ajay:Your survey is free. How does it help your consulting practice?

Karl: Our main motivation for doing the Data Miner Survey is to contribute to the data mining community. We don’t want to charge a fee for the summary reports, because we want to get the information into as many people’s hands as possible. And we want people to feel free to send the report on to their friends and colleagues.

However, the Data Miner Survey does also help Rexer Analytics. It helps to raise the visibility of our company. It increases the traffic and links to our website, and therefore helps our Google rankings. And it is a great conversation starter.

Ajay: Name some statistics on how popular your survey has become over time- in terms of people filling the survey , and people reading the survey

Karl: In 2007 when we launched the first Data Miner Survey, 314 data miners participated, and it’s grown nicely from there. In each of the last two surveys, over 1200 people participated. The interest we’ve seen in our research summary reports has also been growing at a dramatic rate – recently we’ve been getting thousands of requests for the summary reports each year. Additionally, we have been unveiling the highlights of the surveys with a presentation at the Fall Predictive Analytics World conferences, and it is always a popular talk.

But the most gratifying aspects about the expanded interest in our Data Miner Survey are two things:

  1. The great conversations that the Data Miner Survey has initiated. I have wonderful conversations with people by phone, email and at conferences and at colleges about the findings, the trends, and about all the great ideas people have for new and exciting ways that they want to apply analytics in their domains – everything from human resource planning to cancer research, and customer retention to fraud detection. And many people have contributed ideas for new questions or topics that we have incorporated into the survey.
  2. Seeing that people in the data mining community find the survey results useful. Many students and young people entering the field have told us the summary reports provide a great overview of the field and emerging trends. And many software vendors have told us that the survey helps them better understand the needs and preferences of hands-on data mining practitioners. I’m often surprised to see new people and places that are reading and appreciating our survey. We get emails from all corners of the globe, asking questions about the survey, or asking to share it with others. Sometime last year after receiving a question from an academic researcher in Asia, I decided to check Google Scholar to see who is citing the Data Miner Survey in their books and published papers. The list was long. And the list of online news stories, blogs and other mentions of the Data Miner Survey was even longer. I started a list of citations, with links back to the places that are citing the Data Miner Survey – you can look at the list here: http://www.rexeranalytics.com/Data_Miner_Survey_Citations.html – there are over 100 places citing our research, and the list includes 15 languages. But even more surprising was finding that someone had created a Wikipedia entry about the Data Miner Surveys. I made a couple small edits, but then I stopped. The accepted rule in the Wikipedia community is to not edit things that one has a personal interest in. However, I want to encourage any Wikipedia authors out there to go and help update https://en.wikipedia.org/wiki/Rexer%27s_Annual_Data_Miner_Survey.

 Ajay -What do you think are the top 3 insightful charts from your 2013 Report

Karl-  OK, it’s tough for me to pick only 3.  I think that you should pick the three that you think are the most insightful, and then blog about them and the reasons you think they’re important.

 But if you want me to pick 3, then here are three good ones:
— R Usage graph on page 16 
Screenshot from 2013-12-26 06:37:34
— Algorithm graph on page 36  
Screenshot from 2013-12-26 06:39:10
— The pair of graphs on page 19 that show that there’s still a lot of room for improvement
Happy new year!
(Ajay- You can see the wonderful report at http://www.rexeranalytics.com/ especially  the collection of links in the top right corner of the  home page that cite this survey)

Misconceptions and Fallacies in Analytics Education in India

  1.  Teaching a software and labeling it as analytics education- Some examples are Teaching Analytics with MS Excel (a spreadsheet software) , or Teaching a Statistics or Optimization syllabus and tagging it as Business Analytics.
  2. Promise to teach language X but use cheaper software Y– Examples can be offering to teach SPSS language but using the open source equivalent PSPP
  3. Overcharge for a day or two’s workshop- Albert Einstein could not learn a computer language in 3 days he could just get the basics. Anything priced above 500 $ and less than 4 days training is a simple effort to fool you you are getting your much more than your money’s worth.
  4. Extend training to more than 2 months and then overcharge– This is a failure unless done by an accredited college
  5. Freebies– There is no free lunch. Overcharging and giving a discount is a standard marketing malpractice.
  6. Brand Associations– Brand X is well known but has no credentials in Analytics. So it ties up with a couple of analytics consultants and launches a certificate or certification or diploma program in analytics. Unfortunately this extends to the very very best of Indian education.
  7. Hidden costs also known as We are cheap because we are in India-  Analytics software costs almost the same through out the world ( I did propose a PPP method for pricing software differently). Anyone offering discount because of geography is selling you a bridge in Nigeria or a million dollars in Iraq.
  8. Self Paced Learning-Learn Online for Fee- or Free- No, learning needs interaction and instructors- otherwise all universities in the worlds would have moved the professors to research (?) and offered videos to the students for self learning
  9. Better Much Better Support- Some analytics providers aim to distinguish themselves by saying we give better support. Yet their support team is hidden and mostly the instructor giving support. The best solution is to publish members of support team names as is done in support services industry.

These are personal observations and may or may not be true to every organization. All opinions are mine only.

Even more variety in Cloud Computing Instances from AWS

If you ever complain that R is slow because it stores it in RAM , well here is a whole lot of RAM for you.

 

From-

http://aws.typepad.com/aws/2013/12/amazon-ec2-new-i2-instance-type-available-now.html

The Specs
Here are the instance sizes and the associated specs:

Instance Name vCPU Count RAM
Instance Storage (SSD) Price/Hour
i2.xlarge 4 30.5 GiB 1 x 800 GB $0.85
i2.2xlarge 8 61 GiB 2 x 800 GB $1.71
i2.4xlarge 16 122 GiB 4 x 800 GB $3.41
i2.8xlarge 32 244 GiB 8 x 800 GB $6.82

 

This leaves these guys way behind

https://cloud.google.com/products/compute-engine/

High Memory

Machines for tasks that require more memory relative to virtual cores

Instance type Virtual Cores Memory Price (US$)/Hour
(US hosted)
Price (US$)/Hour
(Europe hosted)
n1-highmem-2 2 13GB $0.244 $0.275
n1-highmem-4 4 26GB $0.488 $0.549
n1-highmem-8 8 52GB $0.975 $1.098
n1-highmem-16 16 104GB $1.951 $2.196

Top 7 Business Strategy Models

UPDATED POST- Some Models I use for Business Strategy- to analyze the huge reams of qualitative and uncertain data that business generates. I have added a bonus the Business canvas Model (number 2)

  1. Porters 5 forces Model-To analyze industries
  2. Business Canvas
  3. BCG Matrix- To analyze Product Portfolios
  4. Porters Diamond Model- To analyze locations
  5. McKinsey 7 S Model-To analyze teams
  6. Gernier Theory- To analyze growth of organization
  7. Herzberg Hygiene Theory- To analyze soft aspects of individuals
  8. Marketing Mix Model- To analyze marketing mix.

Continue reading “Top 7 Business Strategy Models”

More to CyberWar than just defacing websites

We are trying to hypothesize realistic scenarios based on existing or near by technologies between the two extreme and popular notions of cyber warfare- that promoted by Die Hard 4 and those who think it is only about bringing down databases.

A Cyberwar is like any war – it will have stages

Reconnaissance and Spying- Not just port sniffing or spear phishing, it will identify the primary and secondary targets in the first and second wave of attacks. It will include both civilian and military strategic targets as well as tactical ones.

For example- communication systems of military infantry and other ground forces are comparatively hardened , but completely disabling the military hospital infrastructure of a country is likely to have more psychological and impact effects. This could range from industrial machines to critical hardware all linked up nicely

The communication system of emergency services is more easily disrupted digitally and can cause more damage than military communication. A simple attack on phone systems for emergency calls (i.e 9-1-1 or 3-1-1 for West or 100 , 101, 102 for India)

This could also include databases on individuals to be targeted including their civilian family members. If you hack the Senator’s daughter Facebook account, trust me it is easier and just as much distracting than hacking the Senator’s website. A list of possible databases to be hacked have been written about here and here

Add oil producing grids, dams, electricity grids and water supplies for civilians to this list and you can see how even messing with the digital diagnostics of the infrastructure can impact the efficiency of enemy response.

So the next time you wear that Pirate Bay T Shirt and participate in the masked rally, know that you are not just protesting war you are unwillingly and unwittingly participating in one.