DECISION STATS

The Seven C’s of Viral Content -What makes content viral online?

Viral-

Definition-(of an image, video, piece of information, etc.) circulated rapidly and widely from one Internet user to another.

Channels– Some content goes viral on some particular channels (like 4chan, or Tumblr) while gets ignored on other social media channels
Content – the type of content should match the audience type (technical or non technical) and channel used for dissemination (like Pinterest or Tumble for images)
Celebrity– Getting a celebrity (say with high enough influence score) endorsement greatly helps viral content to reach beyond initial network
Credibility or Network Effects- People find it easier to like or share content which is already proved to be a viral content or beyond a certain threshold. Some people would like the content if it already is very successful.
Customers -Content consumers can be influencers, sharers, innovators, or passive. It is critical to meet a certain threshold of certain customer types to hit viral counts.
Context– One man’s viral content is another man’s spam.
Circulation – How easy is it to circulate the content? to share it or show appreciation? to add customized comments? This affects viral nature- though it is mostly a function of hosting website than the content itself

\bonus the 8th C – Cuteness and Catiness – On the internet cute babies and cats rule in a duo-poly

2013 in review

The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog.

Here’s an excerpt:

The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 150,000 times in 2013. If it were an exhibit at the Louvre Museum, it would take about 6 days for that many people to see it.

Click here to see the complete report.

I would like to write a thank you note to some of the people who helped make Decisionstats.com possible . We had a total of 150,644 views this year.For that, I have to thank you dear readers for putting up with me- it is now our seventh year.

Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Total

	13,940	12,153	12,948	13,371	12,778	12,085	12,894	11,934	9,914	14,764	12,907	10,956	150,644

I would like to thank Chris (of Mashape) for helping me with some of the interviews I wrote here .I did 26 interviews this year for Programmable Web and a total of 30+ articles including the interviews in 2013.

Of course- we have now reached 116 excellent interviews on Decisionstats.com alone ( see http://goo.gl/V6UsCG )I would like to thank each one of the interviewees who took precious time to fill out the questions.

Sponsors- I would like to thank Dr Eric Siegel ( individually as an author and as founder chair of www.pawcon.com ) , Nadja and Ingo (for Rapid-Miner) , Dr Jonathan ( for Datamind) , Chris M (for Statace.com ) , Gergely ( Author) and many more during all these six years who have kept us afloat and the servers warm in these days of cold reflection, including Gregory (of KDNuggets.com) and erstwhile AsterData founders.

Training Partners- I would like to thank Lovleen Bhatia ( of Edureka for giving me the opportunity to make http://www.edureka.in/r-for-analytics which now has 1721 learners as per http://www.edureka.in/)

I would also specially say Thank you to Jigsaw Academy for giving me the opportunity to create
the first affordable and quality R course in Asia http://analyticstraining.com/2013/jigsaw-completes-training-of-300-students-on-r/

These training courses including those by Datamind and Coursera remain a formidable and affordable alternative to many others catching up in the analytics education game in India ( an issue I wrote here)

Each and Everyone of my students (past and present) and Everyone in the #rstats and SAS-L community, including people who may have been left out.

Thank you sir, for helping me and Decisionstats.com !

Wish each one of you a very happy and Joyous Happy New Year and a great and prosperous 2014!

Desi Movie Review- Dhoom 3

This is Bollywood’s take and tribute to Christopher Nolan. With the finest acting ensemble of Aamir Khan and Bachhan Jr, a plot borrowed half from The Prestige , The Dark Knight and a whole lot of buddy cop- biker gangsta bromance- I loved this movie! That ‘s how nicely they mix the masala together and make it all magically real . When East meets West- magic can happen especially in the movies ( and quite literally in the case of the dancing of Anglo-Indian actress Katrina Kaif)

Go watch it- it’s a splendid year ending movie to watch with your family

Karl Rexer Interview on the state of Analytics

To cap off a wonderful year, we have decided to interview Karl Rexer , founder of http://www.rexeranalytics.com/ and of the data mining survey that is considered the Industry benchmark for the state of the industry in analytics.

Ajay: Describe the history behind doing the survey , how you came up with the idea and what all players do you think survey the data mining and statistical software market apart from you

Karl: Since the early 2000s I’ve been involved on the organizing and review committees for several data mining conferences and workshops. Early in the 2000s, in the hallways at these conferences I heard many analytic practitioners discussing and comparing their algorithms, data sources, challenges, tools, etc. Since we were already conducting online surveys for several of our clients, and my network of data miners is pretty large, I realized that we could easily do a survey of data miners, and share the results with the data mining community. I saw that the gap was there (and the interest), and we could help fill it. It was a way to give back to the data mining community, and also to raise awareness in the marketplace for my company, Rexer Analytics. So in 2007 we launched the first Data Miner Survey. In the first year, 314 data miners participated, and it’s just grown from there. In each of the last two surveys, over 1200 people participated. The interest we’ve seen in our research summary reports has also been astounding – we get thousands of requests for the summary reports each year. Overall, this just confirms what we originally thought: both inside the industry and beyond, people are hungry for information about data mining.

Are there other surveys and reviews of analytic professionals and the analytic marketplace? Sure. And there’s room for a variety of methodologies and perspectives. Forester and Gartner produce several reports that cover the analytic marketplace – they largely focus on software evaluations and IT trends. There are also surveys of CIOs and IT professionals that sometimes cover analytic topics. James Taylor (Decision Management Solutions) conducted an interesting study this year of Predictive Analytics in the Cloud. And of course, there are also the KDnuggets single-question polls that provide a pulse on people’s views of topical issues.

Ajay: Over the years- what broad trends have you seen in the survey in terms of paradigms- name your top 5 insights over these years

Karl: Well, I can’t think of a fifth one, but I’ve got four key findings and trends we’ve seen over the years we’ve been doing the Data Miner Surveys:

The dramatic rise of open-source data mining tools, especially R. Since 2010, R has been the most-used data mining tool. And in 2013, 70% of data miners report using R. R is frequently used along with other tools, but we also see an increasing number of data miners selecting R as their primary tool.
Data miners consistently report that regression, decision trees, and cluster analysis are the key algorithms they turn to. Each of the surveys, from 2007 through 2013, has shown this same core triad of algorithms.
The challenges data miners face are also consistent: Across multiple years, the #1 challenge data miners report has been “dirty data.”. The other top challenges are “explaining data mining to others” and “difficult access to data”. In response to the 2010 survey, data miners described their best practices in overcoming these three key challenges. A summary of their ideas is available on our website here: http://www.rexeranalytics.com/Overcoming_Challenges.html. And three linked “challenge” pages contain almost 200 verbatim best practice ideas collected from survey respondents.
We also see that there is excitement among analytic professionals, high job satisfaction, and room for more and better analytics. People report that the number of analytic projects is increasing, and the size of analytic teams is increasing too. But still there’s room for much wider and more sophisticated use of analytics – only a minority of data miners consider their companies to be analytically sophisticated.

Ajay: What percentage of people are now doing analytics on the cloud, on mobile, tablet , versus desktop

Karl: In the past few years we’ve seen a doubling in the percent of people who report doing some of their analytics using cloud environments. It’s still the minority of data miners, but it’s grown from 7% in 2010 to 10% in 2011, and 19% reporting using cloud environments in 2013.

Ajay:Your survey is free. How does it help your consulting practice?

Karl: Our main motivation for doing the Data Miner Survey is to contribute to the data mining community. We don’t want to charge a fee for the summary reports, because we want to get the information into as many people’s hands as possible. And we want people to feel free to send the report on to their friends and colleagues.

However, the Data Miner Survey does also help Rexer Analytics. It helps to raise the visibility of our company. It increases the traffic and links to our website, and therefore helps our Google rankings. And it is a great conversation starter.

Ajay: Name some statistics on how popular your survey has become over time- in terms of people filling the survey , and people reading the survey

Karl: In 2007 when we launched the first Data Miner Survey, 314 data miners participated, and it’s grown nicely from there. In each of the last two surveys, over 1200 people participated. The interest we’ve seen in our research summary reports has also been growing at a dramatic rate – recently we’ve been getting thousands of requests for the summary reports each year. Additionally, we have been unveiling the highlights of the surveys with a presentation at the Fall Predictive Analytics World conferences, and it is always a popular talk.

But the most gratifying aspects about the expanded interest in our Data Miner Survey are two things:

The great conversations that the Data Miner Survey has initiated. I have wonderful conversations with people by phone, email and at conferences and at colleges about the findings, the trends, and about all the great ideas people have for new and exciting ways that they want to apply analytics in their domains – everything from human resource planning to cancer research, and customer retention to fraud detection. And many people have contributed ideas for new questions or topics that we have incorporated into the survey.
Seeing that people in the data mining community find the survey results useful. Many students and young people entering the field have told us the summary reports provide a great overview of the field and emerging trends. And many software vendors have told us that the survey helps them better understand the needs and preferences of hands-on data mining practitioners. I’m often surprised to see new people and places that are reading and appreciating our survey. We get emails from all corners of the globe, asking questions about the survey, or asking to share it with others. Sometime last year after receiving a question from an academic researcher in Asia, I decided to check Google Scholar to see who is citing the Data Miner Survey in their books and published papers. The list was long. And the list of online news stories, blogs and other mentions of the Data Miner Survey was even longer. I started a list of citations, with links back to the places that are citing the Data Miner Survey – you can look at the list here: http://www.rexeranalytics.com/Data_Miner_Survey_Citations.html – there are over 100 places citing our research, and the list includes 15 languages. But even more surprising was finding that someone had created a Wikipedia entry about the Data Miner Surveys. I made a couple small edits, but then I stopped. The accepted rule in the Wikipedia community is to not edit things that one has a personal interest in. However, I want to encourage any Wikipedia authors out there to go and help update https://en.wikipedia.org/wiki/Rexer%27s_Annual_Data_Miner_Survey.

Ajay -What do you think are the top 3 insightful charts from your 2013 Report

Karl- OK, it’s tough for me to pick only 3. I think that you should pick the three that you think are the most insightful, and then blog about them and the reasons you think they’re important.

But if you want me to pick 3, then here are three good ones:

— R Usage graph on page 16

— Algorithm graph on page 36

— The pair of graphs on page 19 that show that there’s still a lot of room for improvement

Happy new year!

(Ajay- You can see the wonderful report at http://www.rexeranalytics.com/ especially the collection of links in the top right corner of the home page that cite this survey)

Misconceptions and Fallacies in Analytics Education in India

Teaching a software and labeling it as analytics education- Some examples are Teaching Analytics with MS Excel (a spreadsheet software) , or Teaching a Statistics or Optimization syllabus and tagging it as Business Analytics.
Promise to teach language X but use cheaper software Y– Examples can be offering to teach SPSS language but using the open source equivalent PSPP
Overcharge for a day or two’s workshop- Albert Einstein could not learn a computer language in 3 days he could just get the basics. Anything priced above 500 $ and less than 4 days training is a simple effort to fool you you are getting your much more than your money’s worth.
Extend training to more than 2 months and then overcharge– This is a failure unless done by an accredited college
Freebies– There is no free lunch. Overcharging and giving a discount is a standard marketing malpractice.
Brand Associations– Brand X is well known but has no credentials in Analytics. So it ties up with a couple of analytics consultants and launches a certificate or certification or diploma program in analytics. Unfortunately this extends to the very very best of Indian education.
Hidden costs also known as We are cheap because we are in India- Analytics software costs almost the same through out the world ( I did propose a PPP method for pricing software differently). Anyone offering discount because of geography is selling you a bridge in Nigeria or a million dollars in Iraq.
Self Paced Learning-Learn Online for Fee- or Free- No, learning needs interaction and instructors- otherwise all universities in the worlds would have moved the professors to research (?) and offered videos to the students for self learning
Better Much Better Support- Some analytics providers aim to distinguish themselves by saying we give better support. Yet their support team is hidden and mostly the instructor giving support. The best solution is to publish members of support team names as is done in support services industry.

These are personal observations and may or may not be true to every organization. All opinions are mine only.

Even more variety in Cloud Computing Instances from AWS

If you ever complain that R is slow because it stores it in RAM , well here is a whole lot of RAM for you.

From-

http://aws.typepad.com/aws/2013/12/amazon-ec2-new-i2-instance-type-available-now.html

The Specs
Here are the instance sizes and the associated specs:

Instance Name	vCPU Count	RAM	Instance Storage (SSD)	Price/Hour
i2.xlarge	4	30.5 GiB	1 x 800 GB	$0.85
i2.2xlarge	8	61 GiB	2 x 800 GB	$1.71
i2.4xlarge	16	122 GiB	4 x 800 GB	$3.41
i2.8xlarge	32	244 GiB	8 x 800 GB	$6.82