Decisionstats on Social Media| Part 4

Here are some of the social media articles that became popular This one was tough as I have written on many twitter applications, Linkedin Apps etc only to find a new application after a few months. But change is the nature of the game especially if you want to stay online.

1) Spreading content on social media

My philosophy is stuff that you do not need to keep secret, should be shared with as wide as audience as possible. For this reason, I prefer that I meet the reading audience half way, on Facebook, Linkedin, Twitter rather than play the same old come to my website if you want to read it game. If you love your Continue reading “Decisionstats on Social Media| Part 4”

Best of Decision Stats- Modeling and Text Mining Part3

Here are some of the top articles by way of views, in an  area I love– of modeling and text mining.

1) Karl Rexer – Rexer Analytics

Karl produces one of the most respected surveys that captures emerging trends in data mining and technology. Karl was also one of the most enthusiastic people I have interviewed- and I am thankful for his help in getting me some more interviews.

2) Gregory Piatesky Shapiro

One of the earliest and easily the best Knowledge Discoverer of all times, Gregory produces and the newsletter is easily the must newsletter to be on. Gregory was doing data mining , while the Google boys were still debating whether to drop out of Stanford or not.
Continue reading “Best of Decision Stats- Modeling and Text Mining Part3”

Making Government Transparent Using R

Here is a terrific interview on O’Reilley Radar at

It actually talks of using open source statistics like R to make Government more transparent- like analyzing waste.

Some interesting extracts- like I didnt know S is being maintained by SAS.( I thought Tibco had S Plus)


James Turner: So switching gears, the other thing you’re talking about and a big part of your professional life is the R language. Now I will confess that like Erlang, R is something that is on my radar and I see and I look at it and I say, “Okay. When am I ever going to use it?” I mean Erlang is used some places, but R I guess has a very nichey type of audience, doesn’t it?

Danese Cooper: You know, interestingly enough that’s changing. I think that’s been true. R has been in production or in development, let’s say, for the last 20 years. It is patterned after the S language, which was developed in the ’60s at Bell Labs around the same time that UNIX and C were being developed. And it was S for statistics, right? R is sort of a, “If we had known then what we know now” version of S. They’ve been working on it for 20 years in an academic setting. So it has been very slow to grow. But just in the last couple of years, it’s really gotten to a place where it’s ready for enterprise use. And just this year, the people that maintain S, a company called SAS, S-a-s, in South America, south of this country, have announced that they’re going to have to support R, like it’s that widely used now, particularly in schools.

Danese Cooper works for Revolution COmputing that creates a wonderful and professional version of R called Revolution R – some of the work on parallelization and enabling 64 bit Windows R is great. Danese is also a solid open source credentials person having worked with the Board and also with Apache. O Reilley Media’s work in open source conferences is terrific as well.

That apart, the great stuff is in the rest of this must read interview which is available at

The Top Decisionstats Articles -Part 2 Business Intelligence and Data Quality

I am self convinced novice at business intelligence. I understand the broad concepts, understand reporting tools, and definitely forecasting tools. But the whole systems view baffles me enough. Fortunately I have been learning from some of the best writers in this field. Here in order of circulation are the top Business Intelligence articles.

Business Intelligence

1) Jill Dyche

Jill is a fabulously wise and experienced person with a great writing style. Here answers were some of the most educative I have seen in Bi writing.

2) Peter Thomas

The best of British BI is epitomized by Peter Thomas, and he is truly a European giant when it comes to the field. His worst weakness is a tendency to disappear when Test cricket is around- but that is

eminently understable. I can relate to the cricket as well.

3) Karen Lopez

Karen gives an excellent insight on creating mock ups or data models before actual implementation. She has worked on it for three decades and her wisdom is clearly visible here.

Data Quality

Data quality is such an overlooked and easy to fix issue, that I belive any BI vendor that builds the best, most robust data quality architechture will gain the maximum Pareto like benefits out of results. Curiously competing BI vendors will often compete on price, grahics appeal, etc etc, but the easy Garbage In Garbage Out rule is something they should consider. The Data Quality Interviews gave me an important tutorial in these aspects of data management.

1) Jim Harris

Jim is an one man army when it comes to evangelizing data quality and his OCDQ blog is widely read and cited.

2) Steve Sarsfield

His excellent book is the one must read item that people in cost cutting corporations should buy especially if they are considering to go down the Davenport competing on analytics model.

( To be continued- Part 3 Modeling and Text Mining

Part 4 Social Media

Part 5 Humour and Poetry )

The Top DecisionStats Articles -Part 1 Analytics

I was just looking at my web analytics numbers and we seem to have crossed some milestones.

The site has now gotten more than 50,000 views since being launched in Dec 2007.

Thank you everyone for your help in this. More importantly the quality of comments has been fabulous. Since I am out of ideas for the rest of the week- here is a best of posts collection.
Here are some of the most favorite articles as measured by number of page views. I have personal fovurites as well, but these are just the ranks as per page views and how they measure up.

Top 5 Interviews

1) Interviews with SAS Institute leaders- I have found generally great professionalism from SAS Institute people. This is surprising because comin from an open source background, SAS is often looked as a big brother. I find that more of a perception and less of a reality as the company continues to innovate.

a) with John Sall, founder SAS Institute- This is really the biggest interview I did in terms of the person involved. To my surprise ( I wasnt expecting John to say yes) the interview was really frank, and it came very fast. The answers seem to be written by John himself.

Quote- Quantitative fields can be fairly resistant to recession- John Sall.

b) Interview with Anne Milley, Director, Product Marketing , SAS Institute- This is a favourite because it came very soon after the NYTimes article on R etc. One of my personal opinions is that the difference between great and good leaders is often the fact that great leaders are humble enough  to learn and then build on their strengths. It ran in two parts- and I was really appreciative of the in-depth answers that Anne wrote.


Analytics continues to be our middle name.

Customers vote with the cheque book.

Continue reading “The Top DecisionStats Articles -Part 1 Analytics”

Experimental Ad AuDio-Video

As an experiment I will be putting Random Images /U Tube songs in the next 7 posts/ post this week. This would be viewable only by reaching my site and not the RSS ( now restored to full rather than summary).

Let me know if the server hangs ( sigh!!) or if you find them distracting.

or if you know a better song.

So what happened to S Plus

Splus – The corporate version of S ( the predecessor of R) is still being marketed by Tibco corporation- again rumoured to be an acquisition target of  (???)

  • SAS ( who have desired R like capabilties especially in their IML  product to be released soon
  • SAP who lost out to IBM in the SPSS acquisition
  • Oracle
  • Microsoft
  • Rogue Wave (acquirer of Visual Numerics)
  • etc etc.

Anyways S Plus is still alive and kicking-

“The S language and the S+ application have been critical to our ability to manage big data objects intrinsic to wind analytics and wind energy development,” said Brad Horn, Director of Wind Analytics at NextEra Energy.  “We credit our long-term interface and Spotfire consulting with unlocking new ideas and sources of value.  Joint dialogue on configuration alternatives and our recent efforts to restructure legacy code is allowing us to transition from simple interactive use of S+ to a customized S+ configuration with integrated batch processing, server load balancing, and parallel processing.  S+ has a central role in supporting internal decisions and our group emphasis on scale, speed, and quality.”

  • Wavelets, Spatial Stats, EnvironmentalStats: Apply statistics for advanced analysis of signal and image data, spatially correlated data, and environmental data.
  • Resampling: Apply resampling techniques, such as bootstrap and permutation tests, to enable the use of standard statistics on smaller data sets.
  • Association Rules: Uncover relationships between variables in large data sets, most commonly to detect purchase patterns (Market Basket Analysis), or in many other areas like web site usage analysis.
  • Recode Values: Easily handle and prepare data from multiple sources by changing the values in a column to a new value.
  • Deployment and Integration:

    • Spotfire Integration: Read and write Spotfire Text Data files, and leverage examples of using Spotfire Professional to visualize, explore and share model results.
    • Custom Java & C++ nodes: Extend Spotfire Miner by writing custom nodes in Java and C++.
    • Remote Script Execution: Execute S+ scripts remotely on S+ Server to offload and distribute intensive jobs.
    • Global Worksheet Parameters: Make workflows more flexible and reusable to interactive and batch applications.
    • FlexBayes: Create more realistic models, provide a natural way to address missing data, and take advantage of prior analysis.

    Data Access and Preparation:

    • New Data File Types: Unlock more data sources by reading new formats including Spotfire Text Data, Microsoft Excel 2007, Microsoft Access 2007, and Matlab 7.
    • JDBC Access: Access new data sources for analysis with data import and export via the sjdbc library in Spotfire S+ 8.1.