2011 Forecast-ying

Free twitter badge
Image via Wikipedia

I had recently asked some friends from my Twitter lists for their take on 2011, atleast 3 of them responded back with the answer, 1 said they were still on it, and 1 claimed a recent office event.

Anyways- I take note of the view of forecasting from

http://www.uiah.fi/projekti/metodi/190.htm

The most primitive method of forecasting is guessing. The result may be rated acceptable if the person making the guess is an expert in the matter.

Ajay- people will forecast in end 2010 and 2011. many of them will get forecasts wrong, some very wrong, but by Dec 2011 most of them would be writing forecasts on 2012. almost no one will get called on by irate users-readers- (hey you got 4 out of 7 wrong last years forecast!) just wont happen. people thrive on hope. so does marketing. in 2011- and before

and some forecasts from Tom Davenport’s The International Institute for Analytics (IIA) at

http://iianalytics.com/2010/12/2011-predictions-for-the-analytics-industry/

Regulatory and privacy constraints will continue to hamper growth of marketing analytics.

(I wonder how privacy and analytics can co exist in peace forever- one view is that model building can use anonymized data suppose your IP address was anonymized using a standard secret Coco-Cola formula- then whatever model does get built would not be of concern to you individually as your privacy is protected by the anonymization formula)

Anyway- back to the question I asked-

What are the top 5 events in your industry (events as in things that occured not conferences) and what are the top 3 trends in 2011.

I define my industry as being online technology writing- research (with a heavy skew on stat computing)

My top 5 events for 2010 were-

1) Consolidation- Big 5 software providers in BI and Analytics bought more, sued more, and consolidated more.  The valuations rose. and rose. leading to even more smaller players entering. Thus consolidation proved an oxy moron as total number of influential AND disruptive players grew.

 

2) Cloudy Computing- Computing shifted from the desktop but to the mobile and more to the tablet than to the cloud. Ipad front end with Amazon Ec2 backend- yup it happened.

3) Open Source grew louder- yes it got more clients. and more revenue. did it get more market share. depends on if you define market share by revenues or by users.

Both Open Source and Closed Source had a good year- the pie grew faster and bigger so no one minded as long their slices grew bigger.

4) We didnt see that coming –

Technology continued to surprise with events (thats what we love! the surprises)

Revolution Analytics broke through R’s Big Data Barrier, Tableau Software created a big Buzz,  Wikileaks and Chinese FireWalls gave technology an entire new dimension (though not universally popular one).

people fought wars on emails and servers and social media- unfortunately the ones fighting real wars in 2009 continued to fight them in 2010 too

5) Money-

SAP,SAS,IBM,Oracle,Google,Microsoft made more money than ever before. Only Facebook got a movie named on itself. Venture Capitalists pumped in money in promising startups- really as if in a hurry to park money before tax cuts expired in some countries.

 

2011 Top Three Forecasts

1) Surprises- Expect to get surprised atleast 10 % of the time in business events. As internet grows the communication cycle shortens, the hype cycle amplifies buzz-

more unstructured data  is created (esp for marketing analytics) leading to enhanced volatility

2) Growth- Yes we predict technology will grow faster than the automobile industry. Game changers may happen in the form of Chrome OS- really its Linux guys-and customer adaptability to new USER INTERFACES. Design will matter much more in technology on your phone, on your desktop and on your internet. Packaging sells.

False Top Trend 3) I will write a book on business analytics in 2011. yes it is true and I am working with A publisher. No it is not really going to be a top 3 event for anyone except me,publisher and lucky guys who read it.

3) Creating technology and technically enabling creativity will converge at an accelerated rate. use of widgets, guis, snippets, ide will ensure creative left brains can code easier. and right brains can design faster and better due to a global supply chain of techie and artsy professionals.

 

 

Gartner BI and Inf Mgmt Summit 2011- 30 min One on Ones

From the land Down Under, where Gartner gathers business summit thunder.

http://www.gartner.com/technology/summits/apac/business-intelligence/index.jsp

Gartner Business Intelligence
& Information Management Summit 2011

February 22 – 23 • Sydney, AUSTRALIA
gartner.com/ap/bi

Register Now

From Information to Intelligence:

Evaluate, Execute and Evolve

At Gartner Business Intelligence & Information Management Summit 2011 you will experience a unique mix of Gartner research presentations, guest keynote addresses, real-life case studies and interactive panel discussions to provide you with a holistic view of the business intelligence and performance management landscape. Information, insight and advice are channeled through an increasingly targeted and focused approach, taking you from the high-level strategic view all the way to your specific issue.

Click here to view the full agenda or download the brochure.

AGENDA HIGHLIGHTS

teamsend


Guest Keynote Address

Future Thinking – Global Trends and Thinking that are Upending your Business

Anders Sorman-Nilsson
Creative Director, Thinque

Click here to read more about this session.

Best Practice Workshops:

  • How to Become an Effective Data Warehouse Modeler
  • Analytics – Business Intelligence and Performance Management ITScore

Analyst User Roundtables:

  • Enterprise Information Management – Focusing on What Matters to the Business
  • Sharepoint – thin edge of the wedge to the MS family
  • Preparing for the 2020 workplace

Worldwide Expertise at Your Fingertips!
Your questions on Business Intelligence and Performance Management answered. Meet the Gartner Analysts presenting at the Summit and book your exclusive 30 minute one-on-one ( lap top dance) with the Analysts of your choice.

Zen and the art of applying T tests to Spam Data

Decisionstats traffic seemed up mmm but Spam is way way up

Whos spamming my dear bloggie

hmm

is it the russians doing a link spam. unlikely they dont bot against Akismet that much (as they fail)

And Captcha can be failed by python (apparently. sigh)

Is there a co relation of certain tags of posts, and count of spam- hoping to distort say blogs’s search engine rankings for SAS WPS Lawsuit in Google or jet ski across  pacific in Google.

Sigh- an old retired outlaw black hat is never kept in peace. Try doing a blog search for R in Google- Revo  is now down to number 7 (which is hmm given Google Instant)

Of course I think too much about SEO, but I dont run CPC ads- I made much more money when traffic is low – say 5-10 small businesses needing to forecast their sales .

and enjoy your Thanksgiving. Remember the Indians bring the Turkeys.

 

Interview Michael J. A. Berry Data Miners, Inc

Here is an interview with noted Data Mining practitioner Michael Berry, author of seminal books in data mining, noted trainer and consultantmjab picture

Ajay- Your famous book “Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management” came out in 2004, and an update is being planned for 2011. What are the various new data mining techniques and their application that you intend to talk about in that book.

Michael- Each time we do a revision, it feels like writing a whole new book. The first edition came out in 1997 and it is hard to believe how much the world has changed since then. I’m currently spending most of my time in the on-line retailing world. The things I worry about today–improving recommendations for cross-sell and up-sell,and search engine optimization–wouldn’t have even made sense to me back then. And the data sizes that are routine today were beyond the capacity of the most powerful super computers of the nineties. But, if possible, Gordon and I have changed even more than the data mining landscape. What has changed us is experience. We learned an awful lot between the first and second editions, and I think we’ve learned even more between the second and third.

One consequence is that we now have to discipline ourselves to avoid making the book too heavy to lift. For the first edition, we could write everything we knew (and arguably, a bit more!); now we have to remind ourselves that our intended audience is still the same–intelligent laymen with a practical interest in getting more information out of data. Not statisticians. Not computer scientists. Not academic researchers. Although we welcome all readers, we are primarily writing for someone who works in a marketing department and has a title with the word “analyst” or “analytics” in it. We have relaxed our “no equations” rule slightly for cases when the equations really do make things easier to explain, but the core explanations are still in words and pictures.

The third edition completes a transition that was already happening in the second edition. We have fully embraced standard statistical modeling techniques as full-fledged components of the data miner’s toolkit. In the first edition, it seemed important to make a distinction between old, dull, statistics, and new, cool, data mining. By the second edition, we realized that didn’t really make sense, but remnants of that attitude persisted. The third edition rectifies this. There is a chapter on statistical modeling techniques that explains linear and logistic regression, naive Bayes models, and more. There is also a brand new chapter on text mining, a curious omission from previous editions.

There is also a lot more material on data preparation. Three whole chapters are devoted to various aspects of data preparation. The first focuses on creating customer signatures. The second is focused on using derived variables to bring information to the surface, and the third deals with data reduction techniques such as principal components. Since this is where we spend the greatest part of our time in our work, it seemed important to spend more time on these subjects in the book as well.

Some of the chapters have been beefed up a bit. The neural network chapter now includes radial basis functions in addition to multi-layer perceptrons. The clustering chapter has been split into two chapters to accommodate new material on soft clustering, self-organizing maps, and more. The survival analysis chapter is much improved and includes material on some of our recent application of survival analysis methods to forecasting. The genetic algorithms chapter now includes a discussion of swarm intelligence.

Ajay- Describe your early career and how you came into Data Mining as a profession. What do you think of various universities now offering MS in Analytics. How do you balance your own teaching experience with your consulting projects at The Data Miners.

Michael- I fell into data mining quite by accident. I guess I always had a latent interest in the topic. As a high school and college student, I was a fan of Martin Gardner‘s mathematical games in in Scientific American. One of my favorite things he wrote about was a game called New Eleusis in which one players, God, makes up a rule to govern how cards can be played (“an even card must be followed by a red card”, say) and the other players have to figure out the rule by watching what plays are allowed by God and which ones are rejected. Just for my own amusement, I wrote a computer program to play the game and presented it at the IJCAI conference in, I think, 1981.

That paper became a chapter in a book on computer game playing–so my first book was about finding patterns in data. Aside from that, my interest in finding patterns in data lay dormant for years. At Thinking Machines, I was in the compiler group. In particular, I was responsible for the run-time system of the first Fortran Compiler for the CM-2 and I represented Thinking Machines at the Fortran 8X (later Fortran-90) standards meetings.

What changed my direction was that Thinking Machines got an export license to sell our first machine overseas. The machine went to a research lab just outside of Paris. The connection machine was so hard to program, that if you bought one, you got an applications engineer to go along with it. None of the applications engineers wanted to go live in Paris for a few months, but I did.

Paris was a lot of fun, and so, I discovered, was actually working on applications. When I came back to the states, I stuck with that applied focus and my next assignment was to spend a couple of years at Epsilon, (then a subsidiary of American Express) working on a database marketing system that stored all the “records of charge” for American Express card members. The purpose of the system was to pick ads to go in the billing envelope. I also worked on some more general purpose data mining software for the CM-5.

When Thinking Machines folded, I had the opportunity to open a Cambridge office for a Virginia-based consulting company called MRJ that had been a major channel for placing Connection Machines in various government agencies. The new group at MRJ was focused on data mining applications in the commercial market. At least, that was the idea. It turned out that they were more interested in data warehousing projects, so after a while we parted company.

That led to the formation of Data Miners. My two partners in Data Miners, Gordon Linoff and Brij Masand, share the Thinking Machines background.

To tell the truth, I really don’t know much about the university programs in data mining that have started to crop up. I’ve visited the one at NC State, but not any of the others.

I myself teach a class in “Marketing Analytics” at the Carroll School of Management at Boston College. It is an elective part of the MBA program there. I also teach short classes for corporations on their sites and at various conferences.

Ajay- At the previous Predictive Analytics World, you took a session on Forecasting and Predicting Subsciber levels (http://www.predictiveanalyticsworld.com/dc/2009/agenda.php#day2-6) .

It seems inability to forecast is a problem many many companies face today. What do you think are the top 5 principles of business forecasting which companies need to follow.

Michael- I don’t think I can come up with five. Our approach to forecasting is essentially simulation. We try to model the underlying processes and then turn the crank to see what happens. If there is a principal behind that, I guess it is to approach a forecast from the bottom up rather than treating aggregate numbers as a time series.

Ajay- You often partner your talks with SAS Institute, and your blog at http://blog.data-miners.com/ sometimes contain SAS code as well. What particular features of the SAS software do you like. Do you use just the Enterprise Miner or other modules as well for Survival Analysis or Forecasting.

Michael- Our first data mining class used SGI’s Mineset for the hands-on examples. Later we developed versions using Clementine, Quadstone, and SAS Enterprise Miner. Then, market forces took hold. We don’t market our classes ourselves, we depend on others to market them and then share in the revenue.

SAS turned out to be much better at marketing our classes than the other companies, so over time we stopped updating the other versions. An odd thing about our relationship with SAS is that it is only with the education group. They let us use Enterprise Miner to develop course materials, but we are explicitly forbidden to use it in our consulting work. As a consequence, we don’t use it much outside of the classroom.

Ajay- Also any other software you use (apart from SQL and J)

Michael- We try to fit in with whatever environment our client has set up. That almost always is SQL-based (Teradata, Oracle, SQL Server, . . .). Often SAS Stat is also available and sometimes Enterprise Miner.

We run into SPSS, Statistica, Angoss, and other tools as well. We tend to work in big data environments so we’ve also had occasion to use Ab Initio and, more recently, Hadoop. I expect to be seeing more of that.

Biography-

Together with his colleague, Gordon Linoff, Michael Berry is author of some of the most widely read and respected books on data mining. These best sellers in the field have been translated into many languages. Michael is an active practitioner of data mining. His books reflect many years of practical, hands-on experience down in the data mines.

Data Mining Techniques cover

Data Mining Techniques for Marketing, Sales and Customer Relationship Management

by Michael J. A. Berry and Gordon S. Linoff
copyright 2004 by John Wiley & Sons
ISB

Mining the Web cover

Mining the Web

by Michael J.A. Berry and Gordon S. Linoff
copyright 2002 by John Wiley & Sons
ISBN 0-471-41609-6

Non-English editions available in Traditional Chinese and Simplified Chinese

This book looks at the new opportunities and challenges for data mining that have been created by the web. The book demonstrates how to apply data mining to specific types of online businesses, such as auction sites, B2B trading exchanges, click-and-mortar retailers, subscription sites, and online retailers of digital content.

Mastering Data Mining

by Michael J.A. Berry and Gordon S. Linoff
copyright 2000 by John Wiley & Sons
ISBN 0-471-33123-6

Non-English editions available in JapaneseItalianTraditional Chinese , and Simplified Chinese

A case study-based guide to applying data mining techniques for solving practical business problems. These “warts and all” case studies are drawn directly from consulting engagements performed by the authors.

A data mining educator as well as a consultant, Michael is in demand as a keynote speaker and seminar leader in the area of data mining generally and the application of data mining to customer relationship management in particular.

Prior to founding Data Miners in December, 1997, Michael spent 8 years at Thinking Machines Corporation. There he specialized in the application of massively parallel supercomputing techniques to business and marketing applications, including one of the largest database marketing systems of the time.