PAWS goes to SF

Conference :Message on Linkedin groupof Decisionstats

 

[tweetmeme source=”decisionstats”]

Predictive Analytics World, Feb 16-17 in San Francisco

The agenda for Predictive Analytics World – Feb. 16-17 2010 in San Francisco – has been posted: http://www.pawcon.com/sanfrancisco/2010/agenda_overview.php

February’s PAW covers hot topics and advanced methods such as social data, uplift modeling (net lift), text mining, massively parallel analytics, in-cloud deployment, and innovative applications that benefit organizations in new and creative ways.

Be sure to register by December 18 for the Super Early Bird to save $400 off the Regular Price:
http://www.predictiveanalyticsworld.com/register.php

And take an additional $50 off the Super Early Bird with discount code: LIN150

Below is some more info – let me know if you have any questions.

-Eric Siegel, Conference Chair

———–

PAW-2010 includes 25 sessions across two tracks, so you can witness how predictive analytics is applied at 1-800-FLOWERS, Amazon.com, AT&T, BBC, Canadian Automobile Association, Charles Schwab, Continental Airlines, Deutsche Postbank, Google, Group RCI, IBM, PASSUR Aerospace, PayPal (eBay), Sun Microsystems, U.S. Army, Visa, Walmart Financial Services, and Younoodle, plus special examples from the U.S. government agencies CBP, NCMI, NGIC, NSA, and SSA.

Keynote speakers include Kim Larsen, Director Advanced Analytics at Charles Schwab, Andreas S. Weigend, Ph.D., Former Chief Scientist at Amazon.com, and Program Chair Eric Siegel, Ph.D., President of Prediction Impact and former Columbia University professor.

Predictive Analytics World is the business-focused event for predictive analytics professionals, managers and commercial practitioners, covering today’s commercial deployment of predictive analytics, across industries and across software vendors.

For more information, including three pre- and post-event workshops:
http://www.predictiveanalyticsworld.com

Losing a Million Bucks: Netflix Prize Interview

I ( and collective pseudo geeks) across the world lost a potential million dollars when the following team won the Netflix prize. In disgust, I just renewed my Netflix subscription and noticed a 10% increase in the way I liked them.

Jokes apart, here is an except ( perhaps one of the few ever) of an interview of the Netflix winners done by the great Eric Siegel, Phd.

Eric is conference chair of the Predictive Analytics Conference ( a King Arthur’s round table conference on all the shining knights of the data analytic’s world)

Citation-http://www.predictiveanalyticsworld.com/layman-netflix-leader.php

[ES] With no relevant background in statistics — let alone product recommendations specifically — what capabilities or background did make your success possible? Do you consider yourselves mathematicians, or at least strong with math?

[MC] I am certainly not a mathematician – I have engineering level skill. I consider Martin Piotte to have an exceptional mathematical mind (he participated successfully in international math contests when he was a student) even though he never formally studied in that field. In the end, the mathematics used in this contest seem very complex, but are really rather simple. Compared to what most people think, this was more of an engineering contest than a mathematical contest [See Martin’s response below for elaboration on this central point. -Ed]. Also, I think that having a perhaps less in-depth but wider array of skills and knowledge helped us.

[ES] You’ve said, when first getting started, you learned many core strategies/techniques from the Netflix Prize discussion board. Did you do much reading or research elsewhere to ramp up?

[MC] Having started late in the competition, the forum was a good starting point as many avenues had already been explored and links had been posted to many interesting papers. In the end though, reading and getting a good understanding of the actual research papers was a very important step. The forum was also a place where people proposed new (sometimes far fetched) ideas; these ideas often inspired us to come up with our own creative innovations.

PAWS is a great place to meet, greet and do business and though it is 5 hours away I have too much homework to do and grade while at University of Tennessee ( for now)-

Here is a very interesting poll that they are carrying it is good to see conferences take feedback in such a transparent manner-

paws poll

Interview Neil Raden Founder of Hired Brains Inc

Here is an interview with one terrific person who has always inspired my writing ( or atleast my attempts to write) on data and systems. Neil Raden is a giant in the publishing and consulting space for business intelligence ,analytics, and decision management. In a nice interview Neil talks of his passion for his work, his prolific authoring of white papers, his seminal work with James Taylor and how he sees the BI space evolve.

The history of BI pretty much follows the history of computing platforms. First we had time-sharing, then mainframes, then mini’s, then client-server vs. PC, then a number passes at distributed computing, such as CORBA, then SOA and now the cloud.- Neil Raden


Ajay- Describe your career in math and technology and your current activities. How would you explain what you do for a living to a group of high school students who are wondering to take up mathematical and technical subjects or not.

Neil- I didn’t earn a dime at the career I was meant for, consulting, until I was 33 years old. So I would tell college students not to be in such a hurry to corner themselves into a career. It may take a while to figure out what you really want.

Though I went to college to study theatre, within a few weeks I was inspired by a math professor and switched my major. From that point on, it was pads of paper and sharp pencils. I was totally in my own head with math. I never took a statistics course, or even differential equations, because I was consumed by discrete math (graph theory too), topology and logic and later game theory/economics.

When I went looking for a job in 1974, in the midst of a deep recession, I was confronted with the stark reality (in New York ) that I could be a COBOL programmer or an actuary. I chose the latter. Working at AIG in New York in the 70’s was pretty exciting. We broke new ground in commercial property and casualty insurance and reinsurance every week. I was part of a small R&D group under the chief actuary, who reported directed to Maurice Greenberg, the legendary (but now maligned) inventor of AIG, and I loved the work.

I had to go back and teach myself probability and statistics to get through the exams, but ultimately, two kids and one on the way in NYC on one not-so-great salary was a deal-breaker. I left AIG and joined a software company doing modeling and prediction. The rest, as they say, is history. I formed my own consulting company in 1985 and I’m still at it.

To me, consulting isn’t something you do between jobs or a title you get because you implement software for clients. Consulting is a craft, it’s a career and it is rather easy to do but very difficult to learn. I work very hard to teach this to people who work for me. It’s about commitment, hard work and, most of all, ethics and being authentic with your client.

Ajay- Writing books is a lonely yet rewarding work. Could you briefly elucidate on your recent book, Smart (Enough) Systems?

Neil- I have to credit my partner, James Taylor, with the concept for the book. He was working at Fair Isaac (now FICO) at the time and this was exactly what he was doing there. It was a little tangential to my work, but when James approached me, he said he wanted a partner who was proficient in the data integration and analytics aspects of EDM (Enterprise Decision Management).

James made it pretty easy because

1) he is very prolific and 2) he took most of my comments and integrated them without argument.

I’d say I was pretty lucky and it went very well. I don’t know if I’ll ever write another book. I suppose I won’t know until the idea hits me. I’m sure it will be more difficult doing it on my own.

Ajay- What are the various stages that you have seen the BI industry go through. What are the next few years going to bring to us-

What is your wishlist for changes the industry makes for better customer ROI.

Neil- The history of BI pretty much follows the history of computing platforms. First we had time-sharing, then mainframes, then mini’s, then client-server vs. PC, then a number passes at distributed computing, such as CORBA, then SOA and now the cloud. But while the locus of BI storage, computing and presentation has changed, it’s focus changes very slowly.

Historically, there have been two major subject areas in BI: f inance and sales/marketing, All of the other subject areas still rest on periphery.

Complex Event Processing ( CEP ) for example, is making a lot of noise lately , but not much implementation. Visualization is here to stay . When the BI app and the Web a pp are the same, BI will be everywhere, but it will be a sort of pyrrhic victory because it won’t be recognized as such. Now you can take all of this with a grain of salt because I don’t really follow the industry per se, I’m more interested in how my clients can apply the technology to get the results they need.

Ajay- There is a lot of buzz about predictive analytics lately. Do you think it will have a noticeable impact or is it just the latest thing?

Neil- There are only so many people who understand quant itative meth ods and it isn’t going to grow very much. This puts a damper on PA (Predictive Analytics) because no manager is going to act on the recommendations of a black box without an articulate quant who can explain the methodology and the limits of its precision.

That isn’t a bad thing, and those who practice in predictive analytics will prosper.

On the other hand, I believe there will be an expansion of the use of generic PA models that have been vetted in practice. The FICO score is a good example, and the ability to develop and implement these applications (it’s much easier now thanks to PA software and computing environments in general) should allow for a nice market to develop around them. This is especially true with decision automation systems, like logistics, material handling, credit authorization, etc.

Ajay- What were your most interesting projects as an implementer? Most rewarding?

Neil- Most Interesting: I was the Chairman of an Advisory Board at Sandia National Laboratories for a few years.Our goal was to encourage the lab to adopt more modern and effective information management tools for their dual purpose of

1) designing and manufacturing nuclear weapons (frightening isn’t it?) and

2) certification of nuclear waste repositories.

I was able to work with scientists, physicists, engineers, geologists and computer sciences, all from backgrounds very different from those I normally engaged. The problems were monumental.

Most rewarding: We developed a data warehouse to capture the daily sales of products at the most detailed level for a cosmetics company. They never had this information before because the retailers were counters in hundreds of department stores. Thus they were able for the first time to truly understand the “sell through” of their products. Beyond just allowing a better understanding of the flow, they could tailor their promotions and, not much later, implement a continuous replenishment system.

The president of the company came to the launch and explained how we had allowed the company to do things it had never done before which would change it for the better. You don’t get those accolades from the CEO very often.

Ajay- You’ve written forty white papers. That’s a lot. What impact do you think they’ve had?

Neil- I couldn’t tell you. I don’t track downloads, my website doesn’t even require registration. I don’t see them quoted or cited very often, but then, people don’t quote or cite other’s work in this field very often anyway. I can say that I have many repeat customers among the vendors, so they must be deriving some value from them.

Ajay- What are your views on creating a community for the top 100 BI analysts in the world – a bit like a Reuters or a partnership firm. How pleased do you think will BI vendors be by this.

Neil- I was actually involved in an effort like this about a dozen years ago, called BI Alliance . Doug Hackney and I started it, and we had about a dozen BI luminaries in the organization. I’ll try to remember some: Sid Adelman, David Marco, Richard Winter, David Foote, Herb Edelstein.

You could only join if you were an independent or the head of your own firm.

It was a useful marketing tool as we were able to 1) share references and 2) staff projects. But it sort of lost its inertia after a few years.

But a few hundred BI analysts? Are there that many?? LOL I don’t know how the vendors would react, but I sort of doubt this sort of organization would have any kind of clout – too many divergent opinions.

Ajay- Do you think the work you do matters?

Neil- It certainly has an economic impact on my family! LOL I don’t know, I hope it does and proportionate to my income versus the size of the industry, yes, I guess it does. Not necessarily directly though .

A company in Dayton or Macon doesn’t make a decision because I said so, but I think I do influence some analysts and vendor s a nd to the extent I influence them, then I guess I do . I limit my analysis to my clients. If they think this work matters, then it does.

Biography-

Neil Raden, consultant, analyst and author is followed by technology providers, consultants and even other analysts. His knowledge of the analytical applications is the result of thirty years of intensive work. He is the founder of Hired Brains, a research and advisory firm in Santa Barbara, CA, offering research and analysis services to technology providers as well as providing consulting and implementation services. Mr. Raden began his career as a casualty actuary with AIG before moving into software engineering and consulting in the application of analytics in fields as diverse as health care to nuclear waste management to cosmetics marketing. His blog can be found at intelligententerprise.com/experts/raden/. He is the author of dozens articles and white papers and he has has contributed to numerous books and is the co-author of “Smart (Enough) Systems” (Prentice Hall, 2007) with James Taylor. nraden@hiredbrains.com

Alternatively you can just follow Neil Raden at his twitter id neilraden

Interview SPSS Olivier Jouve

SPSS recently launched a major series of products in it’s text mining and data mining product portfolio and rebranded data mining to the PASW series. In an exclusive and extensive interview, Oliver Jouve Vice President,Corporate Development at SPSS Inc talks of science careers, the recent launches, open source support to R by SPSS, Cloud Computing and Business Intelligence.

Ajay: Describe your career in Science. Are careers in science less lucrative than careers in business development? What advice would you give to people re-skilling in the current recession on learning analytical skills?

Olivier: I have a Master of Science in Geophysics and Master of Science in Computer Sciences, both from Paris VI University. I have always tried to combine science and business development in my career as I like to experience all aspects � from idea to concept to business plan to funding to development to marketing to sales.

There was a study published earlier this year that said two of the three best jobs are related to math and statistics. This is reinforced by three societal forces that are converging � better uses of mathematics to drive decision making, the tremendous growth and storage of data, and especially in this economy, the ability to deliver ROI. With more and more commercial and government organizations realizing the value of Predictive Analytics to solve business problems, being equipped with analytical skills can only enhance your career and provide job security.

Ajay: So SPSS has launched new products within its Predictive Analytics Software (PASW) portfolio � Modeler 13 and Text Analytics 13? Is this old wine in a new bottle? What is new in terms of technical terms? What is new in terms of customers looking to mine textual information?

Olivier: Our two new products — PASW Modeler 13 (formerly Clementine) and PASW Text Analytics 13 (formerly Text Mining for Clementine) � extend and automate the power of data mining and text analytics to the business user, while significantly enhancing the productivity, flexibility and performance of the expert analyst.

PASW Modeler 13 data mining workbench has new and enhanced functionality that quickly takes users through the entire data mining process � from data access and preparation to model deployment. Some the newest features include Automated Data Preparation that conditions data in a single step by automatically detecting and correcting quality errors; Auto Cluster that gives users a simple way to determine the best cluster algorithm for a particular data set; and full integration with PASW Statistics (formerly SPSS Statistics).

With PASW Text Analytics 13, SPSS provides the most complete view of the customer through the combined analysis of text, web and survey data.   While other companies only provide the text component, SPSS couples text with existing structured data, permitting more accurate results and better predictive modeling. The new version includes pre-built categories for satisfaction surveys, advanced natural language processing techniques, and it supports more than 30 different languages.

Ajay: SPSS has supported open source platforms – Python and R � before it became fashionable to do so. How has this helped your company?

Olivier: Open source software helps the democratization of the analytics movement and SPSS is keen on supporting that democratization while welcoming open source users (and their creativity) into the analytics framework.

Ajay: What are the differences and similarities between Text Analytics and Search Engines? Can we mix the two as well using APIs?

Olivier: Search Engines are fundamentally top-down in that you know what you are looking for when launching a query. However, Text Analytics is bottom-up, uncovering hidden patterns, relationships and trends locked in unstructured data � including call center notes, open-ended survey responses, blogs and social networks. Now businesses have a way of pulling key concepts and extracting customer sentiments, such as emotional responses, preferences and opinions, and grouping them into categories.

For instance, a call center manager will have a hard time extracting why customers are unhappy and churn by using a search engine for millions of call center notes. What would be the query? But, by using Text Analytics, that same call center agent will discover the main reasons why customers are unhappy, and be able to predict if they are going to churn.

Ajay: Why is Text Analytics so important?  How will companies use it now and into the future?
Olivier –
Actually, the question you should ask is, “Why is unstructured data so important?” Today, more than ever, people love to share their opinions — through the estimated 183 billion emails sent, the 1.6 million blog posts, millions of inquiries captured in call center notes, and thousands of comments on diverse social networking sites and community message boards. And, let�s not forget all data that flows through Twitter. Companies today would be short-sighted to ignore what their customers are saying about their products and services, in their own words. Those opinions � likes and dislikes � are essential nuggets and bear much more insights than demographic or transactional data to reducing customer churn, improving satisfaction, fighting crime, detecting fraud and increasing marketing campaign results.

Ajay: How is SPSS venturing into cloud computing and SaaS?

Olivier: SPSS has been at the origin of the PMML standard to allow organizations to provision their computing power in a very flexible manner � just like provisioning computing power through cloud computing. SPSS strongly believes in the benefits of a cloud computing environment, which is why all of our applications are designed with Service Oriented Architecture components.  This enables SPSS to be flexible enough to meet the demands of the market as they change with respect to delivery mode.  We are currently analyzing business and technical issues related to SPSS technologies in the cloud, such as the scoring and delivery of analytics.  In regards to SaaS, we currently offer hosted services for our PASW Data Collection (formerly Dimensions) survey research suite of products.

Ajay: Do you think business intelligence is an over used term? Why do you think BI and Predictive Analytics failed in mortgage delinquency forecasting and reporting despite the financial sector being a big spender on BI tools?

Oliver: There is a big difference between business intelligence (BI) and Predictive Analytics. Traditional BI technologies focus on what�s happening now or what�s happened in the past by primarily using financial or product data. For organizations to take the most effective action, they need to know and plan for what may happen in the future by using people data � and that�s harnessed through Predictive Analytics.

Another way to look at it � Predictive covers the entire capture, predict and act continuum � from the use of survey research software to capture customer feedback (attitudinal data), to creating models to predict customer behaviors, and then acting on the results to improve business processes. Predictive Analytics, unlike BI, provides the secret ingredient and answers the question, �What will the customer do next?�

That being said, financial institutions didn�t need to use Predictive Analytics to see
that some lenders sold mortgages to unqualified individuals likely to default. Predictive Analytics is an incredible application used to detect fraud, waste and abuse. Companies in the financial services industry can focus on mitigating their overall risk by creating better predictive models that not only encompass richer data sets, but also better rules-based automation.

Ajay: What do people do at SPSS to have fun when they are not making complex mathematical algorithms?
Oliver: SPSS employees love our casual, friendly atmosphere, our professional and talented colleagues, and our cool, cutting-edge technology. The fun part comes from doing meaningful work with great people, across different groups and geographies. Of course being French, I have ensured that my colleagues are fully educated on the best wine and cuisine. And being based in Chicago, there is always a spirited baseball debate between the Cubs and White Sox. However, I am yet to convince anyone that rugby is a better sport.

Biography

Olivier Jouve is Vice President, Corporate Development, at SPSS Inc. He is responsible for defining SPSS strategic directions, growth opportunities through internal development, merger and acquisitions and/or tactical alliances. As a pioneer in the field of data and text mining for the last 20 years, he has created the foundation of Text Analytics technology for analyzing customer interactions at SPSS. Jouve is a successful serial entrepreneur and has had his works published internationally in the area of Analytical CRM, text mining, search engines, competitive intelligence and knowledge management.

KNIME and Zementis shake hands

Two very good and very customer centric (and open source ) companies shook hands on a strategic partnership today.

Knime  www.knime.org and Zementis www.zementis.com .

Decision Stats has been covering these companies and both the products are amazing good, synch in very well thanks to the support of the PMML standard and lower costs considerably for the consumer. (http://www.decisionstats.com/2009/02/knime/ ) and http://www.decisionstats.com/2009/02/interview-michael-zeller-ceozementis/ )

While Knime has both a free personal as well as a commercial license , it supports R thanks to the PMML (www.dmg.org initiative ). Knime also supports R very well .

See http://www.knime.org/blog/export-and-convert-r-models-pmml-within-knime

The following example R script learns a decision tree based on the Iris-Data and exports this as PMML and as an R model which is understood by the R Predictor node:

# load the library for learning a tree model
library(rpart);
# load the pmml export library
library(pmml);
# use class column as predicted column to build decision tree
dt <- rpart(class~., R)
# export to PMML
r_pmml <- pmml(dt)
# write the PMML model to an export file
write(toString(r_pmml), file="C:/R.pmml")
# provide the native R model at the out-port
R<-dt

 

Zementis takes the total cost of ownership and total pain of creating scored models to something close to 1$ /hour thanks to using their proprietary ADAPA engine.