The world of Predictive Analytics: It's back

PAWS 2009 is back with a slam dunk line up of sponsors, and keynote speakers.

The deadline for early bord registration ends on Sept 4.

What’s holding you back?

https://www.eiseverywhere.com/ereg/index.php?eventid=5215&PHPSESSID=36nq7po4hjoasvkcsv0tm5ppj3&

Pricing
Predictive Analytics World Fall 2009

Includes breakfasts, lunches, priceless networking during coffee breaks, the PAW Reception, and full access to program sessions and sponsor expositions.

Early Bird Price
(July 1 – Sept 4)
Regular     Price

Two Day Pass
(Oct 20-21)

$1390 $1590

Predictive Modeling Methods Workshop
(Oct 22)

$795 $895

Putting Predictive Analytics to Work
(Oct 19)

$795 $895

Hands-On Predictive Analytics
(Oct 19)

$795 $895

paws

Disclaimer- I have no monetary transactions with PAW conference but as a blog partner get access to interviews , book review or content.

How they stack up: IDC on Business Analytics

So here is intelligent enterprise on the latest IDC rankings on Business Intelligence and Business Analytics vendors. If you ever wondered how big the bog boys were- read it at

Citation:

http://www.intelligententerprise.com/info_centers/ent_dev/showArticle.jhtml;jsessionid=QL4IYMWB1MSIHQE1GHPSKHWATMY32JVN?articleID=219401120

In 2008, Oracle led the overall market, followed in order by SAP, IBM, SAS and Microsoft, the report said. Rounding out the top 10 were Teradata, Fair Isaac, Informatica, Infor and MicroStrategy, respectively

and

IDC divides the business analytics software market into four primary segments: analytic applications, business intelligence tools, data warehousing platform software and spatial information analytics tools.

and

Fourth-place SAS’ broad portfolio spans all business analytics market segments and is exclusively dedicated to this market. “The company leads in the advanced analytics tools segment and is within the top two vendors in two other market segments,”IDC said.

It’s a brilliant analysis and survey. IDC and Intelligent Enterprise- thanks a tonne for letting us know.

Best of Decision Stats- Modeling and Text Mining Part3

Here are some of the top articles by way of views, in an  area I love– of modeling and text mining.

1) Karl Rexer – Rexer Analytics

http://www.decisionstats.com/2009/06/09/interview-karl-rexer-rexer-analytics/

Karl produces one of the most respected surveys that captures emerging trends in data mining and technology. Karl was also one of the most enthusiastic people I have interviewed- and I am thankful for his help in getting me some more interviews.

2) Gregory Piatesky Shapiro

One of the earliest and easily the best Knowledge Discoverer of all times, Gregory produces http://www.kdnuggets.com and the newsletter is easily the must newsletter to be on. Gregory was doing data mining , while the Google boys were still debating whether to drop out of Stanford or not.
Continue reading “Best of Decision Stats- Modeling and Text Mining Part3”

So what happened to S Plus

Splus – The corporate version of S ( the predecessor of R) is still being marketed by Tibco corporation- again rumoured to be an acquisition target of  (???)

  • SAS ( who have desired R like capabilties especially in their IML  product to be released soon
  • SAP who lost out to IBM in the SPSS acquisition
  • Oracle
  • Microsoft
  • Rogue Wave (acquirer of Visual Numerics)
  • etc etc.

Anyways S Plus is still alive and kicking-

“The S language and the S+ application have been critical to our ability to manage big data objects intrinsic to wind analytics and wind energy development,” said Brad Horn, Director of Wind Analytics at NextEra Energy.  “We credit our long-term interface and Spotfire consulting with unlocking new ideas and sources of value.  Joint dialogue on configuration alternatives and our recent efforts to restructure legacy code is allowing us to transition from simple interactive use of S+ to a customized S+ configuration with integrated batch processing, server load balancing, and parallel processing.  S+ has a central role in supporting internal decisions and our group emphasis on scale, speed, and quality.”

http://spotfire.tibco.com/news/press-releases/2009/2_17_2009.aspx

  • Wavelets, Spatial Stats, EnvironmentalStats: Apply statistics for advanced analysis of signal and image data, spatially correlated data, and environmental data.
  • Resampling: Apply resampling techniques, such as bootstrap and permutation tests, to enable the use of standard statistics on smaller data sets.
  • Association Rules: Uncover relationships between variables in large data sets, most commonly to detect purchase patterns (Market Basket Analysis), or in many other areas like web site usage analysis.
  • Recode Values: Easily handle and prepare data from multiple sources by changing the values in a column to a new value.
  • Deployment and Integration:

    • Spotfire Integration: Read and write Spotfire Text Data files, and leverage examples of using Spotfire Professional to visualize, explore and share model results.
    • Custom Java & C++ nodes: Extend Spotfire Miner by writing custom nodes in Java and C++.
    • Remote Script Execution: Execute S+ scripts remotely on S+ Server to offload and distribute intensive jobs.
    • Global Worksheet Parameters: Make workflows more flexible and reusable to interactive and batch applications.
    • FlexBayes: Create more realistic models, provide a natural way to address missing data, and take advantage of prior analysis.

    Data Access and Preparation:

    • New Data File Types: Unlock more data sources by reading new formats including Spotfire Text Data, Microsoft Excel 2007, Microsoft Access 2007, and Matlab 7.
    • JDBC Access: Access new data sources for analysis with data import and export via the sjdbc library in Spotfire S+ 8.1.

    Citation:

    http://spotfire.tibco.com/Products/S-Plus-Overview.aspx

    http://spotfire.tibco.com/Products/Whatsnew-Splus.aspx


    Interview Gregory Piatetsky KDNuggets.com

    Here is an interviw with Gregory Piatetsky, founder and editor of KDNuggets (www.KDnuggets.com ) ,the oldest and biggest independent industry websites in terms of data mining and analytics-

    gps6

    Ajay- Please describe your career in science, many challenges and rewards that came with it. Name any scientific research, degrees teaching etc.


    Gregory-
    I was born in Moscow, Russia and went to a top math high-school in Moscow. A unique  challenge for me was that my father was one of leading mathematicians in Soviet Union.  While I liked math (and still do), I quickly realized while still in high school that  I will never be as good as my father, and math career was not for me.

    Fortunately, I discovered computers and really liked the process of programming and solving applied problems.  At that time (late 1970s) computers were not very popular and it was not clear that one can make a career in computers.  However I was very lucky that I was able to pursue what I liked and find demand for my skills.

    I got my MS in 1979 and PhD in 1984 in Computer Science from New York University.
    I was interested in AI (perhaps thanks to a lot of science fiction I read as a kid), but found a job in databases, so I was looking for ways to combine them.

    In 1984 I joined GTE Labs where I worked on research in databases and AI, and in 1989 started the first project on Knowledge Discovery in data. To help convince my management that there will be a demand for this thing
    called “data mining” (GTE management did not see much future for it), I also organized a AAAI workshop on the topic.

    I thought “data mining” is not sexy enough name, and so I called it “Knowledge Discovery in Data”, or KDD.  Since 1989, I was working on KDD and data mining in all aspects – more on my page www.kdnuggets.com/gps.html

    Ajay-  How would you encourage a young science entrepreneur in this recession.

    Gregory- Many great companies were started or grew in a recession, e.g.
    http://www.insidecrm.com/features/businesses-started-slump-111108/

    Recession may be compared to a brush fire which removes dead wood and allows new trees to grow.

    Ajay- What prompted you to set up KD Nuggets? Any reasons for the name (kNowledge Discovery Nuggets). Describe some key milestones in this iconic website for data mining people.

    Gregory- After a third KDD workshop in 1993 I started a newsletter to connect about 50 people who attended the workshop and possibly others who were interested in data mining and KDD.  The idea was that it will have short items or “nuggets” of information. Also, at that time a popular metaphor for data miner was gold miners who were looking for gold “nuggets”.  So, I wanted a newsletter with “nuggets” – short, valuable items about Knowledge Discovery.  Thus, the name KDnuggets.

    In 1994 I created a website on data mining at GTE and in 1997, after I left  GTE , I moved it to the current domain name www.kdnuggets.com .

    In 1999, I was working for startup which provided data mining services to financial industry.  However, because of Y2K issues, all banks etc froze their systems in the second half of 1999, and we had very little work (and our salaries were reduced as well).  I decided that I will try to get some ads and was able to get companies like SPSS and Megaputer to advertise.

    Since 2001, I am an independent consultant and KDnuggets is only part of what I am doing.  I also do data mining consulting, and actively participate in SIGKDD (Director 1998-2005, Chair 2005-2009).

    Some people think that KDnuggets is a large company, with publisher, webmaster, editor, ad salesperson, billing dept, etc.  KDnuggets indeed has all this functions, but it is all me and my two cats.

    Ajay- I am impressed by the fact KD nuggets is almost a dictionary or encyclopedia for data mining. But apart from advertising you have not been totally commercial- many features of your newsletter remain ad free – you still maintain a minimalistic look and do not take sponsership aligned with one big vendor. What is your vision for KD Nuggets for the years to come to keep it truly independent.

    Gregory- My vision for KDnuggets is to be a comprehensive resource for data mining community, and I really enjoyed maintaining such resource for the first 7-8 years completely non-commercially. However, when I became self -employed, I could not do KDnuggets without any income, so I selectively introduced ads, and only those which are relevant to data mining.

    I like to think of KDnuggets as a Craiglist for data mining community.

    I certainly realize the importance of social media and Web 2.0 (and interested people can follow my tweets at tweeter.com/kdnuggets)  and plan to add more social features to KDnuggets.

    Still, just like Wikipedia and Facebook do not make New York Times obsolete, I think there is room and need for an edited website, especially for such a nerdy and not very social group like data miners.

    Ajay- What is the worst mistake/error in writing publishing that you did. What is the biggest triumph or high moment in the Nuggets history.

    Gregory- My biggest mistake is probably in choosing the name kdnuggets – in retrospect,  I could have used a shorter and easier to spell domain name, but in 1997 I never expected that I will still be publishing www.KDnuggets.com 12 years later.

    Ajay- Who are your favourite data mining students ( having known so many people). What qualities do you think set a data mining person apart from other sceinces.

    Gregory- I was only an adjunct professor for a short time, so I did not really have data mining students, but I was privileged enough to know many current data mining leaders when they were students.  Among more recent students, I am very impressed with Jure Leskovec, who just finished his PhD and got the best KDD dissertation award.

    Ajay- What does Gregory Piatetsky do for fun when he is not informing the world on analytics and knowledge discovery.

    Gregory- I enjoy travelling with my family, and in the summer I like biking and windsurfing.
    I also read a lot, and currently in the middle of reading Proust (which I periodically dilute by other, lighter books).

    Ajay- What is your favourite reading blog and website ? Any India plans to visit.
    Gregory
    – I visit many blogs on www.kdnuggets.com/websites/blogs.html

    and I like especially
    – Matthew Hurst blog: Data Mining: Text Mining, Visualization, and Social Media
    – Occam’s Razor by Avinash Kaushik, examining web analytics.
    – Juice Analytics, blogging about analytics and visualization
    – Geeking with Greg, exploring the future of personalized information.

    I also like your website decisionstats.com and plan to visit it more frequently

    I visited many countries, but not yet India – waiting for the right occasion !

    Biography

    (http://www.kdnuggets.com/gps.html)

    Gregory Piatetsky-Shapiro, Ph.D. is the President of KDnuggets, which provides research and consulting services in the areas of data mining, web mining, and business analytics. Gregory is considered to be one of the founders of the data mining and knowledge discovery field.Gregory edited or co-edited many collections on data mining and knowledge discovery, including two best-selling books: Knowledge Discovery in Databases (AAAI/MIT Press, 1991) and Advances in Knowledge Discovery in Databases (AAAI/MIT Press, 1996), and has over 60 publications in the areas of data mining, artificial intelligence and database research.

    Gregory is the founder of Knowledge Discovery in Database (KDD) conference series. He organized and chaired the first three Knowledge Discovery in Databases (KDD) workshops in 1989, 1991, and 1993. He then served as the Chair of KDD Steering committee and guided the conversion of KDD workshops into leading international conferences on data mining. He also was the General Chair of the KDD-98 conference.

    Social Network Analysis: Using R

    Here is a great video and slides on doing statistical network analysis using R. It is by Drew Conway from NYU.

    Social Network Analysis in R from Drew Conway on Vimeo.