Visual Guides to CRISP-DM ,KDD and SEMMA

UPDATED- Here are three great examples of a visualization making a process easy to understand. Please click on the images to read them clearly.

1) It visualizes CRISP-DM and is made by Nicole Leaper (


2) KDD -Knowledge Discovery in Databases -visualization by Fayyad whom I have interviewed here at

and work By Gregory Piatetsky Shapiro interviewed by this website here


3) I am also attaching a visual representation of SEMMA from



Knowledge Discovery in Databases -KDD using PostgreSQL and #Rstats

Here is a small brief primer for beginners on configuring an open source database and using an open source analytics package.

All you need to know – is to read!


1. download PostgreSQL from PostgreSQL

Remember to store /memorize the password for the user postgres!

Create a connection using pgAdmin feature in Start Menu

2. download ODBC driver from
and the Win 64 edition from

install ODBC driver

3. Go to

Start Menu\Control Panel\All Control Panel Items\Administrative Tools\Data Sources (ODBC)

4. Configure the following details in System DSN and  User DSN using the ADD tabs .Test connection to check if connection is working

5. Start R and install and load library RODBC

6. Use following initial code for R- if you know SQL you can  do the rest
> library(RODBC)

> odbcDataSources(type = c(“all”, “user”, “system”))
SQLServer              PostgreSQL30             PostgreSQL35W
“SQL Server”    “PostgreSQL ANSI(x64)” “PostgreSQL Unicode(x64)”

> ajay=odbcConnect(“PostgreSQL30”, uid = “postgres”, pwd = “XX”)

> sqlTables(ajay)
1        postgres      public      names      TABLE

> crimedat <- sqlFetch(ajay, “names”)

Interview Gregory Piatetsky

Here is an interviw with Gregory Piatetsky, founder and editor of KDNuggets ( ) ,the oldest and biggest independent industry websites in terms of data mining and analytics-


Ajay- Please describe your career in science, many challenges and rewards that came with it. Name any scientific research, degrees teaching etc.

I was born in Moscow, Russia and went to a top math high-school in Moscow. A unique  challenge for me was that my father was one of leading mathematicians in Soviet Union.  While I liked math (and still do), I quickly realized while still in high school that  I will never be as good as my father, and math career was not for me.

Fortunately, I discovered computers and really liked the process of programming and solving applied problems.  At that time (late 1970s) computers were not very popular and it was not clear that one can make a career in computers.  However I was very lucky that I was able to pursue what I liked and find demand for my skills.

I got my MS in 1979 and PhD in 1984 in Computer Science from New York University.
I was interested in AI (perhaps thanks to a lot of science fiction I read as a kid), but found a job in databases, so I was looking for ways to combine them.

In 1984 I joined GTE Labs where I worked on research in databases and AI, and in 1989 started the first project on Knowledge Discovery in data. To help convince my management that there will be a demand for this thing
called “data mining” (GTE management did not see much future for it), I also organized a AAAI workshop on the topic.

I thought “data mining” is not sexy enough name, and so I called it “Knowledge Discovery in Data”, or KDD.  Since 1989, I was working on KDD and data mining in all aspects – more on my page

Ajay-  How would you encourage a young science entrepreneur in this recession.

Gregory- Many great companies were started or grew in a recession, e.g.

Recession may be compared to a brush fire which removes dead wood and allows new trees to grow.

Ajay- What prompted you to set up KD Nuggets? Any reasons for the name (kNowledge Discovery Nuggets). Describe some key milestones in this iconic website for data mining people.

Gregory- After a third KDD workshop in 1993 I started a newsletter to connect about 50 people who attended the workshop and possibly others who were interested in data mining and KDD.  The idea was that it will have short items or “nuggets” of information. Also, at that time a popular metaphor for data miner was gold miners who were looking for gold “nuggets”.  So, I wanted a newsletter with “nuggets” – short, valuable items about Knowledge Discovery.  Thus, the name KDnuggets.

In 1994 I created a website on data mining at GTE and in 1997, after I left  GTE , I moved it to the current domain name .

In 1999, I was working for startup which provided data mining services to financial industry.  However, because of Y2K issues, all banks etc froze their systems in the second half of 1999, and we had very little work (and our salaries were reduced as well).  I decided that I will try to get some ads and was able to get companies like SPSS and Megaputer to advertise.

Since 2001, I am an independent consultant and KDnuggets is only part of what I am doing.  I also do data mining consulting, and actively participate in SIGKDD (Director 1998-2005, Chair 2005-2009).

Some people think that KDnuggets is a large company, with publisher, webmaster, editor, ad salesperson, billing dept, etc.  KDnuggets indeed has all this functions, but it is all me and my two cats.

Ajay- I am impressed by the fact KD nuggets is almost a dictionary or encyclopedia for data mining. But apart from advertising you have not been totally commercial- many features of your newsletter remain ad free – you still maintain a minimalistic look and do not take sponsership aligned with one big vendor. What is your vision for KD Nuggets for the years to come to keep it truly independent.

Gregory- My vision for KDnuggets is to be a comprehensive resource for data mining community, and I really enjoyed maintaining such resource for the first 7-8 years completely non-commercially. However, when I became self -employed, I could not do KDnuggets without any income, so I selectively introduced ads, and only those which are relevant to data mining.

I like to think of KDnuggets as a Craiglist for data mining community.

I certainly realize the importance of social media and Web 2.0 (and interested people can follow my tweets at  and plan to add more social features to KDnuggets.

Still, just like Wikipedia and Facebook do not make New York Times obsolete, I think there is room and need for an edited website, especially for such a nerdy and not very social group like data miners.

Ajay- What is the worst mistake/error in writing publishing that you did. What is the biggest triumph or high moment in the Nuggets history.

Gregory- My biggest mistake is probably in choosing the name kdnuggets – in retrospect,  I could have used a shorter and easier to spell domain name, but in 1997 I never expected that I will still be publishing 12 years later.

Ajay- Who are your favourite data mining students ( having known so many people). What qualities do you think set a data mining person apart from other sceinces.

Gregory- I was only an adjunct professor for a short time, so I did not really have data mining students, but I was privileged enough to know many current data mining leaders when they were students.  Among more recent students, I am very impressed with Jure Leskovec, who just finished his PhD and got the best KDD dissertation award.

Ajay- What does Gregory Piatetsky do for fun when he is not informing the world on analytics and knowledge discovery.

Gregory- I enjoy travelling with my family, and in the summer I like biking and windsurfing.
I also read a lot, and currently in the middle of reading Proust (which I periodically dilute by other, lighter books).

Ajay- What is your favourite reading blog and website ? Any India plans to visit.
– I visit many blogs on

and I like especially
– Matthew Hurst blog: Data Mining: Text Mining, Visualization, and Social Media
– Occam’s Razor by Avinash Kaushik, examining web analytics.
– Juice Analytics, blogging about analytics and visualization
– Geeking with Greg, exploring the future of personalized information.

I also like your website and plan to visit it more frequently

I visited many countries, but not yet India – waiting for the right occasion !



Gregory Piatetsky-Shapiro, Ph.D. is the President of KDnuggets, which provides research and consulting services in the areas of data mining, web mining, and business analytics. Gregory is considered to be one of the founders of the data mining and knowledge discovery field.Gregory edited or co-edited many collections on data mining and knowledge discovery, including two best-selling books: Knowledge Discovery in Databases (AAAI/MIT Press, 1991) and Advances in Knowledge Discovery in Databases (AAAI/MIT Press, 1996), and has over 60 publications in the areas of data mining, artificial intelligence and database research.

Gregory is the founder of Knowledge Discovery in Database (KDD) conference series. He organized and chaired the first three Knowledge Discovery in Databases (KDD) workshops in 1989, 1991, and 1993. He then served as the Chair of KDD Steering committee and guided the conversion of KDD workshops into leading international conferences on data mining. He also was the General Chair of the KDD-98 conference.

Conferences: KXEN and KDD 09

Here is an announcement regarding one of the foremost conferences on Knowledge Discovery KDD 2009 which is being held in Paris. We have interviewed the joint general chair of the conference, KXEN’s Francoise Soulie Fogelman here at

Indeed given KXEN’s exciting release of their social network analysis software, KSN they are also gold sponsors for the conference. You should view the archives here or read more here

From KXEN’s Press Release-

World’s Best Data Mining Knowledge and Expertise on Show
in Paris at KDD-09

Eminent data mining researchers, academics and practitioners from across the world are honing their presentation skills and charging their laptops in readiness for the industry’s largest and most respected conference, this year being staged for the first time in Europe, in the city of Paris.

The knowledge discovery and data mining 2009 (KDD-09) event will bring together more than 600 specialists, representing the single largest body of expertise in the science and application of data mining technology for industry, government and academia. They will discuss recent discoveries in data mining and share innovative ways of applying the technology in real world business.

Running from the 28th June to 1st July, KDD-09 will feature more than 120 presentations by experts from the US, Europe, Scandinavia and Asia-Pacific. A 20% increase in papers submitted reflects the growing importance of data mining in financially constrained markets. Companies taking part include Orange as a platinum sponsor and Microsoft adCenterLabs and KXEN as gold sponsors. Silver sponsors are Bayesia, Google, HP labs, Pervasive, SAS, Vadis and Yahoo!. Other sponsors include Alberta Center for Machine Learning, Pascal2, Socio Logiciels, Statsoft, Zementis, SFDS, IBM and SIGMOD.

Joint general chair of KDD-09, Francoise Soulie Fogelman, VP Business Development KXEN, says the conference offers a unique chance to see the very latest thinking in data mining. “Some of the best minds from the scientific and business communities will be there, ready and willing to share the results of their cutting edge research and data mining projects with end users. No other industry event offers anything like the depth and breadth of expertise on offer here.”

A particular focus for 2009 will be social network analysis: the discovery and use for competitive advantage of the links between people in social and professional networks. Currently a hot topic among data mining professionals – especially those working in the telecommunications sector – this technique will feature in theoretical and workshop presentations. Details will also be revealed of the world’s first practical applications involving industrial scale volumes of data. Gold sponsor KXEN will present on its booth its recently revealed KSN social network module, helping companies extract valuable new intelligence for better customer acquisition, retention, cross-sell and up-sell campaigns.

Other exhibitors include sponsors as well as Cambridge University Press, Cap Digital, Elsevier, Morgan Claypool Publishers, Oracle, Salford Systems, Springer and Taylor & Francis CRC press.

Also high on the agenda are real-time Web applications for data mining for custom advertising and personalized offers, both seen as crucial to online marketing and sales but both also requiring technologies able to handle very large volumes of data in real time.

Away from science and technology, delegates will also have a chance to sample the best of Paris architecture and hospitality on the evening of 29th June in the main reception room at the exclusive Hotel de Ville – a venue normally reserved for visiting heads of state. A cocktail reception hosted by KXEN will follow presentations, including a welcome from Jean-Louis Missika, the Deputy Mayor of Paris in charge of Innovation, Research and Universities.

There will also be the presentation of awards of the KDD cup by Dr. Isabelle Guyon (ClopiNet). The cup is awarded to the winners of a contest around predicting customer scores from large marketing databases. It, and other prize awards, are being sponsored by the French telecommunications company Orange and Google.

KDD-09 is organized by the data mining special interest group of the Association of Computing Machinery (ACM), the world’s largest educational and scientific computing society. The ACM provides resources that advance computing both as a science and a profession. ACM provides the computing field’s premier digital library and serves its members and the computing profession with leading-edge publications, conferences, and career resources.

More details, program & registration:

About KXEN

KXEN, The Data Mining Automation Company™ delivers next-generation Customer Lifecycle Analytics to enterprises that depend on analytics as a competitive advantage. KXEN’s Data Mining Automation Solution drives significant improvements in customer acquisition, retention, cross-sell and risk applications. Its solution integrates predictive analytics into strategic business processes, allowing customers to drive greater value into their business. Find out more by visiting


