Conferences: KXEN and KDD 09

Here is an announcement regarding one of the foremost conferences on Knowledge Discovery KDD 2009 which is being held in Paris. We have interviewed the joint general chair of the conference, KXEN’s Francoise Soulie Fogelman here at http://www.decisionstats.com/2009/03/27/interview-franoise-soulie-fogelman-kxen/

Indeed given KXEN’s exciting release of their social network analysis software, KSN they are also gold sponsors for the conference. You should view the archives here http://www.kdd2008.com/ or read more here http://www.kdd.org/kdd2009/index.html

From KXEN’s Press Release-

World’s Best Data Mining Knowledge and Expertise on Show
in Paris at KDD-09

Eminent data mining researchers, academics and practitioners from across the world are honing their presentation skills and charging their laptops in readiness for the industry’s largest and most respected conference, this year being staged for the first time in Europe, in the city of Paris.

The knowledge discovery and data mining 2009 (KDD-09) event will bring together more than 600 specialists, representing the single largest body of expertise in the science and application of data mining technology for industry, government and academia. They will discuss recent discoveries in data mining and share innovative ways of applying the technology in real world business.

Running from the 28th June to 1st July, KDD-09 will feature more than 120 presentations by experts from the US, Europe, Scandinavia and Asia-Pacific. A 20% increase in papers submitted reflects the growing importance of data mining in financially constrained markets. Companies taking part include Orange as a platinum sponsor and Microsoft adCenterLabs and KXEN as gold sponsors. Silver sponsors are Bayesia, Google, HP labs, Pervasive, SAS, Vadis and Yahoo!. Other sponsors include Alberta Center for Machine Learning, Pascal2, Socio Logiciels, Statsoft, Zementis, SFDS, IBM and SIGMOD.

Joint general chair of KDD-09, Francoise Soulie Fogelman, VP Business Development KXEN, says the conference offers a unique chance to see the very latest thinking in data mining. “Some of the best minds from the scientific and business communities will be there, ready and willing to share the results of their cutting edge research and data mining projects with end users. No other industry event offers anything like the depth and breadth of expertise on offer here.”

A particular focus for 2009 will be social network analysis: the discovery and use for competitive advantage of the links between people in social and professional networks. Currently a hot topic among data mining professionals – especially those working in the telecommunications sector – this technique will feature in theoretical and workshop presentations. Details will also be revealed of the world’s first practical applications involving industrial scale volumes of data. Gold sponsor KXEN will present on its booth its recently revealed KSN social network module, helping companies extract valuable new intelligence for better customer acquisition, retention, cross-sell and up-sell campaigns.

Other exhibitors include sponsors as well as Cambridge University Press, Cap Digital, Elsevier, Morgan Claypool Publishers, Oracle, Salford Systems, Springer and Taylor & Francis CRC press.

Also high on the agenda are real-time Web applications for data mining for custom advertising and personalized offers, both seen as crucial to online marketing and sales but both also requiring technologies able to handle very large volumes of data in real time.

Away from science and technology, delegates will also have a chance to sample the best of Paris architecture and hospitality on the evening of 29th June in the main reception room at the exclusive Hotel de Ville – a venue normally reserved for visiting heads of state. A cocktail reception hosted by KXEN will follow presentations, including a welcome from Jean-Louis Missika, the Deputy Mayor of Paris in charge of Innovation, Research and Universities.

There will also be the presentation of awards of the KDD cup by Dr. Isabelle Guyon (ClopiNet). The cup is awarded to the winners of a contest around predicting customer scores from large marketing databases. It, and other prize awards, are being sponsored by the French telecommunications company Orange and Google.

KDD-09 is organized by the data mining special interest group of the Association of Computing Machinery (ACM), the world’s largest educational and scientific computing society. The ACM provides resources that advance computing both as a science and a profession. ACM provides the computing field’s premier digital library and serves its members and the computing profession with leading-edge publications, conferences, and career resources.

More details, program & registration: http://www.kdd.org/kdd2009/index.html

About KXEN

KXEN, The Data Mining Automation Company™ delivers next-generation Customer Lifecycle Analytics to enterprises that depend on analytics as a competitive advantage. KXEN’s Data Mining Automation Solution drives significant improvements in customer acquisition, retention, cross-sell and risk applications. Its solution integrates predictive analytics into strategic business processes, allowing customers to drive greater value into their business. Find out more by visiting www.kxen.com.

dcvss358_470gv3wjffk_b

Disclaimer- I am a social media consultant to KXEN.

KXEN Case Studies : Financial Sector

Here are the summaries of some excellent success stories that KXEN has achieved working with partners in the financial world over the years.

Fraud Modeling- Disbank (acquired by Fortis) Turkey

1. Dısbank increased the number of identified fraudulent applications by 200% from 7 to 21 per day.

2.More than 50 fraudsters using counterfeit cards at merchant locations or fraudulent applications have been arrested after April 2004 when the fraud modeling system was set.


A large Bank on the U.S. East Coast

1.Response Modeling

Previously it took the modeling group four weeks to build one model with several hundred variables, using traditional modeling tools. KXEN took one hour for the same problem and doubled the lift in the top decile because it included variables that had not been used for this business question before.

2.Data Quality

Building a Cross/Up-sell Model for a direct marketing campaign to high net worth customers, the modelers needed four weeks using 1500 variables. Again it took one hour with KXEN, which uncovered significant problems with some of the top predictive variables. Further investigation proved that these problems were created in the data merge of the mail file and response file, creating several “perfect” predictors. The model was re-run, removing these variables, and immediately put into production.

Le Crédit Lyonnais

1.Around 160 score models now built annually – compared to around 10 previously – for 130 direct marketing campaigns.
2.KXEN software has allowed LCL to drive up response rates, leading to more value-added services for customers.

Finansbank, Turkey

1.Within 4 months of starting the project to combat dormancy using KXEN’s solution, the bank had successfully reactivated half its previously dormant customers as per Kunter Kutluay, Finansbank Director of Marketing and Risk Analytics.

Bank Austria Creditanstalt , Austria

1.Some 4.5 terabytes of data are held in the bank’s operational systems, with a further 2 terabytes archived. Analytical models created in KXEN are automatically fed through the bank’s scoring engine in batches weekly
or monthly depending on the schema.

“But we are looking at a success rate of target customer deals in the area of three to five per cent with KXEN.
Before that, it was one per cent or less. ”
Werner Widhalm, Head of the Customer Knowledge Management Unit.

Barclays

1.Barclays’ Teradata warehouse holds information on some 14 million active customers, with data
on many different aspects of customer behaviour. Previously, analysts had to manually whittle down several thousand fields of data to a core of only a few hundred to fit the limitations of the modelling process. Now, all of the variables can be fed straight into the predictive model.

Summary– KXEN has achieved tremendous response in all aspects of data modelling in financial sector where time in building, deploying and analyzing model is much more crucial than many other sectors. I would be following this with other case studies on other KXEN successes across multiple domains.

Source – http://www.kxen.com/index.php?option=com_content&task=view&id=220&Itemid=786

Disclaimer- I am a social media consultant for KXEN.

What is Linux

If you think Linux is too serious, take out five minutes and watch these videos -the winners in a contest sponsored by the people at http://video.linuxfoundation.org/contest/winners

This one came in third so check out the other ones too

Citation-http://video.linuxfoundation.org/
http://video.linuxfoundation.org/contest/winners

Bring it on Bing

A few notes on Bing

screenshot-ajay-ohri-bing-mozilla-firefox

  • The design is better ( read newer). Google still thinks design is something they studied and forgot in semester 1 of engineering – but the Ipod like design is cool.
  • I like the preview link  feature- just hover the mouse to get a sleak preview of what the searh page goes to- it saves time I think A LOT.
  • Surprisingly the results are more and in different order than Google
  • Images result was again different than Google but I liked the images options on left margin
  • Google results are still more pertinent ( but not much) on the first page but Bing’s archive seemed fresher ( like catching my Linkedin profile changed url while Google gave an error)

Overall summary- it is NEW and DIFFERENT and GOOD. Good enough to add to the toolbar. But not great enough to leave 8 year old habits of Googling it. Unless Google guys really bung it up.

Citation- http://bing.com

screenshot-ajay-ohri-bing-images-mozilla-firefox

KXEN Webinar on Automation

Here is a webinare from KXEN on automation- having seen the product in action multiple times it is always a Wow moment when you see KXEN build a model in 5 minutes flat from thousands of variables and tens of thousands of rows. If you have not seen the latest version of KXEN in action – do take time out for 60 minutes to see this

From http://www.kxen.com/index.php?option=com_content&task=view&id=546&Itemid=985

KXEN’s Automation Revolutionizes Modeling Productivity

  • Date: June 9, 2009
  • Time: 9:00 am Pacific/12:00 noon Eastern
  • Duration: 60 minutes

Register Now!

You have already recognized improved marketing performance by investing in a campaign management solution and data mining tools. Why might you be interested in KXEN, the leader in data mining automation?

If you are like many businesses these days, you would like to be able to do more with less. You have a limited analytical team and more modeling requirements than ever.

With KXEN, our customers are able to produce models in 1/10th to 1/100th of the time of traditional data mining tools, while not sacrificing model accuracy or robustness.

What makes KXEN different? In this webinar, you will learn how and why KXEN is unique. And why your business might want to select KXEN as your data mining solution.

What will be demonstrated

This presentation will show you how KXEN automates the data mining process including:

Data Preparation
Variable Selection
Model Building
Model Validation
Scoring Code Generation

Who should attend

Statisticians, data miners, data analysts, business analysts and marketing executives who want to increase the productivity of their analytics team.

Register Now!

Disclaimer- I am a consultant to KXEN on social media

More R please

some R news

0 The R Foundation Website I guess the http://www.r-project.org team is busy prettyfying before the annual R users conference kicks in- the website of www.r-project.org ( I was told it looks has the aesthetic visual appeal of dead cat splattered on the autobahn a very HTML 4.0 kind of retro look )

I cant believe the R Site and R core honchos finds the following image the prettiest image to represent graphical abilities of R

The R core site has tremendous functionality and demand though I wonder if they can just put up some ads and get some funding/ two way research tie- up with Google —Google uses R extensively, and can help with online methods as well, and is listed as supporting organization at http://www.r-project.org/foundation/memberlist.html …..

The R archives are a collection of emails and thats not documentation at all – but

1 Revolution R Website and particularly David Smith’s blog is a great way to stay updated on R news at http://blog.revolution-computing.com/

I have covered REvolution R before, and they are truly impressive.

http://www.decisionstats.com/2009/01/31/interviewrichard-schultz-ceo-revolution-computing/

It seems the domain name revolutioncomputing.com was squatted ( by NC?) so thats why the hyphenated web name. It is a very lucid website- though I do request them to put more video/podcasts and a Tweet this button would be great :))

and another more techie post here

http://blog.revolution-computing.com/2009/05/verifying-zipfs-powerdistribution-law-for-cities.html

Another great source is the Twitter – it seems that Twitter R users use the hashtag #rstats to search for R kind of news and code – that should help R bloggers and at a later date users.

Click here for checking it out

http://search.twitter.com/search?q=#stats

2 Some more R forums and sites

Forum for R Enterprise Users http://www.revolution-computing.com/forum

A R Tips Site http://onertipaday.blogspot.com/

The R Journal ( yes there is a journal for all hard working R fans) http://journal.r-project.org/

R on Linkedin http://www.linkedin.com/groups?about=&gid=77616

and the Analytic Bridge community group for R

http://www.analyticbridge.com/group/rprojectandotherfreesoftwaretools

2 Here is a terrific post by Robert Grossman

at http://blog.rgrossman.com/2009/05/17/running-r-on-amazons-ec2/

I liked the way he built the case for using R on Amazon EC2 ( Business case not Use case) and then proceeded to a step by step tutorial simple and powerful blog post. I hope R comes out with a standardized Online R Doc like that which is a single point search able archive for code – something like the SAS online doc (which remains free for WPS users 😉 ) but the way the web is evolving it seems the present mish mash method would continue

the main steps to use R on a pre-configured AMI.

Set up.
The set up needs to be done just once.

1. Set up an Amazon Web Services (AWS) account by going to:

aws.amazon.com.

If you already have an Amazon account for buying books and other items from Amazon, then you can use this account also for AWS.
2. Login to the AWS console
3. Create a “key-pair” by clinking on the link “Key Pairs” in the Configuration section of the Navigation Menu on the left hand side of the AWS console page.
4. Clink on the “Create Key Pair” button, about a quarter of the way down the page.
5. Name the key pair and save it to working directory, say /home/rlg/work.

Launching the AMI. These steps are done whenever you want to launch a new AMI.

1. Login to the AWS console. Click on the Amazon EC2 tab.
2. Click the “AMIs” button under the “Images and Instances” section of the left navigation menu of the AWS console.
3. Enter “opendatagroup” in the search box and select the AMI labeled
“opendatagroup/r-timeseries.manifest.xml”, which
is AMI instance “ami-ea846283″.
4. Enter the number of instances to launch (1), the name of the key pair that you have previously created, and select “web server” for the security group. Click the launch button to launch the AMI. Be sure to terminate the AMI when you are done.
5. Wait until the status of the AMI is “running.” This usually takes about 5 minutes.

Accessing the AMI.

1. Get the public IP address of the new AMI. The easiest way to do this is to select the AMI by checking the box. This provides some additional information about the AMI at the bottom of the window. You can can copy the IP address there.
2. Open a console window and cd to your working directory which contains the key-pair that you previously downloaded.
3. Type the command:
ssh -i testkp.pem -X root@ec2-67-202-44-197.compute-1.amazonaws.com

Here we assume that the name of the key-pair you created is “testkp.pem.” The flag “-X” starts a session that supports X11. If you don’t have X11 on your machine, you can still login and use R but the graphics in the example below won’t be displayed on your computer.

Using R on the AMI.

1. Change your directory and start R

#cd examples
#R
2. Test R by entering a R expression, such as:

> mean(1:100)
[1] 50.5
>
3. From within R, you can also source one of the example scripts to see some time series computations:

> source(‘NYSE.r’)
4. After a minute or so, you should see a graph on your screen. After the graph is finished being drawn, you should see a prompt:

CR to continue

Enter a carriage return and you should see another graph. You will need to enter a carriage return 8 times to complete the script (you can also choose to break out of the script if you get bored with the all the graphs.
5. When you are done, exit your R session with a control-D. Exit your ssh session with an “exit” and terminte your AMI from the Amazon AWS console. You can also choose to leave your AMI running (it is only a few dollars a day).

Acknowledgements: Steve Vejcik from Open Data Group wrote the R scripts and configured the AMI.

AjayTerrific R companies, blogs, tweets, research and sites, but do let me know your feedback . Just un-other R day.

Creating Online Communities


Sometime back I had asked the question- How much do you think would it be to have the top 100 bloggers on SAS language on the same page, in a manner that the RSS feeds get updated on their own. The answer is here-

Wordframe. I had covered this software before in comparison to Ning.com and they proved favorable.

This is a small startup, East Europe based and very hard working. They allegedly wanted to become open source and had plans to create third party applications when I checked with them in January but this may be on hold for a new product launch.

Would sas.com pay for 1000$ set up fee and 200$ monthly fee for getting the top 100 SAS bloggers on their sascommunity.org website.

Would oracle pay for 1000$ set up fee and 200$ monthly fee for getting the top 100 Oracle bloggers on a website sponsored by them.

How much would Aster Data pay for say 100 bloggers about Hadoop ( ahem- assuming there are 100 people who CAN blog about Hadoop- a bit like Einstein’s 5 people in the world can understand his theory of relativity).

Check this site out.

image