Interesting Times

Probably for the first time I am reproducing a comment from a reader in it’s entirity. As an ex GE Finance and exCiti man , the following parable stuck closer to heart.

Here are some nice views from Randall Stross of http://enhilex.com/ on current economic crisis- If his touch a chord so do write back

Hello Everyone, There is a lot of talk about what made Wall St. fail. Having spent some time inside firms that played pivotal roles, I must say that I believe there are as many reasons for the failure as there are ways to manifest lack of integrity, diligence, lack of responsibility and accountability. As recent failures demonstrate, all attempts to legislate all those characteristics have failed – for the same reasons. The manifestations I found were like these following examples:

1. Here’s a short conversation in a hallway. I would later realize that we were standing in front of the manager’s office… Me: “So , why don’t your models account for employment as a factor that influences the borrower’s ability to pay his mortgage?” Him, looking at me like I had 2 heads: “Er, uh… Well, that is because… very long pause… there is no reliable data on that. Yes, that’s right. So we leave it out…” That was the reply from a very intelligent person who knew he would be shown the door if he’d told me the truth. Lay persons understand that models are useful, though imperfect. But the example given above is something else. This model’s design was intended to mislead. People who do things like this may have been “good” employees, but they were not good citizens. The “good” employees remain… Anyone with an IQ over 50 knows that past performance does not predict future performance when you don’t account for the differences between past and present in the entirety of the nexus where the question belongs.

2. Whilst sitting next to me in a development lab, a young reporting analyst was directed to “hard code” the sum of a column of figures headed for regulators – because the bottom line “wasn’t good enough.” The young man hesitated. Relieved, I put my hand on his wrist and quietly suggested that he ask for our client’s manager to send that request to him in an email. The manager left the lab, calling us obscene names. Of course, the email never came.

3. A friend of mine working as a funder for a large mortgage firm was fired for refusing to fund a loan she knew was fraudulent. I’m proud to know the people in #2 and #3. These people have integrity and loyalty to principles the investing public can count on. But they both lost their jobs and have moved on into other fields, away from Wall St-ish areas. So, it is the people who are left that will work to rebuild confidence in the market. OK, now — who’s left? I wish we could let it all fall down and stand back up without the bums who caused it all. We do know who they are…

ps- Hope he is neither a spammer or joking. This does bring a lot of old memories when I worked for the big hot fin companies. Do you have  a personal story like that.

Conferences: KXEN and KDD 09

Here is an announcement regarding one of the foremost conferences on Knowledge Discovery KDD 2009 which is being held in Paris. We have interviewed the joint general chair of the conference, KXEN’s Francoise Soulie Fogelman here at http://www.decisionstats.com/2009/03/27/interview-franoise-soulie-fogelman-kxen/

Indeed given KXEN’s exciting release of their social network analysis software, KSN they are also gold sponsors for the conference. You should view the archives here http://www.kdd2008.com/ or read more here http://www.kdd.org/kdd2009/index.html

From KXEN’s Press Release-

World’s Best Data Mining Knowledge and Expertise on Show
in Paris at KDD-09

Eminent data mining researchers, academics and practitioners from across the world are honing their presentation skills and charging their laptops in readiness for the industry’s largest and most respected conference, this year being staged for the first time in Europe, in the city of Paris.

The knowledge discovery and data mining 2009 (KDD-09) event will bring together more than 600 specialists, representing the single largest body of expertise in the science and application of data mining technology for industry, government and academia. They will discuss recent discoveries in data mining and share innovative ways of applying the technology in real world business.

Running from the 28th June to 1st July, KDD-09 will feature more than 120 presentations by experts from the US, Europe, Scandinavia and Asia-Pacific. A 20% increase in papers submitted reflects the growing importance of data mining in financially constrained markets. Companies taking part include Orange as a platinum sponsor and Microsoft adCenterLabs and KXEN as gold sponsors. Silver sponsors are Bayesia, Google, HP labs, Pervasive, SAS, Vadis and Yahoo!. Other sponsors include Alberta Center for Machine Learning, Pascal2, Socio Logiciels, Statsoft, Zementis, SFDS, IBM and SIGMOD.

Joint general chair of KDD-09, Francoise Soulie Fogelman, VP Business Development KXEN, says the conference offers a unique chance to see the very latest thinking in data mining. “Some of the best minds from the scientific and business communities will be there, ready and willing to share the results of their cutting edge research and data mining projects with end users. No other industry event offers anything like the depth and breadth of expertise on offer here.”

A particular focus for 2009 will be social network analysis: the discovery and use for competitive advantage of the links between people in social and professional networks. Currently a hot topic among data mining professionals – especially those working in the telecommunications sector – this technique will feature in theoretical and workshop presentations. Details will also be revealed of the world’s first practical applications involving industrial scale volumes of data. Gold sponsor KXEN will present on its booth its recently revealed KSN social network module, helping companies extract valuable new intelligence for better customer acquisition, retention, cross-sell and up-sell campaigns.

Other exhibitors include sponsors as well as Cambridge University Press, Cap Digital, Elsevier, Morgan Claypool Publishers, Oracle, Salford Systems, Springer and Taylor & Francis CRC press.

Also high on the agenda are real-time Web applications for data mining for custom advertising and personalized offers, both seen as crucial to online marketing and sales but both also requiring technologies able to handle very large volumes of data in real time.

Away from science and technology, delegates will also have a chance to sample the best of Paris architecture and hospitality on the evening of 29th June in the main reception room at the exclusive Hotel de Ville – a venue normally reserved for visiting heads of state. A cocktail reception hosted by KXEN will follow presentations, including a welcome from Jean-Louis Missika, the Deputy Mayor of Paris in charge of Innovation, Research and Universities.

There will also be the presentation of awards of the KDD cup by Dr. Isabelle Guyon (ClopiNet). The cup is awarded to the winners of a contest around predicting customer scores from large marketing databases. It, and other prize awards, are being sponsored by the French telecommunications company Orange and Google.

KDD-09 is organized by the data mining special interest group of the Association of Computing Machinery (ACM), the world’s largest educational and scientific computing society. The ACM provides resources that advance computing both as a science and a profession. ACM provides the computing field’s premier digital library and serves its members and the computing profession with leading-edge publications, conferences, and career resources.

More details, program & registration: http://www.kdd.org/kdd2009/index.html

About KXEN

KXEN, The Data Mining Automation Company™ delivers next-generation Customer Lifecycle Analytics to enterprises that depend on analytics as a competitive advantage. KXEN’s Data Mining Automation Solution drives significant improvements in customer acquisition, retention, cross-sell and risk applications. Its solution integrates predictive analytics into strategic business processes, allowing customers to drive greater value into their business. Find out more by visiting www.kxen.com.

dcvss358_470gv3wjffk_b

Disclaimer- I am a social media consultant to KXEN.

KXEN Case Studies : Financial Sector

Here are the summaries of some excellent success stories that KXEN has achieved working with partners in the financial world over the years.

Fraud Modeling- Disbank (acquired by Fortis) Turkey

1. Dısbank increased the number of identified fraudulent applications by 200% from 7 to 21 per day.

2.More than 50 fraudsters using counterfeit cards at merchant locations or fraudulent applications have been arrested after April 2004 when the fraud modeling system was set.


A large Bank on the U.S. East Coast

1.Response Modeling

Previously it took the modeling group four weeks to build one model with several hundred variables, using traditional modeling tools. KXEN took one hour for the same problem and doubled the lift in the top decile because it included variables that had not been used for this business question before.

2.Data Quality

Building a Cross/Up-sell Model for a direct marketing campaign to high net worth customers, the modelers needed four weeks using 1500 variables. Again it took one hour with KXEN, which uncovered significant problems with some of the top predictive variables. Further investigation proved that these problems were created in the data merge of the mail file and response file, creating several “perfect” predictors. The model was re-run, removing these variables, and immediately put into production.

Le Crédit Lyonnais

1.Around 160 score models now built annually – compared to around 10 previously – for 130 direct marketing campaigns.
2.KXEN software has allowed LCL to drive up response rates, leading to more value-added services for customers.

Finansbank, Turkey

1.Within 4 months of starting the project to combat dormancy using KXEN’s solution, the bank had successfully reactivated half its previously dormant customers as per Kunter Kutluay, Finansbank Director of Marketing and Risk Analytics.

Bank Austria Creditanstalt , Austria

1.Some 4.5 terabytes of data are held in the bank’s operational systems, with a further 2 terabytes archived. Analytical models created in KXEN are automatically fed through the bank’s scoring engine in batches weekly
or monthly depending on the schema.

“But we are looking at a success rate of target customer deals in the area of three to five per cent with KXEN.
Before that, it was one per cent or less. ”
Werner Widhalm, Head of the Customer Knowledge Management Unit.

Barclays

1.Barclays’ Teradata warehouse holds information on some 14 million active customers, with data
on many different aspects of customer behaviour. Previously, analysts had to manually whittle down several thousand fields of data to a core of only a few hundred to fit the limitations of the modelling process. Now, all of the variables can be fed straight into the predictive model.

Summary– KXEN has achieved tremendous response in all aspects of data modelling in financial sector where time in building, deploying and analyzing model is much more crucial than many other sectors. I would be following this with other case studies on other KXEN successes across multiple domains.

Source – http://www.kxen.com/index.php?option=com_content&task=view&id=220&Itemid=786

Disclaimer- I am a social media consultant for KXEN.

What is Linux

If you think Linux is too serious, take out five minutes and watch these videos -the winners in a contest sponsored by the people at http://video.linuxfoundation.org/contest/winners

This one came in third so check out the other ones too

Citation-http://video.linuxfoundation.org/
http://video.linuxfoundation.org/contest/winners

Bring it on Bing

A few notes on Bing

screenshot-ajay-ohri-bing-mozilla-firefox

  • The design is better ( read newer). Google still thinks design is something they studied and forgot in semester 1 of engineering – but the Ipod like design is cool.
  • I like the preview link  feature- just hover the mouse to get a sleak preview of what the searh page goes to- it saves time I think A LOT.
  • Surprisingly the results are more and in different order than Google
  • Images result was again different than Google but I liked the images options on left margin
  • Google results are still more pertinent ( but not much) on the first page but Bing’s archive seemed fresher ( like catching my Linkedin profile changed url while Google gave an error)

Overall summary- it is NEW and DIFFERENT and GOOD. Good enough to add to the toolbar. But not great enough to leave 8 year old habits of Googling it. Unless Google guys really bung it up.

Citation- http://bing.com

screenshot-ajay-ohri-bing-images-mozilla-firefox

KXEN Webinar on Automation

Here is a webinare from KXEN on automation- having seen the product in action multiple times it is always a Wow moment when you see KXEN build a model in 5 minutes flat from thousands of variables and tens of thousands of rows. If you have not seen the latest version of KXEN in action – do take time out for 60 minutes to see this

From http://www.kxen.com/index.php?option=com_content&task=view&id=546&Itemid=985

KXEN’s Automation Revolutionizes Modeling Productivity

  • Date: June 9, 2009
  • Time: 9:00 am Pacific/12:00 noon Eastern
  • Duration: 60 minutes

Register Now!

You have already recognized improved marketing performance by investing in a campaign management solution and data mining tools. Why might you be interested in KXEN, the leader in data mining automation?

If you are like many businesses these days, you would like to be able to do more with less. You have a limited analytical team and more modeling requirements than ever.

With KXEN, our customers are able to produce models in 1/10th to 1/100th of the time of traditional data mining tools, while not sacrificing model accuracy or robustness.

What makes KXEN different? In this webinar, you will learn how and why KXEN is unique. And why your business might want to select KXEN as your data mining solution.

What will be demonstrated

This presentation will show you how KXEN automates the data mining process including:

Data Preparation
Variable Selection
Model Building
Model Validation
Scoring Code Generation

Who should attend

Statisticians, data miners, data analysts, business analysts and marketing executives who want to increase the productivity of their analytics team.

Register Now!

Disclaimer- I am a consultant to KXEN on social media

More R please

some R news

0 The R Foundation Website I guess the http://www.r-project.org team is busy prettyfying before the annual R users conference kicks in- the website of www.r-project.org ( I was told it looks has the aesthetic visual appeal of dead cat splattered on the autobahn a very HTML 4.0 kind of retro look )

I cant believe the R Site and R core honchos finds the following image the prettiest image to represent graphical abilities of R

The R core site has tremendous functionality and demand though I wonder if they can just put up some ads and get some funding/ two way research tie- up with Google —Google uses R extensively, and can help with online methods as well, and is listed as supporting organization at http://www.r-project.org/foundation/memberlist.html …..

The R archives are a collection of emails and thats not documentation at all – but

1 Revolution R Website and particularly David Smith’s blog is a great way to stay updated on R news at http://blog.revolution-computing.com/

I have covered REvolution R before, and they are truly impressive.

http://www.decisionstats.com/2009/01/31/interviewrichard-schultz-ceo-revolution-computing/

It seems the domain name revolutioncomputing.com was squatted ( by NC?) so thats why the hyphenated web name. It is a very lucid website- though I do request them to put more video/podcasts and a Tweet this button would be great :))

and another more techie post here

http://blog.revolution-computing.com/2009/05/verifying-zipfs-powerdistribution-law-for-cities.html

Another great source is the Twitter – it seems that Twitter R users use the hashtag #rstats to search for R kind of news and code – that should help R bloggers and at a later date users.

Click here for checking it out

http://search.twitter.com/search?q=#stats

2 Some more R forums and sites

Forum for R Enterprise Users http://www.revolution-computing.com/forum

A R Tips Site http://onertipaday.blogspot.com/

The R Journal ( yes there is a journal for all hard working R fans) http://journal.r-project.org/

R on Linkedin http://www.linkedin.com/groups?about=&gid=77616

and the Analytic Bridge community group for R

http://www.analyticbridge.com/group/rprojectandotherfreesoftwaretools

2 Here is a terrific post by Robert Grossman

at http://blog.rgrossman.com/2009/05/17/running-r-on-amazons-ec2/

I liked the way he built the case for using R on Amazon EC2 ( Business case not Use case) and then proceeded to a step by step tutorial simple and powerful blog post. I hope R comes out with a standardized Online R Doc like that which is a single point search able archive for code – something like the SAS online doc (which remains free for WPS users 😉 ) but the way the web is evolving it seems the present mish mash method would continue

the main steps to use R on a pre-configured AMI.

Set up.
The set up needs to be done just once.

1. Set up an Amazon Web Services (AWS) account by going to:

aws.amazon.com.

If you already have an Amazon account for buying books and other items from Amazon, then you can use this account also for AWS.
2. Login to the AWS console
3. Create a “key-pair” by clinking on the link “Key Pairs” in the Configuration section of the Navigation Menu on the left hand side of the AWS console page.
4. Clink on the “Create Key Pair” button, about a quarter of the way down the page.
5. Name the key pair and save it to working directory, say /home/rlg/work.

Launching the AMI. These steps are done whenever you want to launch a new AMI.

1. Login to the AWS console. Click on the Amazon EC2 tab.
2. Click the “AMIs” button under the “Images and Instances” section of the left navigation menu of the AWS console.
3. Enter “opendatagroup” in the search box and select the AMI labeled
“opendatagroup/r-timeseries.manifest.xml”, which
is AMI instance “ami-ea846283″.
4. Enter the number of instances to launch (1), the name of the key pair that you have previously created, and select “web server” for the security group. Click the launch button to launch the AMI. Be sure to terminate the AMI when you are done.
5. Wait until the status of the AMI is “running.” This usually takes about 5 minutes.

Accessing the AMI.

1. Get the public IP address of the new AMI. The easiest way to do this is to select the AMI by checking the box. This provides some additional information about the AMI at the bottom of the window. You can can copy the IP address there.
2. Open a console window and cd to your working directory which contains the key-pair that you previously downloaded.
3. Type the command:
ssh -i testkp.pem -X root@ec2-67-202-44-197.compute-1.amazonaws.com

Here we assume that the name of the key-pair you created is “testkp.pem.” The flag “-X” starts a session that supports X11. If you don’t have X11 on your machine, you can still login and use R but the graphics in the example below won’t be displayed on your computer.

Using R on the AMI.

1. Change your directory and start R

#cd examples
#R
2. Test R by entering a R expression, such as:

> mean(1:100)
[1] 50.5
>
3. From within R, you can also source one of the example scripts to see some time series computations:

> source(‘NYSE.r’)
4. After a minute or so, you should see a graph on your screen. After the graph is finished being drawn, you should see a prompt:

CR to continue

Enter a carriage return and you should see another graph. You will need to enter a carriage return 8 times to complete the script (you can also choose to break out of the script if you get bored with the all the graphs.
5. When you are done, exit your R session with a control-D. Exit your ssh session with an “exit” and terminte your AMI from the Amazon AWS console. You can also choose to leave your AMI running (it is only a few dollars a day).

Acknowledgements: Steve Vejcik from Open Data Group wrote the R scripts and configured the AMI.

AjayTerrific R companies, blogs, tweets, research and sites, but do let me know your feedback . Just un-other R day.