KXEN and a Data Mining Survey

Recently KXEN, the data mining and modeling automation company which has also pioneered social network analytics software came in for a bit of customer love in a data mining survey.


As per the site

KXENs next generation automated data mining software is a strategic solution for 90% of user organizations and has won their support and praise in a new customer satisfaction survey, the findings of which are revealed today.

Of almost 2,000 users polled 90% of those responding said the company’s advanced analytics software was strategic to their activity, 87% were highly or very highly satisfied and 85% agreed the software had met or exceeded all of their expectations. The results underscore KXEN’s growing importance in a market traditionally dominated by more costly, harder to use first generation offerings.

KXENs analytic software was also highly rated for its simple, clear interface with all respondents agreeing that KXEN solutions were easy to use, and 90% stating its new graphical front end had brought yet more usability benefits. Confirming these findings, users responding included sales, marketing and other line of business staff as well as specialist analysts, data miners, academics and statisticians.

Turning to the results of using KXENs software, 98% of all those responding stated it had improved their overall business with the same number agreeing it had speeded up their data modeling activities. 96% said KXEN had increased the value of predictive analytics in their companies.

Of course there are numerous surveys (including probably the best is from KD Nuggets) and I am trying to find the raw data and samples for this survey as I write. But it is a promising step up for a company I have admired since 2004, when I first tested it, and as late as last year I was building online models with it. Predictably Roger Hadaad whom we interviewed in January 2009 was all praise for his team and its splendid product. Well Done, guys take a bow- it is about time ! A great example of a company that builds innovative analytics quitely without getting into any tangles with open source or business intellgence sentiments.

Ajay- I am a consultant to KXEN for Social Networks Analysis.

KXEN – Automated Regression Modeling

I have used KXEN many times for building and testing propensity models. The regression modeling feature of KXEN is awesome in the sense it can make model building very easy to build and deliver.

The KXEN package K2R is the package responsible for this and uses robust regression. A word of the basic mathematical theory behind KXEN’s automated modeling – the technique is called Structural Risk Minimization. You can read more on the basic mathematical technique here or http://www.svms.org/srm/. The following is an extract from the same source.

Structural risk minimization (SRM) (Vapnik and Chervonekis, 1974) is an inductive principle for model selection used for learning from finite training data sets. It describes a general model of capacity control and provides a trade-off between hypothesis space complexity (the VC dimension of approximating functions) and the quality of fitting the training data (empirical error). The procedure is outlined below.

  1. Using a priori knowledge of the domain, choose a class of functions, such as polynomials of degree n, neural networks having n hidden layer neurons, a set of splines with n nodes or fuzzy logic models having n rules.
  2. Divide the class of functions into a hierarchy of nested subsets in order of increasing complexity. For example, polynomials of increasing degree.
  3. Perform empirical risk minimization on each subset (this is essentially parameter selection).
  4. Select the model in the series whose sum of empirical risk and VC confidence is minimal.

Sewell (2006) SVMs use the spirit of the SRM principle.

Structural risk minimization (SRM) (Vapnik 1995) uses a set of models ordered in terms of their complexities. An example is polynomials of increasing order. The complexity is generally given by the number of free parameters. VC dimension is another measure of model complexity. In equation 4.37, we can have a set of decreasing ?i to get a set of models ordered in increasing complexity. Model selection by SRM then corresponds to finding the model simplest in terms of order and best in terms of empirical error on the data.”
Alpaydin (2004), pages 80-81

Now back to the automated regression modeling.

Robust Regression

(K2R) is a universal solution for Classification, Regression, and Attribute Importance. It enables the prediction of behaviors (nominal targets) or quantities (continuous targets).

Unlike traditional regression algorithms, K2R can safely handle a very high numbers of input attributes (over 10,000) in an automated fashion. K2R provides indicators and graphs to ensure that the quality and robustness of trained models can be easily assessed. K2R graphically displays the attribute importance, which provides the relative importance of each attribute for explaining a given business question. At the same time it gives a clear indication of which attributes either contain no relevant information or are redundant with other attributes.

Benefits: The business value of a data mining project is increased by either training more models or completing the project faster. The ability to train more models allows a larger number of scenarios to be tested at a higher level of granularity. For example, if a direct marketing campaign benefits from separate models trained per region, per customer, segment, per month, the automation of K2R allows all of these models to be trained and safely deployed using the same amount or fewer resources than with traditional tools. learn more

What: K2R is a regression algorithm that allows building models to predict categories or continuous variables.

Why: Traditionally, building robust predictive models required a lot of time and expertise, which prevented companies from using data mining as part of their every day business decisions. K2R makes it easy to build and deploy predictive models in the fraction of the time it takes using classical statistical tools.

How: K2R maps a set of descriptive attributes (model inputs) and target attributes (model output). It uses an algorithm patented by KXEN, which is a derivation of a principle described by V. Vapnik as “Structured Risk Minimization.” Instead of looking for the best performance on a known dataset, K2R automatically finds the best compromise between quality and robustness. The resulting models are expressed as a polynomial expression of the input numbers. The only element specified by the user is the polynomial degree. To improve modeling speed, K2R can also build multi-target models.

Benefits for the business user: K2R allows the business user to easily build and understand advanced predictive models without statistical knowledge. A model can be created in a matter of minutes. Two performance indicators describe model quality (Ki) and model reliability or the ability to produce similar on new data (Kr).

K2R graphically displays the individual variable contribution to the model, which helps to select the most important variables explaining a given business question. At the same time it avoids focusing on data that contains no information.

Models can directly be applied in a simulation mode for a single input dataset predicting the score for an individual business question in real time.

Benefits for the Data Mining expert: K2R frees time for Data Mining professionals to apply their expertise in areas where they add more value instead of spending several days to tune a model. K2R produces results within minutes (less than 15 seconds on a laptop with 50,000 lines and 20 variables).

Here is a case study from the company itself.

Marketing campaign usage scenario

* Send a “Test mailing” to 5000 customers to offer them a new product,
* Collect the results of your test mailing to build a “Training” data set that associates things you know about customers prior to the mailing with the answers to your business question
* Train a model to “predict” the Yes/No answer
* Check the quality and robustness of your model (Ki, Kr)
* Apply the model to the 1,000,000 other customers in your database: this model associates each individual customer with a probability for answering Yes. Because you are using a robust model, the sum of probabilities is a good indicator of how many people will answer yes to this mail
* Send your mailing only to those customers with a high probability to respond positively, or use our built-in profit curves to optimize your return on the campaign

Example: Regression: Dealer evaluation usage scenario

* Collect information about the past performance of your dealers two years ago and associate how much of your product they sold 1 year ago
* Train a model to predict how much a dealer will sell based on the available information
* Check the quality and robustness of the model (Ki, Kr)
* Apply the model to all of your dealers today: the model associates each dealer with an estimation of how many products he will sell,
* Sum up the estimates to predict how much you will sell next year. This is the base line for your sales forecast.

In my next post I would include screenshots on how to build an automated regression model using KXEN.

Ajay Disclaimer- I am a consultant to KXEN for social networks.

Interview Françoise Soulie Fogelman, KXEN

This week KXEN launched its social network analysis tool thus gaining a unique edge in being the first to launch social network tools for analytics. Having worked with KXEN as an analyst for scoring model- I am aware of the remarkable innovations they bring to their premium products. In an exclusive interview ,KXENs Vice President for  Strategic Business Development, Franoise Soulie Fogelman agreed to share some light on this remarkable new development in statistical software development.


Ajay Franoise, how does the Social Network Analysis module helps model building for marketing professionals.

Franoise- KXEN Social network Analysis module (KSN) helps build models which take into account interactions between customers. This is done in 3 stages :

  • The data describing interactions is used to build a social network structure (actually, usually various social network structures are built in one pass through the data). You can explore your network to understand better the behavior of a given customer and what is happening around him.
  • From each social network structure, a set of attributes is automatically built by KSN for each node: it could be number of neighbors, average value of a given customer attribute among neighbors Actually, you can have statistics on anything you have loaded into the system as customer node decoration. Usually, youll generate at this stage a few tens of social attributes per social network structure.
  • You then join these social attributes to the existing customer attributes. After that, you build your model as usual.

Ajay  But how does the KSN module work and which mathematical technique is it based on (or is it just addition of extra variables). Are there any proprietary patents that KXEN have filed in this field (both automated modeling as well as social network analysis).

Franoise- The KSN module uses (for extracting social attributes) graph theory. KXEN has not filed a patent in relation with KSN.

Ajay  There are many modeling software but very few which involve social network analysis though many companies have expressed interest in this. What are the present rivals to KSN module specifically in software and who do you think the future rivals will be?

Franoise- There are many software tools, but when it comes to the ability to handle very large graphs, not very many are left. We consider that our only real competitor today is SAS who has an offer for Social network Analysis, but this product is specifically targeted for fraud in bank and insurance. There are also companies positioned in Telco, usually offering a consulting service, built around an internal product. We think our solution is unique in its ability to handle very large volumes (were talking here more than 40 M nodes and 300 M links) and to address all industry domains. As usual, we offer a tool which is an exploratory tool, giving the customer the ability to produce by himself as many models as he wants.

Ajay  Who would be the typical customer or potential clients for KSN module? In which domains would this module be not so relevant? Are there any specific case studies that you can point out?

Franoise- This is a first version, so we do not really know yet who the typical customer will be and cannot point yet to case studies. However, Telco operators have expressed a very strong interest and we already have a Telco customer with whom weve worked on marketing projects. So our first case studies will most certainly come from Telco. We are working on some research projects in the retail space. We think that banks (for fraud), social sites, blogs sites and forums will be our next customers. The sector where I do not see (yet?) a potential is manufacturing industries.

Ajay  How would privacy concerns of customers be addressed with the kind of social network analysis that KSN can now offer to marketers.

Franoise- KXEN offers a tool to build models and is not concerned with the problem of collecting, storing and exploiting data: this is KXEN customers responsibility. Depending upon the country, there are various jurisdictions protecting the storage and use of data and those will naturally apply to building and analyzing Social Networks. However, in the case of Social Network Analysis the issue of ethical use will be more sensitive.

Ajay What kind of hardware solutions go best with KXEN’s software. What are the other BI vendors that your offerings best complement with.

Franoise – KXEN software in general and KSN in particular, run on any platform. When using KSN to build decent size graphs (with tens of millions of nodes and hundreds of millions of links for example), 64 bits architecture is required. A recent survey of KXEN customers show that the BI suites used by our customers are mostly MicroStrategy and Business Objects (SAP). We also like very much to mention Advizor Solutions which offers data visualization software already embedding KXEN technology.

Ajay Do you think the text mining as well as the Data Fusion approach can work for online web analytics, search engines or ad targeting?

Franoise –Of course, our data fusion approach can be very well suited for online web analytics and ad targeting (we have a number of partners that either are already using KXEN for this purpose or developing applications in these domains using KXEN technology). We would be more cautious for search engines per se.

Ajay Are there any plans for offering KXEN products as a Service (like Salesforce.com) instead of the server based approach?

Franoise – We do not have yet plans to offer KXEN products as a service yet, but, again, we have partners such as Kognitio that offers analytics platforms embedding KXEN.


Brief Biography-


Franoise Soulie Fogelman is responsible for leading KXEN business development, identifying new business opportunities for KXEN and working with Product development, Sales and Marketing to help promote KXENs offer. She is also in charge of managing KXENs University Program.

Ms Soulie Fogelman has over 30 years of experience in data mining and CRM both from an academic and a business perspective. Prior to KXEN, she directed the first French research team on Neural Networks at Paris 11 University where she was a CS Professor. She then co-founded Mimetics, a start-up that processes and sells development environment, optical character recognition (OCR) products and services using neural network technology, and became its Chief Scientific Officer. After that she started the Data Mining and CRM group at Atos Origin and, most recently, she created and managed the CRM Agency for Business & Decision, a French IS company specialized in Business Intelligence and CRM.

Ms Soulie Fogelman holds a masters degree in mathematics from Ecole Normale Superieure and a PhD in Computer Science from University of Grenoble. She was advisor to over 20 PhD on data mining, has authored more than 100 scientific papers and books and has been an invited speaker to many academic and business events.


   ( Ajay – So it seems like an interesting software and with the marketing avenues for social networking growing, and analytics modelers exploring the last bit of data for incremental field this is an area where we can be sure of new developments soon. I wonder what the response from other analytics vendors including open source developers would be as this does seem a promising area for statistical modelling as well as analysis. What do you think ?? Can I search all data from Twitter , Facebook ,search results on Indeed .com and Linkedin and add it to your credit profile for creating a better propensity model .. 🙂 Will the credit or marketing behavior scores of your friends affect your propensity and thus the telecom ads you see while surfing )

KXEN releases Social Network Analysis tool

KXEN, the automated model making software company added one more innovation by being one of the first major data mining and analytics vendors in releasing an analytics tool for Social Network Analysis

From the Press Release

Press release

San Francisco, Paris, London, March 24th, 2009

New Social Network Analysis Module Strengthens KXEN Automated Data Mining

Leverages connections between people to boost marketing campaign results and profitability

Sales, marketing and customer retention campaigns are set to become smarter, more effective and more profitable thanks to a new social network analysis module from data mining automation vendor KXEN. By exploiting the connections between customers of telcos, banks, retails and others, KXEN’s new KSN module has shown more than 15% lift improvement in campaign results.

KSN identifies the otherwise hidden links call records or bank transfers for instance -between friends, families, co-workers and other communities and extracts significant social metrics, pinpointing who are the best connected and who plays the most important role in any group. In this way it reveals valuable new customer intelligence that – when added to existing customer information – can strengthen significantly user organizations’ customer acquisition, retention, cross-sell and up-sell campaigns.

Using KSN, companies can increase the accuracy and precision of their campaigns by leveraging the many more customer attributes that the module reveals, allowing them to better predict when customers may be about to churn to another provider, close an account, or buy a new product. A feature unique-to-KXEN allows the analysis of multiple networks and their evolution over time, exposing specific patterns of behaviors like rotational churn, fraud and identity theft.

"Traditional marketing relies on models based solely on customer-vendor interaction and assumes customers act independently of each other," says KXEN’s founder and CEO Roger Haddad. "But the new social network technology in KSN recognizes that customers do indeed interact with each other, and exploits that knowledge to drive up the effectiveness and completeness of marketing and sales campaigns."

KSN integrates with and enriches existing data mining environments or may be deployed entirely standalone. Exploiting viral marketing thinking, it eliminates the normally tedious and labor intensive aspects of social network analysis. The module provides as many social network maps as users want, recognizing that individuals may belong to many different networks across their business, family and social lives.

KSN, shipping from today, complements the existing KXEN Text Coder module which allows organizations to include plain text data into their analytics activities. Together the two modules, along with KXEN’s core software, are behind the company’s Data Fusion approach which combines structured and unstructured data from multiple sources to generate fast accurate results, thus maximizing organizations return on analytics investments.

Please click here to learn more about the KXEN Social Network Analysis module.

About KXEN

KXEN, The Data Mining Automation Company delivers next-generation Customer Lifecycle Analytics to enterprises that depend on analytics as a competitive advantage. KXEN’s Data Mining Automation Solution drives significant improvements in customer acquisition, retention, cross-sell and risk applications. Its solution integrates predictive analytics into strategic business processes, allowing customers to drive greater value into their business. Find out more by visiting www.kxen.com.

I found the statement by the CEO quite interesting –


"Traditional marketing relies on models based solely on customer-vendor interaction and assumes customers act independently of each other," says KXEN’s founder and CEO Roger Haddad. "But the new social network technology in KSN recognizes that customers do indeed interact with each other, and exploits that knowledge to drive up the effectiveness and completeness of marketing and sales campaigns."


I wonder if other analytics vendors are creating /releasing products like these.

Using Web 2.0 for Analytics 2.0

Here is a great video tutorial on You Tube by Zementis, creator of ADAPA,the cloud scoring engine for next gen predictive analytics. You can watch it on the URL or below-



A few weeks back, I was working with the ADAPA engine on a consulting gig, and Ron Ramos, the head of sales mentioned that though they have extensive documentation, they were planning a video tutorial as well on You Tube.

Beats a pdf everytime , doesnt it !!!

I wonder why companies continue to spend huge and I mean huge amounts on white papers and PDFs when they can have much better customer support using a bit of audio, video and even twitter support.

Surprisingly true even for companies working at the cutting edge with other technologies.And the essentially free availability of these tools.


I mean if companies can spend huge amounts for predictive solutions for the big big datasets , why cant they offer some solutions or apps for the web and social media- An exception is KXEN of course with a new Social Network Analysis Module here ).

Imagine a future –

( Example

  • Hello SAS , My code wont run blah blah blah

SAS Support on Twitter..okay do this


  • Hello SPSS, Where Can I find some stuff on Python because I got lost on the website
  • SPSS Support on Skype/Twitter- Dude , do this , click this link !


It is much better than endless rounds of email, aggravation and the list server method is well the users should try and test www.twitter.com for user groups )

Get (or Atleast Try ) Clicky

After writing about various tools on online analytics, a good favorite is Clicky from http://getclicky.com/

It is recommended especially to WordPress users because it has a customized WP Plugin (which means all the fun without any of the code), has an affiliate referral program (it’s a tough economy !) and can help you track individual visitors with a great deal of analytical value.

You can view the www.getclicky.com site for their own benefit list.

Short of actually building a click stream capturing application (which is quite useful for tracking and building models for web mining), clickstream data is extremely helpful in generating insights as it is at a record /visitor level and can be sliced /diced and viewed for your custom insights.

As for building automated models using web data, their is another software from Kxen (www.Kxen.com). They have been around for some time , though losing out on a couple of big chances , but their web scoring module is definitely worth a dekko.