Interview David Smith REvolution Computing

Here is an Interview with REvolution Computing’s Director of Community David Smith.

Our development team spent more than six months making R work on 64-bit Windows (and optimizing it for speed), which we released as REvolution R Enterprise bundled with ParallelR.” David Smith

Ajay -Tell us about your journey in science. In particular tell us what attracted you to R and the open source movement.

David- I got my start in science in 1990 working with CSIRO (the government science organization in Australia) after I completed my degree in mathematics and computer science. Seeing the diversity of projects the statisticians there worked on really opened my eyes to statistics as the way of objectively answering questions about science.

That’s also when I was first introduced to the S language, the forerunner of R. I was hooked immediately; it was just so natural for doing the work I had to do. I also had the benefit of a wonderful mentor, Professor Bill Venables, who at the time was teaching S to CSIRO scientists at remote stations around Australia. He brought me along on his travels as an assistant. I learned a lot about the practice of statistical computing helping those scientists solve their problems (and got to visit some great parts of Australia, too).

Ajay- How do you think we should help bring more students to the fields of mathematics and science-

David- For me, statistics is the practical application of mathematics to the real world of messy data, complex problems and difficult conclusions. And in recent years, lots of statistical problems have broken out of geeky science applications to become truly mainstream, even sexy. In our new information society, graduating statisticians have a bright future ahead of them which I think will inevitably draw more students to the field.

Ajay- Your blog at REVolution Computing is one of the best technical corporate blogs. In particular the monthly round up of new packages, R events and product launches all written in a lucid style. Are there any plans for a REvolution computing community or network as well instead of just the blog.

David- Yes, definitely. We recently hired Danese Cooper as our Open Source Diva to help us in this area. Danese has a wealth of experience building open-source communities, such as for Java at Sun. We’ll be announcing some new community initiatives this summer. In the meantime, of course, we’ll continue with the Revolutions blog, which has proven to be a great vehicle for getting the word out about R to a community that hasn’t heard about it before. Thanks for the kind words about the blog, by the way — it’s been a lot of fun to write. It will be a continuing part of our community strategy, and I even plan to expand the roster of authors in the future, too. (If you’re an aspiring R blogger, please get in touch!)

Ajay- I kind of get confused between what exactly is 32 bit or 64 bit computing in terms of hardware and software. What is the deal there. How do Enterprise solutions from REvolution take care of the 64 bit computing. How exactly does Parallel computing and optimized math libraries in REvolution R help as compared to other flavors of R.

David– Fundamentally, 64-bit systems allow you to process larger data sets with R — as long as you have a version of R compiled to take advantage of the increased memory available. (I wrote about some of the technical details behind this recently on the blog.)  One of the really exciting trends I’ve noticed over the past 6 months is that R is being applied to larger and more complex problems in areas like predictive analytics and social networking data, so being able to process the largest data sets is key.

One common mis perception is that 64-bit systems are inherently faster than their 32-bit equivalents, but this isn’t generally the case. To speed up large problems, the best approach is to break the problem down into smaller components and run them in parallel on multiple machines. We created the ParallelR suite of packages to make it easy to break down such problems in R and run them on a multiprocessor workstation, a local cluster or grid, or even cloud computing systems like Amazon’s EC2 .

” While the core R team produces versions of R for 64-bit Linux systems, they don’t make one for Windows. Our development team spent more than six months making R work on 64-bit Windows (and optimizing it for speed), which we released as REvolution R Enterprise bundled with ParallelR. We’re excited by the scale of the applications our subscribers are already tackling with a combination of 64-bit and parallel computing”

Ajay-  Command line is oh so commanding. Please describe any plans to support or help any R GUI like rattle or R Commander. Do you think Revolution R can get more users if it does help a GUI.

David- Right now we’re focusing on making R easier to use for programmers by creating a new GUI for programming and debugging R code. We heard feedback from some clients who were concerned about training their programmers in R without a modern development environment available. So we’re addressing that by improving R to make the “standard” features programmers expect (like step debugging and variable inspection) work in R and integrating it with the standard environment for programmers on Windows, Visual Studio.

In my opinion R’s strength lies in its combination of high-quality of statistical algorithms with a language ideal for applying them, so “hiding” the language behind a general-purpose GUI negates that strength a bit, I think. On the other hand it would be nice to have an open-source “user-friendly” tool for desktop statistical analysis, so I’m glad others are working to extend R in that area.

Ajay- Companies like SAS are investing in SaaS and cloud computing. Zementis offers scored models on the cloud through PMML. Any views on just building the model or analytics on the cloud itself.

David- To me, cloud computing is a cost-effective way of dynamically scaling hardware to the problem at hand. Not everyone has access to a 20-machine cluster for high-performing computing — and even those that do can’t instantly convert it to a cluster of 100 or 1000 machines to satisfy a sudden spike in demand. REvolution R Enterprise with ParallelR is unique in that it provides a platform for creating sophisticated data analysis applications distributed in the cloud, quickly and easily.

Using clouds for building models is a no-brainer for parallel-computing problems: I recently wrote about how parallel backtesting for financial trading can easily be deployed on Amazon EC2, for example. PMML is a great way of deploying static models, but one of the big advantages of cloud computing is that it makes it possible to update your model much more frequently, to keep your predictions in tune with the latest source data.

Ajay- What are the major alliances that REvolution has in the industry.

David- We have a number of industry partners. Microsoft and Intel, in particular, provide financial and technical support allowing us to really strengthen and optimize R on Windows, a platform that has been somewhat underserved by the open-source community. With Sybase, we’ve been working on combing REvolution R and Sybase Rap to produce some exciting advances in financial risk analytics. Similarly, we’ve been doing work with Vhayu’s Velocity database to provide high-performance data extraction. On the life sciences front, Pfizer is not only a valued client but in many ways a partner who has helped us “road-test” commercial grade R deployment with great success.

Ajay- What are the major R packages that REvolution supports and optimizes and how exactly do they work/help?

David- REvolution R works with all the R packages: in fact, we provide a mirror of CRAN so our subscribers have access to the truly amazing breadth and depth of analytic and graphical methods available in third-party R packages. Those packages that perform intensive mathematical calculations automatically benefit from the optimized math libraries that we incorporate in REvolution R Enterprise. In the future, we plan to work with authors of some key packages provide further improvements — in particular, to make packages work with ParallelR to reduce computation times in multiprocessor or cloud computing environments.

Ajay- Are you planning to lay off people during the recession. does REvolution Computing offer internships to college graduates. What do people at REvolution Computing do to have fun?

David- On the contrary, we’ve been hiring recently. We don’t have an intern program in place just yet, though. For me, it’s been a really fun place to work. Working for an open-source company has a different vibe than the commercial software companies I’ve worked for before. The most fun for me has been meeting with R users around the country and sharing stories about how R is really making a difference in so many different venues — over a few beers of course!


David Smith
Director of Community

David has a long history with the statistical community.  After graduating with a degree in Statistics from the University of Adelaide, South Australia, David spent four years researching statistical methodology at Lancaster University (United Kingdom), where he also developed a number of packages for the S-PLUS statistical modeling environment. David continued his association with S-PLUS at Insightful (now TIBCO Spotfire) where for more than eight years he oversaw the product management of S-PLUS and other statistical and data mining products. David is the co-author (with Bill Venables) of the tutorial manual, An Introduction to R , and one of the originating developers of ESS: Emacs Speaks Statistics. Prior to joining REvolution, David was Vice President, Product Management at Zynchros, Inc.

AjayTo know more about David Smith and REvolution Computing do visit http://www.revolution-computing.com and

http://www.blog.revolution-computing.com
Also see interview with Richard Schultz ,­CEO REvolution Computing here.

http://www.decisionstats.com/2009/01/31/interviewrichard-schultz-ceo-revolution-computing/

White Riders

Here is a nice company started by a fellow batchmate from the Indian Institute of Management, Kaustubh Mishra. It is called White Riders- It is a relative pioneer in adventure travel. Note these bikers are well behaved MBA’s and imparting Team Building Management lessons along the way. I caught up with Kaustubh long enough for him to tell me why he chose the adventure travel business.

km1
Ajay – What has been the story of your career and what message would you like to send to young people aspiring for MBA’s or just starting their careers?

Kaustubh- My first job was as a peon with SPCA, handling paperwork, dishes, etc. My Father wanted to see me getting a bicycle from my own money and that is why it happened. Thanks to Papa, I learnt some important lessons while serving people. During graduation I was doing odd jobs like a faculty at a computer institute, freelance programmer, etc.

The first experience of a large organization came @ Bharti Telecom, where I did my summers. It was a market research project and I remember sleeping in an interviewee’s cabin during a survey. After my PGDM from IIML, I got into Tech Pacific, and then ICICI then ABN AMRO. Please visit my linkedin profile for more details

My message to people doing their MBA is simple – MBA is not the end, it is just a via media for you to get into a good career. Get into an MBA because YOU want to do it and not because everyone else is doing it. There are so many careers options in front of you, follow your heart.

For people starting their careers, just 7 words – realize the power within & follow your dreams.

Ajay- Why did you create a startup? Why did you name it White Collar company ( there was an ad of a business school reunion which had the same name). What is your vision for White Collar Company
Kaustubh
– When I was doing my job, I was always over achieving targets, but after some time a rut sets in. I also realized that complete freedom and maximum returns for my efforts were absent. There were so many things, ideas, etc simmering inside me but I could not do anything inside. To do all that, I had to venture on my own and venture I did. So the biggest reason I started my own company was to put my ideas into practice.

White Collar is a name generally associated with knowledge. I first wanted to name it ‘white’ but the name, domain name, trademarks etc were not available. White denotes knowledge. Our goddess of knowledge and learning ‘Saraswati’ is dressed in white. As all my ventures are essentially about knowledge and learning, so white collar. And White Collar Biker sounds cool and very oxymoronish.

I see White Collar Company to be known as the cradle of new ideas, innovation and creativity in the field of knowledge. A university is next in some years.

Ajay- What are the key learnings that you have learnt in this short period? name some companies in the United States that are similar to your company. What do you think is the market potential of this segment.
Kaustubh-
We are 3 industries – adventure tourism, corporate training and hr advisory. While in the first and the last there are people doing nearly the same thing (I would not say exactly, because we do have our USPs) in corporate training – White Collar Company is the only company in the world conducting management training through motorcycles

With innovation and RoI being extremely important in training, the market potential is huge. In adventure tourism also the potential is great as we are waking upto it. In consultancy as we operate in SME space, the potential again is very large.

It has been a short period to have big learning, but I have been applying learning I had in my previous jobs to this like vendor management, marketing channel management, etc. But yes, I learnt the art of hard bargain and negotiations during this short period.

Ajay- Is an MBA (IIM or Otherwise) necessary for success. Comments please.
Kaustubh-Ajay, your question here says success. Before answering this question, I would first differentiate between 2 successes we are talking about. Success in corporate life is different from success as a entrepreneur.

For being successful as a corporate executive, MBA to a certain extent is good. It gives you certain kind of thought processes and also a platform for future success.

However, if we talk of a successful entrepreneur, I personally do not think MBA will matter much. In fact I often talk of the ‘1st of the month’ syndrome – this is the comfort of getting a handsome amount deposited as your salary every month. When you get into that comfort zone, it becomes very hard to come out. Larger the amount, harder it gets. For a successful entrepreneur – perseverance, self belief, ability of trust and ability to take risk is very important. I doubt if any MBA is going to give you that. The very same thought processes, way of thinking that help you succeed in corporate life, need to be challenged as an entrepreneur.

Ajay- Whats your vision for your web site. Which website is a good analogy for it? Why should anyone visit the website?

Kaustubh- I am not a technical person, but having said that, I see my website to be the focal point of my business. I myself built my website using widgets, etc and going forward all my business will happen from the site. By 2010, we will put a strong CRM and PRM on the website, thus enabling all business processes to be routed through the website. Like I said, I am not a techie, but I think Web 2.0, participative nature of the internet and cloud computing are going to help me save and optimize. We already have an online chat built in site, any customer can come and get more details about our programs.

Going forward, customers will be able to do bookings themselves on the site. Vendors will be able to log in do all necessary business through website and we plan to implement SFA for our employees. I believe this answers the vision and why should anyone visit my site.

Ajay-What is your favorite incident in this short period of your startup. What were the key learnings. Are you seeking venture capital funds.

Kaustubh- For customers, I thought the typical profile that will come will be young males, I was delighted when a female became our first customer. We have tweaked our marketing strategy and positioning after that.

At this stage my baby is too young and fragile. If I give her crutches to walk, she will never be able to stand up herself and be counted. So while we will go for external funding at some point of time, that time is not now. With our kind of business model, right now we are not ready for the interference of a venture capitalist.

dsc01530-300x224

So if you always wanted to travel to India and have an adventure as well contact Kaustubh at http://www.wccindia.com/rider/R_kaustubh.html and he will show you to be a White Rider too.

02020022_jpg

Read more about his company here – http://www.wccindia.com/rider/whywhite.html

Interview Dominic Pouzin Data Applied

Here is an interview with Dominic Pouzin, CEO of http://www.data-applied.com which is a startup making waves in the fields of Data Visualization.
meAjay – Describe your career in applied science. What made you decide to pursue a career in science? Some people think that careers in science are boring. How would you convince a high school student to choose a career in science?

Dominic- It’s important to realize that we are surrounded by products of science and engineering. By products of science, I mean bridges we cross on our way to work, video games we play for entertainment, or even the fabric of clothes we wear. Anyone who is curious should want to know how things really work. In that case, a scientific education makes sense, because it provides the tools necessary to understand and improve our world. I would also argue that a scientific training can also be a stepping stone towards high levels of achievements in other fields. For example, to become a financial wizard, a top patent attorney, or direct large clinical trials, a scientific education serves as a strong foundation. In addition, it’s probably easier to switch from science to another field than the other way round. Who wants to learn about matrix calculus in their forties? In my case, I graduated with a Masters in Computer Science degree, and spent 10 years at Microsoft leading software development teams for the Windows server, Exchange server, and Dynamics CRM product lines. I wish that, along the way, I had found time for a PhD in data mining, but years of practical software engineering experience also has its advantages.

Ajay- What advice would you give to someone who just got laid off, and is pondering whether he should / should not start a business?

Dominic- Working for a large company used to mean trading some autonomy for more stability and access to a wide array of resources. However, in this economy, the terms of the equation have changed. Many workers who lost their jobs found that this stability had disappeared. Others found that resources have become scarcer due to shrinking budgets. With this shift in the balance, entrepreneurship starts becoming more appealing.

Creating your own business might sound daunting, but for example creating a US Washington State LLC takes about 15 minutes, costs 200 dollars, and only requires an Internet connection. Managing payroll may sound like a big headache, but again specialized companies can handle all payroll matters on your behalf for only a few dollars a month. So while this part is relatively easy, you also need two things which are more difficult to come by:

a/ an unshakable belief in what you are trying to achieve, and

b/ a willingness to handle anything that comes your way.

You need to think like a commando solider who just landed on a beach: you’ve got great skills, but you’re alone, and can’t afford to fail. Practically, you may find yourself working for weeks or months with little or no income, and friends and family thinking that you are wasting your time. So, if necessary, try finding a co-founder to boost your confidence and motivate one another. Also, unless you want to spend most of your time chasing people for money, personal savings are a must.

Ajay- So describe your company. How does data visualization work? What differentiates your company from so many data visualization companies?

Dominic- We’re trying to stir things up a bit in terms of making it easier for regular business users to benefit from data mining. For example, we enable new “BI in the cloud” scenarios by allowing users to simply point a browser to access analysis results, or by allowing applications to submit and analyze data using an XML-based API. Built-in collaboration features, and more interactive visualizations, are also definitely part of our story.

Finally, while we focus on data mining (ex: time series forecasting, association rule mining, decision trees, etc.), we also make available other things such as pivot charts or tree maps. No data mining algorithm there, but why should business users care as long as the insight is there?

dataapplied_overview-500x326

To answer your question about visualization, most packages offer basic features such as the ability to pick colors, or to change labels, etc. For differences to emerge, you have to ask the right questions.

*

Access: does visualization require an application to be installed on each computer? Our visualization work directly from a web page, so there is nothing to install (and upgrades are automatic).
*

Search: can visualization results be searched, so as to enable drill-down scenarios? In the age of Google, we enable search everywhere, so that views can be constrained to what the user is looking for.
*

Collaboration: can visualization results be tagged using comments, or shared with other users while securely controlling access, etc.? Visualization is only a starting point – chances are that you will need to talk to someone before analysis is complete – so we offer plenty of collaboration features.
*

Export:how easy is it for a business user to present analysis results to management in a way that is understandable? We make it easy to export visualization content to a shared gallery, and as presentation-ready images.

There are a couple of other things we do as well in terms of interaction (ex: zoom, select, focus, smart graph layout), and a couple we don’t have yet (ex: geo-mapping, export to PDF).

But in conclusion, I would say that useful data visualization is as much about the way you present data (and that must be compelling!), as it is about how one accesses, searches, secures, shares, or exports visualizations.

Ajay- The technology sector was hit the hardest by the immigration of skilled workers. As a technology worker, what do you have to say about immigration? What do you have to say about outsourcing? Do you have any plans for selling your products outside the United States?

Dominic- I am a US permanent resident, half French, half British, and my wife is Indian. So you won’t find it surprising to hear that I am in favor of immigration. In 1996, as an engineering student in France, I made the unusual choice to study one year at the Indian Institute of Technology (Delhi).

In fact, I was the only one in my engineering college (France’s largest) to select India as a destination (my friends all went to the US, UK, Australia, Germany, etc.). Now that India has become a recognized player in the IT field, several dozen students from the same engineering college chose India as a destination. So I guess the immigration is starting to flow both ways!

Also, among the people I used to work with at Microsoft and who left to start a company, a good proportion are immigrants. So it’s important to recognize that immigrants not only help fill high-tech positions, but also create jobs.

Finally, as an entrepreneur trying to keep costs low, outsourcing is a tool you can’t afford to ignore. For example, websites such as http://www.elance.com provide easy access to the global marketplace. For those worried about quality, it’s possible to review customer ratings and portfolios. We keep track of visitors coming to our website, and the majority of the visitors to date have been from outside the US.

Ajay-  What is the basic science used by your company’s product?

Dominic – We use a client / server model. On the server, at the lowest level, we use SQL databases (accessed using ODBC), acting as data and configuration repositories.

Immediately above that sits a computing layer, which offers scalable, distributed data mining algorithms. We implement algorithms which scale well with the number of rows and attributes, but also properly handle a mix of discrete / numeric / missing values.

For example, just for clustering, the literature has some incredibly powerful algorithms (ex: WaveCluster, an algorithm based on wavelet transforms), but which also fail as soon as you enter real-world situations (ex: some fields are discrete).

On top of the computing layer sits a rich, secure web-based XML API, which allows users to manipulate analysis and collaboration objects, while enforcing security.

For the client, we built a web-based visualization application using Microsoft Silverlight. To ensure client / server communications are as efficient as possible, we use a fair amount of data compression and caching.

Ajay-  Who are your existing clients and what is the product launch plan for next year?

Dominic- We’re only in alpha mode right now, so our next customers are in fact beta testers. We’re still busy adding new features. It’s good to be small and nimble, it allows us to move quickly. Sorry, I can’t confirm any launch date yet!

Ajay-  What does the CEO of a startup company do, when he has free time (assuming he has any)?

Dominic- When you spend most of your time working on analytics, it’s sometimes hard to leave your analytical brain at work.

For example, I am sure that readers who come to your website and visit a casino can’t help themselves and immediately start calculating the exact odds of winning (instead of just having fun).

Among other things, I enjoy challenging friends to programming puzzles (actually, they’re recycled Microsoft interview questions). My current bedtime reading is a book about data compression. I think you got the picture!

******************************************************************

Dominic is currently making promising data visualization products at http://data-applied.com/ .To read more about him, please visit his profile page http://www.analyticbridge.com/profile/DominicPouzin

Interview Ron Ramos, Zementis

 HeadShot Here is an interview with Ron Ramos, Director , Zementis. Ron Ramos wants to use put predictions for the desktop and servers to the remote  cloud using Zementis ADAPA scoring solution. I have tested the ADAPA solution myself and made some suggestions on tutorials. Zementis is a terrific company with a great product ADAPA and big early mover advantage ( see http://www.decisionstats.com/?s=zementis for the Zementis 5 minute video and earlier interview a few months back with Michael Zeller, a friend, and CEO of Zementis. )

Ajay- Describe your career journey. How would you motivate your children or young people to follow careers in science or at least to pay more attention to science subjects. What advice would you give to young tech entrepreneurs in this recession- the ones chasing dreams on iMobile Applications, cloud computing etc.

Ron- Science and a curious mind go together. I remember when I first met a friend of mine who is a professor of cognitive sciences at the University of California. To me, he represents the quest for scientific knowledge. Not only has he been studying visual space perception, visual control of locomotion, and spatial cognition, but he is also interested in every single aspect of the world around him. I believe that if we are genuinely interested and curious to know how and why things are the way they are, we are a step closer into appreciating and willing to participate in the collective quest for scientific knowledge.

Our current economic troubles are not affecting a single industry. The problem is widespread. So, tech entrepreneurs should not view this recession as target towards technology. It is new technology in clean, renewable fuels which will most probably define what is to come. I am also old enough to know that everything is cyclical and so, this recession will lead us to great progress. iMobile Applications and Cloud Computing are here to stay since these are technologies that just make sense. Cloud Computing benefits from the pay-as-you-go model, which because of its affordability is bound to allow for the widespread use and availability of computing where we have not seen before.

The most interesting and satisfying effect one can have is transformation – do that which changes people’s lives, and your own at the same time.  I like the concept of doing well and doing good at the same time.  My emphasis has always marketing and sales in every business in which I have been involved.  ADAPA provides for delivering on the promise of predictive analytics – decisioning in real-time.

Ajay-  How do you think Cloud Computing will change the modeling deployment market by 2011. SAS Institute is also building a 70 million dollar facility for private clouds. Do you think private clouds with tied in applications would work.

Ron- Model deployment in the cloud is already a reality. By 2011, we project that most models will be deployed in the cloud (private or not). With time though, private clouds will most probably need to embrace the use of open standards such as PMML. I believe open standards such as PMML, which allows for true interoperability, will become widespread among the data mining community; be used in any kind of computing environment; and, be moved from cloud to cloud.

Ajay- I am curious- who is Zementis competition in cloud deployed models. Where is ADAPA deployment NOT suitable for scoring models – what break off point does size of data make people realize that cloud is better than server. Do you think Internal Organization IT Support teams fear cloud vendors would take their power away.

Ron- Zementis is the first and only company to provide a scoring engine on the cloud. Other data mining companies have announced their intention to move to cloud computing environments. The size of the data you need to score is not something that should be taken into account for determining if scoring should be done in the cloud or not. In ADAPA, models can uploaded and managed through an intuitive web console and all virtual machines can be launched or terminated with the click of a mouse. Since ADAPA instances run from $0.99/hour, it can appeal to small and large scoring jobs. For small, the cost is minimal and deployment of models is fast. For large, the cloud offers scalability. Many ADAPA instances can be set to run at the same time.

 

Cloud computing is changing the way models are deployed, but all organizations still need to manage their data and so IT can concentrate on that. Scoring on the cloud makes the job of IT easier.

Ajay- Which is a case where ADAPA deployment is not suited. Software like from KXEN offers model export into many formats like PMML, SQL, C++ , SAS etc. Do you think Zementis would be benefited if it had such a converter like utility/collection of utilities on its site for the PMML conversion say from SAS code to PMML code etc. Do you think PMML is here to stay for a long time.

Ron- Yes, PMML is here to stay. Version 4.0 is about to be release. So, this is a very mature standard embraced by all leading data mining vendors. I believe the entire community will benefit from having converters to PMML, since it allows for models to be represented by an open and well documented standard. Also, since different tools already import and export PMML, data miners and modelers are the set free to move their models around. True interoperability!

Ajay – Name some specific customer success stories and costs saved.

Ron – As a team, we spent our early development time working on assignments in the mortgage business.  That’s what gave rise to the concept of ADAPA – enabling smart decisions as an integral part of the overall business strategy.  It became obvious to us that we were in fact totally horizontal with application in any industry that had knowledge to be gained from its data.  If only they could put their artful predictive models to work – easily integrated and deployed, able to be invoked directly from the business’ applications using web services, with returned results downloaded for further processing and visualization.  There is no expensive upfront investment in software licenses and hardware; no long-term extended implementation and time-to-production.  The savings are obvious, the ROI pyrotechnic.

Our current users, both enterprise installations and Amazon EC2 subscribers report great results, and for a variety of good reasons we tend to respect their anonymity:

Zementis ADAPA Case Study #1:

Financial Institution Embraces Real-time Decisions.

Decision Management:  A leading financial company wanted to implement an enterprise-wide decision system to automate credit decisions across Retail, Wholesale, and Correspondent business channels. A key requirement for the company’s Enterprise strategy was to select a solution which could execute and manage rules as well as predictive analytic
s on demand and in real-time. With minimal prior automation in place, the challenge was to execute guidelines and pricing for a variety of business scenarios. Complex underwriting and intricate pricing matrices combined present obstacles for employees and customers in correctly assessing available choices from a myriad of financial products. Although embracing a new processing paradigm, the goal for integration of the solution with the existing infrastructure also was to ensure minimal impact to already established processes and to not jeopardize origination volume.

Following a comprehensive market review, the financial institution selected the Zementis ADAPA Enterprise Edition because of its key benefits as a highly scalable decision engine based on open standards. The ADAPA framework, they concluded, ensures real-time execution capabilities for rules and predictive analytics across all products and all business channels.

Working directly with senior business and IT management, Zementis efficiently executed on an iterative deployment strategy which enabled the joint project team to roll out a comprehensive Retail solution in less than three months. Accessed in Retail offices across the country, the ADAPA decision engine assists more than 700 loan officers to determine eligibility of a borrower with the system instantly displaying conditions or exceptions to guidelines as well as precise pricing for each scenario. The Wholesale division exposes the ADAPA decision engine to a large network of several thousand independent brokers who explore scenarios and submit their applications online. While rules were authored in Excel format, a favorite of many business users, predictive models were developed in various analytics tools and deployed in ADAPA via the Predictive Model Markup Language (PMML) standard. Extending its value across the entire enterprise, ADAPA emerged as the central decision hub for vital credit, risk, pricing, and other operational decisions.

Zementis ADAPA Case Study #2:

Delivering Predictive Analytics in the Cloud.

A specialized consulting firm with a focus on predictive analytics needed a cost-effective, agile deployment framework to deliver predictive models to their clients.  The firm specializes in outsourcing the development of predictive models for their clients, using various tools like R, SAS, and SPSS. Supporting open standards, the natural choice was to utilize the Predictive Model Markup Language (PMML) to transfer the models from the scientist’s development environment to a deployment infrastructure.  One key benefit of PMML is to remain development tool agnostic.  The firm selected the Zementis ADAPA Predictive Analytics Edition on the Amazon Elastic Compute Cloud (Amazon EC2) which provides a scalable, reliable deployment platform based on the PMML standard and Service Oriented Architecture (SOA).

With ADAPA, the firm was able to shorten the time-to-market for new models delivered to clients from months to just a few hours.  In addition, ADAPA enables their clients to benefit from a cost-effective SaaS utility-model, whereby the Zementis ADAPA engine is available on-demand at a fraction of the cost of traditional software licenses, eliminating upfront capital expenditures in both hardware and software. The ADAPA Predictive Analytics Edition has given the firm a highly competitive model delivery process and its clients an unprecedented agility in the deployment and integration of predictive analytics in their business processes.

Zementis ADAPA Case Study #3:

Assessing Risk in Real-Time for On-Line Merchant.

An on-line merchant with millions of customers needed to assess risk for submitted transactions before being sent to a credit-card processor.  Following a comprehensive data analysis phase, several models addressing specific data segments were built in a well-know model development platform.  Once model development is complete, models are exported in the PMML (Predictive Model Markup Language) standard. The deployment solution is the ADAPA Enterprise Edition, using its capabilities for data segmentation, data transformation, and model execution. ADAPA was selected as the optimal choice for deployment, not only because PMML-based models can easily be uploaded and are available for execution in seconds, but also because ADAPA Enterprise edition offers the seamless integration of rules and predictive analytics within a single Enterprise Decision Management solution.

ADAPA was deployed on-site and configured to handle high-volume, mission-critical transactions.  The firm not only leveraged the real-time capabilities of ADAPA, but also its integrated reporting framework.  It was very important for the merchant to assess model impact on credit card transactions on a daily basis. Given that ADAPA allows for reports to be uploaded and managed via its web administration console, the reporting team was able to design new reports, schedule them for routine execution, and send the results in PDF format for analysis to the business department with the required agility. During the implementation of the roll-out strategy, the ADAPA web console and its ease of use allowed for effective management of rules and models as well as active monitoring of deployed models and impact of decisions on the business operation.

 

For More on Zementis see here www.zementis.com

Interview SPSS Olivier Jouve

SPSS recently launched a major series of products in it’s text mining and data mining product portfolio and rebranded data mining to the PASW series. In an exclusive and extensive interview, Oliver Jouve Vice President,Corporate Development at SPSS Inc talks of science careers, the recent launches, open source support to R by SPSS, Cloud Computing and Business Intelligence.

Ajay: Describe your career in Science. Are careers in science less lucrative than careers in business development? What advice would you give to people re-skilling in the current recession on learning analytical skills?

Olivier: I have a Master of Science in Geophysics and Master of Science in Computer Sciences, both from Paris VI University. I have always tried to combine science and business development in my career as I like to experience all aspects � from idea to concept to business plan to funding to development to marketing to sales.

There was a study published earlier this year that said two of the three best jobs are related to math and statistics. This is reinforced by three societal forces that are converging � better uses of mathematics to drive decision making, the tremendous growth and storage of data, and especially in this economy, the ability to deliver ROI. With more and more commercial and government organizations realizing the value of Predictive Analytics to solve business problems, being equipped with analytical skills can only enhance your career and provide job security.

Ajay: So SPSS has launched new products within its Predictive Analytics Software (PASW) portfolio � Modeler 13 and Text Analytics 13? Is this old wine in a new bottle? What is new in terms of technical terms? What is new in terms of customers looking to mine textual information?

Olivier: Our two new products — PASW Modeler 13 (formerly Clementine) and PASW Text Analytics 13 (formerly Text Mining for Clementine) � extend and automate the power of data mining and text analytics to the business user, while significantly enhancing the productivity, flexibility and performance of the expert analyst.

PASW Modeler 13 data mining workbench has new and enhanced functionality that quickly takes users through the entire data mining process � from data access and preparation to model deployment. Some the newest features include Automated Data Preparation that conditions data in a single step by automatically detecting and correcting quality errors; Auto Cluster that gives users a simple way to determine the best cluster algorithm for a particular data set; and full integration with PASW Statistics (formerly SPSS Statistics).

With PASW Text Analytics 13, SPSS provides the most complete view of the customer through the combined analysis of text, web and survey data.   While other companies only provide the text component, SPSS couples text with existing structured data, permitting more accurate results and better predictive modeling. The new version includes pre-built categories for satisfaction surveys, advanced natural language processing techniques, and it supports more than 30 different languages.

Ajay: SPSS has supported open source platforms – Python and R � before it became fashionable to do so. How has this helped your company?

Olivier: Open source software helps the democratization of the analytics movement and SPSS is keen on supporting that democratization while welcoming open source users (and their creativity) into the analytics framework.

Ajay: What are the differences and similarities between Text Analytics and Search Engines? Can we mix the two as well using APIs?

Olivier: Search Engines are fundamentally top-down in that you know what you are looking for when launching a query. However, Text Analytics is bottom-up, uncovering hidden patterns, relationships and trends locked in unstructured data � including call center notes, open-ended survey responses, blogs and social networks. Now businesses have a way of pulling key concepts and extracting customer sentiments, such as emotional responses, preferences and opinions, and grouping them into categories.

For instance, a call center manager will have a hard time extracting why customers are unhappy and churn by using a search engine for millions of call center notes. What would be the query? But, by using Text Analytics, that same call center agent will discover the main reasons why customers are unhappy, and be able to predict if they are going to churn.

Ajay: Why is Text Analytics so important?  How will companies use it now and into the future?
Olivier –
Actually, the question you should ask is, “Why is unstructured data so important?” Today, more than ever, people love to share their opinions — through the estimated 183 billion emails sent, the 1.6 million blog posts, millions of inquiries captured in call center notes, and thousands of comments on diverse social networking sites and community message boards. And, let�s not forget all data that flows through Twitter. Companies today would be short-sighted to ignore what their customers are saying about their products and services, in their own words. Those opinions � likes and dislikes � are essential nuggets and bear much more insights than demographic or transactional data to reducing customer churn, improving satisfaction, fighting crime, detecting fraud and increasing marketing campaign results.

Ajay: How is SPSS venturing into cloud computing and SaaS?

Olivier: SPSS has been at the origin of the PMML standard to allow organizations to provision their computing power in a very flexible manner � just like provisioning computing power through cloud computing. SPSS strongly believes in the benefits of a cloud computing environment, which is why all of our applications are designed with Service Oriented Architecture components.  This enables SPSS to be flexible enough to meet the demands of the market as they change with respect to delivery mode.  We are currently analyzing business and technical issues related to SPSS technologies in the cloud, such as the scoring and delivery of analytics.  In regards to SaaS, we currently offer hosted services for our PASW Data Collection (formerly Dimensions) survey research suite of products.

Ajay: Do you think business intelligence is an over used term? Why do you think BI and Predictive Analytics failed in mortgage delinquency forecasting and reporting despite the financial sector being a big spender on BI tools?

Oliver: There is a big difference between business intelligence (BI) and Predictive Analytics. Traditional BI technologies focus on what�s happening now or what�s happened in the past by primarily using financial or product data. For organizations to take the most effective action, they need to know and plan for what may happen in the future by using people data � and that�s harnessed through Predictive Analytics.

Another way to look at it � Predictive covers the entire capture, predict and act continuum � from the use of survey research software to capture customer feedback (attitudinal data), to creating models to predict customer behaviors, and then acting on the results to improve business processes. Predictive Analytics, unlike BI, provides the secret ingredient and answers the question, �What will the customer do next?�

That being said, financial institutions didn�t need to use Predictive Analytics to see
that some lenders sold mortgages to unqualified individuals likely to default. Predictive Analytics is an incredible application used to detect fraud, waste and abuse. Companies in the financial services industry can focus on mitigating their overall risk by creating better predictive models that not only encompass richer data sets, but also better rules-based automation.

Ajay: What do people do at SPSS to have fun when they are not making complex mathematical algorithms?
Oliver: SPSS employees love our casual, friendly atmosphere, our professional and talented colleagues, and our cool, cutting-edge technology. The fun part comes from doing meaningful work with great people, across different groups and geographies. Of course being French, I have ensured that my colleagues are fully educated on the best wine and cuisine. And being based in Chicago, there is always a spirited baseball debate between the Cubs and White Sox. However, I am yet to convince anyone that rugby is a better sport.

Biography

Olivier Jouve is Vice President, Corporate Development, at SPSS Inc. He is responsible for defining SPSS strategic directions, growth opportunities through internal development, merger and acquisitions and/or tactical alliances. As a pioneer in the field of data and text mining for the last 20 years, he has created the foundation of Text Analytics technology for analyzing customer interactions at SPSS. Jouve is a successful serial entrepreneur and has had his works published internationally in the area of Analytical CRM, text mining, search engines, competitive intelligence and knowledge management.

Interview KNIME Fabian Dill

fabian We have covered KNIME.com ‘s open source platform earlier. On the eve of it’s new product launch, co-founder of Knime.com Fabian Dill reveals his thoughts in an exclusive interview.

From the Knime.com website

The modular data exploration platform KNIME, originally solely developed at the University of Konstanz, Germany, enables the user to visually create data flows – or pipelines, execute selected analysis steps, and later investigate the results through interactive views on data and models. KNIME already has more than 2,000 active users in diverse application areas, ranging from early drug discovery and customer relationship analysis to financial information integration.

Ajay – What prompted you personally to be part of KNIME and not join a big technology  company?  What does the future hold for KNIME in 2009-10?

Fabian -I was excited when I first joined the KNIME team in 2005. Back then, we were working exclusively on the open source version backed by some academic funding. Being part of the team that put together such a professional data mining environment from scratch was a great experience. Growing this into a commercial support and development arm has been a thrill as well. The team and the diverse experiences gained from helping get a new company off the ground and being involved in everything it takes to enable this to be successful made it unthinkable for me to work anywhere else.

We continue to develop the open source arm of KNIME and many new features lie ahead: text, image, and time series processing as well as better support for variables. We are constantly working on adding new nodes. KNIME 2.1 is expected in the fall and some of the ongoing development can already be found on the KNIME Labs page (http://labs.knime.org)

The commercial division is providing support and maintenance subscriptions for the freely available desktop version. At the same time we are developing products which will streamline the integration of KNIME into existing IT infrastructures:

  • the KNIME Grid Support lets you run your compute-intensive (sub-) workflows or nodes on a grid or cluster;

  • KNIME Reporting makes use of KNIME’s flexibility in order to gather the data for your report and provides simplified views (static or interactive=dashboards) on the resulting workflow and its results; and

  • the KNIME Enterprise Server facilitates company-wide installation of KNIME and supports collaboration between departments and sites by providing central workflow repositories, scheduled and remote execution, and user rights management.

Ajay -Software as a service and Cloud Computing is the next big thing in 2009. Are there any plans to put KNIME on a cloud computer and charge clients for the hour so they can build models on huge data without buying any hardware but just rent the time?

Fabian – Cloud computing is an agile and client-centric approach and therefore fits nicely into the KNIME framework, especially considering that we are already working on support for distributed computing of KNIME workflows (see above). However, we have no immediate plans for KNIME workflow processing on a per-use charge or similar. That’s an interesting idea, though. The way KNIME nodes are nicely encapsulated (and often even distributable themselves) would make this quite natural.

Ajay – What differentiates KNIME from other products such as RPro and Rapid Miner, for example? What are the principal challenges you have faced in developing it? Why do customers like and dislike it?

Fabian- Every tool has its strengths and weaknesses depending on the task you actually want to accomplish. The focus of KNIME is to support the user during his or her quest of understanding large and heterogeneous data and to make sense out of it. For this task, you cannot rely only on classical data mining techniques, wrapping them into a command line or otherwise configurable environment, but simple, intuitive access to those tools is required in addition to supporting visual exploration with interactive linking and brushing techniques.

By design, KNIME is a modular integration platform, which makes it easy to write own nodes (with the easy-to-use API) or integrate existing libraries or tools.

We integrated Weka, for example, because of its vast library of state-of-the-art machine learning algorithms, the open source program R – in order to provide access to a rich library of statistical functions (and of course many more) – and parts of the Chemistry Development Kit (CDK). All these integrations follow the KNIME requirements for easy and intuitive usage so the user does not need to understand the details of each tool in great depth.

A number of our commercial partners such as Schroedinger, Infocom, Symyx, Tripos, among others, also follow this paradigm and similarly integrate their tools into KNIME. Academic collaborations with ETH Zurich, Switzerland on the High Content Screening Platform HC/DC represent another positive outcome of this open architecture. We believe that this strictly result-oriented approach based on a carefully designed and professionally coded framework is a key factor of KNIME’s broad acceptance. I guess this is another big differentiator: right from the start, KNIME has been developed by a team consisting of SW developers with decades of industrial SW engineering experience.

Ajay – Any there any Asian plans for KNIME? Any other open source partnerships in the pipeline?

Fabian – We have a Japan-based partner, Infocom, who operates in the fields of life science. But we are always open for other partnerships, supporters, or collaborations.

In addition to the open source integrations mentioned above (Weka, R, CDK, HC/DC), there are many other different projects in the works and partnerships under negotiation. Keep an eye on our blog and on our Labs@KNIME page (labs.knime.org).

ABOUT

KNIME – development started in January 2004. Since then: 10 releases; approx. 350,000 lines of code; 25,000 downloads; an estimated 2000 active users. KNIME.com was founded in June 2008 in Zurich, Switzerland.

Fabian Dill – has been working for and with KNIME since 2005; co-founder of KNIME.com.

Interview Visual Numerics Alicia McGreevey

alicia

Here is an interview with the head of marketing of Visual Numerics, Alicia McGreevey.

Visual Numerics® is the leading provider of data analysis software, visualization solutions and expert consulting for technical, business and scientific communities worldwide (see http://www.vni.com ).

Ajay – Describe your career in science so far. How would explain embeddable analytics to a high school student who has to decide between getting a MBA or a Science degree.

Alicia – I think of analytics as analyzing a situation so you can make a decision. To do that objectively, you need data about your situation. Data can be anything: foreign currency exchange rates, the daily temperature here in Houston, or Tiger Wood’s record at the Master’s tournament when he’s not leading after the 3rd round.

Embedding analytics is simply making the analysis part of an application close to, or embedded with, your data. As an example, we have a customer in Germany, GFTA (Gesellschaft Fuer Trendanalysen), who has built an application that embeds analytics to analyze historic and live tick foreign exchange rate data. Their application gives treasuries and traders predictions on what is about to happen to exchange rates so they can make good decisions on when to buy or sell.

Embedding analytics is as much a business discipline as it is science. Historically, our analytics have been used predominantly by the government and scientific community to perform heavy science and engineering research. As business intelligence becomes increasingly important to compete in today’s marketplace, our analytics can now be found driving business decisions in industries like financial services, healthcare and manufacturing. Partners like Teradata and SAP are embedding our analytics into their software as a way to extend their current offerings. As their customers demand more custom BI solutions to fit unique data sets, our analytics provide a more affordable approach to meet that need. Customers now have an option to implement custom BI without incurring the massive overhead that you would typically find in a one-size-fits-all solution.

If you’re a student, I’d recommend you invest time and course work in the area of analytics regardless of the discipline you choose to study. The term analytics is really just a fancy term for math and statistics. I’ve taken math and statistics courses as part of a science curriculum and as part of a business curriculum. Being able to make optimal decisions by objectively analyzing data is a skill that will help you in business, science, engineering, or any area.

Ajay – You have been working behind the scenes quietly building math libraries that power many partners. Could you name a few success stories so far.

Alicia – One of the most interesting things about working at Visual Numerics is our customers. They create fascinating analytic applications using mathematic and statistical functions from our libraries. A few examples:

  • Total, who you probably know as one of the world’s super major oil companies, uses our math optimization routines in an application that automatically controls the blending of components in the production of gasoline, diesel and heavy fuels. By making best use of components, Total helps minimize their refining costs while maximizing revenue.

  • The Physics Department at the University of Kansas uses nonlinear equation solvers from our libraries to develop more efficient particle beam simulations. By simulating the behavior of particle beams in particle accelerators, scientists can better design particle accelerators, like the LHC or Large Hadron Collider, for high-energy research.

  • A final example that I think is interesting, given the current economic situation, is from one of our financial customers RiskMetrics Group. RiskMetrics uses functions from our libraries to do financial stress testing that allows portfolio fund managers simulate economic events, like the price of oil spiking 10% or markets diving 20%. They use this information to predict impacts on their portfolio and make better decisions for their clients.

Ajay – What have been the key moments in Visual Numerics path so far.

Alicia – Our company has been in business for over 38 years, rooted in the fundamentals of mathematics and statistics. It started off as IMSL, offering IMSL Numerical Libraries as a high performance computing tool for numerical analysis. Before visualization was fashionable, we saw visualization as an important part of the data analysis process. As a result, the company merged with Precision Visuals, makers of PV-WAVE (our visual data analysis product) in the 1990s to become what is now known as Visual Numerics.

Looking back at recent history, a major event for Visual Numerics was definitely when SAP AG licensed the libraries at the end of 2007. For several years leading up to 2007, we’d seen increased interest in our libraries from independent software vendors (ISVs). More and more ISVs with broad product offerings were looking to provide their customers with analytic capabilities, so we had invested considerably in making the libraries more attractive to this type of customer. Having SAP, one of the largest and most respected ISVs in the world, license our products gave us confidence that we could be a valued OEM partner to this type of customer.

Ajay – What are the key problems you face in your day to day job as a Visual Numerics employee. How do you have fun when not building math libraries.

Alicia – In marketing, our job is to help potential users of our libraries understand what it is we offer so that they can determine if what we offer is of value to them. Often the hardest challenge we face is simply finding that person. Since our libraries are embeddable, they’ve historically been used by programmers. So we’ve spent a lot of time at developer conferences and sponsoring developer websites, journals and academic programs.

One product update this year is that we’ve made the libraries available from Python, a dynamic scripting language. Making IMSL Library functions available from Python basically means that someone who is not a trained programmer can now use the math and stats capabilities in the IMSL Libraries just like a C, Java, .Net or Fortran developer. It’s an exciting development, though brings with it the challenge of letting a whole new set of potential users know about the capabilities of the libraries. It’s a fun challenge though.

On a more fun side of things, you may be interested to know that our expertise in math and statistics led us to some Hollywood fame. At one point in time, we were selected to review scripts for the crime busting drama, NUMB3RS. NUMB3RS, aired on CBS in the US and features an FBI Special Agent who recruits his brilliant mathematician brother to use the science of mathematics with its complex equations to solve the trickiest crimes in Los Angeles. So yes, the math behind the Show is real and it is exciting indeed to see how math can be applied in all aspects of our lives, including ferreting out criminals on TV!

AjayWhat is the story ahead. How do you think Visual Numerics can help demand forecasting and BI to say BYE to the recession.

We’re seeing more success stories from customers using analytics and data to make good decisions and I think the more organizations leverage analytics, the faster we’ll emerge from this economic slump.

As an example, we have a partner, nCode International, who makes software to help manufacturers collect and analyze test data and use the analysis to make design decisions. Using it, automobile manufacturers can, for example, analyze real-world driving pattern data for different geographic areas (e.g., emerging markets like China and India versus established markets like the USA and Europe) and design the perfect vehicle for specific markets.

So the analytic successes are out there and we know that organizations have multitudes of data. Certainly every organization that we work with has more data today than ever before. For analytics to help us say Bye to the recession, I think we need to continue to promote our successes, make analytic tools available to more users, and get users across multiple disciplines and industries using analytics to make the best possible decisions for their organizations.

Personal Biography:

As Director of Marketing for Visual Numerics, Alicia is an authority on how organizations are using advanced analytics to improve performance. Alicia brings over 15 years of experience working with scientists and customers in the planning and development of new technology products and developing go to market plans. She has a B.A. in Mathematics from Skidmore College and an M.B.A. from the University of Chicago Booth School of Business.