Google – Turns the Page

Duderstadt Center "The Dude", which ...
Image via Wikipedia

Meet Google’s new CEO

Larry Page
Co-Founder and President, Products

Larry Page was Google’s founding CEO and grew the company to more than 200 employees and profitability before moving into his role as president of products in April 2001. He continues to share responsibility for Google’s day-to-day operations with Eric Schmidt and Sergey Brin.

The son of Michigan State University computer science professor Dr. Carl Victor Page, Larry’s love of computers began at age six. While following in his father’s footsteps in academics, he became an honors graduate from the University of Michigan, where he earned a bachelor’s degree in engineering, with a concentration on computer engineering. During his time in Ann Arbor, Larry built an inkjet printer out of Lego™ bricks.

While in the Ph.D. program in computer science at Stanford University, Larry met Sergey Brin, and together they developed and ran Google, which began operating in 1998. Larry went on leave from Stanford after earning his master’s degree.

In 2002, Larry was named a World Economic Forum Global Leader for Tomorrow. He is a member of the National Advisory Committee (NAC) of the University of Michigan College of Engineering, and together with co-founder Sergey Brin, Larry was honored with the Marconi Prize in 2004. He is a trustee on the board of the X PRIZE, and was elected to the National Academy of Engineering in 2004.

and no coincidence but it reminded me of the Metallica video- Turn the Page. Forgive the Pun, herr Eric

https://www.youtube.com/watch?v=dOibtqWo6z4

Carole-Ann’s 2011 Predictions for Decision Management

Carole-Ann’s 2011 Predictions for Decision Management

For Ajay Ohri on DecisionStats.com

What were the top 5 events in 2010 in your field?
  1. Maturity: the Decision Management space was made up of technology vendors, big and small, that typically focused on one or two aspects of this discipline.  Over the past few years, we have seen a lot of consolidation in the industry – first with Business Intelligence (BI) then Business Process Management (BPM) and lately in Business Rules Management (BRM) and Advanced Analytics.  As a result the giant Platform vendors have helped create visibility for this discipline.  Lots of tiny clues finally bubbled up in 2010 to attest of the increasing activity around Decision Management.  For example, more products than ever were named Decision Manager; companies advertised for Decision Managers as a job title in their job section; most people understand what I do when I am introduced in a social setting!
  2. Boredom: unfortunately, as the industry matures, inevitably innovation slows down…  At the main BRMS shows we heard here and there complaints that the technology was stalling.  We heard it from vendors like Red Hat (Drools) and we heard it from bored end-users hoping for some excitement at Business Rules Forum’s vendor panel.  They sadly did not get it
  3. Scrum: I am not thinking about the methodology there!  If you have ever seen a rugby game, you can probably understand why this is the term that comes to mind when I look at the messy & confusing technology landscape.  Feet blindly try to kick the ball out while superhuman forces are moving randomly the whole pack – or so it felt when I played!  Business Users in search of Business Solutions are facing more and more technology choices that feel like comparing apples to oranges.  There is value in all of them and each one addresses a specific aspect of Decision Management but I regret that the industry did not simplify the picture in 2010.  On the contrary!  Many buzzwords were created or at least made popular last year, creating even more confusion on a muddy field.  A few examples: Social CRM, Collaborative Decision Making, Adaptive Case Management, etc.  Don’t take me wrong, I *do* like the technologies.  I sympathize with the decision maker that is trying to pick the right solution though.
  4. Information: Analytics have been used for years of course but the volume of data surrounding us has been growing to unparalleled levels.  We can blame or thank (depending on our perspective) Social Media for that.  Sites like Facebook and LinkedIn have made it possible and easy to publish relevant (as well as fluffy) information in real-time.  As we all started to get the hang of it and potentially over-publish, technology evolved to enable the storage, correlation and analysis of humongous volumes of data that we could not dream of before.  25 billion tweets were posted in 2010.  Every month, over 30 billion pieces of data are shared on Facebook alone.  This is not just about vanity and marketing though.  This data can be leveraged for the greater good.  Carlos pointed to some fascinating facts about catastrophic event response team getting organized thanks to crowd-sourced information.  We are also seeing, in the Decision management world, more and more applicability for those very technology that have been developed for the needs of Big Data – I’ll name for example Hadoop that Carlos (yet again) discussed in his talks at Rules Fest end of 2009 and 2010.
  5. Self-Organization: it may be a side effect of the Social Media movement but I must admit that I was impressed by the success of self-organizing initiatives.  Granted, this last trend has nothing to do with Decision Management per se but I think it is a great evolution worth noting.  Let me point to a couple of examples.  I usually attend traditional conferences and tradeshows in which the content can be good but is sometimes terrible.  I was pleasantly surprised by the professionalism and attendance at *un-conferences* such as P-Camp (P stands for Product – an event for Product Managers).  When you think about it, it is already difficult to get a show together when people are dedicated to the tasks.  How crazy is it to have volunteers set one up with no budget and no agenda?  Well, people simply show up to do their part and everyone has fun voting on-site for what seems the most appealing content at the time.  Crowdsourcing applied to shows: it works!  Similar experience with meetups or tweetups.  I also enjoyed attending some impromptu Twitter jam sessions on a given topic.  Social Media is certainly helping people reach out and get together in person or virtually and that is wonderful!

A segment of a social network
Image via Wikipedia

What are the top three trends you see in 2011?

  1. Performance:  I might be cheating here.   I was very bullish about predicting much progress for 2010 in the area of Performance Management in your Decision Management initiatives.  I believe that progress was made but Carlos did not give me full credit for the right prediction…  Okay, I am a little optimistic on timeline…  I admit it…  If it did not fully happen in 2010, can I predict it again in 2011?  I think that companies want to better track their business performance in order to correct the trajectory of course but also to improve their projections.  I see that it is turning into reality already here and there.  I expect it to become a trend in 2011!
  2. Insight: Big Data being available all around us with new technologies and algorithms will continue to propagate in 2011 leading to more widely spread Analytics capabilities.  The buzz at Analytics shows on Social Network Analysis (SNA) is a sign that there is interest in those kinds of things.  There is tremendous information that can be leveraged for smart decision-making.  I think there will be more of that in 2011 as initiatives launches in 2010 will mature into material results.
    5 Ways to Cultivate an Active Social Network
    Image by Intersection Consulting via Flickr
  3. Collaboration:  Social Media for the Enterprise is a discipline in the making.  Social Media was initially seen for the most part as a Marketing channel.  Over the years, companies have started experimenting with external communities and ideation capabilities with moderate success.  The few strategic initiatives started in 2010 by “old fashion” companies seem to be an indication that we are past the early adopters.  This discipline may very well materialize in 2011 as a core capability, well, or at least a new trend.  I believe that capabilities such Chatter, offered by Salesforce, will transform (slowly) how people interact in the workplace and leverage the volumes of social data captured in LinkedIn and other Social Media sites.  Collaboration is of course a topic of interest for me personally.  I even signed up for Kare Anderson’s collaboration collaboration site – yes, twice the word “collaboration”: it is really about collaborating on collaboration techniques.  Even though collaboration does not require Social Media, this medium offers perspectives not available until now.

Brief Bio-

Carole-Ann is a renowned guru in the Decision Management space. She created the vision for Decision Management that is widely adopted now in the industry. Her claim to fame is the strategy and direction of Blaze Advisor, the then-leading BRMS product, while she also managed all the Decision Management tools at FICO (business rules, predictive analytics and optimization). She has a vision for Decision Management both as a technology and a discipline that can revolutionize the way corporations do business, and will never get tired of painting that vision for her audience. She speaks often at Industry conferences and has conducted university classes in France and Washington DC.

Leveraging her Masters degree in Applied Mathematics / Computer Science from a “Grande Ecole” in France, she started her career building advanced systems using all kinds of technologies — expert systems, rules, optimization, dashboarding and cubes, web search, and beta version of database replication – as well as conducting strategic consulting gigs around change management.

She now tweets as @CMatignon, blogs at blog.sparklinglogic.com and interacts at community.sparklinglogic.com.

She started her career building advanced systems using all kinds of technologies — expert systems, rules, optimization, dashboarding and cubes, web search, and beta version of database replication.  At Cleversys (acquired by Kurt Salmon & Associates), she also conducted strategic consulting gigs mostly around change management.

While playing with advanced software components, she found a passion for technology and joined ILOG (acquired by IBM).  She developed a growing interest in Optimization as well as Business Rules.  At ILOG, she coined the term BRMS while brainstorming with her Sales counterpart.  She led the Presales organization for Telecom in the Americas up until 2000 when she joined Blaze Software (acquired by Brokat Technologies, HNC Software and finally FICO).

Her 360-degree experience allowed her to gain appreciation for all aspects of a software company, giving her a unique perspective on the business.  Her technical background kept her very much in touch with technology as she advanced.

She also became addicted to Twitter in the process.  She is active on all kinds of social media, always looking for new digital experience!

Outside of work, Carole-Ann loves spending time with her two boys.  They grow fruits in their Northern California home and cook all together in the French tradition.

profile on LinkedIn

TwitterFollow me on Twitter

Filtering to Gain Social Network Value
Image by Intersection Consulting via Flickr
Social Networks Hype Cycle
Image by fredcavazza via Flickr

Interview Luis Torgo Author Data Mining with R

Example of k-nearest neighbour classification
Image via Wikipedia

Here is an interview with Prof Luis Torgo, author of the recent best seller “Data Mining with R-learning with case studies”.

Ajay- Describe your career in science. How do you think can more young people be made interested in science.

Luis- My interest in science only started after I’ve finished my degree. I’ve entered a research lab at the University of Porto and started working on Machine Learning, around 1990. Since then I’ve been involved generally in data analysis topics both from a research perspective as well as from a more applied point of view through interactions with industry partners on several projects. I’ve spent most of my career at the Faculty of Economics of the University of Porto, but since 2008 I’m at the department of Computer Science of the Faculty of Sciences of the same university. At the same time I’ve been a researcher at LIAAD / Inesc Porto LA (www.liaad.up.pt).

I like a lot what I do and like science and the “scientific way of thinking”, but I cannot say that I’ve always thought of this area as my “place”. Most of all I like solving challenging problems through data analysis. If that translates into some scientific outcome than I’m more satisfied but that is not my main goal, though I’m kind of “forced” to think about that because of the constraints of an academic career.

That does not mean I’m not passionate about science, I just think there are many more ways of “doing science” than what is reflected in the usual “scientific indicators” that most institutions seem to be more and more obsessed about.

Regards interesting young people in science that is a hard question that I’m not sure I’m qualified to answer. I do tend to think that young people are more sensible to concrete examples of problems they think are interesting and that science helps in solving, as a way of finding a motivation for facing the hard work they will encounter in a scientific career. I do believe in case studies as a nice way to learn and motivate, and thus my book 😉

Ajay- Describe your new book “Data Mining with R, learning with case studies” Why did you choose a case study based approach? who is the target audience? What is your favorite case study from the book

Luis- This book is about learning how to use R for data mining. The book follows a “learn by doing it” approach to data mining instead of the more common theoretical description of the available techniques in this discipline. This is accomplished by presenting a series of illustrative case studies for which all necessary steps, code and data are provided to the reader. Moreover, the book has an associated web page (www.liaad.up.pt/~ltorgo/DataMiningWithR) where all code inside the book is given so that easy copy-paste is possible for the more lazy readers.

The language used in the book is very informal without many theoretical details on the used data mining techniques. For obtaining these theoretical insights there are already many good data mining books some of which are referred in “further readings” sections given throughout the book. The decision of following this writing style had to do with the intended target audience of the book.

In effect, the objective was to write a monograph that could be used as a supplemental book for practical classes on data mining that exist in several courses, but at the same time that could be attractive to professionals working on data mining in non-academic environments, and thus the choice of this more practically oriented approach.

Regards my favorite case study that is a hard question for an author… still I would probably choose the “Predicting Stock Market Returns” case study (Chapter 3). Not only because I like this challenging problem, but mainly because the case study addresses all aspects of knowledge discovery in a real world scenario and not only the construction of predictive models. It tackles data collection, data pre-processing, model construction, transforming predictions into actions using different trading policies, using business-related performance metrics, implementing a trading simulator for “real-world” evaluation, and laying out grounds for constructing an online trading system.

Obviously, for all these steps there are far too many options to be possible to describe/evaluate all of them in a chapter, still I do believe that for the reader it is important to see the overall picture, and read about the relevant questions on this problem and some possible paths that can be followed at these different steps.

In other words: do not expect to become rich with the solution I describe in the chapter !

Ajay- Apart from R, what other data mining software do you use or have used in the past. How would you compare their advantages and disadvantages with R

Luis- I’ve played around with Clementine, Weka, RapidMiner and Knime, but really only playing with teaching goals, and no serious use/evaluation in the context of data mining projects. For the latter I mainly use R or software developed by myself (either in R or other languages). In this context, I do not think it is fair to compare R with these or other tools as I lack serious experience with them. I can however, tell you about what I see as the main pros and cons of R. The main reason for using R is really not only the power of the tool that does not stop surprising me in terms of what already exists and keeps appearing as contributions of an ever growing community, but mainly the ability of rapidly transforming ideas into prototypes. Regards some of its drawbacks I would probably mention the lack of efficiency when compared to other alternatives and the problem of data set sizes being limited by main memory.

I know that there are several efforts around for solving this latter issue not only from the community (e.g. http://cran.at.r-project.org/web/views/HighPerformanceComputing.html), but also from the industry (e.g. Revolution Analytics), but I would prefer that at this stage this would be a standard feature of the language so the the “normal” user need not worry about it. But then this is a community effort and if I’m not happy with the current status instead of complaining I should do something about it!

Ajay- Describe your writing habit- How do you set about writing the book- did you write a fixed amount daily or do you write in bursts etc

Luis- Unfortunately, I write in bursts whenever I find some time for it. This is much more tiring and time consuming as I need to read back material far too often, but I cannot afford dedicating too much consecutive time to a single task. Actually, I frequently tease my PhD students when they “complain” about the lack of time for doing what they have to, that they should learn to appreciate the luxury of having a single task to complete because it will probably be the last time in their professional life!

Ajay- What do you do to relax or unwind when not working?

Luis- For me, the best way to relax from work is by playing sports. When I’m involved in some game I reset my mind and forget about all other things and this is very relaxing for me. A part from sports I enjoy a lot spending time with my family and friends. A good and long dinner with friends over a good bottle of wine can do miracles when I’m too stressed with work! Finally,I do love traveling around with my family.

Luis Torgo

Short Bio: Luis Torgo has a degree in Systems and Informatics Engineering and a PhD in Computer Science. He is an Associate Professor of the Department of Computer Science of the Faculty of Sciences of the University of Porto. He is also a researcher of the Laboratory of Artificial Intelligence and Data Analysis (LIAAD) belonging to INESC Porto LA. Luis Torgo has been an active researcher in Machine Learning and Data Mining for more than 20 years. He has lead several academic and industrial Data Mining research projects. Luis Torgo accompanies the R project almost since its beginning, using it on his research activities. He teaches R at different levels and has given several courses in different countries.

For reading “Data Mining with R” – you can visit this site, also to avail of a 20% discount the publishers have generously given (message below)-

For more information and to place an order, visit us at http://www.crcpress.com.  Order online and apply 20% Off discount code 907HM at checkout.  CRC is pleased to offer free standard shipping on all online orders!

link to the book page  http://www.crcpress.com/product/isbn/9781439810187

Price: $79.95
Cat. #: K10510
ISBN: 9781439810187
ISBN 10: 1439810184
Publication Date: November 09, 2010
Number of Pages: 305
Availability: In Stock
Binding(s): Hardback 

Choosing R for business – What to consider?

A composite of the GNU logo and the OSI logo, ...
Image via Wikipedia

Additional features in R over other analytical packages-

1) Source Code is given to ensure complete custom solution and embedding for a particular application. Open source code has an advantage that is extensively peer- reviewed in Journals and Scientific Literature.  This means bugs will found, shared and corrected transparently.

2) Wide literature of training material in the form of books is available for the R analytical platform.

3) Extensively the best data visualization tools in analytical software (apart from Tableau Software ‘s latest version). The extensive data visualization available in R is of the form a variety of customizable graphs, as well as animation. The principal reason third-party software initially started creating interfaces to R is because the graphical library of packages in R is more advanced as well as rapidly getting more features by the day.

4) Free in upfront license cost for academics and thus budget friendly for small and large analytical teams.

5) Flexible programming for your data environment. This includes having packages that ensure compatibility with Java, Python and C++.

 

6) Easy migration from other analytical platforms to R Platform. It is relatively easy for a non R platform user to migrate to R platform and there is no danger of vendor lock-in due to the GPL nature of source code and open community.

Statistics are numbers that tell (descriptive), advise ( prescriptive) or forecast (predictive). Analytics is a decision-making help tool. Analytics on which no decision is to be made or is being considered can be classified as purely statistical and non analytical. Thus ease of making a correct decision separates a good analytical platform from a not so good analytical platform. The distinction is likely to be disputed by people of either background- and business analysis requires more emphasis on how practical or actionable the results are and less emphasis on the statistical metrics in a particular data analysis task. I believe one clear reason between business analytics is different from statistical analysis is the cost of perfect information (data costs in real world) and the opportunity cost of delayed and distorted decision-making.

Specific to the following domains R has the following costs and benefits

  • Business Analytics
    • R is free per license and for download
    • It is one of the few analytical platforms that work on Mac OS
    • It’s results are credibly established in both journals like Journal of Statistical Software and in the work at LinkedIn, Google and Facebook’s analytical teams.
    • It has open source code for customization as per GPL
    • It also has a flexible option for commercial vendors like Revolution Analytics (who support 64 bit windows) as well as bigger datasets
    • It has interfaces from almost all other analytical software including SAS,SPSS, JMP, Oracle Data Mining, Rapid Miner. Existing license holders can thus invoke and use R from within these software
    • Huge library of packages for regression, time series, finance and modeling
    • High quality data visualization packages
    • Data Mining
      • R as a computing platform is better suited to the needs of data mining as it has a vast array of packages covering standard regression, decision trees, association rules, cluster analysis, machine learning, neural networks as well as exotic specialized algorithms like those based on chaos models.
      • Flexibility in tweaking a standard algorithm by seeing the source code
      • The RATTLE GUI remains the standard GUI for Data Miners using R. It was created and developed in Australia.
      • Business Dashboards and Reporting
      • Business Dashboards and Reporting are an essential piece of Business Intelligence and Decision making systems in organizations. R offers data visualization through GGPLOT, and GUI like Deducer and Red-R can help even non R users create a metrics dashboard
        • For online Dashboards- R has packages like RWeb, RServe and R Apache- which in combination with data visualization packages offer powerful dashboard capabilities.
        • R can be combined with MS Excel using the R Excel package – to enable R capabilities to be imported within Excel. Thus a MS Excel user with no knowledge of R can use the GUI within the R Excel plug-in to use powerful graphical and statistical capabilities.

Additional factors to consider in your R installation-

There are some more choices awaiting you now-
1) Licensing Choices-Academic Version or Free Version or Enterprise Version of R

2) Operating System Choices-Which Operating System to choose from? Unix, Windows or Mac OS.

3) Operating system sub choice- 32- bit or 64 bit.

4) Hardware choices-Cost -benefit trade-offs for additional hardware for R. Choices between local ,cluster and cloud computing.

5) Interface choices-Command Line versus GUI? Which GUI to choose as the default start-up option?

6) Software component choice- Which packages to install? There are almost 3000 packages, some of them are complimentary, some are dependent on each other, and almost all are free.

7) Additional Software choices- Which additional software do you need to achieve maximum accuracy, robustness and speed of computing- and how to use existing legacy software and hardware for best complementary results with R.

1) Licensing Choices-
You can choose between two kinds of R installations – one is free and open source from http://r-project.org The other R installation is commercial and is offered by many vendors including Revolution Analytics. However there are other commercial vendors too.

Commercial Vendors of R Language Products-
1) Revolution Analytics http://www.revolutionanalytics.com/
2) XL Solutions- http://www.experience-rplus.com/
3) Information Builder – Webfocus RStat -Rattle GUI http://www.informationbuilders.com/products/webfocus/PredictiveModeling.html
4) Blue Reference- Inference for R http://inferenceforr.com/default.aspx

  1. Choosing Operating System
      1. Windows

 

Windows remains the most widely used operating system on this planet. If you are experienced in Windows based computing and are active on analytical projects- it would not make sense for you to move to other operating systems. This is also based on the fact that compatibility problems are minimum for Microsoft Windows and the help is extensively documented. However there may be some R packages that would not function well under Windows- if that happens a multiple operating system is your next option.

        1. Enterprise R from Revolution Analytics- Enterprise R from Revolution Analytics has a complete R Development environment for Windows including the use of code snippets to make programming faster. Revolution is also expected to make a GUI available by 2011. Revolution Analytics claims several enhancements for it’s version of R including the use of optimized libraries for faster performance.
      1. MacOS

 

Reasons for choosing MacOS remains its considerable appeal in aesthetically designed software- but MacOS is not a standard Operating system for enterprise systems as well as statistical computing. However open source R claims to be quite optimized and it can be used for existing Mac users. However there seem to be no commercially available versions of R available as of now for this operating system.

      1. Linux

 

        1. Ubuntu
        2. Red Hat Enterprise Linux
        3. Other versions of Linux

 

Linux is considered a preferred operating system by R users due to it having the same open source credentials-much better fit for all R packages and it’s customizability for big data analytics.

Ubuntu Linux is recommended for people making the transition to Linux for the first time. Ubuntu Linux had an marketing agreement with revolution Analytics for an earlier version of Ubuntu- and many R packages can  installed in a straightforward way as Ubuntu/Debian packages are available. Red Hat Enterprise Linux is officially supported by Revolution Analytics for it’s enterprise module. Other versions of Linux popular are Open SUSE.

      1. Multiple operating systems-
        1. Virtualization vs Dual Boot-

 

You can also choose between having a VMware VM Player for a virtual partition on your computers that is dedicated to R based computing or having operating system choice at the startup or booting of your computer. A software program called wubi helps with the dual installation of Linux and Windows.

  1. 64 bit vs 32 bit – Given a choice between 32 bit versus 64 bit versions of the same operating system like Linux Ubuntu, the 64 bit version would speed up processing by an approximate factor of 2. However you need to check whether your current hardware can support 64 bit operating systems and if so- you may want to ask your Information Technology manager to upgrade atleast some operating systems in your analytics work environment to 64 bit operating systems.

 

  1. Hardware choices- At the time of writing this book, the dominant computing paradigm is workstation computing followed by server-client computing. However with the introduction of cloud computing, netbooks, tablet PCs, hardware choices are much more flexible in 2011 than just a couple of years back.

Hardware costs are a significant cost to an analytics environment and are also  remarkably depreciated over a short period of time. You may thus examine your legacy hardware, and your future analytical computing needs- and accordingly decide between the various hardware options available for R.
Unlike other analytical software which can charge by number of processors, or server pricing being higher than workstation pricing and grid computing pricing extremely high if available- R is well suited for all kinds of hardware environment with flexible costs. Given the fact that R is memory intensive (it limits the size of data analyzed to the RAM size of the machine unless special formats and /or chunking is used)- it depends on size of datasets used and number of concurrent users analyzing the dataset. Thus the defining issue is not R but size of the data being analyzed.

    1. Local Computing- This is meant to denote when the software is installed locally. For big data the data to be analyzed would be stored in the form of databases.
      1. Server version- Revolution Analytics has differential pricing for server -client versions but for the open source version it is free and the same for Server or Workstation versions.
      2. Workstation
    2. Cloud Computing- Cloud computing is defined as the delivery of data, processing, systems via remote computers. It is similar to server-client computing but the remote server (also called cloud) has flexible computing in terms of number of processors, memory, and data storage. Cloud computing in the form of public cloud enables people to do analytical tasks on massive datasets without investing in permanent hardware or software as most public clouds are priced on pay per usage. The biggest cloud computing provider is Amazon and many other vendors provide services on top of it. Google is also coming for data storage in the form of clouds (Google Storage), as well as using machine learning in the form of API (Google Prediction API)
      1. Amazon
      2. Google
      3. Cluster-Grid Computing/Parallel processing- In order to build a cluster, you would need the RMpi and the SNOW packages, among other packages that help with parallel processing.
    3. How much resources
      1. RAM-Hard Disk-Processors- for workstation computing
      2. Instances or API calls for cloud computing
  1. Interface Choices
    1. Command Line
    2. GUI
    3. Web Interfaces
  2. Software Component Choices
    1. R dependencies
    2. Packages to install
    3. Recommended Packages
  3. Additional software choices
    1. Additional legacy software
    2. Optimizing your R based computing
    3. Code Editors
      1. Code Analyzers
      2. Libraries to speed up R

citation-  R Development Core Team (2010). R: A language and environment for statistical computing. R Foundation for Statistical Computing,Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.

(Note- this is a draft in progress)

Event: Predictive analytics with R, PMML and ADAPA

From http://www.meetup.com/R-Users/calendar/14405407/

The September meeting is at the Oracle campus. (This is next door to the Oracle towers, so there is plenty of free parking.) The featured talk is from Alex Guazzelli (Vice President – Analytics, Zementis Inc.) who will talk about “Predictive analytics with R, PMML and ADAPA”.

Agenda:
* 6:15 – 7:00 Networking and Pizza (with thanks to Revolution Analytics)
* 7:00 – 8:00 Talk: Predictive analytics with R, PMML and ADAPA
* 8:00 – 8:30 General discussion

Talk overview:

The rule in the past was that whenever a model was built in a particular development environment, it remained in that environment forever, unless it was manually recoded to work somewhere else. This rule has been shattered with the advent of PMML (Predictive Modeling Markup Language). By providing a uniform standard to represent predictive models, PMML allows for the exchange of predictive solutions between different applications and various vendors.

Once exported as PMML files, models are readily available for deployment into an execution engine for scoring or classification. ADAPA is one example of such an engine. It takes in models expressed in PMML and transforms them into web-services. Models can be executed either remotely by using web-services calls, or via a web console. Users can also use an Excel add-in to score data from inside Excel using models built in R.

R models have been exported into PMML and uploaded in ADAPA for many different purposes. Use cases where clients have used the flexibility of R to develop and the PMML standard combined with ADAPA to deploy range from financial applications (e.g., risk, compliance, fraud) to energy applications for the smart grid. The ability to easily transition solutions developed in R to the operational IT production environment helps eliminate the traditional limitations of R, e.g. performance for high volume or real-time transactional systems and memory constraints associated with large data sets.

Speaker Bio:

Dr. Alex Guazzelli has co-authored the first book on PMML, the Predictive Model Markup Language which is the de facto standard used to represent predictive models. The book, entitled PMML in Action: Unleashing the Power of Open Standards for Data Mining and Predictive Analytics, is available on Amazon.com. As the Vice President of Analytics at Zementis, Inc., Dr. Guazzelli is responsible for developing core technology and analytical solutions under ADAPA, a PMML-based predictive decisioning platform that combines predictive analytics and business rules. ADAPA is the first system of its kind to be offered as a service on the cloud.
Prior to joining Zementis, Dr. Guazzelli was involved in not only building but also deploying predictive solutions for large financial and telecommunication institutions around the globe. In academia, Dr. Guazzelli worked with data mining, neural networks, expert systems and brain theory. His work in brain theory and computational neuroscience has appeared in many peer reviewed publications. At Zementis, Dr. Guazzelli and his team have been involved in a myriad of modeling projects for financial, health-care, gaming, chemical, and manufacturing industries.

Dr. Guazzelli holds a Ph.D. in Computer Science from the University of Southern California and a M.S and B.S. in Computer Science from the Federal University of Rio Grande do Sul, Brazil.

IPSUR – A Free R Textbook

Here is a free R textbook called IPSUR-

http://ipsur.r-forge.r-project.org/book/index.php

IPSUR stands for Introduction to Probability and Statistics Using R, ISBN: 978-0-557-24979-4, which is a textbook written for an undergraduate course in probability and statistics. The approximate prerequisites are two or three semesters of calculus and some linear algebra in a few places. Attendees of the class include mathematics, engineering, and computer science majors.

IPSUR is FREE, in the GNU sense of the word. Hard copies are available for purchase here from Lulu and will be available (coming soon) from the other standard online retailers worldwide. The price of the book is exactly the manufacturing cost plus the retailers’ markup. You may be able to get it even cheaper by downloading an electronic copy and printing it yourself, but if you elect this route then be sure to get the publisher-quality PDF from theDownloads page. And double check the price. It was cheaper for my students to buy a perfect-bound paperback from Lulu and have it shipped to their door than it was to upload the PDF to Fed-Ex Kinkos and Xerox a coil-bound copy (and on top of that go pick it up at the store).

If you are going to buy from anywhere other than Lulu then be sure to check the time-stamp on the copyright page. There is a 6 to 8 week delay from Lulu to Amazon and you may not be getting the absolute latest version available.

Refer to the Installation page for instructions to install an electronic copy of IPSUR on your personal computer. See the Feedback page for guidance about questions or comments you may have about IPSUR.

Also see http://ipsur.r-forge.r-project.org/rcmdrplugin/index.php for the R Cmdr Plugin

This plugin for the R Commander accompanies the text Introduction to Probability and Statistics Using R by G. Jay Kerns. The plugin contributes functions unique to the book as well as specific configuration and functionality to R Commander, the pioneering work by John Fox of McMaster University.

RcmdrPlugin.IPSUR’s primary goal is to provide a user-friendly graphical user interface (GUI) to the open-source and freely available R statistical computing environment. RcmdrPlugin.IPSUR is equipped to handle many of the statistical analyses and graphical displays usually encountered by upper division undergraduate mathematics, statistics, and engineering majors. Available features are comparable to many expensive commercial packages such as Minitab, SPSS, and JMP-IN.

Since the audience of RcmdrPlugin.IPSUR is slightly different than Rcmdr’s, certain functionality has been added and selected error-checks have been disabled to permit the student to explore alternative regions of the statistical landscape. The resulting benefit of increased flexibility is balanced by somewhat increased vulnerability to syntax errors and misuse; the instructor should keep this and the academic audience in mind when usingRcmdrPlugin.IPSUR in the classroom

Interview:Richard Schultz , CEO REvolution Computing

Here is an interview with the CEO of REvolution Computing, Richard Schultz. Mr. Schultz offers his perspectives on aspects of the open source, predictive analytics, cloud computing as well his vision for R Commercial.

Note from Ajay-As I blogged previously, commercial establishments now have an option to use R commercially with a full service contract and all guarantees which they expect and get from existing analytics software vendors.

Ajay -Linux has not really succeeded in capturing Windows /Desktop Operating market. What are the technical and business reasons that you think R will succeed in analytics desktop software market.

Richard- To start, Linux was never really targeted at the Windows desktop market, but rather at deseating proprietary Unix deployments (particularly in finance), which it did quite successfully.  This is a similar trend to what we’re seeing in the R world – it’s not that R is generally replacing Excel, for instance.  In addition, with the large and growing base of both users and contributors, the vibrancy of the R community has taken on a life of its own.

As to R and Windows, two things are worth noting:

1. Microsoft has moved rapidly to embrace R and REvolution for that matter.

2. Windows is still the predominate operating system in large commercial enterprises. Because we deploy R on multiprocessors, which are now common on all computers including those pre-loaded with Windows, REvolution R is very much at home in both Windows, Mac, and Linux environments.

Ajay- What are the biggest challenges to Revolution Computing while explaining R Pro to users of traditional statistics softwares. What are the biggest advantages?

Richard- The biggest challenge is getting the word out that there now exists validated and supported R products designed for commercial use. But that’s changing rapidly, as your own interest in REvolution Computing demonstrates. Our biggest advantages are several:

1. we are focused on building a close and collegial relationship with the open source R community;

2. our company has a deep history in super computing and parallelization;

3. with, by Intel’s estimate, over 1 million R users and growing, there is a large community eager to adapt our products as its members advance their careers in the business and research worlds.

Ajay- Which softwares do you think will be affected the most by R’s spread across colleges and companies. What do you believe will be their strategies to compete.


Richard – I want to be politic here. Let me say that the programming software likely most affected by the rise of R is probably proprietary.

We see many opportunities to partner and leverage the strengths of REvolution’s products specifically – high performance, handling of large data, validation, IDE / user interface.

Ajay- How do you intend to incorporate the cloud computing and Software as a Service Model for R Pro. When , if at all, do you think it be possible  for a person to simply upload a zipped csv file, work on a remote cloud computer for analytics and forecasting, and just pay for the hired software,hardware,bandwidth.

Richard – We were thinking of something based on the Ohri framework.  ;-). ( Ajay- Touché!)

In fact, we have deployed, and are deploying cloud-based REvolution R for clients, and it’s something we expect to evolve as those technologies evolve.


Ajay- Asian countries have huge demand for analytics, and are more price conscious on softwares. What would your strategy to sell in Asia /China and India be.

Richard – Open source can be a tremendous win for users in Asia / China / India.  The upfront costs are low, the technology is leading-edge, and there is a distribution network for support.  REvolution has partners, and is continuing to build its partner network to be able to reach these markets.  We expect to accelerate our efforts in these regions toward the end of 2009.

Ajay- What has been the story so far for your career. What prompted you to join/start Revolution Computing. What would be the advice you would give to young science graduates in today’s recession.

Richard – My own background is in computer science, business… and music. Through school I held various positions at IBM, and after graduate school, I worked at Dunn & Bradsteet in a product management role and developed a taste for entrepreneurship. I’ve started two companies so far, MetaServer, a business intelligence middleware company that catered to the insurance industry, and REvolution Computing. Today, MetaServer is part of Oracle. And I continue to play music – guitar and piano. One of these days we’ll get a REvolution Computing band together.

My advice to young science graduates is the same recession or no: follow your enthusiasms; find a passion outside of work like playing music; master open source program languages because that is the future and the future is here.

About Richard Schultz –Chief Executive Officer,REvolution Computing

Richard guides REvolution’s long-range business strategy and leads the company’s teams on a daily basis. His experience developing and growing Business Intelligence software companies includes founding and leading Metaserver, Inc., now a part of Oracle, from inception to sale. Richard has been named Innovator of the Year by Business New Haven; served on the board of the Connecticut Venture Group; and been the keynote speaker for CIO Forum and other technology industry events.  A graduate of Washington University with degrees in Computer Science, Business and Music, Richard also holds a Masters degree in Computer Science from the State University of New York at Stonybrook and has held senior positions at Dunn and Bradstreet and IBM.

Ajay -REvolution Computing has been a leader in this field and going by the latest product launch –well you can try it yourself and see from here http://www.revolution-computing.com