Tag: US
PMML Plugin for Greenplum now available
From a press release from Zementis.
, the Universal PMML Plug-in for in-database scoring. Available now for the EMC Greenplum Database, a high-performance massively parallel processing (MPP) database, the plug-in leverages the Predictive Model Markup Language (PMML) to execute predictive models directly within EMC Greenplum, for highly optimized in-database scoring.
Developed by the Data Mining Group (DMG), PMML is supported by all major data mining vendors, e.g., IBM SPSS, SAS, Teradata, FICO, STASTICA, Microstrategy, TIBCO and Revolution Analytics as well as open source tools like R, KNIME and RapidMiner. With PMML, models built in any of these data mining tools can now instantly be deployed in the EMC Greenplum database. The net result is the ability to leverage the power of standards-based predictive analytics on a massive scale, right where the data resides.
|
|
|
Related Articles
- Creating New Capabilities With An Analytics Lab (chucksblog.emc.com)
- EMC Greenplum releases Community Edition of MPP database product, big data analysis gets cheaper still (zdnet.com)
- EMC lets go of Greenplum Community Edition (go.theregister.com)
- Greenplum, Big Data, and an Open Source Card (arnoldit.com)
- EMC launches free edition of Greenplum database (zdnet.com)
HIGHLIGHTS from REXER Survey :R gives best satisfaction
A Summary report from Rexer Analytics Annual Survey
HIGHLIGHTS from the 4th Annual Data Miner Survey (2010):
• FIELDS & GOALS: Data miners work in a diverse set of fields. CRM / Marketing has been the #1 field in each of the past four years. Fittingly, “improving the understanding of customers”, “retaining customers” and other CRM goals are also the goals identified by the most data miners surveyed.
• ALGORITHMS: Decision trees, regression, and cluster analysis continue to form a triad of core algorithms for most data miners. However, a wide variety of algorithms are being used. This year, for the first time, the survey asked about Ensemble Models, and 22% of data miners report using them.
A third of data miners currently use text mining and another third plan to in the future.
• MODELS: About one-third of data miners typically build final models with 10 or fewer variables, while about 28% generally construct models with more than 45 variables.
• TOOLS: After a steady rise across the past few years, the open source data mining software R overtook other tools to become the tool used by more data miners (43%) than any other. STATISTICA, which has also been climbing in the rankings, is selected as the primary data mining tool by the most data miners (18%). Data miners report using an average of 4.6 software tools overall. STATISTICA, IBM SPSS Modeler, and R received the strongest satisfaction ratings in both 2010 and 2009.
• TECHNOLOGY: Data Mining most often occurs on a desktop or laptop computer, and frequently the data is stored locally. Model scoring typically happens using the same software used to develop models. STATISTICA users are more likely than other tool users to deploy models using PMML.
• CHALLENGES: As in previous years, dirty data, explaining data mining to others, and difficult access to data are the top challenges data miners face. This year data miners also shared best practices for overcoming these challenges. The best practices are available online.
• FUTURE: Data miners are optimistic about continued growth in the number of projects they will be conducting, and growth in data mining adoption is the number one “future trend” identified. There is room to improve: only 13% of data miners rate their company’s analytic capabilities as “excellent” and only 8% rate their data quality as “very strong”.
Please contact us if you have any questions about the attached report or this annual research program. The 5th Annual Data Miner Survey will be launching next month. We will email you an invitation to participate.
Information about Rexer Analytics is available at www.RexerAnalytics.com. Rexer Analytics continues their impressive journey see http://www.rexeranalytics.com/Clients.html
|My only thought- since most data miners are using multiple tools including free tools as well as paid software, Perhaps a pie chart of market share by revenue and volume would be handy.
Also some ideas on comparing diverse data mining projects by data size, or complexity.
Related Articles
- Skills of a good data miner (zyxo.wordpress.com)
- 7 Data Blogs To Explore (readwriteweb.com)
- FBI Data-Mining Program:Total Information Awareness (alitarhini.wordpress.com)
Interview Anne Milley JMP
Here is an interview with Anne Milley, a notable thought leader in the world of analytics. Anne is now Senior Director, Analytical Strategy in Product Marketing for JMP , the leading data visualization software from the SAS Institute.
Ajay-What do you think are the top 5 unique selling points of JMP compared to other statistical software in its category?
Anne-
JMP combines incredible analytic depth and breadth with interactive data visualization, creating a unique environment optimized for discovery and data-driven innovation.
With an extensible framework using JSL (JMP Scripting Language), and integration with SAS, R, and Excel, JMP becomes your analytic hub.
JMP is accessible to all kinds of users. A novice analyst can dig into an interactive report delivered by a custom JMP application. An engineer looking at his own data can use built-in JMP capabilities to discover patterns, and a developer can write code to extend JMP for herself or others.
State-of-the-art DOE capabilities make it easy for anyone to design and analyze efficient experiments to determine which adjustments will yield the greatest gains in quality or process improvement – before costly changes are made.
Not to mention, JMP products are exceptionally well designed and easy to use. See for yourself and check out the free trial at www.jmp.com.
Ajay- What are the challenges and opportunities of expanding JMP’s market share? Do you see JMP expanding its conferences globally to engage global audiences?
Anne-
We realized solid global growth in 2010. The release of JMP Pro and JMP Clinical last year along with continuing enhancements to the rest of the JMP family of products (JMP and JMP Genomics) should position us well for another good year.
With the growing interest in analytics as a means to sustained value creation, we have the opportunity to help people along their analytic journey – to get started, take the next step, or adopt new paradigms speeding their time to value. The challenge is doing that as fast as we would like.
We are hiring internationally to offer even more events, training and academic programs globally.
Ajay- What are the current and proposed educational and global academic initiatives of JMP? How can we see more JMP in universities across the world (say India- China etc)?
Anne-
We view colleges and universities both as critical incubators of future JMP users and as places where attitudes about data analysis and statistics are formed. We believe that a positive experience in learning statistics makes a person more likely to eventually want and need a product like JMP.
For most students – and particularly for those in applied disciplines of business, engineering and the sciences – the ability to make a statistics course relevant to their primary area of study fosters a positive experience. Fortunately, there is a trend in statistical education toward a more applied, data-driven approach, and JMP provides a very natural environment for both students and researchers.
Its user-friendly navigation, emphasis on data visualization and easy access to the analytics behind the graphics make JMP a compelling alternative to some of our more traditional competitors.
We’ve seen strong growth in the education markets in the last few years, and JMP is now used in nearly half of the top 200 universities in the US.
Internationally, we are at an earlier stage of market development, but we are currently working with both JMP and SAS country offices and their local academic programs to promote JMP. For example, we are working with members of the JMP China office and faculty at several universities in China to support the use of JMP in the development of a master’s curriculum in Applied Statistics there, touched on in this AMSTAT News article.
Ajay- What future trends do you see for 2011 in this market (say top 5)?
Anne-
Growing complexity of data (text, image, audio…) drives the need for more and better visualization and analysis capabilities to make sense of it all.
More “chief analytics officers” are making better use of analytic talent – people are the most important ingredient for success!
JMP has been on the vanguard of 64-bit development, and users are now catching up with us as 64-bit machines become more common.
Users should demand easy-to-use, exploratory and predictive modeling tools as well as robust tools to experiment and learn to help them make the best decisions on an ongoing basis.
All these factors and more fuel the need for the integration of flexible, extensible tools with popular analytic platforms.
Ajay-You enjoy organic gardening as a hobby. How do you think hobbies and unwind time help people be better professionals?
Anne-
I am lucky to work with so many people who view their work as a hobby. They have other interests too, though, some of which are work-related (statistics is relevant everywhere!). Organic gardening helps me put things in perspective and be present in the moment. More than work defines who you are. You can be passionate about your work as well as passionate about other things. I think it’s important to spend some leisure time in ways that bring you joy and contribute to your overall wellbeing and outlook.
Btw, nice interviews over the past several months—I hadn’t kept up, but will check it out more often!
Biography– Source- http://www.sas.com/knowledge-exchange/business-analytics/biographies.html

Anne Milley
Anne Milley is Senior Director of Analytics Strategy at JMP Product Marketing at SAS. Her ties to SAS began with bank failure prediction at Federal Home Loan Bank Dallas and continued at 7-Eleven Inc. She has authored papers and served on committees for F2006, KDD, SIAM, A2010 and several years of SAS’ annual data mining conference. Milley is a contributing faculty member for the International Institute of Analytics. anne.milley@jmp.com
OK Cupid Data Visualization- Flow Chart to your Heart
Quite appropriate on a V Day, OK Cupid remains quite innovative how they use data (in this questionnaire data)
Related Articles
- OkCupid: Finding your Valentine with R (revolutionanalytics.com)
- OkCupid Demystifies Dating with Big Data (gigaom.com)
- OkCupid’s Love Math Doesn’t Solve The Equation [They Blinded Us With Science] (jezebel.com)
- OK Cupid Finds That It’s Our Differences That Make Us Attractive (Aw) (thegloss.com)
- Match.com Buys OkCupid for $50M (appscout.com)
R Node- and other Web Interfaces to R
R Node is a great web interface to R.
http://squirelove.net/r-node/doku.php
Features
-
Access to a R server backend via a web browser UI
-
The web browser UI works in all modern browsers, including IE 7 and 8 (excluding SVG based graphs).
-
Username/password login (both from the browser to the R-Node server, and from the R-Node server to Rserve and R).
-
Per-user R sessions. Each user can have their own R workspace, or they can share.
-
-
Support for most R commands that perform statistical analysis and provide textual feedback.
-
Support for most standard R commands that provide graphical feedback via server side generation of the graphs. Some graphs (e.g. plot() can be plotted via SVG client-side as well).
-
Downloading of generated graphs.
-
Accessing R help files using help() and ? commands (Note R v2.10 altered how help is provided, so this currently is not working in R v2.10)
-
Uploading files to work with their data in R.
-
Many commands will work. Try a command, if it does not work, use the feedback button in the application to let us know.
Limitations
-
Various R functions are not supported. These include:
-
Installation of new R packages.
-
Searching of help via ??.
- Example calls (via example()).
-
- First and now not so updated Rweb: Web-based Statistical Analysis Last Modified: 25-Jun-1999 JSS Paper (http://www.jstatsoft.org/v04/i01/
R-Online https://user.cs.tu-berlin.de/~ulfi/cgi-bin/r-online/r-online.cgi(The official FAQ seems outdated )
- Rcgi (it is not clear if the project is still active as per official FQ) http://www.ms.uky.edu/~statweb/testR3.html

Rphp
RWui
http://sysbio.mrc-bsu.cam.ac.uk/Rwui/

R.Rsp
http://cran.r-project.org/web/packages/R.rsp/index.html
RServe
http://www.rforge.net/doc/packages/Rserve/00Index.html
RPad
http://rpad.googlecode.com/svn-history/r76/Rpad_homepage/index.html

CGIwithR
JSS paper Citation. CGIwithR: Facilities for processing Web forms using R. Journal of Statistical Software, 8(10), pp. 1-8, 2003.
A lecture on aspects of using CGI
R Apache
http://biostat.mc.vanderbilt.edu/rapache/

- Open Infrastructure for Outcomes with a live reporting module using RSessionDA
- Free statistics software– Wessa server using R (see http://www.wessa.net/rwasp_arimaforecasting.wasp)
Wessa, P. (2011), Free Statistics Software, Office for Research Development and Education,
version 1.1.23-r6, URL http://www.wessa.net/
- An impressive implementation of time series analysis based on R and Javascript. This web server creates separate browser windows for data entry, graphics, and procedure selection. These windows are dynamic. For example, after entering data there is no
submitbutton to submit the data. The procedure selection window is used to start the analysis, which uses the current values in the data window.
- Online multivariate analysis and graphical displays from PBIL, Lyon
- An R web server for robust rank-based linear models
To make an interactive GUI in gWidgets can be as easy as creating the following script:
w <- gwindow(’simple interactive GUI with one button’, visible=FALSE)
g <- ggroup(cont=w)
b <- gbutton(’click me’, cont=g, handler=function(h,...) {
gmessage(’hello world’, parent=b)
})
visible(w) <- TRUE
A big and slightly outdated resource page from (which I used for some find and seek of resources)
http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/StatCompCourse
AND
The famous site at http://www.yeroon.net/ggplot2/ (but no sharing of this site’s source code ,sigh!)

Thats all for now- but watch this space its exciting (to watch AND code) –
Code Enhancers for R
This page lists code editors (or IDE)
https://rforanalytics.wordpress.com/code-enhancers-for-r/
Graphical User Interfaces for R
https://rforanalytics.wordpress.com/graphical-user-interfaces-for-r/
ODBC /Databases for R
https://rforanalytics.wordpress.com/odbc-databases-for-r/
Related Articles
- WebTunes provides Web-based iTunes interface (macworld.com)
- 5 Reasons to Use Twitter Web Interface (madrasgeek.com)
- Getting Started With Riak & Python (pragmaticbadger.com)
- Rserve – Binary R server – RForge.net (rforge.net)
- How to Run Apache and Node.js on the Same Server (readwriteweb.com)
- Rserve – TCP/IP interface to R – RoSuDa – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse – Universität Augsburg (stats.math.uni-augsburg.de)
SAS to R Challenge: Unique benchmarking
An interesting announcemnet from Revolution Analytics promises to convert your legacy code in SAS language not only cheaper but faster. It’ s a very very interesting challenge and I wonder how SAS users ,corporates, customers as well as the Institute itself reacts
http://www.revolutionanalytics.com/sas-challenge/
Are you paying for expensive software licenses and hardware to run time-consuming statistical analyses on big data sets?
If you’re doing linear regressions, logistic regressions, predictions, or multivariate crosstabulations* there’s something you should know: Revolution Analytics can get the same results for a substantially lower cost and faster than SAS®.
Quick Link:
Revolution R Enterprise 4.2
Top 10 Reasons to Buy
For a limited time only, Revolution Analytics invites you take the SAS to R Challenge. Let us prove that we can deliver on our promise of replicating your results in R, faster and cheaper than SAS.
Here’s how it works:
Fill out the short form below, and one of our conversion experts will contact you to discuss the SAS code you want to convert. If we think Revolution R Enterprise can get the same results faster than SAS, we’ll convert your code to R free of charge. Our goal is to demonstrate that Revolution R Enterprise will produce the same results in less time. There’s no obligation, but if you choose to convert, we guarantee that your license cost for Revolution R Enterprise will be less than half what you’re currently paying for the equivalent SAS software.**
It’s that simple.
We’ll show you that you don’t need expensive hardware and software to do high quality statistical analysis of big data. And we’ll show that you don’t need to tie up your computing resources with long running operations. With Revolution R Enterprise, you can run analyses on commodity hardware using Linux or Windows, scale to terabyte-class data problems and do it at processing speeds you would never have thought possible.
Sign up now, and we will be in touch shortly.
—————————-
SAS is a registered trademark of the SAS Institute, Cary, NC, in the US and other countries.
*Additional statistical algorithms are being rapidly added to Revolution R Enterprise. Custom development services are also available.
**Revolution Analytics retains the right to determine eligibility for this offer. Offer available until March 31, 2011.
Related Articles
- Revolution R Enterprise 4.2 now available (revolutionanalytics.com)
- Live from Strata (revolutionanalytics.com)
- Revolution Analytics in 2010 (revolutionanalytics.com)
- UBIT: SAS for Windows (ubit.buffalo.edu)
- What’s Next for Revolution R and Hadoop? (revolutionanalytics.com)
- A simple test to predict coronary artery disease (r-bloggers.com)













