John Sall sets JMP 9 free to tango with R

 

Diagnostic graphs produced by plot.lm() functi...
Image via Wikipedia

 

John Sall, founder SAS AND JMP , has released the latest blockbuster edition of flagship of JMP 9 (JMP Stands for John’s Macintosh Program).

To kill all birds with one software, it is integrated with R and SAS, and the brochure frankly lists all the qualities. Why am I excited for JMP 9 integration with R and with SAS- well it integrates bigger datasets manipulation (thanks to SAS) with R’s superb library of statistical packages and a great statistical GUI (JMP). This makes JMP the latest software apart from SAS/IML, Rapid Miner,Knime, Oracle Data Miner to showcase it’s R integration (without getting into the GPL compliance need for showing source code– it does not ship R- and advises you to just freely download R). I am sure Peter Dalgaard, and Frankie Harell are all overjoyed that R Base and Hmisc packages would be used by fellow statisticians  and students for JMP- which after all is made in the neighborhood state of North Carolina.

Best of all a JMP 30 day trial is free- so no money lost if you download JMP 9 (and no they dont ask for your credit card number, or do they- but they do have a huuuuuuge form to register before you download. Still JMP 9 the software itself is more thoughtfully designed than the email-prospect-leads-form and the extra functionality in the free 30 day trial is worth it.

Also see “New Features  in JMP 9  http://www.jmp.com/software/jmp9/pdf/new_features.pdf

which has this regarding R.

Working with R

R is a programming language and software environment for statistical computing and graphics. JMP now  supports a set of JSL functions to access R. The JSL functions provide the following options:

• open and close a connection between JMP and R

• exchange data between JMP and R

•submit R code for execution

•display graphics produced by R

JMP and R each have their own sets of computational methods.

R has some methods that JMP does not have. Using JSL functions, you can connect to R and use these R computational methods from within JMP.

Textual output and error messages from R appear in the log window.R must be installed on the same computer as JMP.

JMP is not distributed with a copy of R. You can download R from the Comprehensive R Archive Network Web site:http://cran.r-project.org

Because JMP is supported as both a 32-bit and a 64-bit Windows application, you must install the corresponding 32-bit or 64-bit version of R.

For details, see the Scripting Guide book.

and the download trial page ( search optimized URL) –

http://www.sas.com/apps/demosdownloads/jmptrial9_PROD__sysdep.jsp?packageID=000717&jmpflag=Y

In related news (Richest man in North Carolina also ranks nationally(charlotte.news14.com) , Jim Goodnight is now just as rich as Mark Zuckenberg, creator of Facebook-

though probably they are not creating a movie on Jim yet (imagine a movie titled “The Statistical Software” -not just the same dude feel as “The Social Network”)

See John’s latest interview :

The People Behind the Software: John Sall

http://blogs.sas.com/jmp/index.php?/archives/352-The-People-Behind-the-Software-John-Sall.html

Interview John Sall Founder JMP/SAS Institute

https://decisionstats.com/2009/07/28/interview-john-sall-jmp/

SAS Early Days

https://decisionstats.com/2010/06/02/sas-early-days/

Going Deap : Algols in Python

Logo of PyPy
Image via Wikipedia

Here is an important new step in Python- the established statistical programming language (used to be really pushed by SPSS in pre-IBM days and the rPy package integrates R and Python).

Well the news  ( http://www.kdnuggets.com/2010/10/eap-evolutionary-algorithms-in-python.html ) is the release of Distributed Evolutionary Algorithms in Python. If your understanding of modeling means running regression and iterating it- you may need to read some more.  If you have felt frustrated at lack of parallelization in statistical software as well as your own hardware constraints- well go DEAP (and for corporate types the licensing is

http://www.gnu.org/licenses/lgpl.html ).

http://code.google.com/p/deap/

DEAP

DEAP is intended to be an easy to use distributed evolutionary algorithm library in the Python language. Its two main components are modular and can be used separately. The first module is a Distributed Task Manager (DTM), which is intended to run on cluster of computers. The second part is the Evolutionary Algorithms in Python (EAP) framework.

DTM

DTM is a distributed task manager that is able to spread workload over a buch of computers using a TCP or a MPI connection.

DTM include the following features:

 

EAP

Features

EAP includes the following features:

  • Genetic algorithm using any imaginable representation
    • List, Array, Set, Dictionary, Tree, …
  • Genetic programing using prefix trees
    • Loosely typed, Strongly typed
    • Automatically defined functions (new v0.6)
  • Evolution strategies (including CMA-ES)
  • Multi-objective optimisation (NSGA-II, SPEA-II)
  • Parallelization of the evaluations (and maybe more) (requires python2.6 and preferably python2.7) (new v0.6)
  • Genealogy of an evolution (that is compatible with NetworkX) (new v0.6)
  • Hall of Fame of the best individuals that lived in the population (new v0.5)
  • Milestones that take snapshot of a system regularly (new v0.5)

 

Documentation

See the eap user’s guide for EAP 0.6 documentation.

Requirement

The most basic features of EAP requires Python2.5 (we simply do not offer support for 2.4). In order to use multiprocessing you will need Python2.6 and to be able to combine the toolbox and the multiprocessing module Python2.7 is needed for its support to pickle partial functions.

Projects using EAP

If you want your project listed here, simply send us a link and a brief description and we’ll be glad to add it.

and from the wordpress.com blog (funny how people like code.google.com but not blogger.google.com anymore) at http://deapdev.wordpress.com/

EAP is part of the DEAP project, that also includes some facilities for the automatic distribution and parallelization of tasks over a cluster of computers. The D part of DEAP, called DTM, is under intense development and currently available as an alpha version. DTM currently provides two and a half ways to distribute workload on a cluster or LAN of workstations, based on MPI and TCP communication managers.

This public release (version 0.6) is more complete and simpler than ever. It includes Genetic Algorithms using any imaginable representation, Genetic Programming with strongly and loosely typed trees in addition to automatically defined functions, Evolution Strategies (including Covariance Matrix Adaptation), multiobjective optimization techniques (NSGA-II and SPEA2), easy parallelization of algorithms and much more like milestones, genealogy, etc.

We are impatient to hear your feedback and comments on that system at .

Best,

François-Michel De Rainville
Félix-Antoine Fortin
Marc-André Gardner
Christian Gagné
Marc Parizeau

Laboratoire de vision et systèmes numériques
Département de génie électrique et génie informatique
Université Laval
Quebec City (Quebec), Canada

and if you are new to Python -sigh here are some statistical things (read ad-van-cED analytics using Python) by a slideshare from Visual numerics (pre Rogue Wave acquisition)

Also see,

http://code.google.com/p/deap/wiki/SimpleExample

 

 

 

Top ten RRReasons R is bad for you ?

This is the original symbol of the Perl progra...
Image via Wikipedia

R stands for programming language based out of www.r-project.org

R is bad for you because –

1) It is slower with bigger datasets than SPSS language and SAS language .If you use bigger datasets, then you should either consider more hardware , or try and wait for some of the ODBC connect packages.

2) It needs more time to learn than SAS language .Much more time to learn how to do much more.

3) R programmers are lesser paid than SAS programmers.They prefer it that way.It equates the satisfaction of creating a package in development with a world wide community with the satisfaction of using a package and earning much more money per hour.

4) It forces you to learn the exact details of what you are doing due to its object oriented structure. Thus you either get no answer or get an exact answer. Your customer pays you by the hour not by the correct answers.

5) You can not push a couple of buttons or refer to a list of top ten most commonly used commands to finish the project.

6) It is free. And open for all. It is socialism expressed in code. Some of the packages are built by university professors. It is free.Free is bad. Who pays for the mortgage of the software programmers if all softwares were free ? Who pays for the Friday picnics. Who pays for the Good Night cruises?

7) It is free. Your organization will not commend you for saving them money- they will question why you did not recommend this before. And why did you approve all those packages that expire in 2011.R is fReeeeee. Customers feel good while spending money.The more software budgets you approve the more your salary is. R thReatens all that.

8) It is impossible to install a package you do not need or want. There is no one calling you on the phone to consider one more package or solution. R can make you lonely.

9) R uses mostly Command line. Command line is from the Seventies. Or the Eighties. The GUI’s RCmdr and Rattle are there but still…..

10) R forces you to learn new stuff by the month. You prefer to only earn by the month. Till the day your job got offshored…

Written by a R user in English language

( which fortunately was not copyrighted otherwise we would be paying Britain for each word)

Ajay- The above post was reprinted by personal request. It was written on Jan 2009- and may not be truly valid now. It is meant to be taken in good humor-not so seriously.

BI Software

Here is the brand new release from Jaspersoft at a groovy price of 9000$. Somebody stop these guys!

It’s a great company to watch for buyouts as well- given their expertise in REPORTING and clientele- especially for anyone looking to im prove thier standing in both open source world and reporting software branding.

From AOL owned Arrogantion’s site http://www.crunchbase.com/company/jaspersoft

 

Total $24.5M
Series D, 8/07 1
Scale Venture Partners
SAP Ventures
Doll Capital Management
Partech International
Morgenthaler Ventures
$12M
Unattributed, 12/08 2
Adams Street Partners
Red Hat
Morgenthaler Ventures
Doll Capital Management
Partech International

 

 

The news-

Announcing JasperReports Server Professional

More Resources

Webinar: Introducing JasperReports Server Professional

Thursday October 14

In this live webinar, learn how a new solution from Jaspersoft combines the world’s favorite reporting server with powerful, mature report server functionality—for about 80% less.

  • Date: Thu, Oct 14
  • Time: 10:00 AM PDT
  • Duration: 60 minutes

The World’s Most Powerful and Affordable Reporting Server

Limited Time Introductory Offer: Starting from $9,000 (restrictions apply)

JasperReports Server is the recommended product for organizations requiring an affordable reporting solution for interactive, operational, and production-based reporting. Deployed as a standalone reporting server or integrated inside another application, JasperReports Server is a flexible, powerful, interactive reporting environment for small or large enterprises.

Powered by the world’s most popular reporting tools in JasperReports and iReport, developers and users can take advantage of more interactivity, security, and scheduling of their reports.

Key Benefits:

  • Affordable: Unlimited reports for unlimited users starting at $9,000
  • Powerful: Report scheduling and distribution to 1,000s of users on a single server
  • Flexible: Web service architecture simplifies application integration
  • Secure: Centralized repository authenticates report access
  • Interactive: Easy to interact, self-serve parameterized-based reports
  • Visual appeal: Flash-based charts and maps engage users and enhance applications
  • Open: Access to any data source including relational, XML, Hibernate, EJB, POJO, and custom

 

Speaking of videos -here is a great video on BI from good ol Tennessee-a great 27 min tutorial on BI for newbies

 

Dataists shake up R community with a rocking contest

Flipboard
Image by Johan Larsson via Flickr

Newly created Dataists are creating waves on Hacker News and beyond with their innovative contest- A Recommendation Engine for R Packages.

Not only is the contest useful, it is likely to teach R Users some data hacking skills, as well as the basics of creating a GitHub Project.

Read more here-http://www.dataists.com/2010/10/using-data-tools-to-find-data-tools-the-yo-dawg-of-data-hacking/

For that reason, we’ve settled on the more manageable question, “which packages are most often installed by normal R users?”

This last question could potentially be answered in a variety of ways. Our current approach uses a convenience sample of installation data that we’ve collected from volunteers in the R community, who kindly agreed to send us a list of the packages they have on their systems. We’ve anonymized this data and compiled a set of metadata-based predictors that allow us to predict the installation probabilities quite well. We’re releasing all of our current work, including the data we have and all of the code we’ve used so far for our exploratory analyses. The contest itself will go live on Kaggle on Sunday and will end four months from Sunday on February 10, 2011. The rules, prizes and official data sets are all described below.

Rules and Prizes

To win the contest, you need to predict the probability that a user U has a package P installed on their system for every pair, (U, P). We’ll assess your performance using ROC methods, which will be evaluated against a held out test data set. The winning team will receive 3 UseR! books of their choosing. In order to win the contest, you’ll have to provide your analysis code to us by creating a fork of our GitHub repository. You’ll also be required to provide a written description of your approach. We’re asking for so much openness from the winning team because we want this contest to serve as a stepping stone for the R community. We’re also hoping that enterprising data hackers will extend the lessons learned through this contest to other programming languages.

Extract from-http://www.dataists.com/2010/10/using-data-tools-to-find-data-tools-the-yo-dawg-of-data-hacking/

Read the full article there

Top R Interviews

 

Portrait of baron A.I.Vassiliev (later - count)
Image via Wikipedia

 

Here is a list of the Top R Related Interviews I have done (in random order)-

1) John Fox , Creator of R Commander

https://decisionstats.com/2009/09/14/interview-professor-john-fox-creator-r-commander/

2) Dr Graham Williams, Creator of Rattle

https://decisionstats.com/2009/01/13/interview-dr-graham-williams/

3) David Smith, back when he was community Director of then Revolution Computing.

https://decisionstats.com/2009/05/29/interview-david-smith-revolution-computing/

and his second interview

https://decisionstats.com/2010/08/03/q-a-with-david-smith-revolution-analytics/

4) Robert Schultz, the first CEO of Revolution Computing (now Analytics)

https://decisionstats.com/2009/01/31/interviewrichard-schultz-ceo-revolution-computing/

5) Bob  Muenchen, author of R for SAS and SPSS users AND R for Stata users

https://decisionstats.com/2010/06/29/interview-r-for-stata-users/

https://decisionstats.com/2008/10/16/r-for-sas-and-spss-users/

6) Karim Chine, creator Biocep, Cloud Computing for R

https://decisionstats.com/2009/06/21/interview-karim-chine-biocep-cloud-computing-with-r/

7) Paul van Eikeran, Inference for R,the first enterprise package to use R from within MS Office.

https://decisionstats.com/2009/06/04/inference-for-r/

8) Hadley Wickham, creator GGPlot and R Author

https://decisionstats.com/2010/01/12/interview-hadley-wickham-r-project-data-visualization-guru/

Thats a lot of R interviews- I need to balance them out a bit I guess.

Learning Hadoop

Curious on learning hadoop- a hot resume skill

Try

http://www.cloudera.com/hadoop-training/#certification

Cloudera Certification for Hadoop

Cloudera Certification establishes you as a trusted and valuable resource for those working with Hadoop. Whether your company is just looking into the technology or your customers are asking for help, Cloudera Certification demonstrates your ability to solve problems using Hadoop.

  • Consultants, developers and technical leaders can use Cloudera Certification to demonstrate their experience with Hadoop.
  • Employers can use Cloudera Certification to identify candidates for new jobs or internal promotions, as well as ensure team members share a common knowledge base.
  • Customers can reduce risk by relying on contractors and suppliers who retain current Cloudera Certification for their personnel.

If you’d like to obtain Cloudera Certification for Developers or Administrators

http://www.cloudera.com/hadoop-training/#certification