Matlab-Mathematica-R and GPU Computing

Matlab announced they have a parallel computing toolbox- specially to enable GPU computing as well

http://www.mathworks.com/products/parallel-computing/

Parallel Computing Toolbox™ lets you solve computationally and data-intensive problems using multicore processors, GPUs, and computer clusters. High-level constructs—parallel for-loops, special array types, and parallelized numerical algorithms—let you parallelize MATLAB® applications without CUDA or MPI programming. You can use the toolbox with Simulink® to run multiple simulations of a model in parallel.

MATLAB GPU Support

The toolbox provides eight workers (MATLAB computational engines) to execute applications locally on a multicore desktop. Without changing the code, you can run the same application on a computer cluster or a grid computing service (using MATLAB Distributed Computing Server™). You can run parallel applications interactively or in batch.

Parallel Computing with MATLAB on Amazon Elastic Compute Cloud (EC2)

Also a video of using Mathematica and GPU

Also R has many packages for GPU computing

Parallel computing: GPUs

from http://cran.r-project.org/web/views/HighPerformanceComputing.html

  • The gputools package by Buckner provides several common data-mining algorithms which are implemented using a mixture of nVidia‘s CUDA langauge and cublas library. Given a computer with an nVidia GPU these functions may be substantially more efficient than native R routines. The rpud package provides an optimised distance metric for NVidia-based GPUs.
  • The cudaBayesreg package by da Silva implements the rhierLinearModel from the bayesm package using nVidia’s CUDA langauge and tools to provide high-performance statistical analysis of fMRI voxels.
  • The rgpu package (see below for link) aims to speed up bioinformatics analysis by using the GPU.
  • The magma package provides an interface to the hybrid GPU/CPU library Magma (see below for link).
  • The gcbd package implements a benchmarking framework for BLAS and GPUs (using gputools).

I tried to search for SAS and GPU and SPSS and GPU but got nothing. Maybe they would do well to atleast test these alternative hardwares-

Also see Matlab on GPU comparison for the product Jacket vs Parallel Computing Toolbox

http://www.accelereyes.com/products/compare

GNU PSPP- The Open Source SPSS

If you are SPSS user (for statistics/ not data mining) you can also try 0ut GNU PSPP- which is the open source equivalent and quite eerily impressive in performance. It is available at http://www.gnu.org/software/pspp/ or http://pspp.awardspace.com/ and you can also read more at http://en.wikipedia.org/wiki/PSPP

PSPP is a program for statistical analysis of sampled data. It is a Free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions.

[ Image of Variable Sheet ]The most important of these exceptions are, that there are no “time bombs”; your copy of PSPP will not “expire” or deliberately stop working in the future. Neither are there any artificial limits on the number of cases or variables which you can use. There are no additional packages to purchase in order to get “advanced” functions; all functionality that PSPP currently supports is in the core package.

PSPP can perform descriptive statistics, T-tests, linear regression and non-parametric tests. Its backend is designed to perform its analyses as fast as possible, regardless of the size of the input data. You can use PSPP with its graphical interface or the more traditional syntax commands.

A brief list of some of the features of PSPP follows:

  • Supports over 1 billion cases.
  • Supports over 1 billion variables.
  • Syntax and data files are compatible with SPSS.
  • Choice of terminal or graphical user interface.
  • Choice of text, postscript or html output formats.
  • Inter-operates with GnumericOpenOffice.Org and other free software.
  • Easy data import from spreadsheets, text files and database sources.
  • Fast statistical procedures, even on very large data sets.
  • No license fees.
  • No expiration period.
  • No unethical “end user license agreements”.
  • Fully indexed user manual.
  • Free Software; licensed under GPLv3 or later.
  • Cross platform; Runs on many different computers and many different operating systems.

PSPP is particularly aimed at statisticians, social scientists and students requiring fast convenient analysis of sampled data.

and

Features

This software provides a basic set of capabilities: frequencies, cross-tabs comparison of means (T-tests and one-way ANOVA); linear regression, reliability (Cronbach’s Alpha, not failure or Weibull), and re-ordering data, non-parametric tests, factor analysis and more.

At the user’s choice, statistical output and graphics are done in asciipdfpostscript or html formats. A limited range of statistical graphs can be produced, such as histogramspie-charts and np-charts.

PSPP can import GnumericOpenDocument and Excel spreadsheetsPostgres databasescomma-separated values– and ASCII-files. It can export files in the SPSS ‘portable’ and ‘system’ file formats and to ASCII files. Some of the libraries used by PSPP can be accessed programmatically; PSPP-Perl provides an interface to the libraries used by PSPP.

Origins

The PSPP project (originally called “Fiasco”) is a free, open-source alternative to the proprietary statistics package SPSS. SPSS is closed-source and includes a restrictive licence anddigital rights management. The author of PSPP considered this ethically unacceptable, and decided to write a program which might with time become functionally identical to SPSS, except that there would be no licence expiry, and everyone would be permitted to copy, modify and share the program.

Release history

  • 0.7.5 June 2010 http://pspp.awardspace.com/
  • 0.6.2 October 2009
  • 0.6.1 October 2008
  • 0.6.0 June 2008
  • 0.4.0.1 August 2007
  • 0.4.0 August 2005
  • 0.3.0 April 2004
  • 0.2.4 January 2000
  • 0.1.0 August 1998

Third Party Reviews

In the book “SPSS For Dummies“, the author discusses PSPP under the heading of “Ten Useful Things You Can Find on the Internet” [1]. In 2006, the South African Statistical Association presented a conference which included an analysis of how PSPP can be used as a free replacement to SPSS [2].

Citation-

Please send FSF & GNU inquiries to gnu@gnu.org. There are also other ways to contact the FSF. Please send broken links and other corrections (or suggestions) to bug-gnu-pspp@gnu.org.

Copyright © 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007 Free Software Foundation, Inc., 51 Franklin St – Suite 330, Boston, MA 02110, USA – Verbatim copying and distribution of this entire article are permitted worldwide, without royalty, in any medium, provided this notice, and the copyright notice, are preserved.

Q&A with David Smith, Revolution Analytics.

Here’s a group of questions and answers that David Smith of Revolution Analytics was kind enough to answer post the launch of the new R Package which integrates Hadoop and R-                         RevoScaleR

Ajay- How does RevoScaleR work from a technical viewpoint in terms of Hadoop integration?

David-The point isn’t that there’s a deep technical integration between Revolution R and Hadoop, rather that we see them as complementary (not competing) technologies. Hadoop is amazing at reliably (if slowly) processing huge volumes of distributed data; the RevoScaleR package complements Hadoop by providing statistical algorithms to analyze the data processed by Hadoop. The analogy I use is to compare a freight train with a race car: use Hadoop to slog through a distributed data set and use Map/Reduce to output an aggregated, rectangular data file; then use RevoScaleR to perform statistical analysis on the processed data (and use the speed of RevolScaleR to iterate through many model options to find the best one).

Ajay- How is it different from MapReduce and R Hipe– existing R Hadoop packages?
David- They’re complementary. In fact, we’ll be publishing a white paper soon by Saptarshi Guha, author of the Rhipe R/Hadoop integration, showing how he uses Hadoop to process vast volumes of packet-level VOIP data to identify call time/duration from the packets, and then do a regression on the table of calls using RevoScaleR. There’s a little more detail in this blog post: http://blog.revolutionanalytics.com/2010/08/announcing-big-data-for-revolution-r.html
Ajay- Is it going to be proprietary, free or licensable (open source)?
David- RevoScaleR is a proprietary package, available to paid subscribers (or free to academics) with Revolution R Enterprise. (If you haven’t seen it, you might be interested in this Q&A I did with Matt Shotwell: http://biostatmatt.com/archives/533 )
Ajay- Any existing client case studies for Terabyte level analysis using R.
David- The VOIP example above gets close, but most of the case studies we’ve seen in beta testing have been in the 10’s to 100’s of Gb range. We’ve tested RevoScaleR on larger data sets internally, but we’re eager to hear about real-life use cases in the terabyte range.
Ajay- How can I use RevoScaleR on my dual chip Win Intel laptop for say 5 gb of data.
David- One of the great things about RevoScaleR is that it’s designed to work on commodity hardware like a dual-core laptop. You won’t be constrained by the limited RAM available, and the parallel processing algorithms will make use of all cores available to speed up the analysis even further. There’s an example in this white paper (http://info.revolutionanalytics.com/bigdata.html) of doing linear regression on 13Gb of data on a simple dual-core laptop in less than 5 seconds.
AJ-Thanks to David Smith, for this fast response and wishing him, Saptarshi Guha Dr Norman Nie and the rest of guys at Revolution Analytics a congratulations for this new product launch.

My latest creation

I have just teamed up to create my latest venture called Kush Cognitives (Kush is my son). The firm is gonna make websites, build statistical analysis and offer social media offerings. It’s my latest venture and it merges all my previous ones and skills. After almost 3 years of working on and off with multiple people, this one is with a friend in the US.

Over the years (since 2007) I have made http://virtua-analytics.com (defunct), Swarajya Analytics Private Limited (www.swanplc.com – now sold) and now Kush Cognitives. I have gone through the models of proprietorship and corporation and now partnership.

Kush Cognitives is hosted at Decisionstats.com (as our flagship website) and we have shifted the blog to Decisionstats.Wordpress.com

We are aiming at the startups and small and medium segments first, but we retain capabilities for bigger clients as well. Lesser Bullshit and More Bang for your Buck.

So wish us luck- and if you need any social media advice, statistical analysis to be done, or technical matters of creating websites-This also includes training customization in R , SAS  , and statistical software but from a more practical point of view from a user angle. We are able to cater to both US and Indian clients.

give us a buzz at http://decisionstats.com

regards

Ajay Ohri

Image Courtesy-michelangelo