Open Source Compiler for SAS language/ GNU -DAP

A Bold GNU Head
Image via Wikipedia

I am still testing this out.

But if you know bit more about make and .compile in Ubuntu check out

http://www.gnu.org/software/dap/

I loved the humorous introduction

Dap is a small statistics and graphics package based on C. Version 3.0 and later of Dap can read SBS programs (based on the utterly famous, industry standard statistics system with similar initials – you know the one I mean)! The user wishing to perform basic statistical analyses is now freed from learning and using C syntax for straightforward tasks, while retaining access to the C-style graphics and statistics features provided by the original implementation. Dap provides core methods of data management, analysis, and graphics that are commonly used in statistical consulting practice (univariate statistics, correlations and regression, ANOVA, categorical data analysis, logistic regression, and nonparametric analyses).

Anyone familiar with the basic syntax of C programs can learn to use the C-style features of Dap quickly and easily from the manual and the examples contained in it; advanced features of C are not necessary, although they are available. (The manual contains a brief introduction to the C syntax needed for Dap.) Because Dap processes files one line at a time, rather than reading entire files into memory, it can be, and has been, used on data sets that have very many lines and/or very many variables.

I wrote Dap to use in my statistical consulting practice because the aforementioned utterly famous, industry standard statistics system is (or at least was) not available on GNU/Linux and costs a bundle every year under a lease arrangement. And now you can run programs written for that system directly on Dap! I was generally happy with that system, except for the graphics, which are all but impossible to use,  but there were a number of clumsy constructs left over from its ancient origins.

http://www.gnu.org/software/dap/#Sample output

  • Unbalanced ANOVA
  • Crossed, nested ANOVA
  • Random model, unbalanced
  • Mixed model, balanced
  • Mixed model, unbalanced
  • Split plot
  • Latin square
  • Missing treatment combinations
  • Linear regression
  • Linear regression, model building
  • Ordinal cross-classification
  • Stratified 2×2 tables
  • Loglinear models
  • Logit  model for linear-by-linear association
  • Logistic regression
  • Copyright © 2001, 2002, 2003, 2004 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA

    sounds too good to be true- GNU /DAP joins WPS workbench and Dulles Open’s Carolina as the third SAS language compiler (besides the now defunct BASS software) see http://en.wikipedia.org/wiki/SAS_language#Controversy

     

    Also see http://en.wikipedia.org/wiki/DAP_(software)

    Dap was written to be a free replacement for SAS, but users are assumed to have a basic familiarity with the C programming language in order to permit greater flexibility. Unlike R it has been designed to be used on large data sets.

    It has been designed so as to cope with very large data sets; even when the size of the data exceeds the size of the computer’s memory

    WPS Version 2.5.1 Released – can still run SAS language/data and R

    However this is what Phil Rack the reseller is quoting on http://www.minequest.com/Pricing.html

    Windows Desktop Price: $884 on 32-bit Windows and $1,149 on 64-bit Windows.

    The Bridge to R is available on the Windows platforms and is available for free to customers who
    license WPS through MineQuest,LLC. Companies and organizations outside of North America
    may purchase a license for the Bridge to R which starts at $199 per desktop or $599 per server

    Windows Server Price: $1,903 per logical CPU for 32-bit and $2,474 for 64-bit.

    Note that Linux server versions are available but do not yet support the Eclipse IDE and are
    command line only

    WPS sure seems going well-but their pricing is no longer fixed and on the home website, you gotta fill a form. Ditt0 for the 30 day free evaluation

    http://www.teamwpc.co.uk/products/wps/modules/core

    Data File Formats

    The table below provides a summary of data formats presently supported by the WPS Core module.

    Data File Format Un-Compressed
    Data
    Compressed
    Data
    Read Write Read Write
    SD2 (SAS version 6 data set)
    SAS7BDAT (SAS version 7 data set)
    SAS7BDAT (SAS version 8 data set)
    SAS7BDAT (SAS version 9 data set)
    SASSEQ (SAS version 8/9 sequential file)
    V8SEQ (SAS version 8 sequential file)
    V9SEQ (SAS version 9 sequential file)
    WPD (WPS native data set)
    WPDSEQ (WPS native sequential file)
    XPORT (transport format)

    Additional access to EXCEL, SPSS and dBASE files is supported by utilising the WPS Engine for DB Filesmodule.

    and they have a new product release on Valentine Day 2011 (oh these Europeans!)

    From the press release at http://www.teamwpc.co.uk/press/wps2_5_1_released

    WPS Version 2.5.1 Released 

    New language support, new data engines, larger datasets, improved scalability

    LONDON, UK – 14 February 2011 – World Programming today released version 2.5.1 of their WPS software for workstations, servers and mainframes.

    WPS is a competitively priced, high performance, highly scalable data processing and analytics software product that allows users to execute programs written in the language of SAS. WPS is supported on a wide variety of hardware and operating system platforms and can connect to and work with many types of data with ease. The WPS user interface (Workbench) is frequently praised for its ease of use and flexibility, with the option to include numerous third-party extensions.

    This latest version of the software has the ability to manipulate even greater volumes of data, removing the previous 2^31 (2 billion) limit on number of observations.

    Complimenting extended data processing capabilities, World Programming has worked hard to boost the performance, scalability and reliability of the WPS software to give users the confidence they need to run heavy workloads whilst delivering maximum value from available computer power.

    WPS version 2.5.1 offers additional flexibility with the release of two new data engines for accessing Greenplum and SAND databases. WPS now comes with eleven data engines and can access a huge range of commonly used and industry-standard file-formats and databases.

    Support in WPS for the language of SAS continues to expand with more statistical procedures, data step functions, graphing controls and many other language items and options.

    WPS version 2.5.1 is available as a free upgrade to all licensed users of WPS.

    Summary of Main New Features:

    • Supporting Even Larger Datasets
      WPS is now able to process very large data sets by lifting completely the previous size limit of 2^31 observations.
    • Performance and Scalability Boosted
      Performance and scalability improvements across the board combine to ensure even the most demanding large and concurrent workloads are processed efficiently and reliably.
    • More Language Support
      WPS 2.5.1 continues the expansion of it’s language support with over 70 new language items, including new Procedures, Data Step functions and many other language items and options.
    • Statistical Analysis
      The procedure support in WPS Statistics has been expanded to include PROC CLUSTER and PROC TREE.
    • Graphical Output
      The graphical output from WPS Graphing has been expanded to accommodate more configurable graphics.
    • Hash Tables
      Support is now provided for hash tables.
    • Greenplum®
      A new WPS Engine for Greenplum provides dedicated support for accessing the Greenplum database.
    • SAND®
      A new WPS Engine for SAND provides dedicated support for accessing the SAND database.
    • Oracle®
      Bulk loading support now available in the WPS Engine for Oracle.
    • SQL Server®
      To enhance existing SQL Server database access, a new SQLSERVR (please note spelling) facility in the ODBC engine.

    More Information:

    Existing Users should visit www.teamwpc.co.uk/support/wps/release where you can download a readme file containing more information about all the new features and fixes in WPS 2.5.1.

    New Users should visit www.teamwpc.co.uk/products/wps where you can explore in more detail all the features available in WPS or request a free evaluation.

    and from http://www.teamwpc.co.uk/products/wps/data it seems they are going on the BIG DATA submarine as well-

    Data Support 

    Extremely Large Data Size Handling

    WPS is now able to handle extremely large data sets now that the previous limit of 2^31 observations has been lifted.

    Access Standard Databases

    Use I/O Features in WPS Core

    • CLIPBOARD (Windows only)
    • DDE (Windows only)
    • EMAIL (via SMTP or MAPI)
    • FTP
    • HTTP
    • PIPE (Windows and UNIX only)
    • SOCKET
    • STDIO
    • URL

    Use Standard Data File Formats

    Interview David Katz ,Dataspora /David Katz Consulting

    Here is an interview with David Katz ,founder of David Katz Consulting (http://www.davidkatzconsulting.com/) and an analyst at the noted firm http://dataspora.com/. He is a featured speaker at Predictive Analytics World  http://www.predictiveanalyticsworld.com/sanfrancisco/2011/speakers.php#katz)

    Ajay-  Describe your background working with analytics . How can we make analytics and science more attractive career options for young students

    David- I had an interest in math from an early age, spurred by reading lots of science fiction with mathematicians and scientists in leading roles. I was fortunate to be at Harry and David (Fruit of the Month Club) when they were in the forefront of applying multivariate statistics to the challenge of targeting catalogs and other snail-mail offerings. Later I had the opportunity to expand these techniques to the retail sphere with Williams-Sonoma, who grew their retail business with the support of their catalog mailings. Since they had several catalog titles and product lines, cross-selling presented additional analytic challenges, and with the growth of the internet there was still another channel to consider, with its own dynamics.

    After helping to found Abacus Direct Marketing, I became an independent consultant, which provided a lot of variety in applying statistics and data mining in a variety of settings from health care to telecom to credit marketing and education.

    Students should be exposed to the many roles that analytics plays in modern life, and to the excitement of finding meaningful and useful patterns in the vast profusion of data that is now available.

    Ajay-  Describe your most challenging project in 3 decades of experience in this field.

    David- Hard to choose just one, but the educational field has been particularly interesting. Partnering with Olympic Behavior Labs, we’ve developed systems to help identify students who are most at-risk for dropping out of school to help target interventions that could prevent dropout and promote success.

    Ajay- What do you think are the top 5 trends in analytics for 2011.

    David- Big Data, Privacy concerns, quick response to consumer needs, integration of testing and analysis into business processes, social networking data.

    Ajay- Do you think techniques like RFM and LTV are adequately utilized by organization. How can they be propagated further.

    David- Organizations vary amazingly in how sophisticated or unsophisticated the are in analytics. A key factor in success as a consultant is to understand where each client is on this continuum and how well that serves their needs.

    Ajay- What are the various software you have worked for in this field- and name your favorite per category.

    David- I started out using COBOL (that dates me!) then concentrated on SAS for many years. More recently R is my favorite because of its coverage, currency and programming model, and it’s debugging capabilities.

    Ajay- Independent consulting can be a strenuous job. What do you do to unwind?

    David- Cycling, yoga, meditation, hiking and guitar.

    Biography-

    David Katz, Senior Analyst, Dataspora, and President, David Katz Consulting.

    David Katz has been in the forefront of applying statistical models and database technology to marketing problems since 1980. He holds a Master’s Degree in Mathematics from the University of California, Berkeley. He is one of the founders of Abacus Direct Marketing and was previously the Director of Database Development for Williams-Sonoma.

    He is the founder and President of David Katz Consulting, specializing in sophisticated statistical services for a variety of applications, with a special focus on the Direct Marketing Industry. David Katz has an extensive background that includes experience in all aspects of direct marketing from data mining, to strategy, to test design and implementation. In addition, he consults on a variety of data mining and statistical applications from public health to collections analysis. He has partnered with consulting firms such as Ernst and Young, Prediction Impact, and most recently on this project with Dataspora.

    For more on David’s Session in Predictive Analytics World, San Fransisco on (http://www.predictiveanalyticsworld.com/sanfrancisco/2011/agenda.php#day2-16a)

    Room: Salon 5 & 6
    4:45pm – 5:05pm

    Track 2: Social Data and Telecom 
    Case Study: Major North American Telecom
    Social Networking Data for Churn Analysis

    A North American Telecom found that it had a window into social contacts – who has been calling whom on its network. This data proved to be predictive of churn. Using SQL, and GAM in R, we explored how to use this data to improve the identification of likely churners. We will present many dimensions of the lessons learned on this engagement.

    Speaker: David Katz, Senior Analyst, Dataspora, and President, David Katz Consulting

    Exhibit Hours
    Monday, March 14th:10:00am to 7:30pm

    Tuesday, March 15th:9:45am to 4:30pm

    Revolution R Enterprise 4.2

    Revo R gets more and more yum yum-

    he following new features:

    • Direct import of SAS data sets into the native, efficient XDF file format
    • Direct import of fixed-format text data files into XDF file format
    • New commands to read subsets of rows and variables from XDF files in memory;
    • Many enhancements to the R Productivity Environment (RPE) for Windows
    • Expanded and updated user documentation
    • Added support on Linux for the big-data statistics package RevoScaleR
    • Added support on Windows for Web Services integration of predictive analytics with RevoDeployR.

    Revolution R Enterprise 4.2 is available immediately for 64-bit Red Hat Enterprise Linux systems and both 32-bit and 64-bit Windows systems. Pricing starts at $1,000 per single-user workstation

    And its free for academic licenses- so come on guys it is worth  atleast one download, and test.

    http://www.revolutionanalytics.com/downloads/free-academic.php

     

    SAS to R Challenge: Unique benchmarking

    Flag of Town of Cary
    Image via Wikipedia

    An interesting announcemnet from Revolution Analytics promises to convert your legacy code in SAS language not only cheaper but faster. It’ s a very very interesting challenge and I wonder how SAS users ,corporates, customers as well as the Institute itself reacts

    http://www.revolutionanalytics.com/sas-challenge/

    Take the SAS to R Challenge

    Are you paying for expensive software licenses and hardware to run time-consuming statistical analyses on big data sets?

    If you’re doing linear regressions, logistic regressions, predictions, or multivariate crosstabulations* there’s something you should know: Revolution Analytics can get the same results for a substantially lower cost and faster than SAS®.

    For a limited time only, Revolution Analytics invites you take the SAS to R Challenge. Let us prove that we can deliver on our promise of replicating your results in R, faster and cheaper than SAS.

    Take the challenge

    Here’s how it works:

    Fill out the short form below, and one of our conversion experts will contact you to discuss the SAS code you want to convert. If we think Revolution R Enterprise can get the same results faster than SAS, we’ll convert your code to R free of charge. Our goal is to demonstrate that Revolution R Enterprise will produce the same results in less time. There’s no obligation, but if you choose to convert, we guarantee that your license cost for Revolution R Enterprise will be less than half what you’re currently paying for the equivalent SAS software.**

    It’s that simple.

    We’ll show you that you don’t need expensive hardware and software to do high quality statistical analysis of big data. And we’ll show that you don’t need to tie up your computing resources with long running operations. With Revolution R Enterprise, you can run analyses on commodity hardware using Linux or Windows, scale to terabyte-class data problems and do it at processing speeds you would never have thought possible.

    Sign up now, and we will be in touch shortly.

    Take the challenge

     

    —————————-

    SAS is a registered trademark of the SAS Institute, Cary, NC, in the US and other countries.

    *Additional statistical algorithms are being rapidly added to Revolution R Enterprise. Custom development services are also available.

    **Revolution Analytics retains the right to determine eligibility for this offer. Offer available until March 31, 2011.

    R Commander Plugins-20 and growing!

    First graphical user interface in 1973.
    Image via Wikipedia
    R Commander Extensions: Enhancing a Statistical Graphical User Interface by extending menus to statistical packages

    R Commander ( see paper by Prof J Fox at http://www.jstatsoft.org/v14/i09/paper ) is a well known and established graphical user interface to the R analytical environment.
    While the original GUI was created for a basic statistics course, the enabling of extensions (or plug-ins  http://www.r-project.org/doc/Rnews/Rnews_2007-3.pdf ) has greatly enhanced the possible use and scope of this software. Here we give a list of all known R Commander Plugins and their uses along with brief comments.

    1. DoE – http://cran.r-project.org/web/packages/RcmdrPlugin.DoE/RcmdrPlugin.DoE.pdf
    2. doex
    3. EHESampling
    4. epack- http://cran.r-project.org/web/packages/RcmdrPlugin.epack/RcmdrPlugin.epack.pdf
    5. Export- http://cran.r-project.org/web/packages/RcmdrPlugin.Export/RcmdrPlugin.Export.pdf
    6. FactoMineR
    7. HH
    8. IPSUR
    9. MAc- http://cran.r-project.org/web/packages/RcmdrPlugin.MAc/RcmdrPlugin.MAc.pdf
    10. MAd
    11. orloca
    12. PT
    13. qcc- http://cran.r-project.org/web/packages/RcmdrPlugin.qcc/RcmdrPlugin.qcc.pdf and http://cran.r-project.org/web/packages/qcc/qcc.pdf
    14. qual
    15. SensoMineR
    16. SLC
    17. sos
    18. survival-http://cran.r-project.org/web/packages/RcmdrPlugin.survival/RcmdrPlugin.survival.pdf
    19. SurvivalT
    20. Teaching Demos

    Note the naming convention for above e plugins is always with a Prefix of “RCmdrPlugin.” followed by the names above
    Also on loading a Plugin, it must be already installed locally to be visible in R Commander’s list of load-plugin, and R Commander loads the e-plugin after restarting.Hence it is advisable to load all R Commander plugins in the beginning of the analysis session.

    However the notable E Plugins are
    1) DoE for Design of Experiments-
    Full factorial designs, orthogonal main effects designs, regular and non-regular 2-level fractional
    factorial designs, central composite and Box-Behnken designs, latin hypercube samples, and simple D-optimal designs can currently be generated from the GUI. Extensions to cover further latin hypercube designs as well as more advanced D-optimal designs (with blocking) are planned for the future.
    2) Survival- This package provides an R Commander plug-in for the survival package, with dialogs for Cox models, parametric survival regression models, estimation of survival curves, and testing for differences in survival curves, along with data-management facilities and a variety of tests, diagnostics and graphs.
    3) qcc -GUI for  Shewhart quality control charts for continuous, attribute and count data. Cusum and EWMA charts. Operating characteristic curves. Process capability analysis. Pareto chart and cause-and-effect chart. Multivariate control charts
    4) epack- an Rcmdr “plug-in” based on the time series functions. Depends also on packages like , tseries, abind,MASS,xts,forecast. It covers Log-Exceptions garch
    and following Models -Arima, garch, HoltWinters
    5)Export- The package helps users to graphically export Rcmdr output to LaTeX or HTML code,
    via xtable() or Hmisc::latex(). The plug-in was originally intended to facilitate exporting Rcmdr
    output to formats other than ASCII text and to provide R novices with an easy-to-use,
    easy-to-access reference on exporting R objects to formats suited for printed output. The
    package documentation contains several pointers on creating reports, either by using
    conventional word processors or LaTeX/LyX.
    6) MAc- This is an R-Commander plug-in for the MAc package (Meta-Analysis with
    Correlations). This package enables the user to conduct a meta-analysis in a menu-driven,
    graphical user interface environment (e.g., SPSS), while having the full statistical capabilities of
    R and the MAc package. The MAc package itself contains a variety of useful functions for
    conducting a research synthesis with correlational data. One of the unique features of the MAc
    package is in its integration of user-friendly functions to complete the majority of statistical steps
    involved in a meta-analysis with correlations. It uses recommended procedures as described in
    The Handbook of Research Synthesis and Meta-Analysis (Cooper, Hedges, & Valentine, 2009).

    A query to help for ??Rcmdrplugins reveals the following information which can be quite overwhelming given that almost 20 plugins are now available-

    RcmdrPlugin.DoE::DoEGlossary
    Glossary for DoE terminology as used in
    RcmdrPlugin.DoE
    RcmdrPlugin.DoE::Menu.linearModelDesign
    RcmdrPlugin.DoE Linear Model Dialog for
    experimental data
    RcmdrPlugin.DoE::Menu.rsm
    RcmdrPlugin.DoE response surface model Dialog
    for experimental data
    RcmdrPlugin.DoE::RcmdrPlugin.DoE-package
    R-Commander plugin package that implements
    design of experiments facilities from packages
    DoE.base, FrF2 and DoE.wrapper into the
    R-Commander
    RcmdrPlugin.DoE::RcmdrPlugin.DoEUndocumentedFunctions
    Functions used in menus
    RcmdrPlugin.doex::ranblockAnova
    Internal RcmdrPlugin.doex objects
    RcmdrPlugin.doex::RcmdrPlugin.doex-package
    Install the DOEX Rcmdr Plug-In
    RcmdrPlugin.EHESsampling::OpenSampling1
    Internal functions for menu system of
    RcmdrPlugin.EHESsampling
    RcmdrPlugin.EHESsampling::RcmdrPlugin.EHESsampling-package
    Help with EHES sampling
    RcmdrPlugin.Export::RcmdrPlugin.Export-package
    Graphically export objects to LaTeX or HTML
    RcmdrPlugin.FactoMineR::defmacro
    Internal RcmdrPlugin.FactoMineR objects
    RcmdrPlugin.FactoMineR::RcmdrPlugin.FactoMineR
    Graphical User Interface for FactoMineR
    RcmdrPlugin.IPSUR::IPSUR-package
    An IPSUR Plugin for the R Commander
    RcmdrPlugin.MAc::RcmdrPlugin.MAc-package
    Meta-Analysis with Correlations (MAc) Rcmdr
    Plug-in
    RcmdrPlugin.MAd::RcmdrPlugin.MAd-package
    Meta-Analysis with Mean Differences (MAd) Rcmdr
    Plug-in
    RcmdrPlugin.orloca::activeDataSetLocaP
    RcmdrPlugin.orloca: A GUI for orloca-package
    (internal functions)
    RcmdrPlugin.orloca::RcmdrPlugin.orloca-package
    RcmdrPlugin.orloca: A GUI for orloca-package
    RcmdrPlugin.orloca::RcmdrPlugin.orloca.es
    RcmdrPlugin.orloca.es: Una interfaz grafica
    para el paquete orloca
    RcmdrPlugin.qcc::RcmdrPlugin.qcc-package
    Install the Demos Rcmdr Plug-In
    RcmdrPlugin.qual::xbara
    Internal RcmdrPlugin.qual objects
    RcmdrPlugin.qual::RcmdrPlugin.qual-package
    Install the quality Rcmdr Plug-In
    RcmdrPlugin.SensoMineR::defmacro
    Internal RcmdrPlugin.SensoMineR objects
    RcmdrPlugin.SensoMineR::RcmdrPlugin.SensoMineR
    Graphical User Interface for SensoMineR
    RcmdrPlugin.SLC::Rcmdr.help.RcmdrPlugin.SLC
    RcmdrPlugin.SLC: A GUI for slc-package
    (internal functions)
    RcmdrPlugin.SLC::RcmdrPlugin.SLC-package
    RcmdrPlugin.SLC: A GUI for SLC R package
    RcmdrPlugin.sos::RcmdrPlugin.sos-package
    Efficiently search R Help pages
    RcmdrPlugin.steepness::Rcmdr.help.RcmdrPlugin.steepness
    RcmdrPlugin.steepness: A GUI for
    steepness-package (internal functions)
    RcmdrPlugin.steepness::RcmdrPlugin.steepness
    RcmdrPlugin.steepness: A GUI for steepness R
    package
    RcmdrPlugin.survival::allVarsClusters
    Internal RcmdrPlugin.survival Objects
    RcmdrPlugin.survival::RcmdrPlugin.survival-package
    Rcmdr Plug-In Package for the survival Package
    RcmdrPlugin.TeachingDemos::RcmdrPlugin.TeachingDemos-package
    Install the Demos Rcmdr Plug-In