Here is a new-old system in open source for
for building and scoring statistical models designed to work with data sets that are too large to fit into memory.
http://code.google.com/p/augustus/
Augustus is an open source software toolkit for building and scoring statistical models. It is written in Python and its
most distinctive features are:
• Ability to be used on sets of big data; these are data sets that exceed either memory capacity or disk capacity, so
that existing solutions like R or SAS cannot be used. Augustus is also perfectly capable of handling problems
that can fit on one computer.
• PMML compliance and the ability to both:
– produce models with PMML-compliant formats (saved with extension .pmml).
– consume models from files with the PMML format.
Augustus has been tested and deployed on serveral operating systems. It is intended for developers who work in the
financial or insurance industry, information technology, or in the science and research communities.
Usage
Augustus produces and consumes Baseline, Cluster, Tree, and Ruleset models. Currently, it uses an event-based
approach to building Tree, Cluster and Ruleset models that is non-standard.
New to PMML ?
Read on http://code.google.com/p/augustus/wiki/PMML
The Predictive Model Markup Language or PMML is a vendor driven XML markup language for specifying statistical and data mining models. In other words, it is an XML language so that Continue reading “PMML Augustus”






