Home » Analytics » PMML 4.0

PMML 4.0

Software

R in the Cloud

Train in R

There are some nice changes in the PMML 4.0 version. PMML is the XML version for data modeling , or specificallyquoting the DMG group itself

PMML uses XML to represent mining models. The structure of the models is described by an XML Schema. One or more mining models can be contained in a PMML document. A PMML document is an XML document with a root element of type PMML. The general structure of a PMML document is:

  <?xml version="1.0"?>
  <PMML version="4.0"
    xmlns="http://www.dmg.org/PMML-4_0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >

    <Header copyright="Example.com"/>
    <DataDictionary> ... </DataDictionary>

    ... a model ...

  </PMML>

So what is new in version 4. Here are some powerful modeling changes. For anyone with any XML knowledge PMML is the way to go.

PMML 4.0 – Changes from PMML 3.2

Associations

  • Itemset and AssociationRule elements are no longer enclosed within a “Choice” element
  • Added different scoring procedures: recommendation, exclusiveRecommendation and ruleAssociation with explanation and example
  • Changed version to “4.0″ from “3.2″ in the example(s)

BuiltinFunctions

Added the following functions:
  • isMissing
  • isNotMissing
  • equal
  • notEqual
  • lessThan
  • lessOrEqual
  • greaterThan
  • greaterOrEqual
  • isIn
  • isNotIn
  • and
  • or
  • not
  • isIn
  • isNotIn
  • if

Click on Image for better resolution

ClusteringModel

  • Changed version to “4.0″ from “3.2″ in the example(s)
  • Added reference to ModelExplanation element in the model XSD

Conformance

  • Changed all version references from “3.2″ to “4.0″

DataDictionary

  • No changes

Functions

  • No changes

GeneralRegression

  • Changed to allow for Cox survival models and model ensembles
    • Add new model type: CoxRegression.
    • Allow empty regression model when model type is CoxRegression, so that baseline-only model could be represented.
    • Add new optional model attributes: endTimeVariable, startTimeVariable, subjectIDVariable, statusVariable, baselineStrataVariable, modelDF.
    • Add optional Matrix in Predictor to specify a contrast matrix, optional attribute referencePoint in Parameter.
    • Add new elements: BaseCumHazardTables, EventValues, BaselineStratum, BaselineCell.
    • Add examples of scoring for Cox Regression and contrast matrices.
    • Add new type of distribution: tweedie.
    • Add new attribute in model: targetReferenceCategory, so that the model can be used in MiningModel.
    • Changed version to “4.0″ from “3.2″ in the example(s)
    • Added reference to ModelExplanation element in the model XSD

GeneralStructure

Header

  • No changes

Interoperability

  • Changed: “As a result, a new approach for interoperability was required and is being introduced in PMML version 3.2.” to “As a result, a new approach for interoperability was introduced in PMML version 3.2.”

MiningSchema

  • Added frequencyWeight and analysisWeight as new options for usageType. They will not affect scoring, but will make model information more complete.

ModelComposition — No longer used, replaced by MultipleModels

ModelExplanation

  • New addition to PMML 4.0 that contains information to explain the models, model fit statistics, and visualization information.

ModelVerification

  • No changes

MultipleModels

  • Replaces ModelComposition. Important additions are segmentation and ensembles.
  • Added reference to ModelExplanation element in the model XSD

NaïveBayes

  • Changed version to “4.0″ from “3.2″ in the example(s)
  • Added reference to ModelExplanation element in the model XSD

NeuralNetwork

  • Changed version to “4.0″ from “3.2″ in the example(s)
  • Added reference to ModelExplanation element in the model XSD

Output

  • Extended output type to include Association rule models. The changes add a number of new attributes: “ruleFeature”, “algorithm”, “rank”, “rankBasis”, “rankOrder” and “isMultiValued”. A new enumeration type “ruleValue” is added to the RESULT-FEATURE

Regression

  • Changed version to “4.0″ from “3.2″ in the example(s)
  • Added reference to ModelExplanation element in the model XSD

RuleSet

  • Changed version to “4.0″ from “3.2″ in the example(s)
  • Added reference to ModelExplanation element in the model XSD

Sequence

  • Changed version to “4.0″ from “3.2″ in the example(s)

Statistics

  • accommodate weighted counts by replacing INT-ARRAY with NUM-ARRAY in DiscrStats and ContStats
  • change xs:nonNegativeInteger to xs:double in several places
  • add new boolean attribute ‘weighted’ to UnivariateStats and PartitionFieldStats elements
  • add new attribute cardinality in Counts
  • Also some very long lines in this document are now wrapped.

SupportVectorMachine

  • Added optional attribute threshold
  • Added optional attribute classificationMethod
  • Attribute alternateTargetCategory removed from SupportVectorMachineModel element and moved to SupportVectorMachine element
  • Changed the example slightly
  • Changed version to “4.0″ from “3.2″ in the example(s)
  • Added reference to ModelExplanation element in the model XSD

Targets

  • No changes

Taxonomy

  • Changed: “A TableLocator may contain any description which helps an application to locate a certain table. PMML 3.2 does not yet define the content. PMML users have to use their own extensions. The same applies to InlineTable.” to “A TableLocator may contain any description which helps an application to locate a certain table. PMML standard does not yet define the content. PMML users have to use their own extensions. The same applies to InlineTable.”

Text

  • Changed version to “4.0″ from “3.2″ in the example(s)
  • Added reference to ModelExplanation element in the model XSD

TimeSeriesModel

  • New addition to PMML 4.0 to support Time series models

Transformations

  • No changes

TreeModel

  • Changed version to “4.0″ from “3.2″ in the example(s)
  • Added reference to ModelExplanation element in the model XSD

Sources

http://www.dmg.org/v4-0/GeneralStructure.html

http://www.dmg.org/v4-0/Changes.html

and here are some companies using PMML already

http://www.dmg.org/products.html

I found the tool at http://www.dmg.org/coverage/ much more interesting though (see screenshot).

Screenshot-Mozilla Firefox

Zementis who we have covered in the interviews has played a steller role in bring together this common standard for data mining. Note Kxen model is also highlighted there.

The best PMML convertor tutorial is here

http://www.zementis.com/videos/PMML_Converter_iGoogle_gadget_2_demo.htm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Conferences

Predictive Analytics- The Book

Books

Follow

Get every new post delivered to your Inbox.

Join 831 other followers