There are some nice changes in the PMML 4.0 version. PMML is the XML version for data modeling , or specificallyquoting the DMG group itself
PMML uses XML to represent mining models. The structure of the models is described by an XML Schema. One or more mining models can be contained in a PMML document. A PMML document is an XML document with a root element of type PMML. The general structure of a PMML document is:
<?xml version="1.0"?> <PMML version="4.0" xmlns="http://www.dmg.org/PMML-4_0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > <Header copyright="Example.com"/> <DataDictionary> ... </DataDictionary> ... a model ... </PMML>
So what is new in version 4. Here are some powerful modeling changes. For anyone with any XML knowledge PMML is the way to go.
PMML 4.0 – Changes from PMML 3.2
Associations
- Itemset and AssociationRule elements are no longer enclosed within a “Choice” element
- Added different scoring procedures: recommendation, exclusiveRecommendation and ruleAssociation with explanation and example
- Changed version to “4.0” from “3.2” in the example(s)
BuiltinFunctions
Added the following functions:
- isMissing
- isNotMissing
- equal
- notEqual
- lessThan
- lessOrEqual
- greaterThan
- greaterOrEqual
- isIn
- isNotIn
- and
- or
- not
- isIn
- isNotIn
- if
Click on Image for better resolution
ClusteringModel
- Changed version to “4.0” from “3.2” in the example(s)
- Added reference to ModelExplanation element in the model XSD
Conformance
- Changed all version references from “3.2” to “4.0”
DataDictionary
- No changes
Functions
- No changes
GeneralRegression
- Changed to allow for Cox survival models and model ensembles
- Add new model type: CoxRegression.
- Allow empty regression model when model type is CoxRegression, so that baseline-only model could be represented.
- Add new optional model attributes: endTimeVariable, startTimeVariable, subjectIDVariable, statusVariable, baselineStrataVariable, modelDF.
- Add optional Matrix in Predictor to specify a contrast matrix, optional attribute referencePoint in Parameter.
- Add new elements: BaseCumHazardTables, EventValues, BaselineStratum, BaselineCell.
- Add examples of scoring for Cox Regression and contrast matrices.
- Add new type of distribution: tweedie.
- Add new attribute in model: targetReferenceCategory, so that the model can be used in MiningModel.
- Changed version to “4.0” from “3.2” in the example(s)
- Added reference to ModelExplanation element in the model XSD
GeneralStructure
- Updated the XML namespace to “http://www.dmg.org/PMML-4_0” from “http://www.dmg.org/PMML-3_2”
- Added TimeSeriesModel to the PMML XSD
Header
- No changes
Interoperability
- Changed: “As a result, a new approach for interoperability was required and is being introduced in PMML version 3.2.” to “As a result, a new approach for interoperability was introduced in PMML version 3.2.”
MiningSchema
- Added frequencyWeight and analysisWeight as new options for usageType. They will not affect scoring, but will make model information more complete.
ModelComposition — No longer used, replaced by MultipleModels
ModelExplanation
- New addition to PMML 4.0 that contains information to explain the models, model fit statistics, and visualization information.
ModelVerification
- No changes
MultipleModels
- Replaces ModelComposition. Important additions are segmentation and ensembles.
- Added reference to ModelExplanation element in the model XSD
NaïveBayes
- Changed version to “4.0” from “3.2” in the example(s)
- Added reference to ModelExplanation element in the model XSD
NeuralNetwork
- Changed version to “4.0” from “3.2” in the example(s)
- Added reference to ModelExplanation element in the model XSD
Output
- Extended output type to include Association rule models. The changes add a number of new attributes: “ruleFeature”, “algorithm”, “rank”, “rankBasis”, “rankOrder” and “isMultiValued”. A new enumeration type “ruleValue” is added to the RESULT-FEATURE
Regression
- Changed version to “4.0” from “3.2” in the example(s)
- Added reference to ModelExplanation element in the model XSD
RuleSet
- Changed version to “4.0” from “3.2” in the example(s)
- Added reference to ModelExplanation element in the model XSD
Sequence
- Changed version to “4.0” from “3.2” in the example(s)
Statistics
- accommodate weighted counts by replacing INT-ARRAY with NUM-ARRAY in DiscrStats and ContStats
- change xs:nonNegativeInteger to xs:double in several places
- add new boolean attribute ‘weighted’ to UnivariateStats and PartitionFieldStats elements
- add new attribute cardinality in Counts
- Also some very long lines in this document are now wrapped.
SupportVectorMachine
- Added optional attribute threshold
- Added optional attribute classificationMethod
- Attribute alternateTargetCategory removed from SupportVectorMachineModel element and moved to SupportVectorMachine element
- Changed the example slightly
- Changed version to “4.0” from “3.2” in the example(s)
- Added reference to ModelExplanation element in the model XSD
Targets
- No changes
Taxonomy
- Changed: “A TableLocator may contain any description which helps an application to locate a certain table. PMML 3.2 does not yet define the content. PMML users have to use their own extensions. The same applies to InlineTable.” to “A TableLocator may contain any description which helps an application to locate a certain table. PMML standard does not yet define the content. PMML users have to use their own extensions. The same applies to InlineTable.”
Text
- Changed version to “4.0” from “3.2” in the example(s)
- Added reference to ModelExplanation element in the model XSD
TimeSeriesModel
- New addition to PMML 4.0 to support Time series models
Transformations
- No changes
TreeModel
- Changed version to “4.0” from “3.2” in the example(s)
- Added reference to ModelExplanation element in the model XSD
and here are some companies using PMML already
I found the tool at http://www.dmg.org/coverage/ much more interesting though (see screenshot).
Zementis who we have covered in the interviews has played a steller role in bring together this common standard for data mining. Note Kxen model is also highlighted there.
The best PMML convertor tutorial is here
http://www.zementis.com/videos/PMML_Converter_iGoogle_gadget_2_demo.htm