Jill Dyche on 2012

In part 3 of the series for predictions for 2012, here is Jill Dyche, Baseline Consulting/DataFlux.

Part 2 was Timo Elliot, SAP at http://www.decisionstats.com/timo-elliott-on-2012/ and Part 1 was Jim Kobielus, Forrester at http://www.decisionstats.com/jim-kobielus-on-2012/

Ajay: What are the top trends you saw happening in 2011?

 

Well, I hate to say I saw them coming, but I did. A lot of managers committed some pretty predictable mistakes in 2011. Here are a few we witnessed in 2011 live and up close:

 

1.       In the spirit of “size matters,” data warehouse teams continued to trumpet the volumes of stored data on their enterprise data warehouses. But a peek under the covers of these warehouses reveals that the data isn’t integrated. Essentially this means a variety of heterogeneous virtual data marts co-located on a single server. Neat. Big. Maybe even worthy of a magazine article about how many petabytes you’ve got. But it’s not efficient, and hardly the example of data standardization and re-use that everyone expects from analytical platforms these days.

 

2.       Development teams still didn’t factor data integration and provisioning into their project plans in 2011. So we saw multiple projects spawn duplicate efforts around data profiling, cleansing, and standardization, not to mention conflicting policies and business rules for the same information. Bummer, since IT managers should know better by now. The problem is that no one owns the problem. Which brings me to the next mistake…

 

3.       No one’s accountable for data governance. Yeah, there’s a council. And they meet. And they talk. Sometimes there’s lunch. And then nothing happens because no one’s really rewarded—or penalized for that matter—on data quality improvements or new policies. And so the reports spewing from the data mart are still fraught and no one trusts the resulting decisions.

 

But all is not lost since we’re seeing some encouraging signs already in 2012. And yes, I’d classify some of them as bona-fide trends.

 

Ajay: What are some of those trends?

 

Job descriptions for data stewards, data architects, Chief Data Officers, and other information-enabling roles are becoming crisper, and the KPIs for these roles are becoming more specific. Data management organizations are being divorced from specific lines of business and from IT, becoming specialty organizations—okay, COEs if you must—in their own rights. The value proposition for master data management now includes not just the reconciliation of heterogeneous data elements but the support of key business strategies. And C-level executives are holding the data people accountable for improving speed to market and driving down costs—not just delivering cleaner data. In short, data is becoming a business enabler. Which, I have to just say editorially, is better late than never!

 

Ajay: Anything surprise you, Jill?

 

I have to say that Obama mentioning data management in his State of the Union speech was an unexpected but pretty powerful endorsement of the importance of information in both the private and public sector.

 

I’m also sort of surprised that data governance isn’t being driven more frequently by the need for internal and external privacy policies. Our clients are constantly asking us about how to tightly-couple privacy policies into their applications and data sources. The need to protect PCI data and other highly-sensitive data elements has made executives twitchy. But they’re still not linking that need to data governance.

 

I should also mention that I’ve been impressed with the people who call me who’ve had their “aha!” moment and realize that data transcends analytic systems. It’s operational, it’s pervasive, and it’s dynamic. I figured this epiphany would happen in a few years once data quality tools became a commodity (they’re far from it). But it’s happening now. And that’s good for all types of businesses.

 

About-

Jill Dyché has written three books and numerous articles on the business value of information technology. She advises clients and executive teams on leveraging technology and information to enable strategic business initiatives. Last year her company Baseline Consulting was acquired by DataFlux Corporation, where she is currently Vice President of Thought Leadership. Find her blog posts on www.dataroundtable.com.

Interview Scott Gidley CTO and Founder, DataFlux

Here is an interview with Scott Gidley, CTO and co-founder of leading data quality ccompany DataFlux . DataFlux is a part of SAS Institute and in 2011 acquired Baseline Consulting besides launching the latest version of their Master Data Management  product. Continue reading “Interview Scott Gidley CTO and Founder, DataFlux”

Analytics 2011 Conference

From http://www.sas.com/events/analytics/us/

The Analytics 2011 Conference Series combines the power of SAS’s M2010 Data Mining Conference and F2010 Business Forecasting Conference into one conference covering the latest trends and techniques in the field of analytics. Analytics 2011 Conference Series brings the brightest minds in the field of analytics together with hundreds of analytics practitioners. Join us as these leading conferences change names and locations. At Analytics 2011, you’ll learn through a series of case studies, technical presentations and hands-on training. If you are in the field of analytics, this is one conference you can’t afford to miss.

Conference Details

October 24-25, 2011
Grande Lakes Resort
Orlando, FL

Analytics 2011 topic areas include:

Augustus- a PMML model producer and consumer. Scoring engine.

A Bold GNU Head
Image via Wikipedia

I just checked out this new software for making PMML models. It is called Augustus and is created by the Open Data Group (http://opendatagroup.com/) , which is headed by Robert Grossman, who was the first proponent of using R on Amazon Ec2.

Probably someone like Zementis ( http://adapasupport.zementis.com/ ) can use this to further test , enhance or benchmark on the Ec2. They did have a joint webinar with Revolution Analytics recently.

https://code.google.com/p/augustus/

Recent News

  • Augustus v 0.4.3.1 has been released
  • Added a guide (pdf) for including Augustus in the Windows System Properties.
  • Updated the install documentation.
  • Augustus 2010.II (Summer) release is available. This is v 0.4.2.0. More information is here.
  • Added performance discussion concerning the optional cyclic garbage collection.

See Recent News for more details and all recent news.

Augustus

Augustus is a PMML 4-compliant scoring engine that works with segmented models. Augustus is designed for use with statistical and data mining models. The new release provides Baseline, Tree and Naive-Bayes producers and consumers.

There is also a version for use with PMML 3 models. It is able to produce and consume models with 10,000s of segments and conforms to a PMML draft RFC for segmented models and ensembles of models. It supports Baseline, Regression, Tree and Naive-Bayes.

Augustus is written in Python and is freely available under the GNU General Public License, version 2.

See the page Which version is right for me for more details regarding the different versions.

PMML

Predictive Model Markup Language (PMML) is an XML mark up language to describe statistical and data mining models. PMML describes the inputs to data mining models, the transformations used to prepare data for data mining, and the parameters which define the models themselves. It is used for a wide variety of applications, including applications in finance, e-business, direct marketing, manufacturing, and defense. PMML is often used so that systems which create statistical and data mining models (“PMML Producers”) can easily inter-operate with systems which deploy PMML models for scoring or other operational purposes (“PMML Consumers”).

Change Detection using Augustus

For information regarding using Augustus with Change Detection and Health and Status Monitoring, please see change-detection.

Open Data

Open Data Group provides management consulting services, outsourced analytical services, analytic staffing, and expert witnesses broadly related to data and analytics. It has experience with customer data, supplier data, financial and trading data, and data from internal business processes.

It has staff in Chicago and San Francisco and clients throughout the U.S. Open Data Group began operations in 2002.


Overview

The above example contains plots generated in R of scoring results from Augustus. Each point on the graph represents a use of the scoring engine and a chart is an aggregation of multiple Augustus runs. A Baseline (Change Detection) model was used to score data with multiple segments.

Typical Use

Augustus is typically used to construct models and score data with models. Augustus includes a dedicated application for creating, or producing, predictive models rendered as PMML-compliant files. Scoring is accomplished by consuming PMML-compliant files describing an appropriate model. Augustus provides a dedicated application for scoring data with four classes of models, Baseline (Change Detection) ModelsTree ModelsRegression Models and Naive Bayes Models. The typical model development and use cycle with Augustus is as follows:

  1. Identify suitable data with which to construct a new model.
  2. Provide a model schema which proscribes the requirements for the model.
  3. Run the Augustus producer to obtain a new model.
  4. Run the Augustus consumer on new data to effect scoring.

Separate consumer and producer applications are supplied for Baseline (Change Detection) models, Tree models, Regression models and for Naive Bayes models. The producer and consumer applications require configuration with XML-formatted files. The specification of the configuration files and model schema are detailed below. The consumers provide for some configurability of the output but users will often provide additional post-processing to render the output according to their needs. A variety of mechanisms exist for transmitting data but user’s may need to provide their own preprocessing to accommodate their particular data source.

In addition to the producer and consumer applications, Augustus is conceptually structured and provided with libraries which are relevant to the development and use of Predictive Models. Broadly speaking, these consist of components that address the use of PMML and components that are specific to Augustus.

Post Processing

Augustus can accommodate a post-processing step. While not necessary, it is often useful to

  • Re-normalize the scoring results or performing an additional transformation.
  • Supplements the results with global meta-data such as timestamps.
  • Formatting of the results.
  • Select certain interesting values from the results.
  • Restructure the data for use with other applications.

How to balance your online advertising and your offline conscience

Google in 1998, showing the original logo
Image via Wikipedia

I recently found an interesting example of  a website that both makes a lot of money and yet is much more efficient than any free or non profit. It is called ECOSIA

If you see a website that wants to balance administrative costs  plus have a transparent way to make the world better- this is a great example.

  • http://ecosia.org/how.php
  • HOW IT WORKS
    You search with Ecosia.
  • Perhaps you click on an interesting sponsored link.
  • The sponsoring company pays Bing or Yahoo for the click.
  • Bing or Yahoo gives the bigger chunk of that money to Ecosia.
  • Ecosia donates at least 80% of this income to support WWF’s work in the Amazon.
  • If you like what we’re doing, help us spread the word!
  • Key facts about the park:

    • World’s largest tropical forest reserve (38,867 square kilometers, or about the size of Switzerland)
    • Home to about 14% of all amphibian species and roughly 54% of all bird species in the Amazon – not to mention large populations of at least eight threatened species, including the jaguar
    • Includes part of the Guiana Shield containing 25% of world’s remaining tropical rainforests – 80 to 90% of which are still pristine
    • Holds the last major unpolluted water reserves in the Neotropics, containing approximately 20% of all of the Earth’s water
    • One of the last tropical regions on Earth vastly unaltered by humans
    • Significant contributor to climatic regulation via heat absorption and carbon storage

     

    http://ecosia.org/statistics.php

    They claim to have donated 141,529.42 EUR !!!

    http://static.ecosia.org/files/donations.pdf

     

     

     

     

     

     

     

     

     

     

    Well suppose you are the Web Admin of a very popular website like Wikipedia or etc

    One way to meet server costs is to say openly hey i need to balance my costs so i need some money.

    The other way is to use online advertising.

    I started mine with Google Adsense.

    Click per milli (or CPM)  gives you a very low low conversion compared to contacting ad sponsor directly.

    But its a great data experiment-

    as you can monitor which companies are likely to be advertised on your site (assume google knows more about their algols than you will)

    which formats -banner or text or flash have what kind of conversion rates

    what are the expected pay off rates from various keywords or companies (like business intelligence software, predictive analytics software and statistical computing software are similar but have different expected returns (if you remember your eco class)

     

    NOW- Based on above data, you know whats your minimum baseline to expect from a private advertiser than a public, crowd sourced search engine one (like Google or Bing)

    Lets say if you have 100000 views monthly. and assume one out of 1000 page views will lead to a click. Say the advertiser will pay you 1 $ for every 1 click (=1000 impressions)

    Then your expected revenue is $100.But if your clicks are priced at 2.5$ for every click , and your click through rate is now 3 out of 1000 impressions- (both very moderate increases that can done by basic placement optimization of ad type, graphics etc)-your new revenue is  750$.

    Be a good Samaritan- you decide to share some of this with your audience -like 4 Amazon books per month ( or I free Amazon book per week)- That gives you a cost of 200$, and leaves you with some 550$.

    Wait! it doesnt end there- Adam Smith‘s invisible hand moves on .

    You say hmm let me put 100 $ for an annual paper writing contest of $1000, donate $200 to one laptop per child ( or to Amazon rain forests or to Haiti etc etc etc), pay $100 to your upgraded server hosting, and put 350$ in online advertising. say $200 for search engines and $150 for Facebook.

    Woah!

    Month 1 would should see more people  visiting you for the first time. If you have a good return rate (returning visitors as a %, and low bounce rate (visits less than 5 secs)- your traffic should see atleast a 20% jump in new arrivals and 5-10 % in long term arrivals. Ignoring bounces- within  three months you will have one of the following

    1) An interesting case study on statistics on online and social media advertising, tangible motivations for increasing community response , and some good data for study

    2) hopefully better cost management of your server expenses

    3)very hopefully a positive cash flow

     

    you could even set a percentage and share the monthly (or annually is better actions) to your readers and advertisers.

    go ahead- change the world!

    the key paradigms here are sharing your traffic and revenue openly to everyone

    donating to a suitable cause

    helping increase awareness of the suitable cause

    basing fixed percentages rather than absolute numbers to ensure your site and cause are sustained for years.

    Short Interview Jill Dyche

    Here is brief one question interview with Jill Dyche , founder Baseline Consulting.

     

    In 2010.

     

    • It was more about consciousness-raising in the executive suite—
    • getting C-level managers to understand the ongoing value proposition of BI,
    • why MDM isn’t their father’s database, and
    • how data governance can pay for itself over time.
    • Some companies succeeded with these consciousness-raising efforts. Some didn’t.

     

    But three big ones in 2011 would be:

    1. Predictive analytics in the cloud. The technology is now ready, and so is the market—and that includes SMB companies.
    2. Enterprise search being baked into (commoditized) BI software tools. (The proliferation of static reports is SO 2006!)
    3. Data governance will begin paying dividends. Until now it was all about common policies for data. In 2011, it will be about ROI.

    I do a “Predictions for the coming year” article every January for TDWI,

    Note- Jill ‘s January TDWI article seems worth waiting for in this case.

    About-

    Source-http://www.baseline-consulting.com/pages/page.asp?page_id=49125

    Partner and Co-Founder

    Jill Dyché is a partner and co-founder of Baseline Consulting.  She is responsible for key client strategies and market analysis in the areas of data governance, business intelligence, master data management, and customer relationship management. 

    Jill counsels boards of directors on the strategic importance of their information investments.

    Author

    Jill is the author of three books on the business value of IT. Jill’s first book, e-Data (Addison Wesley, 2000) has been published in eight languages. She is a contributor to Impossible Data Warehouse Situations: Solutions from the Experts (Addison Wesley, 2002), and her book, The CRM Handbook (Addison Wesley, 2002), is the bestseller on the topic. 

    Jill’s work has been featured in major publications such as Computerworld, Information Week, CIO Magazine, the Wall Street Journal, the Chicago Tribune and Newsweek.com. Jill’s latest book, Customer Data Integration (John Wiley and Sons, 2006) was co-authored with Baseline partner Evan Levy, and shows the business breakthroughs achieved with integrated customer data.

    Industry Expert

    Jill is a featured speaker at industry conferences, university programs, and vendor events. She serves as a judge for several IT best practice awards. She is a member of the Society of Information Managementand Women in Technology, a faculty member of TDWI, and serves as a co-chair for the MDM Insight conference. Jill is a columnist for DM Review, and a blogger for BeyeNETWORK and Baseline Consulting.

     

    Cisco SocialMiner

    A highly simplified version of the RSS feed ic...
    Image via Wikipedia

    A new product from Cisco to mine social media for analytics on sentiment-

    http://www.cisco.com/en/US/products/ps11349/index.html

    Cisco SocialMiner is a social media customer care solution that can help you proactively respond to customers and prospects communicating through public social media networks like Twitter, Facebook, or other public forums or blogging sites. By providing social media monitoring, queuing, and workflow to organize customer posts on social media networks and deliver them to your social media customer care team, your company can respond to customers in real time using the same social network they are using.

    Cisco SocialMiner provides:

    • The ability to configure multiple campaigns to search for customer postings on the public social web about your company’s products, services, or area of expertise
    • Filtering of social contacts based on preconfigured campaign filters to focus campaign searches
    • Routing of social contacts to skilled customer care representatives in the contact center or to experts in the enterprise–multiple people can work together to handle responses to customer postings through shared work queues
    • Detailed metrics for social media customer care activities, campaign reports, and team reports

    With Cisco SocialMiner, your company can listen and respond to customer conversations originating in the social web. Being proactive can help your company enhance its service, improve customer loyalty, garner new customers, and protect your brand.

    Table 1. Features and Benefits of Cisco SocialMiner 8.5

    Feature Benefits
    Product Baseline Features
    Social media feeds

    • Feeds are configurable sources to capture public social contacts that contain specific words, terms, or phrases.

    • Feeds enable you to search for information on the public social web about your company’s products, services, or area of expertise.

    • Cisco SocialMiner supports the following types of feeds:

    • Facebook

    • Twitter
    Campaign management

    • Groups feeds into campaigns to organize all posting activity related to a product category or business objective

    • Produces metrics on campaign activity

    • Provides the ability to configure multiple campaigns to search for customer postings on specific products or services

    • Groups social contacts for handling by the social media customer care team

    • Enables filtering of social contacts based on preconfigured campaign filters to focus campaign searches
    Route and queue social contacts

    • Enables routing of social contacts to skilled customer care representatives in the contact center

    • Draws on expertise in the enterprise by allowing multiple people in the enterprise to work together to handle responses to customer postings through shared work queues

    • Enables automated distribution of work to improve efficiency and effectiveness of social media engagement
    Tagging

    • Allows work to be routed to the appropriate team by grouping each post or social contact into different categories; for example, a post can be marked with the “customer_support” tag; this post will then appear on a customer support agent’s queue for processing
    Social media customer care metrics

    • Provides detailed metrics on social media customer care activities, campaign reports, and team reports

    • Measures work and results

    • Manages to service-level goals

    • Supports brand management

    • Optimizes staffing

    • Includes dashboarding of social media posting activity when Cisco Unified Intelligence Center is used
    Reporting for social contacts

    • Provides a reporting database that can be accessed using any reporting tool, including Cisco Unified Intelligence Center

    • Enables customer care management to accurately report on and track social media interactions by the contact center
    OpenSocial-compliant gadgets

    Representational State Transfer (REST) application programming interfaces (APIs)

    • Provides flexible user interface options

    • Enables extensive opportunities for customization
    Optional integration with full suite of Cisco Collaboration tools

    • Allows you to take advantage of the full suite of Cisco Collaboration tools, including Cisco Quad, Cisco Show and Share, and Cisco Pulse technology, to help your social media customer care team quickly find answers to help customers efficiently and effectively

    • Easy to maintain with existing IT personnel
    Operating Environment
    Cisco Unified Computing System(UCS) C-Series or B-Series Servers

    • Requires a Cisco UCS C-Series or B-Series Server.

    • Server consolidation means lower cost per server with Cisco UCS Servers.
    Architecture
    Scalability

    • One server supports up to 30 simultaneous social media customer care users and 10,000 social contacts per hour.
    Management
    Cisco Unified Real-Time Monitoring Tool (RTMT)

    • Operational management is enhanced through integration with the Cisco Unified RTMT, providing consistent application monitoring across Cisco Unified Communications Solutions.
    Simple Network Management Protocol (SNMP)

    • SNMP with an associated MIB is supported through the Cisco Voice Operating System (VOS).
    Reporting
    Cisco Unified Intelligence Center

    • Create customizable reports of social media customer care events using Cisco Unified Intelligence Center (purchased separately).