SAP and University Alliances

Great Stuff on SAP’s University Network I did exchange emails before turning them over to the Departmental guys-they are really serious on expanding the pool of analysts

more analytics and BI companies should do this- and buzz me at www.twitter.com/decisionstats if you like to do it with U Tenn students- we currently are working on the the world’s biggest super computer at the nearby Oakridge National Lab.

Screenshot-12

Advanced Analytics on Multi-Terabyte Datasets- Conferences

Some news on Data Mining 2009 by Aster Data –

SAS and Aster Data to Present “Advanced Analytics on Multi-Terabyte Datasets” at M2009 in Las Vegas – Oct. 26-27
Learn how the tight coupling of SQL and MapReduce provided by Aster Data creates new ‘big data’ analytics opportunities when combined with SAS. Aster Data will exhibit throughout the event.
More

And also a nice  webcast by Curt Monash on the same Big Data topic-

Mastering MapReduce Webinar Series, Session 1
“Big Data Reality: The Role of MapReduce in Big Data Management and Analysis”- Oct. 15
Industry analyst Curt Monash explains the basics of MapReduce, key uses cases, and which industries and applications are heavily using MapReduce. Topics include recommendations for integrating MapReduce in an enterprise business intelligence and data warehousing environment.
More

Also,

Here is a brief synopsis on the Aster Data ( http://www.facebook.com/pages/Aster-Data-Systems/5601042375) Sponsored Big Data Summit  ( http://www.facebook.com/pages/Big-Data-Summit/143312171156 )which I attended-

  • A Plan for Large Scale Data Analytics: How to Utilize Aster nCluster and Hadoop in a Symbiotic
    Relationship to Support Processing in Excess of 100 Billion Rows Per Month
    – Michael Brown and Will Duckworth
    (EVP, Software Engineering, comScore, Inc. and Director, Software Engineering, comScore, Inc.)

This talked of the special needs of Com Score in handling big data and why Map Reduce and Hadoop seem to be the cost effective solutions for big big data while RDBMS seems stuck in the middle of middle data. Broadly informative on the statistical challenges of the future given the explosion of data as well.

  • Making Sense of Hadoop – Its Fit With Data Warehouses – Colin White
    (President and Founder of BI Research)

Colin brought a nice perspective on the open source Hadoop vis a vis the Properietary packages and the traditional DBMS. His perspective on the solution is no software is perfect for all needs while all softwares that sell have their own good points while the converging solution could be a heterogeneous solution of the above.

  • MapReduce Inside a Database System – When and How Case Studies from ShareThis, Specific Media, and Other – Tasso Argyros (Chief Technology Officer and Co-Founder of Aster Data)

This was a more detailed look at the Big Product Launch ( the Hadoop Connector) by Tasso and an interesting look at time series analysis using nPath rather than SQL . Interesting given the ongoing convergence analytics and business intelligence.

Also Tasso lived up to his presenting charm with an excellent pitch on nPath (as his interview said ).

  • Large-Scale Analytics at LinkedIn – Jonathan Goldman
    (Former Principal Scientist at LinkedIn)

This was nice given Jonathan’s perscpective ( he has Phd In Physics) and now does consulting for LinkedIn while maintaining his interests in education- the special needs for social media websites, designing experiments on the fly with huge real time datasets as well as some interesting visualizations (like India and America have the second biggest cross country Li connections after USA- UK. Apparently Linkedin ( http://www.facebook.com/group.php?gid=2211231478 ) does not sound so good when translated in Chinese ( AT Dinner I learnt from a fellow Chinese student that China censors Facebook – sigh!).

  • Networking Mixer: Beer, wine, hot hors d’oeuvres

I got interviewed ( AFTER) I had mixed some Beer and Wine for myself. The Video interview which was the first video interview I have given ( You know- I have taken SOME interviews by Email and plan to do some more while in Vegas for the Data Mining 2009  with SAS http://www.facebook.com/group.php?gid=2227381262)

They are still editing that interview 😉

—That was all – you need to send me a Facebook invite to see the rest of the NY trip or better still just join the Facebook page of Decision Stats at

http://www.facebook.com/pages/DecisionStats/191421035186

After two weeks I hope to have some more coverage on Data Mining 2009 while at the same time enjoying my much needed Fall Break-  Life at University at Tennessee is looking up ( since we beat Georgia 45-19 🙂 )

r*xE5HeUJa(%

Interview Michael Zeller,CEO Zementis on PMML

Here is a topic specific interview with Micheal Zeller of Zementis on PMML, the de facto standard for data mining.

PMML Logo

Ajay- What is PMML?

Mike- The Predictive Model Markup Language (PMML) is the leading standard for statistical and data mining models and supported by all leading analytics vendors and organizations. With PMML, it is straightforward to develop a model on one system using one application and deploy the model on another system using another application. PMML reduces complexity and bridges the gap between development and production deployment of predictive analytics.

PMML is governed by the Data Mining Group (DMG), an independent, vendor led consortium that develops data mining standards

Ajay- Why can PMML help any business?

Mike– PMML ensures business agility with respect to data mining, predictive analytics, and enterprise decision management. It provides one standard, one deployment process, across all applications, projects and business divisions. In this way, business stakeholders, analytic scientists, and IT are finally speaking the same language.

In the current global economic crisis more than ever, a company must become more efficient and optimize business processes to remain competitive. Predictive analytics is widely regarded as the next logical step, implementing more intelligent, real-time decisions across the enterprise.

However, the deployment of decisions based on predictive models and statistical algorithms has been a hurdle for many companies. Typically, it has been a complex, costly process to get such models integrated into operational systems. With the PMML standard, this no longer is the case. PMML simply eliminates the deployment complexity for predictive models.

A standard also provides choices among vendors, allowing us to implement best-of-breed solutions, and creating a common knowledge framework for internal teams – analytics, IT, and business – as well external vendors and consultants. In general, having a solid standard is a sign of a mature analytics industry, creating more options for users and, most importantly, propelling the total analytics market to the next level.

Ajay- Can PMML help your existing software in analytics and BI?

Mike- PMML has been widely accepted among vendors, almost all major analytics and business intelligence vendors already support the standard. If you have any such software package in-house, you most likely have PMML at your disposal already.

For example, you can develop your models in any of the tools that support PMML, e.g., SPSS, SAS, Microstrategy, or IBM, and then deploy that model in ADAPA, which is the Zementis decision engine. Or you can even choose from various open source tools, like R and KNIME.

PMML_Now

Ajay- How does Zementis and ADAPA and PMML fit?

Mike- Zementis has been a avid supporter of the PMML standard and is very active in the development of the standard. We contributed to the PMML package for the open source R Project. Furthermore, we created a free PMML Converter tool which helps users to validate and correct PMML files from various vendors and convert legacy PMML files to the latest version of the standard.

Most prominently with ADAPA, Zementis launched the first cloud-computing scoring engine on the Amazon EC2 cloud. ADAPA is a highly scalable deployment, integration and execution platform for PMML-based predictive models. Not only does it give you all the benefits of being fully standards-based, using PMML and web services, but it also leverages the cloud for scalability and cost-effectiveness.

By being a Software as a Service (SaaS) application on Amazon EC2, ADAPA provides extreme flexibility, from casual usage which only costs a few dollars a month all the way to high-volume mission critical enterprise decision management which users can seamlessly launch in the United States or in European data centers.

Ajay- What are some examples where PMML helped companies save money?

Mike- For any consulting company focused on developing predictive analytics models for clients, PMML provides tremendous benefits, both for clients and service provider. In standardizing on PMML, it defines a clear deliverable – a PMML model – which clients can deploy instantly. No fixed requirements on which specific tools to choose for development or deployment, it is only important that the model adheres to the PMML standard which becomes the common interface between the business partners. This eliminates miscommunication and lowers the overall project cost. Another example is where a company has taken advantage of the capability to move models instantly from development to operational deployment. It allows them to quickly update models based on market conditions, say in the area of risk management and fraud detection, or to roll out new marketing campaigns.

Personally, I think the biggest opportunities are still ahead of us as more and more businesses embrace operational predictive analytics. The true value of PMML is to facilitate a real-time decision environment where we leverage predictive models in every business process, at every customer touch point and on-demand to maximize value

Ajay- Where can I find more information about PMML?

Mike- First there is the Data Mining Group (DMG) web site at http://www.dmg.org

I strongly encourage any company that has a significant interest in predictive analytics to become a member and help drive the development of the standard.

We also created a knowledge base of PMML-related information at http://www.predictive-analytics.info and there is a PMML interest group on Linked

In http://www.linkedin.com/groupRegistration?gid=2328634

This group is more geared toward a general discussion forum for business benefits and end-user questions, and it is a great way to get started with PMML.

Last but not least, the Zementis web site at http://www.zementis.com

It contains various PMML example files, the PMML Converter tool, as well links to PMML resource pages on the web.

For more on Michael Zeller and Zementis read his earlier interview at https://decisionstats.wordpress.com/2009/02/03/interview-michael-zeller-ceozementis-2/

Interview Ken O Connor Business Intelligence Consultant

Here is an interview with an industry veteran of Business Intelligence, Ken O Connor.

Ajay- Describe your career journey across the full development cycle of Business Intelligence.

Ken- I started my career in the early 80’s in the airline industry, where I worked as an application programmer and later as a systems programmer. I took a computer science degree by night. The airline industry was one of the first to implement computer systems in the ‘60s, and the legacy of being an early adaptor was that airline reservation systems were developed in Assembler. Remarkable as it sounds now, as application programmers, we wrote our own file access methods. Even more remarkable, as systems programmers, we modified the IBM supplied Operating System, originally known as the Airline Control Program (ACP), later renamed as Transaction Processing Facility (TPF). The late ‘80s saw the development of Global “Computer Reservations Systems” (CRS systems) including AMADEUS and GALILEO. I moved from Aer Lingus, a small Irish airline, to work in London on the British Airways systems, to enable the British Airways systems share information and communicate with the new Global CRS systems.

I learnt very important lessons during those years.

* The criticality of standards

* The drive for interoperability of systems

* The drive towards information sharing

* The drive away from bespoke development

In the 90’s I returned to Dublin, where I worked as an independent consultant with IBM on many data intensive projects. On one project I was lead developer in the IBM Dublin Laboratory on the development of the Data Replication tool called “Data Propagator NonRelational”. This tool automatically propagates updates made on IMS databases to DB2 databases. On this project, we successfully piloted using the Cleanroom Development Method, as part of IBM’s derive towards Six Sigma quality.

In the past 15 years I have moved away from IT towards the business. I describe myself as a Hybrid. I believe there is a serious communications gap between business users and IT, and this is a frequent cause of project failures. I seek to bridge that gap. I ensure that requirements are clear, measurable, testable, and capable of being easily understood and signed off by business owners.

One of my favorite programmes was Euro Changeover, This was a hugely data intensive programme. It was the largest changeover undertaken by European Financial Institutions. I worked as an independent consultant with the IBM Euro Centre of Competence. I developed changeover strategies for a number of Irish Enterprises, and was the End to End IT changeover process owner in a major Irish bank. Every application and every data store holding currency sensitive data (not just amounts, but currency signs etc.) had to be converted at exactly the same time to ensure that all systems successfully switched to euro processing on 1st January 2002.

I learnt many, many lasting lessons about data the hard way on Euro Changeover programmes, such as:

* The extent to which seemingly separate applications share operational data – often without the knowledge of the owning application.

* The extent to which business users use (abuse) data fields to hold information never intended for the data field.

* The critical distinction between the underlying data (in a data store) and the information displayed to a business user.

I have worked primarily on what I call “End of food chain” projects and programmes, such as Single View of Customer, data migrations, and data population of repositories for BASEL II and Anti Money Laundering (AML) systems. Business Intelligence is another example of an “End of food chain” project. “End of food-chain” projects share the following characteristics:

* Dependent on existing data

* No control over the quality of existing data they depend on

* No control over the data entry processes by which the data they require is captured.

* The data required may have been captured many years previously.

Recently, I have shared my experience of “Enterprise wide data issues” in a series of posts on my blog, together with a process for assessing the status of those issues within an Enterprise (more details). In my experience, the success of a Business Intelligence programme and the ease with which an Enterprise completes “End of food chain” data dependent programmes directly depends on the status of the common Enterprise Wide data issues I have identified.

Ajay -Describe the educational scene for science graduates in Ireland. What steps do you think governments and universities can do to better teach science and keep young people excited about it?

Ken- I am not in a position to comment on the educational scene for science graduates in Ireland. However, I can say that currently there are insufficient numbers of school children studying science in primary and 2nd level education. There is a need to excite young people about science. There is a need for more interactive science museums, like W5 in Belfast which is hugely successful. Kids love to get involved, and practical science can be great fun.

Ajay- What are some of the key trends in business intelligence that you have seen-

Ken- Since the earliest days of my career, I have seen an ever increasing move towards standards based interoperability of systems, and interchange of data. This has accelerated dramatically in recent years. This is the good news. Further good news is the drive towards the use of external reference databases to verify the accuracy of data, at point of data entry (See blog post on Upstream prevention by Henrik Liliendahl Sørensen). One example of this drive is cloud based verification services from new companies like Ireland based Clavis Technology.

The harsh reality is that “Old hardware goes into museums, while old software goes into production every night”. Enterprises have invested vast amounts of money in legacy applications over decades. These legacy systems access legacy data in legacy data stores. This legacy data will continue to pose challenges in the delivery of Business Intelligence to the Business community that needs it. These challenges will continue to provide opportunities for Data Quality professionals.

Ajay- What is going to be the next fundamental change in this industry in your opinion?

Ken- The financial crisis will result in increased regulatory requirements. This will be good news for the Business Intelligence / Data Quality industry. In time, it will no longer be sufficient to provide the regulator with ‘just’ the information requested. The regulator will want to see the process by which the information was gathered; the process controls, and evidence of the quality the underlying data from which the information was derived. This move will result in funding for Data Governance programmes, which will lead to increased innovation in our industry.

Ajay- Describe your startup Map My Business, your target customer and your vision for it.

Ken- I started MapMyBusiness.com as a “recession buster”. Ireland was hit particularly hard by the financial crisis. I had become over dependent on the financial services industry, and a blanket ban on the use of external consultants left me with no option but to reinvent myself. MapMyBusiness.com helps small businesses to attract clients, by getting them on Google page one. Having been burnt by an over dependence on one industry, my vision is to diversify. I believe that Data Governance is industry independent, and I am focussing on increasing my customer base for my Data Governance consultancy skills, via my company Professional IT Personnel Ltd.

Ajay- What do you do when not working with customers or blogging on your website?

Ken- I try to achieve a reasonable work/life balance. I am married with two children aged 12 and 10, and like to spend time with them, especially outdoors, walking, hiking, playing tennis etc. I am involved in my community, lobbying for improved cycling infrastructure in our area (more details). Ireland, like most countries, is facing an obesity epidemic, due to an increasingly sedentary lifestyle. Too many people get little or no exercise, and don’t have the time, willpower, or perhaps money, to regularly work out in a gym. By including “Active Travel” in our daily lives – by walking or cycling to schools and local amenities, we can get enough physical exercise to prevent obesity, and obesity related health problems. We need to make our cities, towns and villages more pedestrian and cyclist friendly, to encourage “active travel”. My voluntary work in this area introduced me to mapping (see example), and enabled me to set up MapMyBusiness.com.

Biography-

Ken O’Connor is an independent IT Consultant with almost 30 years of work experience. He specialises in Data: Data Migration, Data Population, Data Governance, Data Quality, Data Profiling…His  company is called Professional IT Personnel Ltd.

Ken started his blog (Ken O’ Connor Data Consultant) to share his experience and to learn from the experience of others.   Dylan Jones, editor of dataqualitypro, describe Ken as a “grizzled veteran”, with almost 30 years experience across the full development lifecycle.

New York Diner

In a New York Thai restaurant
I dine alone being new to York town
Borrowing conversation from left and right
Bringing no conversation of my own in the fading twilight

As bubbles slowly bubble from a sparkling dollar five glass
I watch from shadows as pretty people come and go as they say excuse me and quickly pass

I am an odd ball I know
Brown monkey nowhere to go
The waiters give me a look best called quizzical
What on the napkin do I scribble

Will the fellow eat and clear in peace start by giving chicken panang a nibble

Will I pay up this after all is west Harlem
Asians don’t tip they have been before on this trip

And I drink and devour
Dinner fine and dine
IMG_0084
Watching conversation sparkle up
As sparkling wine goes down
I nod I say people are just the same
Appearances change but they play the same old games

Up when happy when sad they are down
Every big big city every new yet old town

We drink different wines
But then think similar thoughts

Daily joys and same different struggles
That our love and life bought

Wine brings heat to our face
Letting my jacket slip a bit
The waiter slips me a seen it all look
Are you. Serious he thinks you silly twit

Leaving all pretences
I chug wine like we chug beer
Expensive to my sponsors
But hey it brings me cheer

Ole  lady on my left
Drunk college chicks on my right
Smart  dame right across room
Cute Thai waitress completes a pleasant sight

IMG_0114

Chug chug chug
We drink sparkling wine
Eating and being merry
Old wine makes new troubles all fine

Now thinking deeper-

In the middle of urban sub arcana
Face to face verbal smacks in your space
Comes a concept called Americana

Passionate adjectives and superlative passions
Americana is an euphemism for monetary nirvana

Nasal voices on my right
Deep bass slightly in front
To my left a wavering voice wavers
Aromatic cacophony my ears take the brunt

Wine slipping down slowly
But hey rising so fast
After effects may disappear soon
But the mellow pleasure promises to last