Home » Posts tagged 'Organizations'
Tag Archives: Organizations
UPDATED POST- Some Models I use for Business Strategy- to analyze the huge reams of qualitative and uncertain data that business generates. I have added a bonus the Business canvas
- Porters 5 forces Model-To analyze industries
- Business Canvas
- BCG Matrix- To analyze Product Portfolios
- Porters Diamond Model- To analyze locations
- McKinsey 7 S Model-To analyze teams
- Gernier Theory- To analyze growth of organization
- Herzberg Hygiene Theory- To analyze soft aspects of individuals
- Marketing Mix Model- To analyze marketing mix.
Here is an interview with James G Kobielus, who is the Senior Program Director, Product Marketing, Big Data Analytics Solutions at IBM. Special thanks to Payal Patel Cudia of IBM’s communication team,for helping with the logistics for this.
Ajay -What are the specific parts of the IBM Platform that deal with the three layers of Big Data -variety, velocity and volume
James-Well first of all, let’s talk about the IBM Information Management portfolio. Our big data platform addresses the three layers of big data to varying degrees either together in a product , or two out of the three or even one of the three aspects. We don’t have separate products for the variety, velocity and volume separately.
Let us define these three layers-Volume refers to the hundreds of terabytes and petabytes of stored data inside organizations today. Velocity refers to the whole continuum from batch to real time continuous and streaming data.
Variety refers to multi-structure data from structured to unstructured files, managed and stored in a common platform analyzed through common tooling.
For Volume-IBM has a highly scalable Big Data platform. This includes Netezza and Infosphere groups of products, and Watson-like technologies that can support petabytes volume of data for analytics. But really the support of volume ranges across IBM’s Information Management portfolio both on the database side and the advanced analytics side.
For real time Velocity, we have real time data acquisition. We have a product called IBM Infosphere, part of our Big Data platform, that is specifically built for streaming real time data acquisition and delivery through complex event processing. We have a very rich range of offerings that help clients build a Hadoop environment that can scale.
Our Hadoop platform is the most real time capable of all in the industry. We are differentiated by our sheer breadth, sophistication and functional depth and tooling integrated in our Hadoop platform. We are differentiated by our streaming offering integrated into the Hadoop platform. We also offer a great range of modeling and analysis tools, pretty much more than any other offering in the Big Data space.
Attached- Jim’s slides from Hadoop World
Ajay- Any plans for Mahout for Hadoop
Jim- I cant speak about product plans. We have plans but I cant tell you anything more. We do have a feature in Big Insights called System ML, a library for machine learning.
Ajay- How integral are acquisitions for IBM in the Big Data space (Netezza,Cognos,SPSS etc). Is it true that everything that you have in Big Data is acquired or is the famous IBM R and D contributing here . (see a partial list of IBM acquisitions at at http://www.ibm.com/investor/strategy/acquisitions.wss )
Jim- We have developed a lot on our own. We have the deepest R and D of anybody in the industry in all things Big Data.
For example – Watson has Big Insights Hadoop at its core. Apache Hadoop is the heart and soul of Big Data (see http://www-01.ibm.com/software/data/infosphere/hadoop/ ). A great deal that makes Big Insights so differentiated is that not everything that has been built has been built by the Hadoop community.
We have built additions out of the necessity for security, modeling, monitoring, and governance capabilities into BigInsights to make it truly enterprise ready. That is one example of where we have leveraged open source and we have built our own tools and technologies and layered them on top of the open source code.
Yes of course we have done many strategic acquisitions over the last several years related to Big Data Management and we continue to do so. This quarter we have done 3 acquisitions with strong relevance to Big Data. One of them is Vivisimo (http://www-03.ibm.com/press/us/en/pressrelease/37491.wss ).
Vivisimo provides federated Big Data discovery, search and profiling capabilities to help you figure out what data is out there,what is relevance of that data to your data science project- to help you answer the question which data should you bring in your Hadoop Cluster.
We also did Varicent , which is more performance management and we did TeaLeaf , which is a customer experience solution provider where customer experience management and optimization is one of the hot killer apps for Hadoop in the cloud. We have done great many acquisitions that have a clear relevance to Big Data.
Netezza already had a massively parallel analytics database product with an embedded library of models called Netezza Analytics, and in-database capabilties to massively parallelize Map Reduce and other analytics management functions inside the database. In many ways, Netezza provided capabilities similar to that IBM had provided for many years under the Smart Analytics Platform (http://www-01.ibm.com/software/data/infosphere/what-is-advanced-analytics/ ) .
There is a differential between Netezza and ISAS.
ISAS was built predominantly in-house over several years . If you go back a decade ago IBM acquired Ascential Software , a product portfolio that was the heart and soul of IBM InfoSphere Information Manager that is core to our big Data platform. In addition to Netezza, IBM bought SPSS two years back. We already had data mining tools and predictive modeling in the InfoSphere portfolio, but we realized we needed to have the best of breed, SPSS provided that and so IBM acquired them.
Cognos- We had some BI reporting capabilities in the InfoSphere portfolio that we had built ourselves and also acquired for various degrees from prior acquisitions. But clearly Cognos was one of the best BI vendors , and we were lacking such a rich tool set in our product in visualization and cubing and so for that reason we acquired Cognos.
There is also Unica – which is a marketing campaign optimization which in many ways is a killer app for Hadoop. Projects like that are driving many enterprises.
Ajay- How would you rank order these acquisitions in terms of strategic importance rather than data of acquisition or price paid.
Jim-Think of Big Data as an ecosystem that has components that are fitted to particular functions for data analytics and data management. Is the database the core, or the modeling tool the core, or the governance tools the core, or is the hardware platform the core. Everything is critically important. We would love to hear from you what you think have been most important. Each acquisition has helped play a critical role to build the deepest and broadest solution offering in Big Data. We offer the hardware, software, professional services, the hosting service. I don’t think there is any validity to a rank order system.
Ajay-What are the initiatives regarding open source that Big Data group have done or are planning?
Jim- What we are doing now- We are very much involved with the Apache Hadoop community. We continue to evolve the open source code that everyone leverages.. We have built BigInsights on Apache Hadoop. We have the closest, most up to date in terms of version number to Apache Hadoop ( Hbase,HDFS, Pig etc) of all commercial distributions with our BigInsights 1.4 .
We have an R library integrated with BigInsights . We have a R library integrated with Netezza Analytics. There is support for R Models within the SPSS portfolio. We already have a fair amount of support for R across the portfolio.
Ajay- What are some of the concerns (privacy,security,regulation) that you think can dampen the promise of Big Data.
Jim- There are no showstoppers, there is really a strong momentum. Some of the concerns within the Hadoop space are immaturity of the technology, the immaturity of some of the commercial offerings out there that implement Hadoop, the lack of standardization for formal sense for Hadoop.
There is no Open Standards Body that declares, ratifies the latest version of Mahout, Map Reduce, HDFS etc. There is no industry consensus reference framework for layering these different sub projects. There are no open APIs. There are no certifications or interoperability standards or organizations to certify different vendors interoperability around a common API or framework.
The lack of standardization is troubling in this whole market. That creates risks for users because users are adopting multiple Hadoop products. There are lots of Hadoop deployments in the corporate world built around Apache Hadoop (purely open source). There may be no assurance that these multiple platforms will interoperate seamlessly. That’s a huge issue in terms of just magnifying the risk. And it increases the need for the end user to develop their own custom integrated code if they want to move data between platforms, or move map-reduce jobs between multiple distributions.
Also governance is a consideration. Right now Hadoop is used for high volume ETL on multi structured and unstructured data sources, or Hadoop is used for exploratory sand boxes for data scientists. These are important applications that are a majority of the Hadoop deployments . Some Hadoop deployments are stand alone unstructured data marts for specific applications like sentiment analysis like.
Hadoop is not yet ready for data warehousing. We don’t see a lot of Hadoop being used as an alternative to data warehouses for managing the single version of truth of system or record data. That day will come but there needs to be out there in the marketplace a broader range of data governance mechanisms , master data management, data profiling products that are mature that enterprises can use to make sure their data inside their Hadoop clusters is clean and is the single version of truth. That day has not arrived yet.
One of the great things about IBM’s acquisition of Vivisimo is that a piece of that overall governance picture is discovery and profiling for unstructured data , and that is done very well by Vivisimo for several years.
What we will see is vendors such as IBM will continue to evolve security features inside of our Hadoop platform. We will beef up our data governance capabilities for this new world of Hadoop as the core of Big Data, and we will continue to build up our ability to integrate multiple databases in our Hadoop platform so that customers can use data from a bit of Hadoop,some data from a bit of traditional relational data warehouse, maybe some noSQL technology for different roles within a very complex Big Data environment.
That latter hybrid deployment model is becoming standard across many enterprises for Big Data. A cause for concern is when your Big Data deployment has a bit of Hadoop, bit of noSQL, bit of EDW, bit of in-memory , there are no open standards or frameworks for putting it all together for a unified framework not just for interoperability but also for deployment.
There needs to be a virtualization or abstraction layer for unified access to all these different Big Data platforms by the users/developers writing the queries, by administrators so they can manage data and resources and jobs across all these disparate platforms in a seamless unified way with visual tooling. That grand scenario, the virtualization layer is not there yet in any standard way across the big data market. It will evolve, it may take 5-10 years to evolve but it will evolve.
So, that’s the concern that can dampen some of the enthusiasm for Big Data Analytics.
You can read more about Jim at http://www.linkedin.com/pub/james-kobielus/6/ab2/8b0 or
follow him on Twitter at http://twitter.com/jameskobielus
You can read more about IBM Big Data at http://www-01.ibm.com/software/data/bigdata/
This is a guest blog post by Carole Ann Matignon of Sparkling Logic. You can see more on Sparkling Logic at http://my.sparklinglogic.com/
Decision Management is about combining predictive models and business rules to automate decisions for your business. Insurance underwriting, loan origination or workout, claims processing are all very good use cases for that discipline… But there is a hiccup… It ain’t as easy you would expect…
If you have a neat model, then most tools would allow you to export it as a PMML model – PMML stands for Predictive Model Markup Language and is a standard XML representation for predictive model formulas. Many model development tools let you export it without much effort. Many BRMS – Business rules Management Systems – let you import it. Tada… The model is ready for deployment.
The problem that we keep seeing over and over in the industry is the issue around variables.
Those neat predictive models are formulas based on variables that may or may not exist as is in your object model. When the variable is itself a formula based on the object model, like the min, max or sum of Dollar amount spent in Groceries in the past 3 months, and the object model comes with transaction details, such that you can compute it by iterating through those transactions, then the problem is not “that” big. PMML 4 introduced some support for those variables.
The issue that is not easy to fix, and yet quite frequent, is when the model development data model does not resemble the operational one. Your Data Warehouse very likely flattened the object model, and pre-computed some aggregations that make the mapping very hard to restore.
It is clearly not an impossible project as many organizations do that today. It comes with a significant overhead though that forces modelers to involve IT resources to extract the right data for the model to be operationalized. It is a heavy process that is well justified for heavy-duty models that were developed over a period of time, with a significant ROI.
This is a show-stopper though for other initiatives which do not have the same ROI, or would require too frequent model refresh to be viable. Here, I refer to “real” model refresh that involves a model reengineering, not just a re-weighting of the same variables.
For those initiatives where time is of the essence, the challenge will be to bring closer those two worlds, the modelers and the business rules experts, in order to streamline the development AND deployment of analytics beyond the model formula. The great opportunity I see is the potential for a better and coordinated tuning of the cut-off rules in the context of the model refinement. In other words: the opportunity to refine the strategy as a whole. Very ambitious? I don’t think so.
About Carole Ann Matignon
Carole-Ann Matignon – Co-Founder, President & Chief Executive Officer
She is a renowned guru in the Decision Management space. She created the vision for Decision Management that is widely adopted now in the industry. Her claim to fame is managing the strategy and direction of Blaze Advisor, the leading BRMS product, while she also managed all the Decision Management tools at FICO (business rules, predictive analytics and optimization). She has a vision for Decision Management both as a technology and a discipline that can revolutionize the way corporations do business, and will never get tired of painting that vision for her audience. She speaks often at Industry conferences and has conducted university classes in France and Washington DC.
She started her career building advanced systems using all kinds of technologies — expert systems, rules, optimization, dashboarding and cubes, web search, and beta version of database replication. At Cleversys (acquired by Kurt Salmon & Associates), she also conducted strategic consulting gigs around change management.
While playing with advanced software components, she found a passion for technology and joined ILOG (acquired by IBM). She developed a growing interest in Optimization as well as Business Rules. At ILOG, she coined the term BRMS while brainstorming with her Sales counterpart. She led the Presales organization for Telecom in the Americas up until 2000 when she joined Blaze Software (acquired by Brokat Technologies, HNC Software and finally FICO).
Her 360-degree experience allowed her to gain appreciation for all aspects of a software company, giving her a unique perspective on the business. Her technical background kept her very much in touch with technology as she advanced.
In part 3 of the series for predictions for 2012, here is Jill Dyche, Baseline Consulting/DataFlux.
Part 2 was Timo Elliot, SAP at http://www.decisionstats.com/timo-elliott-on-2012/ and Part 1 was Jim Kobielus, Forrester at http://www.decisionstats.com/jim-kobielus-on-2012/
Ajay: What are the top trends you saw happening in 2011?
Well, I hate to say I saw them coming, but I did. A lot of managers committed some pretty predictable mistakes in 2011. Here are a few we witnessed in 2011 live and up close:
1. In the spirit of “size matters,” data warehouse teams continued to trumpet the volumes of stored data on their enterprise data warehouses. But a peek under the covers of these warehouses reveals that the data isn’t integrated. Essentially this means a variety of heterogeneous virtual data marts co-located on a single server. Neat. Big. Maybe even worthy of a magazine article about how many petabytes you’ve got. But it’s not efficient, and hardly the example of data standardization and re-use that everyone expects from analytical platforms these days.
2. Development teams still didn’t factor data integration and provisioning into their project plans in 2011. So we saw multiple projects spawn duplicate efforts around data profiling, cleansing, and standardization, not to mention conflicting policies and business rules for the same information. Bummer, since IT managers should know better by now. The problem is that no one owns the problem. Which brings me to the next mistake…
3. No one’s accountable for data governance. Yeah, there’s a council. And they meet. And they talk. Sometimes there’s lunch. And then nothing happens because no one’s really rewarded—or penalized for that matter—on data quality improvements or new policies. And so the reports spewing from the data mart are still fraught and no one trusts the resulting decisions.
But all is not lost since we’re seeing some encouraging signs already in 2012. And yes, I’d classify some of them as bona-fide trends.
Ajay: What are some of those trends?
Job descriptions for data stewards, data architects, Chief Data Officers, and other information-enabling roles are becoming crisper, and the KPIs for these roles are becoming more specific. Data management organizations are being divorced from specific lines of business and from IT, becoming specialty organizations—okay, COEs if you must—in their own rights. The value proposition for master data management now includes not just the reconciliation of heterogeneous data elements but the support of key business strategies. And C-level executives are holding the data people accountable for improving speed to market and driving down costs—not just delivering cleaner data. In short, data is becoming a business enabler. Which, I have to just say editorially, is better late than never!
Ajay: Anything surprise you, Jill?
I have to say that Obama mentioning data management in his State of the Union speech was an unexpected but pretty powerful endorsement of the importance of information in both the private and public sector.
I’m also sort of surprised that data governance isn’t being driven more frequently by the need for internal and external privacy policies. Our clients are constantly asking us about how to tightly-couple privacy policies into their applications and data sources. The need to protect PCI data and other highly-sensitive data elements has made executives twitchy. But they’re still not linking that need to data governance.
I should also mention that I’ve been impressed with the people who call me who’ve had their “aha!” moment and realize that data transcends analytic systems. It’s operational, it’s pervasive, and it’s dynamic. I figured this epiphany would happen in a few years once data quality tools became a commodity (they’re far from it). But it’s happening now. And that’s good for all types of businesses.
Jill Dyché has written three books and numerous articles on the business value of information technology. She advises clients and executive teams on leveraging technology and information to enable strategic business initiatives. Last year her company Baseline Consulting was acquired by DataFlux Corporation, where she is currently Vice President of Thought Leadership. Find her blog posts on www.dataroundtable.com.