Interview James G Kobielus IBM Big Data

Here is an interview with  James G Kobielus, who is the Senior Program Director, Product Marketing, Big Data Analytics Solutions at IBM. Special thanks to Payal Patel Cudia of IBM’s communication team,for helping with the logistics for this.

Ajay -What are the specific parts of the IBM Platform that deal with the three layers of Big Data -variety, velocity and volume

James-Well first of all, let’s talk about the IBM Information Management portfolio. Our big data platform addresses the three layers of big data to varying degrees either together in a product , or two out of the three or even one of the three aspects. We don’t have separate products for the variety, velocity and volume separately.

Let us define these three layers-Volume refers to the hundreds of terabytes and petabytes of stored data inside organizations today. Velocity refers to the whole continuum from batch to real time continuous and streaming data.

Variety refers to multi-structure data from structured to unstructured files, managed and stored in a common platform analyzed through common tooling.

For Volume-IBM has a highly scalable Big Data platform. This includes Netezza and Infosphere groups of products, and Watson-like technologies that can support petabytes volume of data for analytics. But really the support of volume ranges across IBM’s Information Management portfolio both on the database side and the advanced analytics side.

For real time Velocity, we have real time data acquisition. We have a product called IBM Infosphere, part of our Big Data platform, that is specifically built for streaming real time data acquisition and delivery through complex event processing. We have a very rich range of offerings that help clients build a Hadoop environment that can scale.

Our Hadoop platform is the most real time capable of all in the industry. We are differentiated by our sheer breadth, sophistication and functional depth and tooling integrated in our Hadoop platform. We are differentiated by our streaming offering integrated into the Hadoop platform. We also offer a great range of modeling and analysis tools, pretty much more than any other offering in the Big Data space.

Attached- Jim’s slides from Hadoop World

Ajay- Any plans for Mahout for Hadoop

Jim- I cant speak about product plans. We have plans but I cant tell you anything more. We do have a feature in Big Insights called System ML, a library for machine learning.

Ajay- How integral are acquisitions for IBM in the Big Data space (Netezza,Cognos,SPSS etc). Is it true that everything that you have in Big Data is acquired or is the famous IBM R and D contributing here . (see a partial list of IBM acquisitions at at http://www.ibm.com/investor/strategy/acquisitions.wss )

Jim- We have developed a lot on our own. We have the deepest R and D of anybody in the industry in all things Big Data.

For example – Watson has Big Insights Hadoop at its core. Apache Hadoop is the heart and soul of Big Data (see http://www-01.ibm.com/software/data/infosphere/hadoop/ ). A great deal that makes Big Insights so differentiated is that not everything that has been built has been built by the Hadoop community.

We have built additions out of the necessity for security, modeling, monitoring, and governance capabilities into BigInsights to make it truly enterprise ready. That is one example of where we have leveraged open source and we have built our own tools and technologies and layered them on top of the open source code.

Yes of course we have done many strategic acquisitions over the last several years related to Big Data Management and we continue to do so. This quarter we have done 3 acquisitions with strong relevance to Big Data. One of them is Vivisimo (http://www-03.ibm.com/press/us/en/pressrelease/37491.wss ).

Vivisimo provides federated Big Data discovery, search and profiling capabilities to help you figure out what data is out there,what is relevance of that data to your data science project- to help you answer the question which data should you bring in your Hadoop Cluster.

 We also did Varicent , which is more performance management and we did TeaLeaf , which is a customer experience solution provider where customer experience management and optimization is one of the hot killer apps for Hadoop in the cloud. We have done great many acquisitions that have a clear relevance to Big Data.

Netezza already had a massively parallel analytics database product with an embedded library of models called Netezza Analytics, and in-database capabilties to massively parallelize Map Reduce and other analytics management functions inside the database. In many ways, Netezza provided capabilities similar to that IBM had provided for many years under the Smart Analytics Platform (http://www-01.ibm.com/software/data/infosphere/what-is-advanced-analytics/ ) .

There is a differential between Netezza and ISAS.

ISAS was built predominantly in-house over several years . If you go back a decade ago IBM acquired Ascential Software , a product portfolio that was the heart and soul of IBM InfoSphere Information Manager that is core to our big Data platform. In addition to Netezza, IBM bought SPSS two years back. We already had data mining tools and predictive modeling in the InfoSphere portfolio, but we realized we needed to have the best of breed, SPSS provided that and so IBM acquired them.

 Cognos– We had some BI reporting capabilities in the InfoSphere portfolio that we had built ourselves and also acquired for various degrees from prior acquisitions. But clearly Cognos was one of the best BI vendors , and we were lacking such a rich tool set in our product in visualization and cubing and so for that reason we acquired Cognos.

There is also Unica – which is a marketing campaign optimization which in many ways is a killer app for Hadoop. Projects like that are driving many enterprises.

Ajay- How would you rank order these acquisitions in terms of strategic importance rather than data of acquisition or price paid.

Jim-Think of Big Data as an ecosystem that has components that are fitted to particular functions for data analytics and data management. Is the database the core, or the modeling tool the core, or the governance tools the core, or is the hardware platform the core. Everything is critically important. We would love to hear from you what you think have been most important. Each acquisition has helped play a critical role to build the deepest and broadest solution offering in Big Data. We offer the hardware, software, professional services, the hosting service. I don’t think there is any validity to a rank order system.

Ajay-What are the initiatives regarding open source that Big Data group have done or are planning?

Jim- What we are doing now- We are very much involved with the Apache Hadoop community. We continue to evolve the open source code that everyone leverages.. We have built BigInsights on Apache Hadoop. We have the closest, most up to date in terms of version number to Apache Hadoop ( Hbase,HDFS, Pig etc) of all commercial distributions with our BigInsights 1.4 .

We have an R library integrated with BigInsights . We have a R library integrated with Netezza Analytics. There is support for R Models within the SPSS portfolio. We already have a fair amount of support for R across the portfolio.

Ajay- What are some of the concerns (privacy,security,regulation) that you think can dampen the promise of Big Data.

Jim- There are no showstoppers, there is really a strong momentum. Some of the concerns within the Hadoop space are immaturity of the technology, the immaturity of some of the commercial offerings out there that implement Hadoop, the lack of standardization for formal sense for Hadoop.

There is no Open Standards Body that declares, ratifies the latest version of Mahout, Map Reduce, HDFS etc. There is no industry consensus reference framework for layering these different sub projects. There are no open APIs. There are no certifications or interoperability standards or organizations to certify different vendors interoperability around a common API or framework.

The lack of standardization is troubling in this whole market. That creates risks for users because users are adopting multiple Hadoop products. There are lots of Hadoop deployments in the corporate world built around Apache Hadoop (purely open source). There may be no assurance that these multiple platforms will interoperate seamlessly. That’s a huge issue in terms of just magnifying the risk. And it increases the need for the end user to develop their own custom integrated code if they want to move data between platforms, or move map-reduce jobs between multiple distributions.

Also governance is a consideration. Right now Hadoop is used for high volume ETL on multi structured and unstructured data sources, or Hadoop is used for exploratory sand boxes for data scientists. These are important applications that are a majority of the Hadoop deployments . Some Hadoop deployments are stand alone unstructured data marts for specific applications like sentiment analysis like.

Hadoop is not yet ready for data warehousing. We don’t see a lot of Hadoop being used as an alternative to data warehouses for managing the single version of truth of system or record data. That day will come but there needs to be out there in the marketplace a broader range of data governance mechanisms , master data management, data profiling products that are mature that enterprises can use to make sure their data inside their Hadoop clusters is clean and is the single version of truth. That day has not arrived yet.

One of the great things about IBM’s acquisition of Vivisimo is that a piece of that overall governance picture is discovery and profiling for unstructured data , and that is done very well by Vivisimo for several years.

What we will see is vendors such as IBM will continue to evolve security features inside of our Hadoop platform. We will beef up our data governance capabilities for this new world of Hadoop as the core of Big Data, and we will continue to build up our ability to integrate multiple databases in our Hadoop platform so that customers can use data from a bit of Hadoop,some data from a bit of traditional relational data warehouse, maybe some noSQL technology for different roles within a very complex Big Data environment.

That latter hybrid deployment model is becoming standard across many enterprises for Big Data. A cause for concern is when your Big Data deployment has a bit of Hadoop, bit of noSQL, bit of EDW, bit of in-memory , there are no open standards or frameworks for putting it all together for a unified framework not just for interoperability but also for deployment.

There needs to be a virtualization or abstraction layer for unified access to all these different Big Data platforms by the users/developers writing the queries, by administrators so they can manage data and resources and jobs across all these disparate platforms in a seamless unified way with visual tooling. That grand scenario, the virtualization layer is not there yet in any standard way across the big data market. It will evolve, it may take 5-10 years to evolve but it will evolve.

So, that’s the concern that can dampen some of the enthusiasm for Big Data Analytics.

About-

You can read more about Jim at http://www.linkedin.com/pub/james-kobielus/6/ab2/8b0 or

follow him on Twitter at http://twitter.com/jameskobielus

You can read more about IBM Big Data at http://www-01.ibm.com/software/data/bigdata/

Google Cloud is finally here

Amazon gets some competition, and customers should see some relief, unless Google withdraws commitment on these products after a few years of trying (like it often does now!)

 

http://cloud.google.com/products/index.html

Machine Type Pricing
Configuration Virtual Cores Memory GCEU * Local disk Price/Hour $/GCEU/hour
n1-standard-1-d 1 3.75GB *** 2.75 420GB *** $0.145 0.053
n1-standard-2-d 2 7.5GB 5.5 870GB $0.29 0.053
n1-standard-4-d 4 15GB 11 1770GB $0.58 0.053
n1-standard-8-d 8 30GB 22 2 x 1770GB $1.16 0.053
Network Pricing
Ingress Free
Egress to the same Zone. Free
Egress to a different Cloud service within the same Region. Free
Egress to a different Zone in the same Region (per GB) $0.01
Egress to a different Region within the US $0.01 ****
Inter-continental Egress At Internet Egress Rate
Internet Egress (Americas/EMEA destination) per GB
0-1 TB in a month $0.12
1-10 TB $0.11
10+ TB $0.08
Internet Egress (APAC destination) per GB
0-1 TB in a month $0.21
1-10 TB $0.18
10+ TB $0.15
Persistent Disk Pricing
Provisioned space $0.10 GB/month
Snapshot storage** $0.125 GB/month
IO Operations $0.10 per million
IP Address Pricing
Static IP address (assigned but unused) $0.01 per hour
Ephemeral IP address (attached to instance) Free
* GCEU is Google Compute Engine Unit — a measure of computational power of our instances based on industry benchmarks; review the GCEU definition for more information
** coming soon
*** 1GB is defined as 2^30 bytes
**** promotional pricing; eventually will be charged at internet download rates

Google Prediction API

Tap into Google’s machine learning algorithms to analyze data and predict future outcomes.

Leverage machine learning without the complexity
Use the familiar RESTful interface
Enter input in any format – numeric or text

Build smart apps

Learn how you can use Prediction API to build customer sentiment analysis, spam detection or document and email classification.

Google Translation API

Use Google Translate API to build multilingual apps and programmatically translate text in your webpage or application.

Translate text into other languages programmatically
Use the familiar RESTful interface
Take advantage of Google’s powerful translation algorithms

Build multilingual apps

Learn how you can use Translate API to build apps that can programmatically translate text in your applications or websites.

Google BigQuery

Analyze Big Data in the cloud using SQL and get real-time business insights in seconds using Google BigQuery. Use a fully-managed data analysis service with no servers to install or maintain.
Figure

Reliable & Secure

Complete peace of mind as your data is automatically replicated across multiple sites and secured using access control lists.
Scale infinitely

You can store up to hundreds of terabytes, paying only for what you use.
Blazing fast

Run ad hoc SQL queries on
multi-terabyte datasets in seconds.

Google App Engine

Create apps on Google’s platform that are easy to manage and scale. Benefit from the same systems and infrastructure that power Google’s applications.

Focus on your apps

Let us worry about the underlying infrastructure and systems.
Scale infinitely

See your applications scale seamlessly from hundreds to millions of users.
Business ready

Premium paid support and 99.95% SLA for business users

Apps for Google Drive

I kind of liked the fact that Google Drive has a lot of apps already- even though it is quite young.

Especially the mechanical engineer in me liked the AutoCAD app and the video editing apps, the online bitcoin wallet, free project scheduling app, the cloud’s first (?) open office document reader and etc

Developers would especially like playing with the OAuth Playground app for Google Drive on the Google Chrome platform.

Check out  for yourself.

https://chrome.google.com/webstore/category/collection/drive_apps

Software Review- Google Drive versus Dropbox

Here are some notes from reviewing Google Drive  https://drive.google.com/ vs Dropbox https://www.dropbox.com/.

1) Google Drive gives more free space upfront  than Dropbox.5GB versus 2GB

2) Dropbox has a referral system 500 mb per referral while there is no referral system for Google Drive

3) The sync facility with Google Docs makes Google Drive especially useful for prior users of Google Docs.

4) API access to Google Drive is only for Chrome apps which is intriguing!

https://developers.google.com/drive/apps_overview

Apps will not have any API access to files unless users have first installed the app in Chrome Web Store.

You can use the Dropbox API much more easily –

See the platforms at

https://www.dropbox.com/developers/start/core

Choose your platform:

iOS Android Python Ruby

But-

(though I wonder if you set the R working directory to the local shared drive for Google Drive it should sync up as well but of course be slower –http://scrogster.wordpress.com/2011/01/29/using-dropbox-with-r-2/)

5) Google Drive icon is ugly (seriously, dude!) , but the features in the Windows app is just the same as the Dropbox App. Too similar 😉

 

6) Upgrade space is much more cheaper to Google Drive than Dropbox ( by Google Drive prices being exactly  a quarter of prices on Dropbox and max storage being 16 times as much). This will affect power storage users. I expect to see some slowdown in Dropbox new business unless G Drive has outage (like Gmail) . Existing users at Dropbox probably wont shift for the small dollar amount- though it is quite easy to do so.

 

Install Google Drive on your local workstation and cut and paste your Dropbox local folder to the Google Drive local folder!!

7) Dropbox deserves credit for being first (like Hotmail and AOL) but Google Drive is almost better in all respects!

Google Drive

Free
5 GB of Drive (0% used)
10 GB of Gmail (48% used)
1 GB of Picasa (0% used)

Upgrade:

25 GB
2,49 $ / Month
+25 GB for Drive and Picasa
Bonus: Your Gmail storage will be upgraded to 25 GB.
Choose this plan

100 GB
4,99 $ / Month
+100 GB for Drive and Picasa
Bonus: Your Gmail storage will be upgraded to 25 GB.
Choose this plan

 Need more storage?

Up to 16 TB available

Dropbox–

Current account type

Large DropboxDropbox Badge greenFree
Free
Up to 18 GB (2 GB + 500 MB per referral)
Account info 

Other account types

Large DropboxDropbox Badge orange50 GB +
Pro 50
+1 GB per referral, up to +32 GB
$9.99/month or $99.00/year Upgrade to Pro 50
Large DropboxDropbox Badge purple100 GB +
Pro 100
+1 GB per referral, up to +32 GB
$19.99/month or $199.00/year Upgrade to Pro 100
Triple DropboxDropbox For Teams Badge1 TB +
Teams
Plans starting at 1 TB
Large shared quota, centralized admin and billing, and more!

 

 

 

Google introduces Google Play

Some nice new features from the big G men from Mountain view. Google Play- for movies, games, apps, music and books. Nice to see entertainment is back on Google’s priority.

 

See this to read more

https://play.google.com/about/

When will I get Google Play?

About Google Play

Q: What is Google Play?
A: Google Play is a new digital content experience from Google where you can find your favorite music, movies, books, and Android apps and games. It’s your entertainment hub: you can access it from the web or from your Android device or even TV, and all your content is instantly available across all of these devices.

Q: What is your strategy with Google Play?
A: Our goal with Google Play is to bring together all your favorite content in one place that you can access across your devices. Specifically, digital content is fundamental to the mobile experience, so bringing all of this content together in one place for users makes the Android platform even more compelling. We’re also simplifying digital content for Google users – you can go to the Google Play website on your desktop and purchase and experience the latest movies, music and books. With Google Play, we’re giving you a simpler way to get your digital content.

Q: What will the experience be for users? What will happen to my existing account?
A: All content and apps in your existing account will remain in your account, but will transition to Google Play. On your device, the Android Market app icon will become the Google Play store icon. You’ll see “Play Store.” For the movies, books and music apps, you’ll begin to see Play versions of these as well, such as “Play Music,” and “Play Movies.”

Q: When will I get Google Play? What markets is this available in?
A: We’ll be rolling out Google Play globally starting today. On the web, Google Play will be live today. On devices, it will take a few days for the Android Market app to update to the Google Play Store app. The music, books and movies apps will also receive an update today.
Around the globe, Google Play will include Android apps and games. In countries where we have already launched music, books or movies, you will see those categories available in Google Play, too.

Q: I live outside the US. When will I get the books, music or movies verticals? I only see Android apps and games?
A: We want to bring different content categories to as many countries as possible. We’ve already launched movies and books in several countries outside the U.S. and will continue to do so overtime, but we don’t have a specific timeline to share.

Q: What types of content are available in my country?

  • Paid Apps: Available in these countries
  • Movies: Available in US, UK, Canada, and Japan
  • eBooks: Available in US, UK, Canada, and Australia
  • Music: Available in US

 

Q: Does this mean Google Music and the Google eBookstore will cease to exist? What about my account?
A: Both Google Music and the Google eBookstore are now part of Google Play. Your music and your books, including anything you bought, are still there, available to you in Google Play and accessible through your Google account.

Q: Where did my Google eBooks books go? Will I still have access to them?
A: Your books are now part of Google Play. Your books are still there, available to you in your Google Play library and accessible through your Google account.

Q: I don’t use an Android phone, can I still use Google Play?
A: Yes. Google Play is available on any computer with a modern browser at play.google.com. On the web, you can browse and buy books, movies and music. You can read books on the Google Play web reader, listen to music on your computer or watch movies online. Your digital content is all stored in the cloud, so you can access from anywhere using your Google Account.
We’ve also created ways to experience your music and books on other platforms such as the Google Books iOS app.

Q: Why do I not see Google Play yet on my device?
A: Please see our help center article on this here.

Q: How can I contact Google Play consumer support?
A: You can call or email our team here.

Opera Unite- the future of cloud computing browsers

The boys (and ladies) at opera have been busy writing code , while the rest of the coders on the cloud were issuing press releases, attending meetings or just sky diving from the cloud. Judging by the language of apps and extensions, it seems that the  engineers de Vikings et Slavs were busy coding while the Anglo Saxons were busy preparing for IPOs.

I really like the complete anonymity offered by Opera and especially Opera Unite

1) The Adblock option blocks all ads (same as other extensions)

2) The lovely Opera Unite has incredible apps for peer to peer sharing. You can create your own spotify, host your own chat application, transfer files, remote manage your computer. C’est magnifique!

Some really awesome apps on Opera Unite

All these apps can make your own desktop into a remotely managed website- so SOPA is irrelevant even if passed without any protest or non violent protests

(SOPA- an acronym for STOP OBAMA or STOP A (?) , since OBAMA is the one the internet really supports , and he is dependent on that goodwill for fundraising or A is the acronym of a legendary media myth of an imaginary web based organization (imaginary as in iota)

QUOTE

I think it would be a good idea.

 Mahatma Gandhiwhen asked what he thought of Western civilization

Google Apps Terms of Use- Termination

TERMINATION
You may discontinue your use of Google services at any time. You agree that Google may at any time and for any reason, including a period of account inactivity, terminate your access to Google services, terminate the Terms, or suspend or terminate your account. In the event of termination, your account will be disabled and you may not be granted access to Google services, your account or any files or other content contained in your account. Sections 10 (Termination), 13 (Indemnity), 14 (Disclaimer of Warranties), 15 (Limitations of Liability), 16 (Exclusions and Limitations) and 19 (including choice of law, severability and statute of limitations), of the Terms, shall survive expiration or termination.

Source-

http://www.google.com/apps/intl/en/terms/user_terms.html

Related-

 

https://www.youtube-nocookie.com/v/BcxRfg96dTQ?version=3&hl=en_US&rel=0