Home » Posts tagged 'jobs'
Tag Archives: jobs
Interview Rob J Hyndman Forecasting Expert #rstats
Here is an interview with Prof Rob J Hyndman who has created many time series forecasting methods and authored books as well as R packages on the same.
Probably the biggest impact I’ve had is in helping the Australian government forecast the national health budget. In 2001 and 2002, they had underestimated health expenditure by nearly $1 billion in each year which is a lot of money to have to find, even for a national government. I was invited to assist them in developing a new forecasting method, which I did. The new method has forecast errors of the order of plus or minus $50 million which is much more manageable. The method I developed for them was the basis of the ETS models discussed in my 2008 book on exponential smoothing (www.exponentialsmoothing.net)
Interview James G Kobielus IBM Big Data
Here is an interview with James G Kobielus, who is the Senior Program Director, Product Marketing, Big Data Analytics Solutions at IBM. Special thanks to Payal Patel Cudia of IBM’s communication team,for helping with the logistics for this.
Ajay -What are the specific parts of the IBM Platform that deal with the three layers of Big Data -variety, velocity and volume
James-Well first of all, let’s talk about the IBM Information Management portfolio. Our big data platform addresses the three layers of big data to varying degrees either together in a product , or two out of the three or even one of the three aspects. We don’t have separate products for the variety, velocity and volume separately.
Let us define these three layers-Volume refers to the hundreds of terabytes and petabytes of stored data inside organizations today. Velocity refers to the whole continuum from batch to real time continuous and streaming data.
Variety refers to multi-structure data from structured to unstructured files, managed and stored in a common platform analyzed through common tooling.
For Volume-IBM has a highly scalable Big Data platform. This includes Netezza and Infosphere groups of products, and Watson-like technologies that can support petabytes volume of data for analytics. But really the support of volume ranges across IBM’s Information Management portfolio both on the database side and the advanced analytics side.
For real time Velocity, we have real time data acquisition. We have a product called IBM Infosphere, part of our Big Data platform, that is specifically built for streaming real time data acquisition and delivery through complex event processing. We have a very rich range of offerings that help clients build a Hadoop environment that can scale.
Our Hadoop platform is the most real time capable of all in the industry. We are differentiated by our sheer breadth, sophistication and functional depth and tooling integrated in our Hadoop platform. We are differentiated by our streaming offering integrated into the Hadoop platform. We also offer a great range of modeling and analysis tools, pretty much more than any other offering in the Big Data space.
Attached- Jim’s slides from Hadoop World
Ajay- Any plans for Mahout for Hadoop
Jim- I cant speak about product plans. We have plans but I cant tell you anything more. We do have a feature in Big Insights called System ML, a library for machine learning.
Ajay- How integral are acquisitions for IBM in the Big Data space (Netezza,Cognos,SPSS etc). Is it true that everything that you have in Big Data is acquired or is the famous IBM R and D contributing here . (see a partial list of IBM acquisitions at at http://www.ibm.com/investor/strategy/acquisitions.wss )
Jim- We have developed a lot on our own. We have the deepest R and D of anybody in the industry in all things Big Data.
For example – Watson has Big Insights Hadoop at its core. Apache Hadoop is the heart and soul of Big Data (see http://www-01.ibm.com/software/data/infosphere/hadoop/ ). A great deal that makes Big Insights so differentiated is that not everything that has been built has been built by the Hadoop community.
We have built additions out of the necessity for security, modeling, monitoring, and governance capabilities into BigInsights to make it truly enterprise ready. That is one example of where we have leveraged open source and we have built our own tools and technologies and layered them on top of the open source code.
Yes of course we have done many strategic acquisitions over the last several years related to Big Data Management and we continue to do so. This quarter we have done 3 acquisitions with strong relevance to Big Data. One of them is Vivisimo (http://www-03.ibm.com/press/us/en/pressrelease/37491.wss ).
Vivisimo provides federated Big Data discovery, search and profiling capabilities to help you figure out what data is out there,what is relevance of that data to your data science project- to help you answer the question which data should you bring in your Hadoop Cluster.
We also did Varicent , which is more performance management and we did TeaLeaf , which is a customer experience solution provider where customer experience management and optimization is one of the hot killer apps for Hadoop in the cloud. We have done great many acquisitions that have a clear relevance to Big Data.
Netezza already had a massively parallel analytics database product with an embedded library of models called Netezza Analytics, and in-database capabilties to massively parallelize Map Reduce and other analytics management functions inside the database. In many ways, Netezza provided capabilities similar to that IBM had provided for many years under the Smart Analytics Platform (http://www-01.ibm.com/software/data/infosphere/what-is-advanced-analytics/ ) .
There is a differential between Netezza and ISAS.
ISAS was built predominantly in-house over several years . If you go back a decade ago IBM acquired Ascential Software , a product portfolio that was the heart and soul of IBM InfoSphere Information Manager that is core to our big Data platform. In addition to Netezza, IBM bought SPSS two years back. We already had data mining tools and predictive modeling in the InfoSphere portfolio, but we realized we needed to have the best of breed, SPSS provided that and so IBM acquired them.
Cognos- We had some BI reporting capabilities in the InfoSphere portfolio that we had built ourselves and also acquired for various degrees from prior acquisitions. But clearly Cognos was one of the best BI vendors , and we were lacking such a rich tool set in our product in visualization and cubing and so for that reason we acquired Cognos.
There is also Unica – which is a marketing campaign optimization which in many ways is a killer app for Hadoop. Projects like that are driving many enterprises.
Ajay- How would you rank order these acquisitions in terms of strategic importance rather than data of acquisition or price paid.
Jim-Think of Big Data as an ecosystem that has components that are fitted to particular functions for data analytics and data management. Is the database the core, or the modeling tool the core, or the governance tools the core, or is the hardware platform the core. Everything is critically important. We would love to hear from you what you think have been most important. Each acquisition has helped play a critical role to build the deepest and broadest solution offering in Big Data. We offer the hardware, software, professional services, the hosting service. I don’t think there is any validity to a rank order system.
Ajay-What are the initiatives regarding open source that Big Data group have done or are planning?
Jim- What we are doing now- We are very much involved with the Apache Hadoop community. We continue to evolve the open source code that everyone leverages.. We have built BigInsights on Apache Hadoop. We have the closest, most up to date in terms of version number to Apache Hadoop ( Hbase,HDFS, Pig etc) of all commercial distributions with our BigInsights 1.4 .
We have an R library integrated with BigInsights . We have a R library integrated with Netezza Analytics. There is support for R Models within the SPSS portfolio. We already have a fair amount of support for R across the portfolio.
Ajay- What are some of the concerns (privacy,security,regulation) that you think can dampen the promise of Big Data.
Jim- There are no showstoppers, there is really a strong momentum. Some of the concerns within the Hadoop space are immaturity of the technology, the immaturity of some of the commercial offerings out there that implement Hadoop, the lack of standardization for formal sense for Hadoop.
There is no Open Standards Body that declares, ratifies the latest version of Mahout, Map Reduce, HDFS etc. There is no industry consensus reference framework for layering these different sub projects. There are no open APIs. There are no certifications or interoperability standards or organizations to certify different vendors interoperability around a common API or framework.
The lack of standardization is troubling in this whole market. That creates risks for users because users are adopting multiple Hadoop products. There are lots of Hadoop deployments in the corporate world built around Apache Hadoop (purely open source). There may be no assurance that these multiple platforms will interoperate seamlessly. That’s a huge issue in terms of just magnifying the risk. And it increases the need for the end user to develop their own custom integrated code if they want to move data between platforms, or move map-reduce jobs between multiple distributions.
Also governance is a consideration. Right now Hadoop is used for high volume ETL on multi structured and unstructured data sources, or Hadoop is used for exploratory sand boxes for data scientists. These are important applications that are a majority of the Hadoop deployments . Some Hadoop deployments are stand alone unstructured data marts for specific applications like sentiment analysis like.
Hadoop is not yet ready for data warehousing. We don’t see a lot of Hadoop being used as an alternative to data warehouses for managing the single version of truth of system or record data. That day will come but there needs to be out there in the marketplace a broader range of data governance mechanisms , master data management, data profiling products that are mature that enterprises can use to make sure their data inside their Hadoop clusters is clean and is the single version of truth. That day has not arrived yet.
One of the great things about IBM’s acquisition of Vivisimo is that a piece of that overall governance picture is discovery and profiling for unstructured data , and that is done very well by Vivisimo for several years.
What we will see is vendors such as IBM will continue to evolve security features inside of our Hadoop platform. We will beef up our data governance capabilities for this new world of Hadoop as the core of Big Data, and we will continue to build up our ability to integrate multiple databases in our Hadoop platform so that customers can use data from a bit of Hadoop,some data from a bit of traditional relational data warehouse, maybe some noSQL technology for different roles within a very complex Big Data environment.
That latter hybrid deployment model is becoming standard across many enterprises for Big Data. A cause for concern is when your Big Data deployment has a bit of Hadoop, bit of noSQL, bit of EDW, bit of in-memory , there are no open standards or frameworks for putting it all together for a unified framework not just for interoperability but also for deployment.
There needs to be a virtualization or abstraction layer for unified access to all these different Big Data platforms by the users/developers writing the queries, by administrators so they can manage data and resources and jobs across all these disparate platforms in a seamless unified way with visual tooling. That grand scenario, the virtualization layer is not there yet in any standard way across the big data market. It will evolve, it may take 5-10 years to evolve but it will evolve.
So, that’s the concern that can dampen some of the enthusiasm for Big Data Analytics.
About-
You can read more about Jim at http://www.linkedin.com/pub/james-kobielus/6/ab2/8b0 or
follow him on Twitter at http://twitter.com/jameskobielus
You can read more about IBM Big Data at http://www-01.ibm.com/software/data/bigdata/
New Economics Theories for the new Tech World
When I was doing my MBA (a decade ago), one of the principal theories on why corporations exist was 1) Shareholder Value creation (grow wealth for investors) and a notable second was 2) Stakeholder Value creation- creating jobs for societies, providing tax to countries, providing employees with stable employment and incentives, and of course creating monetary value for shareholders.
There were two ways you could raise money- debt or equity. Debt had the advantage of interest payments being tax deductible. Debt payments had to be met regularly. Equity had the advantage that equity holders were the last ones to be paid in case of closing the company down, which justified that rate of return on equity is generally higher than cost of debt. Dividend payouts to stockholders could be deferred in a low revenue year or due to planning reasons.
Or in plain English, over the long term borrowing money from share holders in lieu of stocks was more expensive than selling bonds or borrowing from the banks.
Hybrid combinations of debt and equity were warrants and debentures that started off as one form of instrument and over a period of time gave much more flexibility and risk safety nets to both issuers and subscribers of capital. Another hybrid was stock options (now considered as a default option of rewarding employees in technology companies, but this was not always the case).
The use of call and put options in debentures, and the idea of vesting period in stock options was to promote lone term stability and minimize fluctuations in stock prices, employee attrition, besides of course to minimize the weighted average cost of capital. Venture capital was another class of capital known for both huge rates of return and risk taking (?)
But in today’s world where a Google has three classes of shares, companies trade shares before IPOs, and valuations of technology companies sink and rise by huge % over weeks (especially as they near IPO dates)- I wonder if traditional theories in finance need a much stronger overhaul.
or do markets need a regulatory overhaul, that would enable stock exchanges to have once more the credibility they had as the primary sources of raising capital.
Who will guard the guardians? Their conscience- the regulators or the news media?
There are ways of raising money that are not evil.
But they are not perfectly fair as well.
Revolution Webinar Series #Rstats
Revolution Analytics Webinar-
|
Featured Webinar
|
![]() |
||
| David Champagne CTO, Revolution Analytics |
||
| Tuesday, December 20th | ||
| 11:00AM – 11:30AM Pacific Click here for the webinar time in your local time zone |
||
Traditional IT infrastructure is simply unable to meet
the demands of the new “Big Data Analytics” landscape. Many enterprises are turning to the “R” statistical programming language and Hadoop (both open source projects) as a potential solution. This webinar will introduce the statistical capabilities of R within the Hadoop ecosystem. We’ll cover:
- An introduction to new packages developed by Revolution Analytics to facilitate interaction with the data stores HDFS and HBase so that they can be leveraged from the R environment
- An overview of how to write Map Reduce jobs in R using Hadoop
- Special considerations that need to be made when working with R and Hadoop.
We’ll also provide additional resources that are available to people interested in integrating R and Hadoop.
|
Upcoming Webinars
|
| Wed, Dec 14th 11:00AM – 11:30AM PT |
Revolution R Enterprise – 100% R and MoreR users already know why the R language is the lingua franca of statisticians today: because it’s the most powerful statistical language in the world. Revolution Analytics builds on the power of open source R, and adds performance, productivity and integration features to create Revolution R Enterprise. In this webinar, author and blogger David Smith will introduce the additional capabilities of Revolution R Enterprise. |
Free online education by Stanford and MIT
One more reason American education is the best in the world- it has a big heart.
Stanford just announced free courses starting from Jan 2012- and they are online (so no visa blues) and free( as in speech and free as in beer) and just the same as actual courses (yes , the homework will have to be done, and the dog cannot eat the homework)
MIT meanwhile has 2000 courses at http://ocw.mit.edu/courses/
- but I liked Stanford’s minimal , clutter free interface ( I read Steve Jobs biography- the interface hangover continues).
Hurrah for Stanford!
MIT needs to DESIGN their free online courses website and maybe do more search engine optimization at
Mafia Wars 2 -The review of the new game on G+
Mafia Wars2 tries to be Steve Jobs working on a PC. It substitutes storyline for fancier graphics, smoother icons and design, and the numeric fun of piling up scores is almost gone. Did you mention social gaming- there are hardly any incentives for “social” part of the game. It looks like a gaudy Vegas twist to a beloved franchise.
Leave sequels to the boys in Hollywood, Pincuss/Zynga- You will need to be much more original to create the next blockbuster. Of course given the size of Zynga’s captive addict-base, the game will be a hit. But it will be a hit more like Transformers 3 , than likes of Blade Hunter or Terminator 2.
May we suggest another NEW game , than rebooting the squeezed lemon juice of a beloved and now departed friend-chise.
ps The song is irritating too. and the skimpy clad players are insulting.
10 Ways We will miss Steve Jobs
I am not an Apple fanboy.In fact I dont use a Mac (because Linux works well for me at much cheaper rates)
I am going to miss Steve Jobs like I miss …… still.
1) The Original Pirate – I liked Steve Jobs ever since I saw Pirates of Silicon Valley, I wanted to be like the Jobs who created jobs http://en.wikipedia.org/wiki/Pirates_of_Silicon_Valley
Artists steal. Yeah baby!
2) Music -Itunes Improbably the man who came up with the idea of music @ 99 cents helped more artists earn money in the era of Napster. Music piracy is not dead, but at 99 cents you CAN afford the songs
3) Aesthetics- and Design- as competitive barriers. It was all about the interface. People care about interfaces. Shoody software wont sell.
4) Portable Music- yes I once wrote a poem on my first Ipod. http://www.decisionstats.com/ode-to-an-ipod/ No , it doesnot rank as the top ten poems on Ipod in SERP
Walkman ‘s evolution was the Ipod – and it was everywhere.
5) Big Phones can be cool too- I loved my IPhone and so did everyone. But thats because making cool phones before that was all about making the tiniest thinnest phone. Using Videochat on Iphone and webs surfing were way much cooler than anything before or since.
6) Apps for Money for Geeks. Yes the Apps marketplace was more enriching to the geek universe than all open source put together.
7) Turtleneck Steve- You know when Steve Jobs was about to make a presentation because one week before and one week later the whole tech media behaved like either a fanboy or we are too cool to be an Apple fanboy but we will report it still. The man who wrote no code sold more technology than everyone else using just a turtleneck and presentations.
8) Pixar toons- Yes Pixar toons made sure cartoons were pieces of art and not just funny stuff anymore. This one makes me choke up
9) Kicking Microsoft butt- Who else but Steve can borrow money from MS and then beat it in every product it wanted to.
10) Not being evil. Steve Jobs made more money for more geeks than anyone. and he made it look good! The original DONT BE EVIL guy who never needed to say it aloud
Take a bow Steve Jobs (or touch the first Apple product that comes to your hand after reading this!)
The article was first written on Aug 25,2011 on Steve Jobs resignation news.It has been updated to note his departing from this planet as of yesterday.








