High Performance Analytics

Marry Big Data Analytics to High Performance Computing, and you get the buzzword of this season- High Performance Analytics.

It basically consists of Parallelized code to run in parallel on custom hardware, in -database analytics for speed, and cloud computing /high performance computing environments. On an operational level, it consists of software (as in analytics) partnering with software (as in databases, Map reduce, Hadoop) plus some hardware (HP or IBM mostly). It is considered a high margin , highly profitable, business with small number of deals compared to say desktop licenses.

As per HPC Wire- which is a great tool/newsletter to keep updated on HPC , SAS Institute has been busy on this front partnering with EMC Greenplum and TeraData (who also acquired  SAS Partner AsterData to gain a much needed foot in the MR/SQL space) Continue reading “High Performance Analytics”

Save the Data

Breakdown of political party representation in...
Image via Wikipedia

I just read an online cause here-

http://sunlightfoundation.com/savethedata/

Some of the most important technology programs that keep Washington accountable are in danger of being eliminated. Data.gov, USASpending.gov, the IT Dashboard and other federal data transparency and government accountability programs are facing a massive budget cut, despite only being a tiny fraction of the national budget. Help save the data and make sure that Congress doesn’t leave the American people in the dark.

I wonder why the federal government/ non profit agencies can help create a SPARQL database, and in days of cloud computing, why a tech major cannot donate storage space to it, after all despite US corporate tax rate being high, US technological companies do end up paying a lower rate thanks to tax breaks/routing overseas revenue.

In the new age data is power, and the US has led in its mission to use technology to further its own values even especially in Middle East. The datasets should be made public and transitioned to the private sector/academia for research and re designing for data augmentation with out straining the massive deficit /borrowing/ fighting 3 wars. Of particular interest would be datasets of campaign finances  and donors especially given large number of retail/small donors/internet marketing in elections as it will also help serve as an example of democracy and change. Even countries like China can create a corruption/expense efficiency tracking internal dashboard with restricted rights to help with rural and urban governance.

Why search optimization can make you like Rebecca Black

Felicia Day, actress and web content producer.
Image via Wikipedia

A highly optimized blog post or web content can get you a lot of attention just like Rebecca Black’s video (provided it passes through the new quality metrics \change*/ in the Search Engine)

But if the underlying content is weak, or based on a shoddy understanding of the content-it can drive lots of horrid comments as well as ensuring that bad word of mouth is spread about the content or you/despite your hard work.

An example of this is copy and paste journalism especially in technology circles, where even a bigger Page Ranked website /blog can get away with scraping or stealing content from a lower page ranked website (or many websites)  after adding a cursory “expert comment”. This is also true when someone who is basically a corporate communication specialist (or PR -public relations) person is given a techinical text and encourage to write about it without completely understanding it.

A mild technical defect in the search engine algorithm is that it does not seem to pay attention to when the content was published, so the copying website or blog actually can get by as fresher content even if it is practically has 90% of the same words). The second flaw is over punishment or manual punishment of excessive linking – this can encourage search optimization minded people to hoard links or discourage trackbacks.

A free internet is one which promotes free sharing of content and does not encourage stealing or un-authorized scraping or content copying. Unfortunately current search engine optimization can encourage scraping and content copying without paying too much attention to origin of the words.

In addition the analytical rigor by which search algorithms search your inboxes (as in search all emails for a keyword) or media rich sites (like Youtube) are quite on a different level of quality altogether. The chances of garbage results are much more while searching for media content and/or emails.

IBM and Revolution team to create new in-database R

From the Press Release at http://www.revolutionanalytics.com/news-events/news-room/2011/revolution-analytics-netezza-partnership.php

Under the terms of the agreement, the companies will work together to create a version of Revolution’s software that takes advantage of IBM Netezza’s i-class technology so that Revolution R Enterprise can run in-database in an optimal fashion.

About IBM

For information about IBM Netezza, please visit: http://www.netezza.com.
For Information on IBM Information Management, please visit: http://www.ibm.com/software/data/information-on-demand/
For information on IBM Business Analytics, please visit the online press kit: http://www.ibm.com/press/us/en/presskit/27163.wss
Follow IBM and Analytics on Twitter: http://twitter.com/ibmbizanalytics
Follow IBM analytics on Tumblr: http://smarterplanet.tumblr.com/tagged/new_intelligence
IBM YouTube Analytics Channel: http://www.youtube.com/user/ibmbusinessanalytics
For information on IBM Smarter Systems: http://www-03.ibm.com/systems/smarter/

About Revolution Analytics

Revolution Analytics is the leading commercial provider of software and services based on the open source R project for statistical computing.  Led by predictive analytics pioneer Norman Nie, the company brings high performance, productivity and enterprise readiness to R, the most powerful statistics language in the world. The company’s flagship Revolution R product is designed to meet the production needs of large organizations in industries such as finance, life sciences, retail, manufacturing and media.  Used by over 2 million analysts in academia and at cutting-edge companies such as Google, Bank of America and Acxiom, R has emerged as the standard of innovation in statistical analysis. Revolution Analytics is committed to fostering the continued growth of the R community through sponsorship of the Inside-R.org community site, funding worldwide R user groups and offers free licenses of Revolution R Enterprise to everyone in academia.


Netezza, an IBM Company, is the global leader in data warehouse, analytic and monitoring appliances that dramatically simplify high-performance analytics across an extended enterprise. IBM Netezza’s technology enables organizations to process enormous amounts of captured data at exceptional speed, providing a significant competitive and operational advantage in today’s data-intensive industries, including digital media, energy, financial services, government, health and life sciences, retail and telecommunications.

The IBM Netezza TwinFin® appliance is built specifically to analyze petabytes of detailed data significantly faster than existing data warehouse options, and at a much lower total cost of ownership. It stores, filters and processes terabytes of records within a single unit, analyzing only the relevant information for each query.

Using Revolution R Enterprise & Netezza Together

Revolution Analytics and IBM Netezza have announced a partnership to integrate Revolution R Enterprise and the IBM Netezza TwinFin  Data Warehouse Appliance. For the first time, customers seeking to run high performance and full-scale predictive analytics from within a data warehouse platform will be able to directly leverage the power of the open source R statistics language. The companies are working together to create a version of Revolution’s software that takes advantage of IBM Netezza’s i-class technology so that Revolution R Enterprise can run in-database in an optimal fashion.

This partnership integrates Revolution R Enterprise with IBM Netezza’s high performance data warehouse and advanced analytics platform to help organizations combat the challenges that arise as complexity and the scale of data grow.  By moving the analytics processing next to the data, this integration will minimize data movement – a significant bottleneck, especially when dealing with “Big Data”.  It will deliver high performance on large scale data, while leveraging the latest innovations in analytics.

With Revolution R Enterprise for IBM Netezza, advanced R computations are available for rapid analysis of hundreds of terabyte-class data volumes — and can deliver 10-100x performance improvements at a fraction of the cost compared to traditional analytics vendors.

Additional Resources


TeraData buys AsterData for 260+ million $

Balance sheet venetian method
Image via Wikipedia

This just in! Big party in San Carlos this weekend.

Teradata is acquiring Aster Data‘s business, including its intellectual property and technology product line, through a merger transaction. Teradata plans to support Aster Data’s customers and integrate its employees immediately upon completion of the acquisition, which is expected to occur in the second quarter of 2011. Teradata acquired an 11 percent ownership interest in Aster Data in September 2010, and has agreed to pay an additional $263 million for the remaining ownership interest, net of debt and other expenses. In addition, through this acquisition, Teradata will obtain approximately $21 million of cash which Aster Data is expected to have on its balance sheet at closing.

http://www.asterdata.com/news/110303-Teradata-to-Acquire-Aster-Data.php

 

R Graphs Resources

Relevant GUI-

GrapheR and Deducer

https://rforanalytics.wordpress.com/graphical-user-interfaces-for-r/

Websites-


Graphics by Examples

. UCLA: Academic Technology Services,  Statistical Consulting Group. from https://www.ats.ucla.edu/stat/R/gbe/default.htm (accessed Feb 10, 2011)

https://www.ats.ucla.edu/stat/R/gbe/default.htm

Quick-R

http://www.statmethods.net/graphs/

Graph Gallery

http://addictedtor.free.fr/graphiques/allgraph.php

Frank McCown

https://www.harding.edu/fmccown/r/

Detailed Tutorial

https://math.illinoisstate.edu/dhkim/rstuff/rtutor.html

Advanced Data Visualization

Hadley Wickham

Courses- http://had.co.nz/stat645/

and Package-  http://had.co.nz/ggplot2/

example-

http://had.co.nz/ggplot2/geom_density.html

Interview David Katz ,Dataspora /David Katz Consulting

Here is an interview with David Katz ,founder of David Katz Consulting (http://www.davidkatzconsulting.com/) and an analyst at the noted firm http://dataspora.com/. He is a featured speaker at Predictive Analytics World  http://www.predictiveanalyticsworld.com/sanfrancisco/2011/speakers.php#katz)

Ajay-  Describe your background working with analytics . How can we make analytics and science more attractive career options for young students

David- I had an interest in math from an early age, spurred by reading lots of science fiction with mathematicians and scientists in leading roles. I was fortunate to be at Harry and David (Fruit of the Month Club) when they were in the forefront of applying multivariate statistics to the challenge of targeting catalogs and other snail-mail offerings. Later I had the opportunity to expand these techniques to the retail sphere with Williams-Sonoma, who grew their retail business with the support of their catalog mailings. Since they had several catalog titles and product lines, cross-selling presented additional analytic challenges, and with the growth of the internet there was still another channel to consider, with its own dynamics.

After helping to found Abacus Direct Marketing, I became an independent consultant, which provided a lot of variety in applying statistics and data mining in a variety of settings from health care to telecom to credit marketing and education.

Students should be exposed to the many roles that analytics plays in modern life, and to the excitement of finding meaningful and useful patterns in the vast profusion of data that is now available.

Ajay-  Describe your most challenging project in 3 decades of experience in this field.

David- Hard to choose just one, but the educational field has been particularly interesting. Partnering with Olympic Behavior Labs, we’ve developed systems to help identify students who are most at-risk for dropping out of school to help target interventions that could prevent dropout and promote success.

Ajay- What do you think are the top 5 trends in analytics for 2011.

David- Big Data, Privacy concerns, quick response to consumer needs, integration of testing and analysis into business processes, social networking data.

Ajay- Do you think techniques like RFM and LTV are adequately utilized by organization. How can they be propagated further.

David- Organizations vary amazingly in how sophisticated or unsophisticated the are in analytics. A key factor in success as a consultant is to understand where each client is on this continuum and how well that serves their needs.

Ajay- What are the various software you have worked for in this field- and name your favorite per category.

David- I started out using COBOL (that dates me!) then concentrated on SAS for many years. More recently R is my favorite because of its coverage, currency and programming model, and it’s debugging capabilities.

Ajay- Independent consulting can be a strenuous job. What do you do to unwind?

David- Cycling, yoga, meditation, hiking and guitar.

Biography-

David Katz, Senior Analyst, Dataspora, and President, David Katz Consulting.

David Katz has been in the forefront of applying statistical models and database technology to marketing problems since 1980. He holds a Master’s Degree in Mathematics from the University of California, Berkeley. He is one of the founders of Abacus Direct Marketing and was previously the Director of Database Development for Williams-Sonoma.

He is the founder and President of David Katz Consulting, specializing in sophisticated statistical services for a variety of applications, with a special focus on the Direct Marketing Industry. David Katz has an extensive background that includes experience in all aspects of direct marketing from data mining, to strategy, to test design and implementation. In addition, he consults on a variety of data mining and statistical applications from public health to collections analysis. He has partnered with consulting firms such as Ernst and Young, Prediction Impact, and most recently on this project with Dataspora.

For more on David’s Session in Predictive Analytics World, San Fransisco on (http://www.predictiveanalyticsworld.com/sanfrancisco/2011/agenda.php#day2-16a)

Room: Salon 5 & 6
4:45pm – 5:05pm

Track 2: Social Data and Telecom 
Case Study: Major North American Telecom
Social Networking Data for Churn Analysis

A North American Telecom found that it had a window into social contacts – who has been calling whom on its network. This data proved to be predictive of churn. Using SQL, and GAM in R, we explored how to use this data to improve the identification of likely churners. We will present many dimensions of the lessons learned on this engagement.

Speaker: David Katz, Senior Analyst, Dataspora, and President, David Katz Consulting

Exhibit Hours
Monday, March 14th:10:00am to 7:30pm

Tuesday, March 15th:9:45am to 4:30pm