Analyzing Conversations on Twitter

If you are a marketing , analyst relationship, public relationship or a product manager who uses or abuses social media, you sometimes need to track what influencers and analysts are saying. A tool called Bettween allows you to capture public conversations between two influential (or interesting) tweeps.

See conversations between Neil Raden http://www.beyeblogs.com/raden/ and Curt Monash http://www.dbms2.com/ two noted BI gurus

http://bettween.com/neilraden/curtmonash

  • @NEILRADEN66
  • @CURTMONASH61
  • TOTAL MESSAGES127
  • SHARE CONVERSATION


unless Google decides to license its Wave technology to Twitter for separate encrypted , or public tweets. 🙂 They do share some history and employees (cough cough) or Twitter waits to create or better its public /protected tweet mode to be more granular

http://bettween.com/neilraden/curtmonash#statistics

tools to analyze Twitter conversations in SAS

Business Analytics Analyst Relations /Ethics/White Papers

Curt Monash, whom I respect and have tried to interview (unsuccessfully) points out suitable ethical dilemmas and gray areas in Analyst Relations in Business Intelligence here at http://www.dbms2.com/2010/07/30/advice-for-some-non-clients/

If you dont know what Analyst Relations are, well it’s like credit rating agencies for BI software. Read Curt and his landscaping of the field here ( I am quoting a summary) at http://www.strategicmessaging.com/the-ethics-of-white-papers/2010/08/01/

Vendors typically pay for

  1. They want to connect with sales prospects.
  2. They want general endorsement from the analyst.
  3. They specifically want endorsement from the analyst for their marketing claims.
  4. They want the analyst to do a better job of explaining something than they think they could do themselves.
  5. They want to give the analyst some money to enhance the relationship,

Merv Adrian (I interviewed Merv here at http://www.dudeofdata.com/?p=2505) has responded well here at http://www.enterpriseirregulars.com/23040/white-paper-sponsorship-and-labeling/

None of the sites I checked clearly identify the work as having been sponsored in any way I found obvious in my (admittefly) quick scan. So this is an issue, but it’s not confined to Oracle.

My 2 cents (not being so well paid 😉 are-

I think Curt was calling out Oracle (which didnt respond) and not Merv ( whose subsequent blog post does much to clarify).

As a comparative new /younger blogger in this field,
I applaud both Curt to try and bell the cat ( or point out what everyone in AR winks at) and for Merv for standing by him.

In the long run, it would strengthen analyst relations as a channel if they separate financial payment of content from bias. An example is credit rating agencies who forgot to do so in BFSI and see what happened.

Customers invest millions of dollars in BI systems trusting marketing collateral/white papers/webinars/tests etc. Perhaps it’s time for an industry association for analysts so that individual analysts don’t knuckle down under vendor pressure.

It is easier for someone of Curt, Merv’s stature to declare editing policy and disclosures before they write a white paper.It is much harder for everyone else who is not so well established.

White papers can take as much as 25,000$ to produce- and I know people who in Business Analytics (as opposed to Business Intelligence) slog on cents per hour cranking books on R, SAS , webinars, trainings but there are almost no white papers in BA. Are there any analytics independent analysts who are not biased by R or SAS or SPSS or etc etc. I am not sure but this looks like a good line to  pursue 😉 – provided ethical checks and balances are established.

Personally I know of many so called analytics communities go all out to please their sponsors so bias in writing does exist (you cant praise SAS on a R Blogging Forum or R USers Meet and you cant write on WPS at SAS Community.org )

– at the same time someone once told me- It is tough to make a living as a writer, and that choice between easy money and credible writing needs to be respected.

Most sponsored white papers I read are pure advertisements, directed at CEOs rather than the techie community at large.

Almost every BI vendor claims to have the fastest database with 5X speed- and benchmarking in technical terms could be something they could do too.

Just like Gadget sites benchmark products, you can not benchmark BI or even BA products as it is written not to do so  in many licensing terms.

Probably that is the reason Billions are spent in BI and the positive claims are doubtful ( except by the sellers). Similarly in Analytics, many vendors would have difficulty justifying their claims or prices if they are subjected to a side by side comparison. Unfortunately the resulting confusion results in shoddy technology coming stronger due to more aggressive marketing.

Open Source and Software Strategy

Curt Monash at Monash Research pointed out some ongoing open source GPL issues for WordPress and the Thesis issue (Also see http://ma.tt/2009/04/oracle-and-open-source/ and  http://www.mattcutts.com/blog/switching-things-around/).

As a user of both going upwards of 2 years- I believe open source and GPL license enforcement are general parts of software strategy of most software companies nowadays. Some thoughts on  open source and software strategy-Thesis remains a very very popular theme and has earned upwards of 100,000 $ for its creator (estimate based on 20k plus installs and 60$ avg price)

  • Little guys like to give away code to get some satisfaction/ recognition, big guys give away free code only when its necessary or when they are not making money in that product segment anyway.
  • As Ethan Hunt said, ” Every Hero needs a Villian”. Every software (market share) war between players needs One Big Company Holding more market share and Open Source Strategy between other player who is not able to create in house code, so effectively out sources by creating open source project. But same open source propent rarely gives away the secret to its own money making project.
    • Examples- Google creates open source Android, but wont reveal its secret algorithm for search which drives its main profits,
    • Google again puts a paper for MapReduce but it’s Yahoo that champions Hadoop,
    • Apple creates open source projects (http://www.apple.com/opensource/) but wont give away its Operating Source codes (why?) which help people buys its more expensive hardware,
    • IBM who helped kickstart the whole proprietary code thing (remember MS DOS) is the new champion of open source (http://www.ibm.com/developerworks/opensource/) and
    • Microsoft continues to spark open source debate but read http://blogs.technet.com/b/microsoft_blog/archive/2010/07/02/a-perspective-on-openness.aspx and  also http://www.microsoft.com/opensource/
    • SAS gives away a lot of open source code (Read Jim Davis , CMO SAS here , but will stick to Base SAS code (even though it seems to be making more money by verticals focus and data mining).
    • SPSS was the first big analytics company that helps supports R (open source stats software) but will cling to its own code on its softwares.
    • WordPress.org gives away its software (and I like Akismet just as well as blogging) for open source, but hey as anyone who is on WordPress.com knows how locked in you can get by its (pricy) platform.
    • Vendor Lock-in (wink wink price escalation) is the elephant in the room for Big Software Proprietary Companies.
    • SLA Quality, Maintenance and IP safety is the uh-oh for going in for open source software mostly.
  • Lack of IP protection for revenue models for open source code is the big bottleneck  for a lot of companies- as very few software users know what to do with source code if you give it to them anyways.
    • If companies were confident that they would still be earning same revenue and there would be less leakage or theft, they would gladly give away the source code.
    • Derivative softwares or extensions help popularize the original softwares.
      • Half Way Steps like Facebook Applications  the original big company to create a platform for third party creators),
      • IPhone Apps and Android Applications show success of creating APIs to help protect IP and software control while still giving some freedom to developers or alternate
      • User Interfaces to R in both SAS/IML and JMP is a similar example
  • Basically open source is mostly done by under dog while top dog mostly rakes in money ( and envy)
  • There is yet to a big commercial success in open source software, though they are very good open source softwares. Just as Google’s success helped establish advertising as an alternate ( and now dominant) revenue source for online companies , Open Source needs a big example of a company that made billions while giving source code away and still retaining control and direction of software strategy.
  • Open source people love to hate proprietary packages, yet there are more shades of grey (than black and white) and hypocrisy (read lies) within  the open source software movement than the regulated world of big software. People will be still people. Software is just a piece of code.  😉

(Art citation-http://gapingvoid.com/about/ and http://gapingvoidgallery.com/

CommeRcial R- Integration in software

Some updates to R on the commercial side.

Revolution Computing is apparently now renamed Revolution Analytics. Hopefully this and the GUI development will help pay more focused attention on working in R in a mainstream office situation. I am still waiting for David Smith’s cheery hey-guys-we-changed-again blog post though at a new site called inside-r.org/ or his old blog site at blog.revolution-computing.com

They probably need to hire more people now – Curt Monash, noted all-things-data software guru has the inside dope here

Techworld writes more here at http://www.techworld.com.au/article/345288/startup_wants_r_alternative_ibm_sas

The company’s software is priced “aggressively” versus IBM and SAS. A single supported workstation costs $2,000 for an annual subscription. Pricing for server-based licenses varies depending on the implementation.

But Revolution Analytics faces a tough challenge from those larger vendors, as well as the likes of XLSolutions, which offers R training and a competing software package, R-Plus.

SPSS though continues to integrate R solidly and also march ahead with Python (which is likely to be the next gen in statistical programming if it keeps up) http://insideout.spss.com/

With the release of Version 18 of IBM SPSS Statistics and the Developer product, easy-to-install versions of the Python and R materials are posted.  In particular, look for the R Essentials link on the main page or from the Plugins page.  It installs the R Plugin, the correct version of R, and a bunch of example R integrations as bundles.  It’s much easier to get going with this now.

Netezza , a business intelligence vendor promises more integration and even a training in R based analytics here

R Modeling for TwinFin i-Class

Objective
Learn how to use TwinFin i-Class for scaling up the R language.

Description
In this class, you’ll learn how to use R to create models using huge data and how to create R algorithms that exploit our asymmetric massively parallel (AMPP®) architecture. Netezza has seamlessly integrated with R to offload the heavy lifting of the computational processing on TwinFin i-Class. This results in higher performance and increased scalability for R. Sign up for this class to learn how to take advantage of TwinFin i-Class for your R modeling. Topics include:

  1. R CRAN package installation on TwinFin i-Class
  2. Creating models using R on TwinFin i-Class
  3. Creating R algorithms for TwinFin i-Class

Format
Hands-on classroom lecture, lab exercises, tour

Audience
Knowledgeable R users – modelers, analytic developers, data miners

Course Length
0.5 day: 12pm-4pm Wednesday, June 23 OR 8am-12pm Thursday, June 24 OR 1pm-5pm Thursday, June 24, 2010

Delivery
Enzee Universe 2010, Boston, MA

Student Prerequisites

  • Working knowledge of R and parallel computing
  • Have analytic, compute-intensive challenges
  • Understanding of data mining and analytics”

My favourite GUI in stats , JMP (also from SAS Institute) is going to deploy R integration as soon as this September – Read more here- http://www.sas.com/news/preleases/JMP-to-R-integrationSGF10.html

Also SAS-IML studio is not lagging behind

The next release of SAS/IML will extend R integration to the server environment – enabling users to deploy results in batch mode and access R from SAS on additional platforms, such as UNIX and Linux.

I am kind of happy at one of the best GUI’s integrating with one of the most innovative stats softwares. It’s like two of your best friends getting married. (see screenshots of the softwares)

All in all- R as a platform making good overall progress from all sides of the corporate software spectrum which can only be good for R developers as well as users/students.

Advanced Analytics on Multi-Terabyte Datasets- Conferences

Some news on Data Mining 2009 by Aster Data –

SAS and Aster Data to Present “Advanced Analytics on Multi-Terabyte Datasets” at M2009 in Las Vegas – Oct. 26-27
Learn how the tight coupling of SQL and MapReduce provided by Aster Data creates new ‘big data’ analytics opportunities when combined with SAS. Aster Data will exhibit throughout the event.
More

And also a nice  webcast by Curt Monash on the same Big Data topic-

Mastering MapReduce Webinar Series, Session 1
“Big Data Reality: The Role of MapReduce in Big Data Management and Analysis”- Oct. 15
Industry analyst Curt Monash explains the basics of MapReduce, key uses cases, and which industries and applications are heavily using MapReduce. Topics include recommendations for integrating MapReduce in an enterprise business intelligence and data warehousing environment.
More

Also,

Here is a brief synopsis on the Aster Data ( http://www.facebook.com/pages/Aster-Data-Systems/5601042375) Sponsored Big Data Summit  ( http://www.facebook.com/pages/Big-Data-Summit/143312171156 )which I attended-

  • A Plan for Large Scale Data Analytics: How to Utilize Aster nCluster and Hadoop in a Symbiotic
    Relationship to Support Processing in Excess of 100 Billion Rows Per Month
    – Michael Brown and Will Duckworth
    (EVP, Software Engineering, comScore, Inc. and Director, Software Engineering, comScore, Inc.)

This talked of the special needs of Com Score in handling big data and why Map Reduce and Hadoop seem to be the cost effective solutions for big big data while RDBMS seems stuck in the middle of middle data. Broadly informative on the statistical challenges of the future given the explosion of data as well.

  • Making Sense of Hadoop – Its Fit With Data Warehouses – Colin White
    (President and Founder of BI Research)

Colin brought a nice perspective on the open source Hadoop vis a vis the Properietary packages and the traditional DBMS. His perspective on the solution is no software is perfect for all needs while all softwares that sell have their own good points while the converging solution could be a heterogeneous solution of the above.

  • MapReduce Inside a Database System – When and How Case Studies from ShareThis, Specific Media, and Other – Tasso Argyros (Chief Technology Officer and Co-Founder of Aster Data)

This was a more detailed look at the Big Product Launch ( the Hadoop Connector) by Tasso and an interesting look at time series analysis using nPath rather than SQL . Interesting given the ongoing convergence analytics and business intelligence.

Also Tasso lived up to his presenting charm with an excellent pitch on nPath (as his interview said ).

  • Large-Scale Analytics at LinkedIn – Jonathan Goldman
    (Former Principal Scientist at LinkedIn)

This was nice given Jonathan’s perscpective ( he has Phd In Physics) and now does consulting for LinkedIn while maintaining his interests in education- the special needs for social media websites, designing experiments on the fly with huge real time datasets as well as some interesting visualizations (like India and America have the second biggest cross country Li connections after USA- UK. Apparently Linkedin ( http://www.facebook.com/group.php?gid=2211231478 ) does not sound so good when translated in Chinese ( AT Dinner I learnt from a fellow Chinese student that China censors Facebook – sigh!).

  • Networking Mixer: Beer, wine, hot hors d’oeuvres

I got interviewed ( AFTER) I had mixed some Beer and Wine for myself. The Video interview which was the first video interview I have given ( You know- I have taken SOME interviews by Email and plan to do some more while in Vegas for the Data Mining 2009  with SAS http://www.facebook.com/group.php?gid=2227381262)

They are still editing that interview 😉

—That was all – you need to send me a Facebook invite to see the rest of the NY trip or better still just join the Facebook page of Decision Stats at

http://www.facebook.com/pages/DecisionStats/191421035186

After two weeks I hope to have some more coverage on Data Mining 2009 while at the same time enjoying my much needed Fall Break-  Life at University at Tennessee is looking up ( since we beat Georgia 45-19 🙂 )

r*xE5HeUJa(%

Buddypress for Analytical Buddies??

Let us assume there are top 100 analysts in the world mostly using WordPress or Typepad or Blogger to make posts

Managing them is quite a challenge.

What is marketing ROI of analyst relationships for a Business Intelligence vendor- Curt Monash is the Aerosmith of Business Intelligence Analysts so he can tell it better.

How about a magical community where you just use their mostly Feedburner of Feedblitz RSS feeds to create a self automated community.

Serach Engine Optimization can be tricked by keeping that community website free from Google or Search Engines ( yes it can be done).

Use numerical etc as in Linkedin to spur rivalry by shifting their page positions up and down, or by clicking repeatedly on some posts to manipulate their views on blog posts.

What would SAS pay to have all SAS analysts in one webpage. or SPSS to have all SPSS analysts in one webpage.

Six months later suddenly open the website for search engines, and the RSS feed has downloaded all the posts of all the top 50 analysts of the world. Google advertsing wont matter because hey we have a mega vendor sponsor- while individual bloggers / analysts have no collective strength now as the community is too big.

So much blah blah-

What software would you use.

you can choose between

Ning.com ( but it mostly non Blog feeds based)

or Wordframe.com ( which interface and name sounds suspiciously like WordPress software)

Or you can choose a customized WordPress Solution called Buddy Press.

Here is the software-

BuddyPress

BuddyPress will transform an installation of WordPress MU into a social network platform.

BuddyPress is a suite of WordPress plugins and themes, each adding a distinct new feature. BuddyPress contains all the features you’d expect from WordPress but aims to let members socially interact. Read More ?

Note this was just a generic case study for making a case for open source based community softwares. Resemblance to any thing is a matter of coincidence – except for Curt Monash of course.

Cost of Customized WordPress Software for communties is a big zero- it is free and open source and tjousands of plugins can be installed and maintained for it.

See an existing installation here

www.decisionstats.com/community

or at www.buddypress.org