terrific Tr.im trims Tweet time

Okay, the title of the post was bad attempt at a haiku. But the tr.im plugin for Firefox is incredible and helps you tweet interesting reading in matter of seconds. More importantly it shows you the analytics behind how many actual users went to that particular tr.im url. While Tr.im is yet another url shortening service like the tinyurl.com and bit.ly services, what makes Tr.im stand out in a terrific manner are the following innovations –

1) User friendly Firefox Plugin that can be downloaded from https://addons.mozilla.org/en-US/firefox/addon/10232/

See the screenshot of the Tr.im panel which conveniently opens on the left. The Statistics can be seen in the separate window ( note the Twitterfox application which is also open on the right – that is a separate application)

2) Analytics for tracking the locations, of people who click on the url and whether they were human or a bot.

3) Seamless Twitter integration even for multiple accounts

So it seems like you will run out of excuses to run away from Twitter soon, and all the additional social network data being generated could really help the next generation of response and online propensity models.

Tr.im that!!

screenshottrim

KXEN – Automated Regression Modeling

I have used KXEN many times for building and testing propensity models. The regression modeling feature of KXEN is awesome in the sense it can make model building very easy to build and deliver.

The KXEN package K2R is the package responsible for this and uses robust regression. A word of the basic mathematical theory behind KXEN’s automated modeling – the technique is called Structural Risk Minimization. You can read more on the basic mathematical technique here or http://www.svms.org/srm/. The following is an extract from the same source.

Structural risk minimization (SRM) (Vapnik and Chervonekis, 1974) is an inductive principle for model selection used for learning from finite training data sets. It describes a general model of capacity control and provides a trade-off between hypothesis space complexity (the VC dimension of approximating functions) and the quality of fitting the training data (empirical error). The procedure is outlined below.

  1. Using a priori knowledge of the domain, choose a class of functions, such as polynomials of degree n, neural networks having n hidden layer neurons, a set of splines with n nodes or fuzzy logic models having n rules.
  2. Divide the class of functions into a hierarchy of nested subsets in order of increasing complexity. For example, polynomials of increasing degree.
  3. Perform empirical risk minimization on each subset (this is essentially parameter selection).
  4. Select the model in the series whose sum of empirical risk and VC confidence is minimal.

Sewell (2006) SVMs use the spirit of the SRM principle.

Structural risk minimization (SRM) (Vapnik 1995) uses a set of models ordered in terms of their complexities. An example is polynomials of increasing order. The complexity is generally given by the number of free parameters. VC dimension is another measure of model complexity. In equation 4.37, we can have a set of decreasing ?i to get a set of models ordered in increasing complexity. Model selection by SRM then corresponds to finding the model simplest in terms of order and best in terms of empirical error on the data.”
Alpaydin (2004), pages 80-81

Now back to the automated regression modeling.

Robust Regression

(K2R) is a universal solution for Classification, Regression, and Attribute Importance. It enables the prediction of behaviors (nominal targets) or quantities (continuous targets).

Unlike traditional regression algorithms, K2R can safely handle a very high numbers of input attributes (over 10,000) in an automated fashion. K2R provides indicators and graphs to ensure that the quality and robustness of trained models can be easily assessed. K2R graphically displays the attribute importance, which provides the relative importance of each attribute for explaining a given business question. At the same time it gives a clear indication of which attributes either contain no relevant information or are redundant with other attributes.

Benefits: The business value of a data mining project is increased by either training more models or completing the project faster. The ability to train more models allows a larger number of scenarios to be tested at a higher level of granularity. For example, if a direct marketing campaign benefits from separate models trained per region, per customer, segment, per month, the automation of K2R allows all of these models to be trained and safely deployed using the same amount or fewer resources than with traditional tools. learn more

What: K2R is a regression algorithm that allows building models to predict categories or continuous variables.

Why: Traditionally, building robust predictive models required a lot of time and expertise, which prevented companies from using data mining as part of their every day business decisions. K2R makes it easy to build and deploy predictive models in the fraction of the time it takes using classical statistical tools.

How: K2R maps a set of descriptive attributes (model inputs) and target attributes (model output). It uses an algorithm patented by KXEN, which is a derivation of a principle described by V. Vapnik as “Structured Risk Minimization.” Instead of looking for the best performance on a known dataset, K2R automatically finds the best compromise between quality and robustness. The resulting models are expressed as a polynomial expression of the input numbers. The only element specified by the user is the polynomial degree. To improve modeling speed, K2R can also build multi-target models.

Benefits for the business user: K2R allows the business user to easily build and understand advanced predictive models without statistical knowledge. A model can be created in a matter of minutes. Two performance indicators describe model quality (Ki) and model reliability or the ability to produce similar on new data (Kr).

K2R graphically displays the individual variable contribution to the model, which helps to select the most important variables explaining a given business question. At the same time it avoids focusing on data that contains no information.

Models can directly be applied in a simulation mode for a single input dataset predicting the score for an individual business question in real time.

Benefits for the Data Mining expert: K2R frees time for Data Mining professionals to apply their expertise in areas where they add more value instead of spending several days to tune a model. K2R produces results within minutes (less than 15 seconds on a laptop with 50,000 lines and 20 variables).

Here is a case study from the company itself.

Marketing campaign usage scenario

* Send a “Test mailing” to 5000 customers to offer them a new product,
* Collect the results of your test mailing to build a “Training” data set that associates things you know about customers prior to the mailing with the answers to your business question
* Train a model to “predict” the Yes/No answer
* Check the quality and robustness of your model (Ki, Kr)
* Apply the model to the 1,000,000 other customers in your database: this model associates each individual customer with a probability for answering Yes. Because you are using a robust model, the sum of probabilities is a good indicator of how many people will answer yes to this mail
* Send your mailing only to those customers with a high probability to respond positively, or use our built-in profit curves to optimize your return on the campaign

Example: Regression: Dealer evaluation usage scenario

* Collect information about the past performance of your dealers two years ago and associate how much of your product they sold 1 year ago
* Train a model to predict how much a dealer will sell based on the available information
* Check the quality and robustness of the model (Ki, Kr)
* Apply the model to all of your dealers today: the model associates each dealer with an estimation of how many products he will sell,
* Sum up the estimates to predict how much you will sell next year. This is the base line for your sales forecast.

In my next post I would include screenshots on how to build an automated regression model using KXEN.

Ajay Disclaimer- I am a consultant to KXEN for social networks.

Twitterfox- Twitter for the busy people

Here is a nice firefox plugin for people who want to start using Twitter without losing too much time. It sits nicely in one corner and gives gentle tweets – think of it as a big instant messenger, big in terms of number of followers and need to use twitter and busy in terms of time, but very nice and comfortable. The screenshot says it all and all you need to do is start using Firefox and install this from http://twitterfox.net/.

Heavily recommended for non users of Twitter who are curious on what this thing is all about—-

Screenshots courtesy  myself and the gentlepeople at http://twitterfox.net/.

screenshot-decisionstats-e280ba-dashboard-e28094-wordpress-mozilla-firefox

TwitterFox is a Firefox extension that notifies you of your friends’ tweets on Twitter.

This extension adds a tiny icon on the status bar which notifies you when your friends update their tweets. Also it has a small text input field to update your tweets.

Install TwitterFox

If you want to get updates of TwitterFox, feel free to follow @TwitterFox.

New Features and Changes in Version 1.7.7.1

  • Supported Firefox 3.1b3
  • Added a context menu to each tweets which has:
    • Copy
    • Re-tweet
    • Open this tweet in new tab
    • Delete tweet
  • Auto extract is.gd and bit.ly links.
  • Added Mark all as read menu item to main context menu.
  • Increased contrast of background color of read/unread messages.
  • Added in-reply-to-status-id parameter for status update.
  • Added da-DK, th-TH, vi-VN, ar-SA, ar, and kw-GB translations.
  • Bug fixes.

Does Twitter reduce Blogging ?

One more post on Twitter you may sigh, but wait. I am examine Twitter as an economic complementary  or substitute product to Blogging and trying to come up with a mathematical proving rule to dis prove the Null Hypothesis-

Twitter does not affect blogging of individuals or communities as  a whole. or does it ?

Twitter reduces blogging because

  1. Twitter is easier to do. Creating a blog is different ball game.

  2. Tweeting is two way and interactive while Blogging is mostly a one way broadcast.

  3. People respond to Tweets and re tweet them much more than they comment or forward blog posts. This is due to the inherent design of the softwares.

  4. Twitter is chaotic, but so is real life in which human brain processes different information from people like collegues, family, friends and sorts them. Blogging has a structure which helps the reader more than the writer

  5. It is easier to tweet and faster to get your point across than in Blogging.

  6. People allocate a set amount of time for social media activities and personal branding. Now this may be elastic but not totally so. Hence the rise of twitter time in people’ lives would mean lesser time to read and write blogs.

Now to a more quantitative study.

We get statistics from Technocrati – State of the Blogosphere and add in WordPress Stats to boot.

(credit -http://technorati.com/blogging/state-of-the-blogosphere/ )

A chart of total WordPress.com blogs since  launch:

(credit- http://en.wordpress.com/stats/ )

Note new signups can be seen for WordPress.com at http://en.wordpress.com/stats/signups/

Fatigue could be a reason why Twitter is hotting up while Blogging sees steady state growth.

The following figure from Technocrati’s 2008 report sums it best.

http://technorati.com/blogging/state-of-the-blogosphere/who-are-the-bloggers/

But if I compare June 2008 numbers of Blogging Frequency with the 2007 report – I am not able to compare the numbers

(Source -http://technorati.com/blogging/state-of-the-blogosphere/the-how-of-blogging/ )

http://www.sifry.com/alerts/archives/000493.html

It seems that Blog posts did get a boost with the 2008 elections and the current low traffic may simply be due to a lack of issues in Blogosphere. The rise in Twitter traffic is also due to creation of applications by third party providers and this trend has led to Twitter being the number 3 social media site.

Based on the data, it does not seem Twitter reduces Blog posts to a significant degree. After all Twitter is also a great medium to disseminate or spread the word on good blog posts.

It is simply too early to say that Twitter is reducing blogging though there seem clear trends along that line.

What about you ? If you were a blogger, is  your blog post frequency affected by your tweeting activities.

iBi – Business Intelligence Applications on the iPhone

From the press release at QlikTech at http://www.qlikview.com/Contents.aspx?id=9836

They actually end up promoting Oracle’s mobile BI app even though they are trying to bash it up.

QlikTech, the world’s fastest-growing Business Intelligence (BI) company, today announced the immediate availability of QlikView for iPhone, the very first truly interactive mobile BI app built specifically for the iPhone. Unlike Oracle’s mobile BI offering that features a rigid interface and limited functionality, QlikView for iPhone fully leverages iPhone’s multitouch and GPS features to deliver QlikView’s renowned, industry-defining interactive capabilities. The result is a groundbreaking app that puts the power of sophisticated, real-time business answers in the hands of mobile users worldwide. It can be downloaded for free from Apple’s Mobile App Store on iTunes.

Product Highlights:

  • Interactive – click through line items on a list box or chart to get to answers, going deep into regional or product data.
  • Coverflow –flip through relevant business analysis, make a new selection and those changes are instantly reflected throughout.
  • GPS-enabled – automatically delivers local customer sales, service or inventory data as reps approach a customer or supplier facility.
  • Feature-rich – use Search, Bookmark and Shake to Erase

Smarter, Faster, Real-Time Interactive Analysis
Mobile professionals need access to comprehensive, real-time information, not static reports that lack detail from offerings like Oracle’s mobile BI tool. With QlikView for iPhone, salespeople can drill deep into accounts and get granular, up-to-the minute answers and analysis that help them do their job better. From specific customer or product data, down to a single SKU or employee name, QlikView for iPhone gets users what they need, the moment they need it.

“We comprehensively surveyed the BI mobile landscape and it was clear all previous attempts at addressing user needs failed miserably,” said Anthony Deighton, SVP Product, QlikTech.   “Just posting a static report on a mobile screen, as Oracle’s solution does, may be marginally helpful, but creates a tremendously frustrating user experience, leaving no opportunity to interact with the data. With QlikView for iPhone, users get a mobile view of a relevant data subset, as well as access to the specific answers they seek. This interactive dynamic is the only way to truly fulfill the promise of mobile BI.”

The Only Mobile BI Tool with Multitouch, Coverflow and GPS Integration
QlikView for iPhone takes full advantage of the iPhone’s native interface. The entire application is multitouch driven with complete implementation of the iPhone finger gestures users are accustomed to. Simple finger-swipes and finger-pinches enable users to select, interact and drill down into data. And to clear selections, all users have to do is shake the device. Apple’s popular 3-D Coverflow feature is also enabled, allowing users to “flip” through analyses in the same way they would through album covers and artists in iTunes. Real-time data changes are also instantly reflected in every Coverflow chart.

And here is the actual Oracle application

http://www.oracle.com/appserver/business-intelligence/business-indicators.html

Enhance Productivity for Mobile Business Users
Oracle Business Indicators is the first in a series of business applications for delivering Oracle business information to the Apple iPhone. The application provides mobile business users with real-time, secure access to business performance information on one of the industry’s most exciting and engaging mobile devices – Apple iPhone.

Oracle Business Indicators allows users to view and interact with Oracle Business Intelligence (BI) Applications that include financial, human resources, supply chain, and customer relationship management analytics, as well as analytical alerts generated by Oracle Delivers, an integrated component of Oracle Business Intelligence Enterprise Edition Plus (OBIEE). Leveraging full advantage of the Apple iPhone mobile platform, Oracle Business Indicators is built as a native application to offer highly intuitive and flexible features including browse, search, and favorites for a superior overall end user experience.

BENEFITS
* Pre-defined business indicators-Pre-built metrics and reports include financial, human resources, supply chain, and customer relationship management analytics.
* Timely alerts on exception conditions-Enables the mobile user to review alerts generated by conditions pre-defined in Oracle Delivers. A user can select an alert entry and immediately review an associated analytic report.
* Superior user experience-Offers a highly intuitive user interface for browsing, searching, and locating business performance metrics.
* Robust security-Based on the same user security model as Oracle BI Applications. Also supports Secure Sockets Layer (SSL) encryption technology.

SAS commits $70 million to Cloud Computing

From the official SAS website

http://www.sas.com/news/preleases/CCF2009.html

SAS to build $70 million cloud computing facility

New cloud computing facility will support needed data-intensive customer solutions

CARY, NC  (Mar. 19, 2009)  –  SAS, the leader in business analytics software and services, announces today it is building a 38,000-square-foot cloud computing facility to provide the additional data-handling capacity needed to expand SAS’ OnDemand offerings and hosted solutions.

As the need for hosted solutions grows, new research and development jobs will be generated at SAS’ Cary, N.C., world headquarters, where the majority of R&D employees (more than 1,400) are located.

“This project is proof that, despite the down economy, SAS continues to grow and innovate,” said Jim Goodnight, CEO of SAS. “The growing demand by our customers for hosted solutions has given us this opportunity to invest even further in North Carolina and the Cary community.”

In keeping with SAS’ commitment to protecting the environment, the facility will be built to Leadership in Energy and Environmental Design (LEED) standards for water and energy conservation. The sustainable construction methods encourage recycling of materials, similar to the Executive Briefing Center under construction on the Cary campus. SAS’ first LEED building, SAS Canada’s headquarters in Toronto, opened in April 2006.

In keeping with LEED standards, about 60 percent of the project’s construction and equipment spending will be in North Carolina.  Approximately 1,000 people will be involved in its design and construction.

The facility will include two 10,000-square-foot server farms. Server Farm 1 is anticipated to be on-line mid-2010 and support growth for three to five years.  Server Farm 2 will be constructed as a shell and will be populated with mechanical and electrical infrastructure once Server Farm 1 reaches 80 percent capacity.  The facility will be built on SAS’ Cary campus.

Apparently SAS Institute believes in creating jobs ( and thousands of them) during the recession ! Jim clearly is in top intellectual shape despite his err vintage. Imagine with just a browser and you could be crunching billions of bytes of data sitting from a beach in Goa! Thankfully they did not believe the hot air that McKinsey put out on cloud computing (read here http://smartdatacollective.com/Home/17942 )

McKinsey attacks Cloud Computing having no sense

McKinsey, that fine think tank of intellectuals recently dubbed cloud computing as not making sense -thus trying to throttle in its infancy a paradigm that could make companies across the world more competitive than they are today by helping cut costs precisely when they need it the most. The attempt to paint virtualization rather than remote computing is another attempt to cloud the air rather than clear the air on cloud computing. Most consulting companies would have pointed out industry affiliations and disclaimers on which companies they are representing or have represented.

Read other comments at the NYT article

Its study uses Amazon.com’s Web service offering as the price of outsourced cloud computing, since its service is the best-known and it publishes its costs. On that basis, according to McKinsey, the total cost of the data center functions would be $366 a month per unit of computing output, compared with $150 a month for the conventional data center. “The industry has assumed the financial benefits of cloud computing and, in our view, that’s a faulty assumption,” said Will Forrest, a principal at McKinsey, who led the study.

My take on this is here-

Cloud computing will have lower costs as economies of scale kick in, as they did for nearly all technologies. McKinsey partners must be having a hard time to meet their annual bonuses if they have not factored this basic assumption in their cost projections. Cloud computing just converts this to a mass infrastructure from the present scenario where you pay annual licenses for software that you use for less than 60 % in a day, and hardware that you find obsolete in 3-4 years, which is off course gives accountants a reason to help you with depreciation and tax benefits. Rent a computer in the sky is simpler – and you would not need any consultant to help advise what configuration you need.

Mckinsey has deep touches with the outsourcing industry in India from their seminal paper in 1999, to their first concept Knowledge center that helped start it, to their alumni across the outsourcing sector which satisfy a mutual symbiotic relationship particularly in business research. Cloud computing actually help with virtual teams – no need for server farms, IT bureaucracies and Indian outsourcing can actually reduce a lot of costs along with American direct users. The intermediaries and consultants would be affected the most.

Indeed I am speaking on the Cloud Slam 09, precisely on how cloud computing can help lower the digital divide by giving high power computing to anyone having a thin shell laptop with a browser. Developing countries need access to HPC to better plan their resources and growth in an environmentally optimized manner.

http://www.decisionstats.com