Ajay Ohri

Does Facebook deserve a 100 billion Valuation

some questions in my Mind as I struggle to bet my money and pension savings on Facebook IPO

1) Revenue Mix- What percentage of revenues for Facebook come from Banner ads versus gaming partners like Zynga. How dependent is Facebook on Gaming partners. (Zynga has Google as an investor). What mix of revenue is dependent on privacy regulation countries like Europe vs countries like USA.

2) Do 800 million users of Facebook mean 100 billion valuation ? Thats a valuation of $125 in customer life time in terms of NPV . Since ad revenue is itself a percentage of actual good and services sold- how much worth of goods and services do consumers have to buy per capita , to give $125 worth of ads to FB. Eg . companies spend 5% of product cost on Facebook ads, so does that mean each FB account will hope to buy 2500$ worth of Goods from the Internet and from Facebook (assuming they also buy from Amazon etc)

3) Corporate Governance- Unlike Google, Facebook has faced troubling questions of ethics from the day it has started. This includes charges of intellectual property theft, but also non transparent FB stock option pricing in secondary markets before IPO, private placement by Wall Street Bankers like GoldMan Saachs, major investments by Russian Internet media corporations. (read- http://money.cnn.com/2011/01/03/technology/facebook_goldman/index.htm)

4) Retention of key employees post IPO- Key Employees at Google are actually ex- Microsofties. Key FB staff are ex-Google people. Where will the key -FB people go when bored and rich after IPO.

5) Does the macro Economic Condition justify the premium and Private Equity multiple of Facebook?

Will FB be the next Google (in terms of investor retruns) or will it be like Groupon. I suspect the answer is- it depends on market discounting these assumptions while factoring in sentiment (as well as unloading of stock from large number of FB stock holders on week1).

Baby You Are a Rich Man. but not 100 billion rich. yet. Maybe 80 billion isnt that bad.

Quantitative Modeling for Arbitrage Positions in Ad KeyWords Internet Marketing

Assume you treat an ad keyword as an equity stock. There are slight differences in the cost for advertising for that keyword across various locations (Zurich vs Delhi) and various channels (Facebook vs Google) . You get revenue if your website ranks naturally in organic search for the keyword, and you have to pay costs for getting traffic to your website for that keyword.

An arbitrage position is defined as a riskless profit when cost of keyword is less than revenue from keyword. We take examples of Adsense and Adwords primarily.

There are primarily two types of economic curves on the foundation of which commerce of the internet resides-

1) Cost Curve- Cost of Advertising to drive traffic into the website (Google Adwords, Twitter Ads, Facebook , LinkedIn ads)

2) Revenue Curve – Revenue from ads clicked by the incoming traffic on website (like Adsense, LinkAds, Banner Ads, Ad Sharing Programs , In Game Ads)

The cost and revenue curves are primarily dependent on two things

1) Type of KeyWord-Also subdependent on

a) Location of Prospective Customer, and

b) Net Present Value of Good and Service to be eventually purchased

For example , keyword for targeting sales of enterprise “business intelligence software” should ideally be costing say X times as much as keywords for “flower shop for birthdays” where X is the multiple of the expected payoffs from sales of business intelligence software divided by expected payoff from sales of flowers (say in Location, Daytona Beach ,Florida or Austin, Texas)

2) Traffic Volume – Also sub-dependent on Time Series and

a) Seasonality -Annual Shoppping Cycle

b) Cyclicality– Macro economic shifts in time series

The cost and revenue curves are not linear and ideally should be continuous in a definitive exponential or polynomial manner, but in actual reality they may have sharp inflections , due to location, time, as well as web traffic volume thresholds

Type of Keyword – For example ,keywords for targeting sales for Eminem Albums may shoot up in a non linear manner after the musician dies.

The third and not so publicly known component of both the cost and revenue curves is factoring in internet industry dynamics , including relative market share of internet advertising platforms, as well as percentage splits between content creator and ad providing platforms.

For example, based on internet advertising spend, people belive that the internet advertising is currently heading for a duo-poly with Google and Facebook are the top two players, while Microsoft/Skype/Yahoo and LinkedIn/Twitter offer niche options, but primarily depend on price setting from Google/Bing/Facebook.

It is difficut to quantify the elasticity and efficiency of market curves as most literature and research on this is by in-house corporate teams , or advisors or mentors or consultants to the primary leaders in a kind of incesteous fraternal hold on public academic research on this.

It is recommended that-

1) a balance be found in the need for corporate secrecy to protest shareholder value /stakeholder value maximization versus the need for data liberation for innovation and grow the internet ad pie faster-

2) Cost and Revenue Curves between different keywords, time,location, service providers, be studied by quants for hedging inetrent ad inventory or /and choose arbitrage positions This kind of analysis is done for groups of stocks and commodities in the financial world, but as commerce grows on the internet this may need more specific and independent quants.

3) attention be made to how cost and revenue curves mature as per level of sophistication of underlying economy like Brazil, Russia, China, Korea, US, Sweden may be in different stages of internet ad market evolution.

For example-

A study in cost and revenue curves for certain keywords across domains across various ad providers across various locations from 2003-2008 can help academia and research (much more than top ten lists of popular terms like non quantitative reports) as well as ensure that current algorithmic wightings are not inadvertently given away.

Part 2- of this series will explore the ways to create third party re-sellers of keywords and measuring impacts of search and ad engine optimization based on keywords.

Analytics Conferences for 2012

NOTE: Early Bird registration for PAW and TAW San Francisco is January 20th – $400 lower than Onsite Price.

CONFERENCE: Predictive Analytics World – San Francisco
March 4-10, 2012 in San Francisco, CA
http://predictiveanalyticsworld.com/sanfrancisco/2012
Discount Code : AJBP12

CONFERENCE: Text Analytics World – San Francisco
March 6-7, 2012 in San Francisco, CA
http://textanalyticsworld.com/sanfrancisco/2012
Discount Code :AJBP12

VARIOUS ANALYTICS WORKSHOPS:
A plethora of 1-day workshops are held alongside PAW and TAW
For details see: http://pawcon.com/sanfrancisco/2012/analytics_workshops.php

SEMINAR: Predictive Analytics for Business, Marketing & Web
March 22-23, 2012 in New York City, NY
July 26-27, 2012 in São Paulo, Brazil
A concentrated training program lead by Eric Siegel.
http://businessprediction.com

CONFERENCE: Predictive Analytics World – Toronto
April 26-27, 2012 in Toronto, Ontario
http://predictiveanalyticsworld.com/toronto/2012
Discount Code :AJBP12

CONFERENCE: Predictive Analytics World – Chicago
June 25-26, 2012 in Chicago, IL
http://www.predictiveanalyticsworld.com/chicago/2012/
Discount Code :AJBP12

MORE ANALYTICS EVENTS:
PAW Düsseldorf: November 6-7, 2012 – http://www.predictiveanalyticsworld.de
PAW London: November 27-28, 2012 – http://www.pawcon.com
PAW Videos: Available on-demand – http://www.pawcon.com/video

Topic Models

Some stuff on Topic Models-

http://en.wikipedia.org/wiki/Topic_model

In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract “topics” that occur in a collection of documents. An early topic model was probabilistic latent semantic indexing (PLSI), created by Thomas Hofmann in 1999.[1] Latent Dirichlet allocation (LDA), perhaps the most common topic model currently in use, is a generalization of PLSI developed by David Blei, Andrew Ng, and Michael Jordan in 2002, allowing documents to have a mixture of topics.[2] Other topic models are generally extensions on LDA, such as Pachinko allocation, which improves on LDA by modeling correlations between topics in addition to the word correlations which constitute topics. Although topic models were first described and implemented in the context of natural language processing, they have applications in other fields such as bioinformatics.

http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation

In statistics, latent Dirichlet allocation (LDA) is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. For example, if observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word’s creation is attributable to one of the document’s topics. LDA is an example of a topic model

David M Blei’s page on Topic Models-

http://www.cs.princeton.edu/~blei/topicmodeling.html

a general introduction to topic modeling .
At KDD-2011 a long tutorial about topic modeling. The slides are here .
slides from a talk on dynamic and correlated topic models applied to the journal Science . (Here is a video of the talk.)
a more technical review paper about this field.
David Mimno maintains a bibliography of topic modeling papers and software.

The topic models mailing list is a good forum for discussing topic modeling.

In R,

topicmodels and lda are two R packages for LDA analysis.

Some resources I compiled on Slideshare based on the above- Continue reading “Topic Models”

Automatically creating tags for big blogs with WordPress

I use the simple-tags plugin in WordPress for automatically creating and posting tags. I am hoping this makes the site better to navigate. Given the fact that I had not been a very efficient tagger before, this plugin can really be useful for someone in creating tags for more than 100 (or 1000 posts) especially WordPress based blog aggregators.

The plugin is available here –

http://wordpress.org/extend/plugins/simple-tags/

Simple Tags is the successor of Simple Tagging Plugin This is THE perfect tool to manage perfectly your WP terms for any taxonomy

It was written with this philosophy : best performances, more secured and brings a lot of new functions

This plugin is developped on WordPress 3.3, with the constant WP_DEBUG to TRUE.

Administration

Tags suggestion from Yahoo! Term Extraction API, OpenCalais, Alchemy, Zemanta, Tag The Net, Local DB with AJAX request
- Compatible with TinyMCE, FCKeditor, WYMeditor and QuickTags
tags management (rename, delete, merge, search and add tags, edit tags ID)
Edit mass tags (more than 50 posts once)
Auto link tags in post content
Auto tags !
Type-ahead input tags / Autocompletion Ajax
Click tags
Possibility to tag pages (not only posts) and include them inside the tags results
Easy configuration ! (in WP admin)

The above plugin can be combined with the RSS Aggregator plugin for Search Engine Optimization purposes

Ajay-You can also combine this plugin with RSS auto post blog aggregator (read instructions here) and create SEO optimized Blog Aggregation or Curation

Information Ladder for Analytics

One very commonly used diagram in marketing and sales by analytics providers, which is hardly ever credited to its author is the Information Ladder

http://en.wikipedia.org/wiki/Information_ladder

The information ladder is a diagram created by education professor Norman Longworth to describe the stages in human learning. According to the ladder, a learner moves through the following progression to construct “wisdom” at the highest level from “data” at the lowest level:

Data →

Information →

Knowledge →

Understanding →

Insight →

Wisdom

Whereas the first two steps can be scientifically exactly defined, the upper parts belong to the domain of psychology and philosophy.

I sometimes think the information ladder and especially the latter two parts are underutilized, under-quantified as metrics and rarely understood completely by the wise men in analytics and information display.

Some visual versions are below

Funny enough, it is one of the rare concepts first inspired by poetry-

http://en.wikipedia.org/wiki/DIKW

The earliest formalized distinction between wisdom, knowledge, and information may have been made by poet and playwright T.S. Eliot

Where is the Life we have lost in living?
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

Business Analytics Projects

As per me, Analytics Projects get into these four broad phases-

Business Problem Phase– What needs to be done?

Increase Revenues
Cut Costs
Investigate Unusual Events
Project Timelines

Technical Problem Phase – Technical Problems in Project Execution

Data Availability /Data Quality/Data Augmentation Costs
Statistical -(Technique based approach) , Hypothesis Formulation,Sampling, Iterations
Programming-(Tool based approach) Analytics Platform Coding (Input, Formats,Processing)

Technical Solution Phase – Problem Solving using the Tools and Skills Available

Data Cleaning /Outlier Treatment/Missing Value Imputation
Statistical -(Technique based approach) Error Minimization, Model Validation, Confidence Levels
Programming-(Tool based approach) Analytics Platform Coding (Output, Display,Graphs)

Business Solution Phase– Put it all together in a word document, presentation and/or spreadsheet

Finalized- Forecasts , Models and Data Strategies
Improvements in existing processes
Control and Monitoring of Analytical Results post Implementation
Legal and Compliance guidelines to execution
(Internal or External) Client Satisfaction and Expectation Management
Audience Feedback based on presenting final deliverable to broader audience

Please share:

Please share:

Please share:

Please share:

Please share:

Please share:

Please share: