Adding / to robots. text again

So I tried to move without a search engine , and only social sharing, but for a small blog like mine, that means almost 75% of traffic comes via search engines.
Maybe the ratio of traffic from search to social will change in the future,

I have now enough data to conclude search is the ONLY statistically significant driver of traffic ( for a small blog)
If you are a blogger you should definitely try and give the tools at Google Webmaster a go,


URL Googlebot type Fetch Status Fetch date Web Denied by robots.txt 1/19/12 8:25 PM Web Success URL and linked pages submitted to index 12/27/11 9:55 PM


Also from Google Analytics, I see that denying search traffic doesnot increase direct/ referral traffic in any meaningful way.

So my hypothesis that some direct traffic was mis-counted as search traffic due to Chrome, toolbar search – well the hypothesis was wrong 🙂

Also Google seems to drop url quite quickly (within 18 hours) and I will test the rebound in SERPs in a few hours.  I was using meta tags, blocked using robots.txt, and removal via webmasters ( a combination of the three may have helped)

To my surprise search traffic declined to 5-10, but it did not become 0. I wonder why that happens (I even got a few Google queries per day) and I was blocking the “/” fron robots.txt.


Net Net- The numbers below show- as of now , in a non SOPA, non Social world, Search Engines remain the webmasters only true friend (till they come up with another panda or whatever update 😉 )

Who made Libre Office




513 individuals contributed to (and whose contributions were imported into LibreOffice) or LibreOffice until 2011-11-11 09:02:38.

Developers committing code since 2010-09-28

Ruediger Timm
Commits: 89832
Joined: 2000-10-10
Kurt Zenker
Commits: 32763
Joined: 2000-09-25
Oliver Bolte
Commits: 31795
Joined: 2000-09-19
Vladimir Glazunov
Commits: 30289
Joined: 2000-12-04
Jens-Heiner Rechtien [hr]
Commits: 29314
Joined: 2000-09-18
Ivo Hinkelmann
Commits: 10228
Joined: 2002-09-09
Caolán McNamara
Commits: 5952
Joined: 2000-10-10
Frank Schoenheit [fs]
Commits: 5019
Joined: 2000-09-19
Hans-Joachim Lankenau
Commits: 3077
Joined: 2000-09-19
Ocke Janssen [oj]
Commits: 2861
Joined: 2000-09-20
Mathias Bauer
Commits: 2606
Joined: 2000-09-20
Oliver Specht
Commits: 2458
Joined: 2000-09-21
Philipp Lohmann [pl]
Commits: 2132
Joined: 2000-09-21
Tor Lillqvist
Commits: 2035
Joined: 2010-03-23
Stephan Bergmann
Commits: 1993
Joined: 2000-10-04
Christian Lippka ORACLE
Commits: 1811
Joined: 2000-09-25

We do not distinguish between commits that were imported from the OOo code base and those that went directly into the LibreOffice code base as:
a) it is technically not possible to distinguish between commits that go directly into the LibreOffice code base and commits that were merged in from the code base, and
b) contributers to the OOo code base should also be credited for the excellent work they do.

Do note that LibreOffice is divided into 20 git repositories. Pushing a change into all repositories will be counted as 20 commits as there is no way to distinguish this from 20 separate commits.

Total contributions to the TDF Wiki

1223 individuals contributed:

2011 Analytics Recap

Events in the field of data that impacted us in 2011

1) Oracle unveiled plans for R Enterprise. This is one of the strongest statements of its focus on in-database analytics. Oracle also unveiled plans for a Public Cloud

2) SAS Institute released version 9.3 , a major analytics software in industry use.

3) IBM acquired many companies in analytics and high tech. Again.However the expected benefits from Cognos-SPSS integration are yet to show a spectacular change in market share.

2011 Selected acquisitions

Emptoris Inc. December 2011

Cúram Software Ltd. December 2011

DemandTec December 2011

Platform Computing October 2011

 Q1 Labs October 2011

Algorithmics September 2011

 i2 August 2011

Tririga March 2011


4) SAP promised a lot with SAP HANA- again no major oohs and ahs in terms of market share fluctuations within analytics.

5) Amazon continued to lower prices of cloud computing and offer more options.

6) Google continues to dilly -dally with its analytics and cloud based APIs. I do not expect all the APIs in the Google APIs suit to survive and be viable in the enterprise software space.  This includes Google Cloud Storage, Cloud SQL, Prediction API at Some of the location based , translation based APIs may have interesting spin offs that may be very very commercially lucrative.

7) Microsoft -did- hmm- I forgot. Except for its investment in Revolution Analytics round 1 many seasons ago- very little excitement has come from MS plans in data mining- The plugins for cloud based data mining from Excel remain promising yet , while Azure remains a stealth mode starter.

8) Revolution Analytics promised us a GUI and didnt deliver (till yet 🙂 ) . But it did reveal a much better Enterprise software Revolution R 5.0 is one of the strongest enterprise software in the R /Stat Computing space and R’s memory handling problem is now an issue of perception than actual stuff thanks to newer advances in how it is used.

9) More conferences, more books and more news on analytics startups in 2011. Big Data analytics remained a strong buzzword. Expect more from this space including creative uses of Hadoop based infrastructure.

10) Data privacy issues continue to hamper and impede effective analytics usage. So does rational and balanced regulation in some of the most advanced economies. We expect more regulation and better guidelines in 2012.

Ads Alliance on Internet

Just saw

the Digital Advertising Alliance’s (DAA) Self-Regulatory Program for Online Behavioral Advertising.

Multi-Site Data Collection Principles Broaden Self Regulation Beyond Online Behavioral Advertising

The new Principles consist of the following specific requirements:

  1. Transparency and consumer control for purposes other than OBA – The Multi-Site Data Principles call for organizations that collect Multi-Site Data for purposes other than OBA to provide transparency and control regarding Internet surfing across unrelated Websites.
  2. Collection / use of data for eligibility determination – The Multi-Site Data Principles prohibit the collection, use or transfer of Internet surfing data across Websites for determination of a consumer’s eligibility for employment, credit standing, healthcare treatment and insurance.
  3. Collection / use of children’s data – The Multi-Site Data Principles state that organizations must comply with the Children’s Online Privacy Protection Act (COPPA).
  4. Meaningful accountability – The Multi-Site Data Principles are subject to enforcement through strong accountability mechanisms.

The DAA Self-Regulatory Principles


The cross-industry Self-Regulatory Principles for Multi-Site Data augment the Self-Regulatory   Principles for Online Behavioral Advertising  (OBA)  by covering the prospective  collection of Web site   data beyond that collected for OBA purposes.  The existing OBA  Principles and definitions  remain in   full force and effect and are not limited by the new  principles.

The cross-industry Self-Regulatory Principles for Online Behavioral Advertising was developed by   leading industry associations to apply  consumer-friendly standards to online  behavioral advertising  across the Internet. Online behavioral advertising increasingly supports the convenient access to  content, services, and applications over the Internet that consumers have come to expect at no cost   to them.

The Education Principle calls for organizations to participate in efforts to educate individuals and businesses about online behavioral advertising and the Principles.

The Transparency Principle calls for clearer and easily accessible disclosures to consumers about data collection and use practices associated with online behavioral advertising. It will result in new, enhanced notice on the page where data is collected through links embedded in or around advertisements, or on the Web page itself.

The Consumer Control Principle provides consumers with an expanded ability to choose whether data is collected and used for online behavioral advertising purposes. This choice will be available through a link from the notice provided on the Web page where data is collected.

The Consumer Control Principle requires “service providers”, a term that includes Internet access service providers and providers of desktop applications software such as Web browser “tool bars” to obtain the consent of users before engaging in online behavioral advertising, and take steps to de-identify the data used for such purposes.

The Data Security Principle calls for organizations to provide appropriate security for, and limited retention of data, collected and used for online behavioral advertising purposes.

The Material Changes Principle calls for obtaining consumer consent before a Material Change is made to an entity’s Online Behavioral Advertising data collection and use policies unless that change will result in less collection or use of data.

The Sensitive Data Principle recognizes that data collected from children and used for online behavioral advertising merits heightened protection, and requires parental consent for behavioral advertising to consumers known to be under 13 on child-directed Web sites. This Principle also provides heightened protections to certain health and financial data when attributable to a specific individual.

The Accountability Principle calls for development of programs to further advance these Principles, including programs to monitor and report instances of uncorrected non-compliance with these Principles to appropriate government agencies. The CBBB and DMA have been asked and agreed to work cooperatively to establish accountability mechanisms under the Principles.


Ajay- So why the self regulations?

Answer- Shoddy Maths in behaviorally targeted ads is leading to a very high glut in targeted ads, more than can be reasonably expected to click based on consumer spending. On the internet- unlike on television- cost is less of a barrrier to OVER ADVERTISING.


Moving data between Windows and Ubuntu VMWare partition

I use Windows 7 on my laptop (it came pre-installed) and Ubuntu using the VMWare Player. What are the advantages of using VM Player instead of creating a dual-boot system? Well I can quickly shift from Ubuntu to Windows and bakc again without restarting my computer everytime. Using this approach allows me to utilize software that run only on Windows and run software like Rattle, the R data mining GUI, that are much easier installed on Linux.

However if your statistical software is on your Virtual Disk , and your data is on your Windows disk, you need a way to move data from Windows to Ubuntu.

The solution to this as per Ubuntu forums is –

Open My Computer, browse to the folder you want to share.  Right-click on the folder, select Properties.  Sharing tab.  Select the radio button to “Share this Folder”.  Change the default generated name if you wish; add a description if you wish.  Click the Permissions button to modify the security settings of what users can read/write to the share.

On the Linux side, it depends on the distro, the shell, and the window manager.

Well Ubuntu makes it really easy to configure the Linux steps to move data within Windows and Linux partitions.



VMmare makes it easy to share between your Windows (host) and Linux (guest) OS


Step 1

and step 2

Do this



Start the Wizard

when you finish the wizard and share a drive or folder- hey where do I see my shared ones-


see this folder in Linux- /mnt/hgfs (bingo!)

Hacker HW – Make this folder //mnt/hgfs a shortcut in Places your Ubuntu startup

Hacker Hw 2-

Upload using an anon email your VM dark data to Ubuntu one

Delete VM

Purge using software XX

Reinstall VM and bring back backup


Note time to do this




-General Sharing in Windows



Just open the Network tab in Ubuntu- see screenshots below-

Windows will now ask your Ubuntu user for login-

Once Logged in Windows from within Ubuntu Vmware, this is what happens

You see a tab called “users on “windows username”- pc appear on your Ubuntu Desktop  (see top right of the screenshot)

If you double click it- you see your windows path

You can now just click and drag data between your windows and linux partitions , just the way you do it in Windows .

So based on this- if you want to build  decision trees, artifical neural networks, regression models, and even time series models for zero capital expenditure- you can use both Ubuntu/R without compromising on your IT policy of Windows only in your organization (there is a shortage of Ubuntu trained IT administrators in the enterprise world)

Revised Installation Procedure for utilizing both Ubuntu /R/Rattle data mining on your Windows PC.

Using VMWare to build a free data mining system in R, as well as isolate your analytics system (thus using both Linux and Windows without overburdening your machine)

First Time

  1. and Install
  2. Only
  3. Create New Virtual Image in VM Ware Player
  4. Applications—–Terminal——sudo apt get-install R (to download and install)
  5.                                          sudo R (to open R)
  6. Once R is opened type this  —-install.packages(rattle)—– This will install rattle
  7. library(rattle) will load Rattle—–
  8. rattle() will open the GUI—-
Getting Data from Host to Guest VM
Next Time
  1. Go to VM Player
  2. Open the VM
  3. sudo R in terminal to bring up R
  4. library(rattle) within R
  5. rattle()
At this point even if you dont know any Linux and dont know any R, you can create data mining models using the Rattle GUI (and time series model using E pack in the R Commander GUI) – What can Rattle do in data mining? See this slideshow-
If Google Docs is banned as per your enterprise organizational IT policy of having Windows Explorer only- well you can see these screenshots

Zynga Mafia Wars 2 on Google Plus

The latest game on Google Plus is a clone of one of the most important games in social gaming history- Mafia Wars 2. Early days and a more detailed review to follow- but there has been a design paradigm change in terms of icons, fonts and storyline. Will this capture the gamers attention- time will tell?

Use R for Business- Competition worth $ 20,000 #rstats

All you contest junkies, R lovers and general change the world people, here’s a new contest to use R in a business application


$20,000 in Prizes for Users Solving Business Problems with R


PALO ALTO, Calif. – September 1, 2011 – Revolution Analytics, the leading commercial provider of R software, services and support, today announced the launch of its “Applications of R in Business” contest to demonstrate real-world uses of applying R to business problems. The competition is open to all R users worldwide and submissions will be accepted through October 31. The Grand Prize winner for the best application using R or Revolution R will receive $10,000.

The bonus-prize winner for the best application using features unique to Revolution R Enterprise – such as itsbig-data analytics capabilities or its Web Services API for R – will receive $5,000. A panel of independent judges drawn from the R and business community will select the grand and bonus prize winners. Revolution Analytics will present five honorable mention prize winners each with $1,000.

“We’ve designed this contest to highlight the most interesting use cases of applying R and Revolution R to solving key business problems, such as Big Data,” said Jeff Erhardt, COO of Revolution Analytics. “The ability to process higher-volume datasets will continue to be a critical need and we encourage the submission of applications using large datasets. Our goal is to grow the collection of online materials describing how to use R for business applications so our customers can better leverage Big Analytics to meet their analytical and organizational needs.”

To enter Revolution Analytics’ “Applications of R in Business” competition Continue reading “Use R for Business- Competition worth $ 20,000 #rstats”

%d bloggers like this: