New Delhi R User Group and other thoughts

  1. The New Delhi R User group co-founded by me in 2011 now has 111 members , 5 organizers (including the now Canada exiled me) and two sponsors including Revolution Analytics (meetup fees) and Mimir Technologies (location ,office and refreshments) with almost 7-8 successful meetups.
  2. Despite the high fees of Meetup.com (converted in Indian rupees) , New Delhi continues to stick to it, thanks to Revolution’s generous support.
  3. UseRs continue to be of wide diversity including expatriate Americans, Doctors, Researchers, Analysts, IT people
  4. I am hoping more people in India create local R groups in other cities
  5. I hope someone comes up with a non-spam, non expensive option to Meetup.com which is not really an open platform (not really FOAS)
  6. The quality of technology and speed at which New Delhi users continue to pick and spread R amazes me.
  7. Fun Fact- India has largest number of analytics professionals certified in SAS language a fact that has been noticed by SAS clone WPS which is the option of choice for third party training institutes undercutting the SAS Institute itself on pricing while teaching SAS language.
  8. Funny Facts- I have kickstarted R language programs for two institutes in India- but am still searching for a better and cheaper alternative to Coursera (hands off) training, and Online (paid and support) training. Is there any FOAS training in R which offers tutoring as well? I have tried SageBourse- again with mixed results. It worked better for SAS language queries. I am loath to open my own training business as it takes time away from my writing.

Top five ways to do business unethically in India

Over a decade long career , I have often been reminded of this saying from erstwhile mentors in long forgotten consulting email group- It is not WHAT you KNOW, it is WHO you KNOW. The power of WHO you KNOW can defeat even what you know , have learnt or worked hard at. Accordingly these are some wry observations on how businesses sometimes take shortcuts in India, and the whys and wherefores.

1) Regulatory Arbitrage due to Lack of Regulatory Oversight- This is especially true in terms of labor practices. This includes under-paying Caucasians and non -Indians for internships , or jobs (in the name of sponsoring the work visa). India is an extremely inexpensive place to stay in, but it is sometimes unfriendly (in terms of laws not people) to people visiting from the West. This ranges from amusing things to paying 10 times the price for non Indian visitors to Taj Mahal- to not so funny things as paying them lower salaries because they need  a reason to stay on. Unfortunately this is true in many countries -underpaying aliens, but it is much better regulated in the West.

2) Stealing Intellectual Property– I have often known people to steal presentations and even excel macros from the place they were working to the new place. Almost no one gets prosecuted for intellectual property theft (unless you are caught with 10,000 pirated music or film cds)

3) Using Pirated Softwares – Lack of awareness of FOSS means many SMEs use shortcuts including downloading software from Pirate Bay and using this to work for clients in the West. Example- This could be as simple as downloading SAS software from Internet, or using WPS software for training and mis-representing SAS Institute’s name. (added confusion due to SAS -software,company,language ) . There are other major companies who suffer from this too, notably Microsoft.

This could be as complex as using academic versions of enterprise software for businesses purposes. In each case because of the geography, legal risk is quite low, and returns quite high from pirated software. It also helps lower the unethical vendor’s quotation of prices compared to the one who is doing it straight.

One way to avoid this is –ask your vendor to show you copy of how many legal licence’s for software. It can also help in cutting down exaggerated bench strength claims of vendors, as sometimes businesses hire many people and then put them on internal projects.

4) Illegal Trade practices- This include making employees sign a 1 year bond for not leaving the company after they have visited the West for company work- in the name of training . This also includes abusing the loopholes in various types of visa.

5) Ignoring signed contracts and negotiating to lower prices at every step illegally, in collusion with other vendors ( there is no effective anti -trust act ) and using the complete inadequate and lengthy nature of filing court cases in India.
Almost every non Indian client I know pays on time- almost every Indian client I know needs reminders. This is more of a mindset problem , knowing the reluctance to file lawsuits in India given slow progress in the courts ( India has 1.2 billion people and per capita access to judges and lawyers is quite low). The buzz word is- How much can we settle this? Lets do a settlement!

In the long run, this is choking off growth and potential of SMEs in India. In a continuing series- I will help the non Indian users with ways to use technology for legal remedies  in India for intellectual property  along with known case studies and examples.

Using R for Cricket Analysis #rstats #IPL

#Downloading the Data for batting across all formats of cricket
library(XML)
url="http://stats.espncricinfo.com/ci/engine/stats/index.html?class=11;template=results;type=batting"
tables=readHTMLTable(url,stringsAsFactors = F)
#Note we wrote stringsAsFactors=F in this to avoid getting factor variables, 
#since we will need to convert these variables to numeric variables
table2=tables$"Overall figures"
rm(tables)
#Creating new variables from Span
table2$Debut=as.numeric(substr(table2$Span,1,4))
table2$LastYr=as.numeric(substr(table2$Span,6,10))
table2$YrsPlayed=table2$LastYr-table2$Debut
#Creating New Variables. In cricket a not out score is denoted by * which can cause data quality error. 
#This is treated by grepl for finding and gsub for removing the *. 
#Note the double \ to escape regex charachter
table2$HSNotOut=grepl("\\*",table2$HS)
table2$HS2=gsub("\\*","",table2$HS)
#Creating a FOR Loop (!) to convert variables to numeric variables
for (i in 3:17) {
+     table2[, i] <- as.numeric(table2[, i])
+ }

and we see why Sachin Tendulkar is the best (by using ggplot via Deducer)

dmancasestudy5

Also see 

  • Freaknomics Challenge-
    1. Prove match fixing does not and cannot exist in IPL
    2. Create an ideal fantasy team
    
    

 

New Delhi UseRs March 2013 MeetUp #rstats

The fifth New Delhi UseRs Meet Up happened at Mimir Tech’s premises in Green Park, New Delhi. I presented on using GUIs for easier transitioning to R from other software but limited it to Deducer (for data visualization -specifically templates and facets in GGPLOT) and Rattle (for Data Mining). We also discussed a couple of things including how to apply R in other business domains, and open source alternatives to Meetup.com .

Dhiraj Rajaram creates India’s first billion dollar analytics company

Dhiraj Rajaram, got featured in Economic Times recently as the CEO- founder of India’s first billion dollar valuation analytics startup.

Mu Sigma attracts a clutch of foreign investors, gets valued at $1 billion, Dhiraj Rajaram is now king of data

This year, the company which employs 2,500 people across a development centre in Bangalore and offices in the US, UK and Australia, will build a data analytics lab in the US and hire 400 data scientists there.

I first met Dhiraj in 2008 Q1 for a job. We didnt agree partly because I needed to be close to my son ( who was 4 mth old) and I ended up taking a contract with another Bangalore based company. What impressed me at that time was something I rarely see in India’s analytics entrepreneurs-

1)  A Grand Vision- Dhiraj said- I am trying to build the largest math factory on the world.

2) Focus- Dhiraj was focused only on analytics projects. No quick and easy outsourcing low end tasks and outsourcing for him.

3) Positivity- Not once during the entire two hour interaction did he say a negative word on competition, attrition, challenges, pressures.

4) Flamboyance- I wonder sometimes why a colorful culture like India’s end up with people being so meek in corporate culture. Dhiraj was probably one of the most flamboyant senior analytics leaders.

But there were some concerns I had in 2008 q1- including plans for IPO ( I thought that was early) and senior management flux ( the COO left in a few months).

Anyways Dhiraj grew the 200 strong team to around 900 by 2010 q3. This time again he called me for a job interview. This time we again found that there was nothing I was really good at in analytics company- with my interest in open source, blogging and writing books, and my morbid fear of managing people in operations. However I noticed some changes-

  1. There were greater signs of process driven orientation ( including messages to keep meetings short)
  2. There were newer people in senior management
  3. Dhiraj was slightly more restrained in his frank talk ( given his increasing stature and demands on his time and attention on him)
  4. I loved the sign on his Office- Jugad. Literally that means ingenuity in Hindi- and shows a glimpse into the maveric, brilliant and flamboyant nature of the CEO.

Again, there were some odd points. Mu Sigma continued to have the perception ( true or false, I dont know) of having a large number of attrition at junior levels. Again there were rumours that Dhiraj had become a bit autocratic in management ( which I found no clue of). I found that the biggest problem that Mu Sigma, Dhiraj had – they were creating enemies just by shaking up the slow IT Services mindset of India- where easy money was available just by low quality labor arbitrage. This cultural opposition to anything new (like a pure analytics company), or anything rapid ( like a company that scales up organically) could have stopped lesser men, but Mu Sigma moved on.

So it was quite nice to read the news, finally an Indian company , had broken the 1 billion mark. Allow me some leeway here. I truly believe analytics and maths have no nationality. But if you see the rampant poverty in India , what we need is more aggressive and impatient businessmen like Mr Rajaram, than the chalta hain _ ” it is okay” attitude.

Dhiraj and team, take a bow. You make us proud!

 

 

 

 

Talking on Big Data Analytics

I am going  being sponsored to a Government of India sponsored talk on Big Data Analytics at Bangalore on Friday the 13 th of July. If you are in Bangalore, India you may drop in for a dekko. Schedule and Abstracts (i am on page 7 out 9) .

Your tax payer money is hard at work- (hassi majak only if you are a desi. hassi to fassi.)

13 July 2012 (9.30 – 11.00 & 11.30 – 1.00)
Big Data Big Analytics
The talk will showcase using open source technologies in statistical computing for big data, namely the R programming language and its use cases in big data analysis. It will review case studies using the Amazon Cloud, custom packages in R for Big Data, tools like Revolution Analytics RevoScaleR package, as well as the newly launched SAP Hana used with R. We will also review Oracle R Enterprise. In addition we will show some case studies using BigML.com (using Clojure) , and approaches using PiCloud. In addition it will showcase some of Google APIs for Big Data Analysis.

Lastly we will talk on social media analysis ,national security use cases (i.e. cyber war) and privacy hazards of big data analytics.

Schedule

View more presentations from Ajay Ohri.
Abstracts

View more documents from Ajay Ohri.

 

Indian Court un-blocks Pirate Bay

http://www.bbc.co.uk/news/technology-18551471

Web users in India are once again able to access video and file-sharing sites, including The Pirate Bay.The country’s Madras High Court has changed its earlier censorship order which centred on the issue of internet copyright

It states that only specific web addresses – URLs – carrying the pirated content should be blocked, but not the entire website.

“The order of interim injunction dated 25/04/2012 is hereby clarified that the interim injunction is granted only in respect of a particular URL where the infringing movie is kept and not in respect of the entire website,” reads the updated decision.