Home » Posts tagged 'ajay'
Tag Archives: ajay
Ajay- Describe how you started using R. What are some of the benefits you noticed on moving to R?
Jeff- I began using R in an internship while working on my undergraduate degree. I was provided with some unformatted R code and asked to modularize the code then wrap it up into an R package for distribution alongside a publication.
To be honest, as a Computer Science student with training more heavily emphasizing the big high-level languages, R took some getting used to for me. It wasn’t until after I concluded that initial project and began using R to do my own data analysis that I began to realize its potential and value. It was the first scripting language which really made interactive use appealing to me — the experience of exploring a dataset in R was unlike anything (more…)
going being sponsored to a Government of India sponsored talk on Big Data Analytics at Bangalore on Friday the 13 th of July. If you are in Bangalore, India you may drop in for a dekko. Schedule and Abstracts (i am on page 7 out 9) .
Your tax payer money is hard at work- (hassi majak only if you are a desi. hassi to fassi.)
13 July 2012 (9.30 – 11.00 & 11.30 – 1.00)
Big Data Big Analytics
The talk will showcase using open source technologies in statistical computing for big data, namely the R programming language and its use cases in big data analysis. It will review case studies using the Amazon Cloud, custom packages in R for Big Data, tools like Revolution Analytics RevoScaleR package, as well as the newly launched SAP Hana used with R. We will also review Oracle R Enterprise. In addition we will show some case studies using BigML.com (using Clojure) , and approaches using PiCloud. In addition it will showcase some of Google APIs for Big Data Analysis.
Lastly we will talk on social media analysis ,national security use cases (i.e. cyber war) and privacy hazards of big data analytics.
A common example in business analytics data is to take a random sample of a very large dataset, to test your analytics code. Note most business analytics datasets are data.frame ( records as rows and variables as columns) in structure or database bound.This is partly due to a legacy of traditional analytics software.
Here is how we do it in R-
• Refering to parts of data.frame rather than whole dataset.
Using square brackets to reference variable columns and rows
The notation dataset[i,k] refers to element in the ith row and jth column.
The notation dataset[i,] refers to all elements in the ith row .or a record for a data.frame
The notation dataset[,j] refers to all elements in the jth column- or a variable for a data.frame.
For a data.frame dataset
> nrow(dataset) #This gives number of rows
> ncol(dataset) #This gives number of columns
An example for corelation between only a few variables in a data.frame.
Splitting a dataset into test and control.
ts.test=dataset2[1:200] #First 200 rows
ts.control=dataset2[201:275] #Next 75 rows
Random sampling enables us to work on a smaller size of the whole dataset.
use sample to create a random permutation of the vector x.
Suppose we want to take a 5% sample of a data frame with no replacement.
Let us create a dataset ajay of random numbers
#This is the kind of code line that frightens most MBAs!!
Note we use the round function to round off values.
This is a typical business data scenario when we want to select only a few records to do our analysis (or test our code), but have all the columns for those records. Let us assume we want to sample only 5% of the whole data so we can run our code on it
Then the number of rows in the new object will be 0.05*nrow(ajay).That will be the size of the sample.
The new object can be referenced to choose only a sample of all rows in original object using the size parameter.
We also use the replace=FALSE or F , to not the same row again and again. The new_rows is thus a 5% sample of the existing rows.
Then using the square backets and ajay[new_rows,] to get-
You can change the percentage from 5 % to whatever you want accordingly.
Over the Christmas break, I created a Google Adwords campaign using the $100 credit generously given by Google. I did it using my alumni id, even though I have a perfectly normal gmail id. I guess if Google allows me to use the credit on any account- well I will take it. and so a free experiment was borne.
But whom to target -with Google- but Google itself. It seemed logical
So I created a campaign for the names of prominent Googlers (from a list of Google + at https://plus.google.com/103399926392582289066/posts/LX4g7577DqD ) and limited the ad location to Mountain View, California.
NULL HYPOTHESIS- People who are googled a lot from within the office are either popular or just checking themselves.
My ad was-
Hire Ajay Ohri
or see screenshot below.
Here are the results-88 clicks and 43000 impressions (and 83$ of Google’s own money)
clearly Vic Gundotra is googled a lot within Mountain View, California. Does He Google himself.
so is Matt Cutts. Does HE Google himself or does he get elves to help him.
to my disappointment not many people clicked my LI offer, I am still blogging
and there were few clicks on Marissa Myers. Why Google her when she is right down the corridor.
The null hypothesis is thus rejected. Also most clicks were from display and not from search.
I need to do something better to do with Christmas break this year. I still got a credit of 16$ left.