I am just listing down a set of basic R functions that allow you to start the task of business analytics, or analyzing a dataset(data.frame). I am doing this both as a reference for myself as well as anyone who wants to learn R- quickly.
I am not putting in data import functions, because data manipulation is a seperate baby altogether. Instead I assume you have a dataset ready for analysis and what are the top R commands you would need to analyze it.
For anyone who thought R was too hard to learn- here is ten functions to learning R
1) str(dataset) helps you with the structure of dataset
2) names(dataset) gives you the names of variables
3)mean(dataset) returns the mean of numeric variables
4)sd(dataset) returns the standard deviation of numeric variables
5)summary(variables) gives the summary quartile distributions and median of variables
That about gives me the basic stats I need for a dataset.
> data(faithful)
> names(faithful) [1] "eruptions" "waiting"
> str(faithful) 'data.frame': 272 obs. of 2 variables: $ eruptions: num 3.6 1.8 3.33 2.28 4.53 ... $ waiting : num 79 54 74 62 85 55 88 85 51 85 ...
> summary(faithful) eruptions waiting Min. :1.600 Min. :43.0 1st Qu.:2.163 1st Qu.:58.0 Median :4.000 Median :76.0 Mean :3.488 Mean :70.9 3rd Qu.:4.454 3rd Qu.:82.0 Max. :5.100 Max. :96.0 > mean(faithful) eruptions waiting 3.487783 70.897059 > sd(faithful) eruptions waiting 1.141371 13.594974
6) I can do a basic frequency analysis of a particular variable using the table command and $ operator (similar to dataset.variable name in other statistical languages)
> table(faithful$waiting) 43 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 62 63 64 65 66 67 68 69 70 1 3 5 4 3 5 5 6 5 7 9 6 4 3 4 7 6 4 3 4 3 2 1 1 2 4 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 5 1 7 6 8 9 12 15 10 8 13 12 14 10 6 6 2 6 3 6 1 1 2 1 1
or I can do frequency analysis of the whole dataset using
> table(faithful) waiting eruptions 43 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 62 63 64 65 66 67 1.6 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1.7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1.733 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
.....output truncated
7) plot(dataset)
It helps plot the dataset
8) hist(dataset$variable) is better at looking at histograms
hist(faithful$waiting)
9) boxplot(dataset)
10) The tenth function for a beginner would be cor(dataset$var1,dataset$var2)
> cor(faithful) eruptions waiting eruptions 1.0000000 0.9008112 waiting 0.9008112 1.0000000
I am assuming that as a beginner you would use the list of GUI at http://rforanalytics.wordpress.com/graphical-user-interfaces-for-r/ to import and export Data. I would deal with ten steps to data manipulation in R another post.