Home » Posts tagged 'r commander'
Tag Archives: r commander
Data Visualization for R packages at Github #rstats
I noticed this article sometime back by the most excellent hacker, John Myles White ( author Machine learning for Hackers)
http://www.johnmyleswhite.com/notebook/2012/08/12/the-social-dynamics-of-the-r-core-team/
Professor John Fox, whom we have interviewed here as the creator of R Commander, talked on this at User 2008
http://www.statistik.uni-dortmund.de/useR-2008/slides/Fox.pdf
I also noticed that R Project is stuck on SVN ( yes or no??, comment please) while some part of the rest of the World has moved on to Git. See
http://en.wikipedia.org/wiki/Git_%28software%29
Is Git really that good compared to SVN
http://stackoverflow.com/questions/871/why-is-git-better-than-subversion
Maybe, I think with 5000 packages and more , R -project needs to have more presence on Github and atleast consider Git for the distributed and international project R is becoming.
The second meetup for R New Delhi Users
The R Users of New Delhi met for the second time on Dec 15, 2012. We meet on the third Saturday of every month.
We talked on epidemiology using epi calc package ( we have 1 doctor and 1 bio statistician) , and Cloud Computing ( we have two IT guys) and Business Analytics. We also discussed the GUI , R Commander , Rattle, and Deducer for beginners and people transitioning to R from other analytics software. We also discussed the R for SAS and SPSS Users books, and R for Data Mining Book. The free book for R for Epidemiology (
http://cran.r-project.org/doc/contrib/Epicalc_Book.pdf
) was mentioned . Not bad for 1 hour.
We are currently unfunded and unsponsored , I hope to get some sponsors to give away R books to encourage users and group members (excluding my own). The only catch to join this meetup group, you either need to attend (and be local) or present something ( if you are not in Delhi)
I have been trying to get this group to go from Vector to Matrix to get a bigger sponsorship from Revolution , but I am constrained by meeting in a public cafe. That is due to change since we managed to get one sponsor for meeting place in Noida ( a Business School batchmate who owns his office)
http://www.revolutionanalytics.com/news-events/r-user-group/
Deadlines for applications are:
- March 31, 2013 for Matrix and Array level groups.
- September 30, 2013 for Vector level groups.
2013 Sponsorship Levels
The size of the annual grant depends on the size of your group.
| Level | For groups that are: | Requirements | Annual Grant ($USD) |
| Vector | Just getting started | A group name, group webpage, and a focus on R. (Here are some tips on starting up a new R user group.) | $100 |
| Matrix | Smaller but established | 3 meetings in last 6 months with 30 attendees or more. | $500 |
| Array | Larger and groups | 3 meetings in last 6 months with 60 attendees or more. | $1000 |
Related articles
- New Delhi R User group meets up (decisionstats.com)
Interviews and Reviews: More R #rstats
I got interviewed on moving on from Excel to R in Human Resources (HR) here at
http://www.hrtecheurope.com/blog/?p=5345
“There is a lot of data out there and it’s stored in different formats. Spreadsheets have their uses but they’re limited in what they can do. The spreadsheet is bad when getting over 5000 or 10000 rows – it slows down. It’s just not designed for that. It was designed for much higher levels of interaction.
In the business world we really don’t need to know every row of data, we need to summarise it, we need to visualise it and put it into a powerpoint to show to colleagues or clients.”
And a more recent interview with my fellow IIML mate, and editor at Analytics India Magazine
http://analyticsindiamag.com/interview-ajay-ohri-author-r-for-business-analytics/
AIM: Which R packages do you use the most and which ones are your favorites?
AO: I use R Commander and Rattle a lot, and I use the dependent packages. I use car for regression, and forecast for time series, and many packages for specific graphs. I have not mastered ggplot though but I do use it sometimes. Overall I am waiting for Hadley Wickham to come up with an updated book to his ecosystem of packages as they are very formidable, completely comprehensive and easy to use in my opinion, so much I can get by the occasional copy and paste code.
A surprising review at R- Bloggers.com /Intelligent Trading
http://intelligenttradingtech.blogspot.in/2012/10/book-review-r-for-business-analytics.html
The good news is that many of the large companies do not view R as a threat, but as a beneficial tool to assist their own software capabilities.
After assisting and helping R users navigate through the dense forest of various GUI interface choices (in order to get R up and running), Mr. Ohri continues to handhold users through step by step approaches (with detailed screen captures) to run R from various simple to more advanced platforms (e.g. CLOUD, EC2) in order to gather, explore, and process data, with detailed illustrations on how to use R’s powerful graphing capabilities on the back-end.
Do you want to write a review too? You can visit the site here
http://www.springer.com/statistics/book/978-1-4614-4342-1
Related articles
- What does R do? Bring people together, of course! (r-bloggers.com)
- Book Review: R for Business Analytics, A Ohri (r-bloggers.com)
JSS launches special edition for GUI for #Rstats
I love GUIs (graphical user interfaces)- they might be TCL/TK based or GTK based or even QT based. As a researcher they help me with faster coding, as a consultant they help with faster transition of projects from startup to handover stage and as an R instructor helps me get people to learn R faster.
I wish Python had some GUIs though
from the open access journal of statistical software-
JSS Special Volume 49: Graphical User Interfaces for R
Pedro M. Valero-Mora, Ruben Ledesma
Vol. 49, Issue 1, Jun 2012
Submitted 2012-06-03, Accepted 2012-06-03
Ya-Shan Cheng, Chien-Yu Peng
Vol. 49, Issue 2, Jun 2012
Submitted 2010-12-31, Accepted 2011-06-29
Joris J. Snellenburg, Sergey Laptenok, Ralf Seger, Katharine M. Mullen, Ivo H. M. van Stokkum
Vol. 49, Issue 3, Jun 2012
Submitted 2011-01-20, Accepted 2011-09-16
Marcel Austenfeld, Wolfram Beyschlag
Vol. 49, Issue 4, Jun 2012
Submitted 2011-01-05, Accepted 2012-02-20
Byron C. Wallace, Issa J. Dahabreh, Thomas A. Trikalinos, Joseph Lau, Paul Trow, Christopher H. Schmid
Vol. 49, Issue 5, Jun 2012
Submitted 2010-11-01, Accepted 2012-12-20
Bei Huang, Dianne Cook, Hadley Wickham
Vol. 49, Issue 6, Jun 2012
Submitted 2011-01-20, Accepted 2012-04-16
John Fox, Marilia S. Carvalho
Vol. 49, Issue 7, Jun 2012
Submitted 2010-12-26, Accepted 2011-12-28
Ian Fellows
Vol. 49, Issue 8, Jun 2012
Submitted 2011-02-28, Accepted 2011-09-08
Stefan Rödiger, Thomas Friedrichsmeier, Prasenjit Kapat, Meik Michalke
Vol. 49, Issue 9, Jun 2012
Submitted 2010-12-28, Accepted 2011-05-06
John Verzani
Vol. 49, Issue 10, Jun 2012
Submitted 2010-12-17, Accepted 2011-05-11
Antony Unwin
Vol. 49, Issue 11, Jun 2012
Submitted 2010-12-08, Accepted 2011-07-15
Moving data between Windows and Ubuntu VMWare partition
I use Windows 7 on my laptop (it came pre-installed) and Ubuntu using the VMWare Player. What are the advantages of using VM Player instead of creating a dual-boot system? Well I can quickly shift from Ubuntu to Windows and bakc again without restarting my computer everytime. Using this approach allows me to utilize software that run only on Windows and run software like Rattle, the R data mining GUI, that are much easier installed on Linux.
However if your statistical software is on your Virtual Disk , and your data is on your Windows disk, you need a way to move data from Windows to Ubuntu.
The solution to this as per Ubuntu forums is -
http://communities.vmware.com/thread/55242
Open My Computer, browse to the folder you want to share. Right-click on the folder, select Properties. Sharing tab. Select the radio button to “Share this Folder”. Change the default generated name if you wish; add a description if you wish. Click the Permissions button to modify the security settings of what users can read/write to the share.
On the Linux side, it depends on the distro, the shell, and the window manager.
Well Ubuntu makes it really easy to configure the Linux steps to move data within Windows and Linux partitions.
NEW UPDATE-
VMmare makes it easy to share between your Windows (host) and Linux (guest) OS
Step 1
and step 2
Do this
and
Start the Wizard
when you finish the wizard and share a drive or folder- hey where do I see my shared ones-
see this folder in Linux- /mnt/hgfs (bingo!)
Hacker HW – Make this folder //mnt/hgfs a shortcut in Places your Ubuntu startup
Hacker Hw 2-
Upload using an anon email your VM dark data to Ubuntu one
Delete VM
Purge using software XX
Reinstall VM and bring back backup
Note time to do this
-General Sharing in Windows
Just open the Network tab in Ubuntu- see screenshots below-
Windows will now ask your Ubuntu user for login-
Once Logged in Windows from within Ubuntu Vmware, this is what happens
You see a tab called “users on “windows username”- pc appear on your Ubuntu Desktop (see top right of the screenshot)
If you double click it- you see your windows path
You can now just click and drag data between your windows and linux partitions , just the way you do it in Windows .
So based on this- if you want to build decision trees, artifical neural networks, regression models, and even time series models for zero capital expenditure- you can use both Ubuntu/R without compromising on your IT policy of Windows only in your organization (there is a shortage of Ubuntu trained IT administrators in the enterprise world)
Revised Installation Procedure for utilizing both Ubuntu /R/Rattle data mining on your Windows PC.
Using VMWare to build a free data mining system in R, as well as isolate your analytics system (thus using both Linux and Windows without overburdening your machine)
First Time
http://downloads.vmware.com/d/info/desktop_end_user_computing/vmware_player/4_0
Download and Install
http://www.ubuntu.com/download/ubuntu/download
Download Only- Create New Virtual Image in VM Ware Player
- Applications—–Terminal——sudo apt get-install R (to download and install)
- sudo R (to open R)
- Once R is opened type this —-install.packages(rattle)—– This will install rattle
- library(rattle) will load Rattle—–
- rattle() will open the GUI—-
Getting Data from Host to Guest VMNext Time
- Go to VM Player
- Open the VM
- sudo R in terminal to bring up R
- library(rattle) within R
- rattle()
At this point even if you dont know any Linux and dont know any R, you can create data mining models using the Rattle GUI (and time series model using E pack in the R Commander GUI) – What can Rattle do in data mining? See this slideshow-
http://www.decisionstats.com/data-mining-with-r-gui-rattle-rstats/
If Google Docs is banned as per your enterprise organizational IT policy of having Windows Explorer only- well you can see these screenshots
http://rattle.togaware.com/rattle-screenshots.html
#rstats -Basic Data Manipulation using R
Continuing my series of basic data manipulation using R. For people knowing analytics and
new to R.
1 Keeping only some variables Using subset we can keep only the variables we want- Sitka89 <- subset(Sitka89, select=c(size,Time,treat)) Will keep only the variables we have selected (size,Time,treat). 2 Dropping some variables Harman23.cor$cov.arm.span <- NULL
This deletes the variable named cov.arm.span in the dataset Harman23.cor 3 Keeping records based on character condition Titanic.sub1<-subset(Titanic,Sex=="Male") Note the double equal-to sign
4 Keeping records based on date/time condition
subset(DF, as.Date(Date) >= '2009-09-02' & as.Date(Date) <= '2009-09-04')
5 Converting Date Time Formats into other formats
if the variable dob is “01/04/1977) then following will convert into a date object
z=strptime(dob,”%d/%m/%Y”)
and if the same date is 01Apr1977
z=strptime(dob,"%d%b%Y")
6 Difference in Date Time Values and Using Current Time
The difftime function helps in creating differences in two date time variables.
difftime(time1, time2, units='secs')
or
difftime(time1, time2, tz = "", units = c("auto", "secs", "mins", "hours", "days", "weeks"))
For current system date time values you can use
Sys.time()
Sys.Date()
This value can be put in the difftime function shown above to calculate age or time elapsed.
7 Keeping records based on numerical condition
Titanic.sub1<-subset(Titanic,Freq >37)
For enhanced usage-
you can also use the R Commander GUI with the sub menu Data > Active Dataset
8 Sorting Data
Sorting A Data Frame in Ascending Order by a variable
AggregatedData<- sort(AggregatedData, by=~ Package)
Sorting a Data Frame in Descending Order by a variable
AggregatedData<- sort(AggregatedData, by=~ -Installed)
9 Transforming a Dataset Structure around a single variable
Using the Reshape2 Package we can use melt and acast functions
library("reshape2")
tDat.m<- melt(tDat)
tDatCast<- acast(tDat.m,Subject~Item)
If we choose not to use Reshape package, we can use the default reshape method in R. Please do note this takes longer processing time for bigger datasets.
df.wide <- reshape(df, idvar="Subject", timevar="Item", direction="wide")
10 Type in Data
Using scan() function we can type in data in a list
11 Using Diff for lags and Cum Sum function forCumulative Sums
We can use the diff function to calculate difference between two successive values of a variable.
Diff(Dataset$X)
Cumsum function helps to give cumulative sum
Cumsum(Dataset$X)
> x=rnorm(10,20) #This gives 10 Randomly distributed numbers with Mean 20
> x
[1] 20.76078 19.21374 18.28483 20.18920 21.65696 19.54178 18.90592 20.67585
[9] 20.02222 18.99311
> diff(x)
[1] -1.5470415 -0.9289122 1.9043664 1.4677589 -2.1151783 -0.6358585 1.7699296
[8] -0.6536232 -1.0291181 >
cumsum(x)
[1] 20.76078 39.97453 58.25936 78.44855 100.10551 119.64728 138.55320
[8] 159.22905 179.25128 198.24438
> diff(x,2) # The diff function can be used as diff(x, lag = 1, differences = 1, ...) where differences is the order of differencing
[1] -2.4759536 0.9754542 3.3721252 -0.6474195 -2.7510368 1.1340711 1.1163064
[8] -1.6827413
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
12 Merging Data
Deducer GUI makes it much simpler to merge datasets. The simplest syntax for a merge statement is
totalDataframeZ <- merge(dataframeX,dataframeY,by=c("AccountId","Region"))
13 Aggregating and group processing of a variable
We can use multiple methods for aggregating and by group processing of variables.
Two functions we explore here are aggregate and Tapply. Refering to the R Online Manual at
[http://stat.ethz.ch/R-manual/R-patched/library/stats/html/aggregate.html] ## Compute the averages for the variables in 'state.x77', grouped ## according to the region (Northeast, South, North Central, West) that ## each state belongs to aggregate(state.x77, list(Region = state.region), mean) Using TApply ## tapply(Summary Variable, Group Variable, Function) Reference [http://www.ats.ucla.edu/stat/r/library/advanced_function_r.htm#tapply] We can also use specialized packages for data manipulation. For additional By-group processing you can see the doBy package as well as Plyr package
for data manipulation.Doby contains a variety of utilities including:
1) Facilities for groupwise computations of summary statistics and other facilities for working with grouped data.
2) General linear contrasts and LSMEANS (least-squares-means also known as population means),
3) HTMLreport for autmatic generation of HTML file from R-script with a minimum of markup, 4) various other utilities and is available at[ http://cran.r-project.org/web/packages/doBy/index.html]
Also Available at [http://cran.r-project.org/web/packages/plyr/index.html],
Plyr is a set of tools that solves a common set of problems:
you need to break a big problem down into manageable pieces,
operate on each pieces and then put all the pieces back together.
For example, you might want to fit a model to each spatial location or
time point in your study, summarise data by panels or collapse high-dimensional arrays
to simpler summary statistics.










