Home » Posts tagged 'stack overflow'
Tag Archives: stack overflow
While SAS language has a beautifully designed ODS (Output Delivery System) for saving output from certain analysis in excel files (and html and others), in R one can simply use the object, put it in a write.table and save it a csv file using the file parameter within write.table.
As a business analytics consultant, the output from a Proc Means, Proc Freq (SAS) or a summary/describe/table command (in R) is to be presented as a final report. Copying and pasting is not feasible especially for large amounts of text, or remote computers.
Using the following we can simple save the output in R
#We shifted the directory, so we can save output without putting the entire path again and again for each step.
#I have found the summary command most useful for initial analysis and final display (particularly during the data munging step)
# I assigned a new object to the analysis step (summary), it could also be summary,names, describe (HMisc) or table (for frequency analysis),
Note: This is for basic beginners in R using it for business analytics dealing with large number of variables.
If you have a large number of files in a local directory to be read in R, you can avoid typing the entire path again and again by modifying the file parameter in the read.table and changing the working directory to that folder
and so on…
maybe there is a better approach somewhere on Stack Overflow or R help, but this will work just as well.
you can then merge the objects created ajayt1 and ajayt2… (to be continued)
Here is an interview with JJ Allaire, founder of RStudio. RStudio is the IDE that has overtaken other IDE within the R Community in terms of ease of usage. On the eve of their latest product launch, JJ talks to DecisionStats on RStudio and more.
Ajay- So what is new in the latest version of RStudio and how exactly is it useful for people?
JJ- The initial release of RStudio as well as the two follow-up releases we did last year were focused on the core elements of using R: editing and running code, getting help, and managing files, history, workspaces, plots, and packages. In the meantime users have also been asking for some bigger features that would improve the overall work-flow of doing analysis with R. In this release (v0.95) we focused on three of these features:
Projects. R developers tend to have several (and often dozens) of working contexts associated with different clients, analyses, data sets, etc. RStudio projects make it easy to keep these contexts well separated (with distinct R sessions, working directories, environments, command histories, and active source documents), switch quickly between project contexts, and even work with multiple projects at once (using multiple running versions of RStudio).
Version Control. The benefits of using version control for collaboration are well known, but we also believe that solo data analysis can achieve significant productivity gains by using version control (this discussion on Stack Overflow talks about why). In this release we introduced integrated support for the two most popular open-source version control systems: Git and Subversion. This includes changelist management, file diffing, and browsing of project history, all right from within RStudio.
Code Navigation. When you look at how programmers work a surprisingly large amount of time is spent simply navigating from one context to another. Modern programming environments for general purpose languages like C++ and Java solve this problem using various forms of code navigation, and in this release we’ve brought these capabilities to R. The two main features here are the ability to type the name of any file or function in your project and go immediately to it; and the ability to navigate to the definition of any function under your cursor (including the definition of functions within packages) using a keystroke (F2) or mouse gesture (Ctrl+Click).
Ajay- What’s the product road map for RStudio? When can we expect the IDE to turn into a full fledged GUI?
JJ- Linus Torvalds has said that “Linux is evolution, not intelligent design.” RStudio tries to operate on a similar principle—the world of statistical computing is too deep, diverse, and ever-changing for any one person or vendor to map out in advance what is most important. So, our internal process is to ship a new release every few months, listen to what people are doing with the product (and hope to do with it), and then start from scratch again making the improvements that are considered most important.
Right now some of the things which seem to be top of mind for users are improved support for authoring and reproducible research, various editor enhancements including code folding, and debugging tools.
What you’ll see is us do in a given release is to work on a combination of frequently requested features, smaller improvements to usability and work-flow, bug fixes, and finally architectural changes required to support current or future feature requirements.
While we do try to base what we work on as closely as possible on direct user-feedback, we also adhere to some core principles concerning the overall philosophy and direction of the product. So for example the answer to the question about the IDE turning into a full-fledged GUI is: never. We believe that textual representations of computations provide fundamental advantages in transparency, reproducibility, collaboration, and re-usability. We believe that writing code is simply the right way to do complex technical work, so we’ll always look for ways to make coding better, faster, and easier rather than try to eliminate coding altogether.
Ajay -Describe your journey in science from a high school student to your present work in R. I noticed you have been very successful in making software products that have been mostly proprietary products or sold to companies.
Why did you get into open source products with RStudio? What are your plans for monetizing RStudio further down the line?
JJ- In high school and college my principal areas of study were Political Science and Economics. I also had a very strong parallel interest in both computing and quantitative analysis. My first job out of college was as a financial analyst at a government agency. The tools I used in that job were SAS and Excel. I had a dim notion that there must be a better way to marry computation and data analysis than those tools, but of course no concept of what this would look like.
From there I went more in the direction of general purpose computing, starting a couple of companies where I worked principally on programming languages and authoring tools for the Web. These companies produced proprietary software, which at the time (between 1995 and 2005) was a workable model because it allowed us to build the revenue required to fund development and to promote and distribute the software to a wider audience.
By 2005 it was however becoming clear that proprietary software would ultimately be overtaken by open source software in nearly all domains. The cost of development had shrunken dramatically thanks to both the availability of high-quality open source languages and tools as well as the scale of global collaboration possible on open source projects. The cost of promoting and distributing software had also collapsed thanks to efficiency of both distribution and information diffusion on the Web.
When I heard about R and learned more about it, I become very excited and inspired by what the project had accomplished. A group of extremely talented and dedicated users had created the software they needed for their work and then shared the fruits of that work with everyone. R was a platform that everyone could rally around because it worked so well, was extensible in all the right ways, and most importantly was free (as in speech) so users could depend upon it as a long-term foundation for their work.
So I started RStudio with the aim of making useful contributions to the R community. We started with building an IDE because it seemed like a first-rate development environment for R that was both powerful and easy to use was an unmet need. Being aware that many other companies had built successful businesses around open-source software, we were also convinced that we could make RStudio available under a free and open-source license (the AGPLv3) while still creating a viable business. At this point RStudio is exclusively focused on creating the best IDE for R that we can. As the core product gets where it needs to be over the next couple of years we’ll then also begin to sell other products and services related to R and RStudio.
In 1995 Joseph J. (JJ) Allaire co-founded Allaire Corporation with his brother Jeremy Allaire, creating the web development tool ColdFusion. In March 2001, Allaire was sold to Macromedia where ColdFusion was integrated into the Macromedia MX product line. Macromedia was subsequently acquired by Adobe Systems, which continues to develop and market ColdFusion.
After the sale of his company, Allaire became frustrated at the difficulty of keeping track of research he was doing using Google. To address this problem, he co-founded Onfolio in 2004 with Adam Berrey, former Allaire co-founder and VP of Marketing at Macromedia.
On March 8, 2006, Onfolio was acquired by Microsoft where many of the features of the original product are being incorporated into the Windows Live Toolbar. On August 13, 2006, Microsoft released the public beta of a new desktop blogging client called Windows Live Writer that was created by Allaire’s team at Microsoft.
Starting in 2009, Allaire has been developing a web-based interface to the widely used R technical computing environment. A beta version of RStudio was publicly released on February 28, 2011.
JJ Allaire received his B.A. from Macalester College (St. Paul, MN) in 1991.
RStudio is an integrated development environment (IDE) for R which works with the standard version of R available from CRAN. Like R, RStudio is available under a free software license. RStudio is designed to be as straightforward and intuitive as possible to provide a friendly environment for new and experienced R users alike. RStudio is also a company, and they plan to sell services (support, training, consulting, hosting) related to the open-source software they distribute.
I have been watching for Revolution Analytics product almost since the inception of the company. It has managed to sail over storms, naysayers and critics with simple and effective strategy of launching good software, making good partnerships and keeping up media visibility with white papers, joint webinars, blogs, conferences and events.
However this is a listing of all technical contributions made by Revolution Analytics products to the #rstats project.
1) Useful Packages mostly in parallel processing or more efficient computing like
- foreach (http://cran.r-project.org/web/packages/foreach/index.html) ,
- nws (http://cran.r-project.org/web/packages/nws/).
- iterators (http://cran.r-project.org/web/packages/iterators/index.html),
- doSMP (http://cran.r-project.org/web/packages/doSMP/index.html).
- doSNOW (http://cran.r-project.org/web/packages/doSNOW/index.html),
- doMC (http://cran.r-project.org/web/packages/doMC/index.html),
- revoIPC (http://cran.r-project.org/web/packages/revoIPC/)
2) RevoScaler package to beat R’s memory problem (this is probably the best in my opinion as it is yet to be replicated by the open source version and is a clear cut reason for going in for the paid version)
- Efficient XDF File Format designed to efficiently handle huge data sets.
- Data Step Functionality to quickly clean, transform, explore, and visualize huge data sets.
- Data selection functionality to store huge data sets out of memory, and select subsets of rows and columns for in-memory operation with all R functions.
- Visualize Large Data sets with line plots and histograms.
- Built-in Statistical Algorithms for direct analysis of huge data sets:
- Summary Statistics
- Linear Regression
- Logistic Regression
- On-the-fly data transformations to include derived variables in models without writing new data files.
- Extend Existing Analyses by writing user- defined R functions to “chunk” through huge data sets.
- Direct import of fixed-format text data files and SAS data sets into .xdf format
3) RevoDeploy R for API based R solution – I somehow think this feature will get more important as time goes on but it seems a lower visibility offering right now.
- Collection of Web services implemented as a RESTful API.
- .NET Client library — includes a COM interoperability to call R from VBA
- Management Console for securely administrating servers, scripts and users through HTTP and HTTPS.
- XML and JSON format for data exchange.
- Built-in security model for authenticated or anonymous invocation of R Scripts.
- Repository for storing R objects and R Script execution artifacts.
4) Revolutions IDE (or Productivity Environment) for a faster coding environment than command line. The GUI by Revolution Analytics is in the works. – Having used this- only the Code Snippets function is a clear differentiator from newer IDE and GUI. The code snippets is awesome though and even someone who doesnt know much R can get analysis set up quite fast and accurately.
- Full-featured Visual Debugger for debugging R scripts, with call stack window and step-in, step-over, and step-out capability.
- Enhanced Script Editor with hover-over help, word completion, find-across-files capability, automatic syntax checking, bookmarks, and navigation buttons.
- Run Selection, Run to Line and Run to Cursor evaluation
- R Code Snippets to automatically generate fill-in-the-blank sections of R code with tooltip help.
- Object Browser showing available data and function objects (including those in packages), with context menus for plotting and editing data.
- Solution Explorer for organizing, viewing, adding, removing, rearranging, and sourcing R scripts.
- Customizable Workspace with dockable, floating, and tabbed tool windows.
- Version Control Plug-in available for the open source Subversion version control software.
Marketing contributions from Revolution Analytics-
1) Sponsoring R sessions and user meets
2) Evangelizing R at conferences and partnering with corporate partners including JasperSoft, Microsoft , IBM and others at http://www.revolutionanalytics.com/partners/
3) Helping with online initiatives like http://www.inside-r.org/ (which is curiously dormant and now largely superseded by R-Bloggers.com) and the syntax highlighting tool at http://www.inside-r.org/pretty-r. In addition Revolution has been proactive in reaching out to the community
4) Helping pioneer blogging about R and Twitter Hash tag discussions , and contributing to Stack Overflow discussions. Within a short while, #rstats online community has overtaken a lot more established names- partly due to decentralized nature of its working.
Did I miss something out? yes , they share their code by GPL.
Let me know by feedback
Before you rev up those keyboards, and shoot off a snarky comment- consider this statement- there are many ways to run (and ruin economies). But they still have not found a replacement for money. Yes Happiness is important. Search Engine is good.
So unless they start a new branch of economics with lots more motivational theory and psychology and lot less quant especially for open source projects, money ,revenue, sales is the only true measure of success in enterprise software. Particularly if you have competitors who are making more money selling the same class of software.
Popularity contests are for high school quarterbacks —so even if your open source software is popular in downloads, email discussions, stack overflow or (more…)
I was searching for some basic syntax in R (basically cross tabs and density plots) and I came across the Quick R site.
Its really a nice site for R beginners and anyone trying to remember some syntax.
R syntax can be very simple- a histoigram is just hist(), boxplot is just boxplot() and t test is just t.test(dataset)
Here is an example from the site-
# Simple Histogram
# Colored Histogram with Different Number of Bins
hist(mtcars$mpg, breaks=12, col="red")
# Add a Normal Curve (Thanks to Peter Dalgaard)
x <- mtcars$mpg
h<-hist(x, breaks=10, col="red", xlab="Miles Per Gallon",
main="Histogram with Normal Curve")
yfit <- yfit*diff(h$mids[1:2])*length(x)
lines(xfit, yfit, col="blue", lwd=2)
Histograms can be a poor method for determining the shape of a distribution because it is so strongly affected by the number of bins used.
KERNEL DENSITY PLOTS
Kernal density plots are usually a much more effective way to view the distribution of a variable. Create the plot using plot(density(x)) where x is a numeric vector.
# Kernel Density Plot
d <- density(mtcars$mpg) # returns the density data
plot(d) # plots the results
# Filled Density Plot
d <- density(mtcars$mpg)
plot(d, main="Kernel Density of Miles Per Gallon")
polygon(d, col="red", border="blue")
COMPARING GROUPS VIA KERNAL DENSITY
The sm.density.compare( ) function in the sm package allows you to superimpose the kernal density plots of two or more groups. The format is sm.density.compare(x, factor) where x is a numeric vector and factor is the grouping variable.
# Compare MPG distributions for cars with
# 4,6, or 8 cylinders
# create value labels
cyl.f <- factor(cyl, levels= c(4,6,8),
labels = c("4 cylinder", "6 cylinder", "8 cylinder"))
# plot densities
sm.density.compare(mpg, cyl, xlab="Miles Per Gallon")
title(main="MPG Distribution by Car Cylinders")
# add legend via mouse click
legend(locator(1), levels(cyl.f), fill=colfill)
It is not as exhaustive as http://cran.r-project.org/doc/manuals/R-intro.html
but it is much more simpler and easy to follow.
The site is created by Robert I. Kabacoff, Ph.D.
and he is working on a book called “R in Action”
I have received numerous requests for a hardcopy version of this site, so over the past year I have been writing a book that takes the material here and significantly expands upon it. If you are interested, early access is available.
If you have not been to that website, I recommend it highly (though the tagline or logo of R for SAS/SPSS/Stata users seems a bit familiar)-http://www.statmethods.net/index.html
for SAS/SPSS/Stata Users
- Two Thoughts on Lisp Syntax. (kazimirmajorinc.blogspot.com)
- Some Basics about Stats (psipsychologytutor.org)
- Bone Density Tests: A Clue to Your Future (webmd.com)
- Net Access Corporation Unveils 50,000 Square Foot, State-of-the-Art Data Center in Parsippany, New Jersey (prweb.com)
- programming languages – What makes lisp macros so special – Stack Overflow (stackoverflow.com)
- Thinking about Syntax (latenightpc.com)
- Our minds use syntax to understand actions, just like with language [Mad Psychology] (io9.com)
- Syntax highlighting for Django using Pygments (ofbrooklyn.com)
- People of HTML5 – Bruce Lawson (hacks.mozilla.org)
- Haskell syntax vs. Lisp syntax | LispCast (lispcast.com)