Home » Posts tagged 'r users'
Tag Archives: r users
3rd Meeting for New Delhi R users group #rstats
We continued the 3 rd month without a sponsor for the R User Meetup group. I wish analytics companies in Gurgaon and Mumbai or even Hyderabad can help sponsor R users meetups, as we need to have more knowledge sharing culture here.
I am a bit disappointed by people who RSVP yes, but cancel out without letting us know.
Some additional insights-
1) Never hold meetup on cricket match day
2) location should be very close to Metro Station.
3) Saturdays will be the only meetup day for future. If I am unavailable, Dr Amir will be leading the Meetup.
Some more action points-
1) Ajay and Dr Amir to work together on a blog / site called R for Medical Researchers focused on Indian issues.
2) For Web Application developers, please see shiny package. I have written on the latest version of Shiny- but I would also like you to click on each of the 18 links I have written in this article. Shiny is great for statistically enabling web apps.
http://decisionstats.com/2013/01/26/shiny-0-3-released-new-era-for-rstats/
3) We will be uploading files and creating a slideshare for Delhi R Users. This will be our digital archive.
4) To start using R immediately use Rattle ( http://rattle.togaware.com/) especially if you build models or need to use data mining
5) To start using R for GIS and Spatial use Deducer with Spatial Plot Builder Plugin
http://www.deducer.org/pmwiki/index.php?n=Main.SpatialPlotBuilder
6) To integrate R with C++ see the RCPP gallery with almost 100 plus packages dependent on it. This will be needed with Developers who are trying to use a faster version of R by using C++ . http://gallery.rcpp.org/
7) Web Analytics using R- is now updated for changes in Google APIs. http://decisionstats.com/2013/01/28/google-analytics-using-rstats-updated/
8) If you are new to programming or Ruby or Python please go to http://codeacdemy.com .
9) For a beginner in R use http://tryr.codeschool.com
Minutes of Meeting -courtesy Dr Amir
four people (Ajay, myself, Ashish and Nagesh) joined for the 3rd meetup today at around 4 pm at hauz khas village. Ashish and Nagesh are two new members who attended the meetup for the first time. Ashish is in web development with knowledge of python and perl. today he introduced the group about data scraping from web using Perl. Nagesh has a background in management and currently working with insurance company. he is planning to switch to R as it is free. he currently uses SAS. Ajay introduced GUIs like Rattle, Deducer; books (R for business analytics which he himself has authored and others like R for spss and sas users, data mining with r, rattle) and GADM for mapping. I introduced Coursera courses which is currently ongoing and which uses R and also the basics of R like packages, functions and strengths of R.Ashish has volunteered to find a place for meetup in Green Park near the metro station as we are still struggling to get a decent place for our meetups.
Also – here is the meetup group for New Delhi R Users
Data Frame in Python
Exploring some Python Packages and R packages to move /work with both Python and R without melting your brain or exceeding your project deadline
—————————————
If you liked the data.frame structure in R, you have some way to work with them at a faster processing speed in Python.
Here are three packages that enable you to do so-
(1) pydataframe http://code.google.com/p/pydataframe/
An implemention of an almost R like DataFrame object. (install via Pypi/Pip: “pip install pydataframe”)
Usage:
u = DataFrame( { "Field1": [1, 2, 3], "Field2": ['abc', 'def', 'hgi']}, optional: ['Field1', 'Field2'] ["rowOne", "rowTwo", "thirdRow"])
A DataFrame is basically a table with rows and columns.
Columns are named, rows are numbered (but can be named) and can be easily selected and calculated upon. Internally, columns are stored as 1d numpy arrays. If you set row names, they’re converted into a dictionary for fast access. There is a rich subselection/slicing API, see help(DataFrame.get_item) (it also works for setting values). Please note that any slice get’s you another DataFrame, to access individual entries use get_row(), get_column(), get_value().
DataFrames also understand basic arithmetic and you can either add (multiply,…) a constant value, or another DataFrame of the same size / with the same column names, like this:
#multiply every value in ColumnA that is smaller than 5 by 6.
my_df[my_df[:,'ColumnA'] < 5, 'ColumnA'] *= 6
#you always need to specify both row and column selectors, use : to mean everything
my_df[:, 'ColumnB'] = my_df[:,'ColumnA'] + my_df[:, 'ColumnC']
#let's take every row that starts with Shu in ColumnA and replace it with a new list (comprehension)
select = my_df.where(lambda row: row['ColumnA'].startswith('Shu'))
my_df[select, 'ColumnA'] = [row['ColumnA'].replace('Shu', 'Sha') for row in my_df[select,:].iter_rows()]
Dataframes talk directly to R via rpy2 (rpy2 is not a prerequiste for the library!)
(2) pandas http://pandas.pydata.org/
Library Highlights
- A fast and efficient DataFrame object for data manipulation with integrated indexing;
- Tools for reading and writing data between in-memory data structures and different formats: CSV and text files, Microsoft Excel, SQL databases, and the fast HDF5 format;
- Intelligent data alignment and integrated handling of missing data: gain automatic label-based alignment in computations and easily manipulate messy data into an orderly form;
- Flexible reshaping and pivoting of data sets;
- Intelligent label-based slicing, fancy indexing, and subsetting of large data sets;
- Columns can be inserted and deleted from data structures for size mutability;
- Aggregating or transforming data with a powerful group by engine allowing split-apply-combine operations on data sets;
- High performance merging and joining of data sets;
- Hierarchical axis indexing provides an intuitive way of working with high-dimensional data in a lower-dimensional data structure;
- Time series-functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging. Even create domain-specific time offsets and join time series without losing data;
- The library has been ruthlessly optimized for performance, with critical code paths compiled to C;
- Python with pandas is in use in a wide variety of academic and commercial domains, including Finance, Neuroscience, Economics, Statistics, Advertising, Web Analytics, and more.
Why not R?
First of all, we love open source R! It is the most widely-used open source environment for statistical modeling and graphics, and it provided some early inspiration for pandas features. R users will be pleased to find this library adopts some of the best concepts of R, like the foundational DataFrame (one user familiar with R has described pandas as “R data.frame on steroids”). But pandas also seeks to solve some frustrations common to R users:
- R has barebones data alignment and indexing functionality, leaving much work to the user. pandas makes it easy and intuitive to work with messy, irregularly indexed data, like time series data. pandas also provides rich tools, like hierarchical indexing, not found in R;
- R is not well-suited to general purpose programming and system development. pandas enables you to do large-scale data processing seamlessly when developing your production applications;
- Hybrid systems connecting R to a low-productivity systems language like Java, C++, or C# suffer from significantly reduced agility and maintainability, and you’re still stuck developing the system components in a low-productivity language;
- The “copyleft” GPL license of R can create concerns for commercial software vendors who want to distribute R with their software under another license. Python and pandas use more permissive licenses.
(3) datamatrix http://pypi.python.org/pypi/datamatrix/0.8
datamatrix 0.8
A Pythonic implementation of R’s data.frame structure.
Latest Version: 0.9
This module allows access to comma- or other delimiter separated files as if they were tables, using a dictionary-like syntax. DataMatrix objects can be manipulated, rows and columns added and removed, or even transposed
—————————————————————–
Modeling in Python
UseR 2012 Early Registration #rstats
Early Registration Deadline Approaches for UseR 2012
http://biostat.mc.vanderbilt.edu/wiki/Main/UseR-2012
Registration
Deadlines
- Early Registration: Jan 23 24 – Feb 29
- Regular Registration: Mar 1 – May 12
- Late Registration: May 13 – June 4
- On-site Registration: June 12 – June 15
Fees
| Academic | Non-Academic | Student/Retiree | |
|---|---|---|---|
| Registration: Early | $290 | $440 | $145 |
| Registration: Regular | $365 | $560 | $185 |
| Registration: Late | $435 | $645 | $225 |
| Registration: On-site | $720 | $720 | $360 |
| Short Course | $200 | $300 | $100 |

Vanderbilt University; Nashville, Tennessee, USA
12th-15th June 2012
http://biostat.mc.vanderbilt.edu/wiki/Main/UseR-2012
Contact
Stephania McNeal-Goddard
Assistant to the Chair
stephania.mcneal-goddard@vanderbilt.edu
Phone: 615.322.2768
Fax: 615.343.4924
Vanderbilt University School of Medicine
Department of Biostatistics
S-2323 Medical Center North
Nashville, TN 37232-2158
Oracle launches its version of R #rstats
From-
http://www.oracle.com/us/corporate/press/1515738
Integrates R Statistical Programming Language into Oracle Database 11g
News Facts
Comprehensive In-Database Platform for Advanced Analytics
Supporting Quotes
| Oracle Advanced Analytics — an option to Oracle Database 11g Enterprise Edition – extends the database into a comprehensive advanced analytics platform through two major components: Oracle R Enterprise and Oracle Data Mining. With Oracle Advanced Analytics, customers have a comprehensive platform for real-time analytic applications that deliver insight into key business subjects such as churn prediction, product recommendations, and fraud alerting.
Oracle R Enterprise tightly integrates the open source R programming language with the database to further extend the database with Rs library of statistical functionality, and pushes down computations to the database. Oracle R Enterprise dramatically advances the capability for R users, and allows them to use their existing R development skills and tools, and scripts can now also run transparently and scale against data stored in Oracle Database 11g. Oracle Data Mining provides powerful data mining algorithms that run as native SQL functions for in-database model building and model deployment. It can be accessed through the SQL Developer extension Oracle Data Miner to build, evaluate, share and deploy predictive analytics methodologies. At the same time the high-performance Oracle-specific data mining algorithms are accessible from R. |
||
BENEFITS |
||
|
| Oracle R Hadoop Connector | Gives R users high performance native access to Hadoop Distributed File System (HDFS) and MapReduce programming framework. |
|---|
Interview JJ Allaire Founder, RStudio
Here is an interview with JJ Allaire, founder of RStudio. RStudio is the IDE that has overtaken other IDE within the R Community in terms of ease of usage. On the eve of their latest product launch, JJ talks to DecisionStats on RStudio and more.
Ajay- So what is new in the latest version of RStudio and how exactly is it useful for people?
JJ- The initial release of RStudio as well as the two follow-up releases we did last year were focused on the core elements of using R: editing and running code, getting help, and managing files, history, workspaces, plots, and packages. In the meantime users have also been asking for some bigger features that would improve the overall work-flow of doing analysis with R. In this release (v0.95) we focused on three of these features:
Projects. R developers tend to have several (and often dozens) of working contexts associated with different clients, analyses, data sets, etc. RStudio projects make it easy to keep these contexts well separated (with distinct R sessions, working directories, environments, command histories, and active source documents), switch quickly between project contexts, and even work with multiple projects at once (using multiple running versions of RStudio).
Version Control. The benefits of using version control for collaboration are well known, but we also believe that solo data analysis can achieve significant productivity gains by using version control (this discussion on Stack Overflow talks about why). In this release we introduced integrated support for the two most popular open-source version control systems: Git and Subversion. This includes changelist management, file diffing, and browsing of project history, all right from within RStudio.
Code Navigation. When you look at how programmers work a surprisingly large amount of time is spent simply navigating from one context to another. Modern programming environments for general purpose languages like C++ and Java solve this problem using various forms of code navigation, and in this release we’ve brought these capabilities to R. The two main features here are the ability to type the name of any file or function in your project and go immediately to it; and the ability to navigate to the definition of any function under your cursor (including the definition of functions within packages) using a keystroke (F2) or mouse gesture (Ctrl+Click).
Ajay- What’s the product road map for RStudio? When can we expect the IDE to turn into a full fledged GUI?
JJ- Linus Torvalds has said that “Linux is evolution, not intelligent design.” RStudio tries to operate on a similar principle—the world of statistical computing is too deep, diverse, and ever-changing for any one person or vendor to map out in advance what is most important. So, our internal process is to ship a new release every few months, listen to what people are doing with the product (and hope to do with it), and then start from scratch again making the improvements that are considered most important.
Right now some of the things which seem to be top of mind for users are improved support for authoring and reproducible research, various editor enhancements including code folding, and debugging tools.
What you’ll see is us do in a given release is to work on a combination of frequently requested features, smaller improvements to usability and work-flow, bug fixes, and finally architectural changes required to support current or future feature requirements.
While we do try to base what we work on as closely as possible on direct user-feedback, we also adhere to some core principles concerning the overall philosophy and direction of the product. So for example the answer to the question about the IDE turning into a full-fledged GUI is: never. We believe that textual representations of computations provide fundamental advantages in transparency, reproducibility, collaboration, and re-usability. We believe that writing code is simply the right way to do complex technical work, so we’ll always look for ways to make coding better, faster, and easier rather than try to eliminate coding altogether.
Ajay -Describe your journey in science from a high school student to your present work in R. I noticed you have been very successful in making software products that have been mostly proprietary products or sold to companies.
Why did you get into open source products with RStudio? What are your plans for monetizing RStudio further down the line?
JJ- In high school and college my principal areas of study were Political Science and Economics. I also had a very strong parallel interest in both computing and quantitative analysis. My first job out of college was as a financial analyst at a government agency. The tools I used in that job were SAS and Excel. I had a dim notion that there must be a better way to marry computation and data analysis than those tools, but of course no concept of what this would look like.
From there I went more in the direction of general purpose computing, starting a couple of companies where I worked principally on programming languages and authoring tools for the Web. These companies produced proprietary software, which at the time (between 1995 and 2005) was a workable model because it allowed us to build the revenue required to fund development and to promote and distribute the software to a wider audience.
By 2005 it was however becoming clear that proprietary software would ultimately be overtaken by open source software in nearly all domains. The cost of development had shrunken dramatically thanks to both the availability of high-quality open source languages and tools as well as the scale of global collaboration possible on open source projects. The cost of promoting and distributing software had also collapsed thanks to efficiency of both distribution and information diffusion on the Web.
When I heard about R and learned more about it, I become very excited and inspired by what the project had accomplished. A group of extremely talented and dedicated users had created the software they needed for their work and then shared the fruits of that work with everyone. R was a platform that everyone could rally around because it worked so well, was extensible in all the right ways, and most importantly was free (as in speech) so users could depend upon it as a long-term foundation for their work.
So I started RStudio with the aim of making useful contributions to the R community. We started with building an IDE because it seemed like a first-rate development environment for R that was both powerful and easy to use was an unmet need. Being aware that many other companies had built successful businesses around open-source software, we were also convinced that we could make RStudio available under a free and open-source license (the AGPLv3) while still creating a viable business. At this point RStudio is exclusively focused on creating the best IDE for R that we can. As the core product gets where it needs to be over the next couple of years we’ll then also begin to sell other products and services related to R and RStudio.
About-

JJ Allaire
JJ Allaire is a software engineer and entrepreneur who has created a wide variety of products including ColdFusion,Windows Live Writer, Lose It!, and RStudio.
From http://en.wikipedia.org/wiki/Joseph_J._Allaire
In 1995 Joseph J. (JJ) Allaire co-founded Allaire Corporation with his brother Jeremy Allaire, creating the web development tool ColdFusion.[1] In March 2001, Allaire was sold to Macromedia where ColdFusion was integrated into the Macromedia MX product line. Macromedia was subsequently acquired by Adobe Systems, which continues to develop and market ColdFusion.
After the sale of his company, Allaire became frustrated at the difficulty of keeping track of research he was doing using Google. To address this problem, he co-founded Onfolio in 2004 with Adam Berrey, former Allaire co-founder and VP of Marketing at Macromedia.
On March 8, 2006, Onfolio was acquired by Microsoft where many of the features of the original product are being incorporated into the Windows Live Toolbar. On August 13, 2006, Microsoft released the public beta of a new desktop blogging client called Windows Live Writer that was created by Allaire’s team at Microsoft.
Starting in 2009, Allaire has been developing a web-based interface to the widely used R technical computing environment. A beta version of RStudio was publicly released on February 28, 2011.
JJ Allaire received his B.A. from Macalester College (St. Paul, MN) in 1991.
RStudio-
RStudio is an integrated development environment (IDE) for R which works with the standard version of R available from CRAN. Like R, RStudio is available under a free software license. RStudio is designed to be as straightforward and intuitive as possible to provide a friendly environment for new and experienced R users alike. RStudio is also a company, and they plan to sell services (support, training, consulting, hosting) related to the open-source software they distribute.
Use R for Business- Competition worth $ 20,000 #rstats
All you contest junkies, R lovers and general change the world people, here’s a new contest to use R in a business application
REVOLUTION ANALYTICS LAUNCHES “APPLICATIONS OF R IN BUSINESS” CONTEST
$20,000 in Prizes for Users Solving Business Problems with R
PALO ALTO, Calif. – September 1, 2011 – Revolution Analytics, the leading commercial provider of R software, services and support, today announced the launch of its “Applications of R in Business” contest to demonstrate real-world uses of applying R to business problems. The competition is open to all R users worldwide and submissions will be accepted through October 31. The Grand Prize winner for the best application using R or Revolution R will receive $10,000.
The bonus-prize winner for the best application using features unique to Revolution R Enterprise – such as itsbig-data analytics capabilities or its Web Services API for R – will receive $5,000. A panel of independent judges drawn from the R and business community will select the grand and bonus prize winners. Revolution Analytics will present five honorable mention prize winners each with $1,000.
“We’ve designed this contest to highlight the most interesting use cases of applying R and Revolution R to solving key business problems, such as Big Data,” said Jeff Erhardt, COO of Revolution Analytics. “The ability to process higher-volume datasets will continue to be a critical need and we encourage the submission of applications using large datasets. Our goal is to grow the collection of online materials describing how to use R for business applications so our customers can better leverage Big Analytics to meet their analytical and organizational needs.”
To enter Revolution Analytics’ “Applications of R in Business” competition (more…)
