Tips for using R in production analytics environment

Newface1) Read.csv is dead. Long live fread Use fread from data.table to import data and get a speed up factor of 5 X in the data import phase itself. Ignore data.table package and languish in hell

2) Write.csv is boring. Write as a .Rda file Use .Rda file to get compressions of upto 4 X

3) Use new project mode from RStudio This helps to clean workflow management

4) Use GUIs like Deducer / kmggplot2 plugin from Rcommander for great data viz right now For people who want to use ggplot2 straight away

5) Avoiding duplicates , remove prior copies and use gc() Memory management is key to use of R in production analytics.

6) Think object oriented. Forget other languages Think slice and dice and using $ and [] and using apply versus for loops.

7) Use ? and ?? before you google and ask for help on Stack Overflow Seriously dude R has a lot of documentation! A Lot! Use it . Also see CRAN Views!

8) You are not too old to learn dplyr on Datacamp Skilling up and reskilling is part of being a data science hacker

9) Subscribe to R-bloggers and never miss out on a new package that helps solve your problems R has 8000+ packages and 150000 + functions. All you need is one function to cut down your analysis time and go home early

10) Profiling code, benchmark functions and byte compilation seperate grown up from the kids data scientists. Hadley says- Hadley says- Enough said!

Author: Ajay Ohri

1 thought on “Tips for using R in production analytics environment”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s