Tips for using R in production analytics environment

Newface1) Read.csv is dead. Long live fread Use fread from data.table to import data and get a speed up factor of 5 X in the data import phase itself. Ignore data.table package and languish in hell

2) Write.csv is boring. Write as a .Rda file Use .Rda file to get compressions of upto 4 X

3) Use new project mode from RStudio This helps to clean workflow management

4) Use GUIs like Deducer / kmggplot2 plugin from Rcommander for great data viz right now For people who want to use ggplot2 straight away

5) Avoiding duplicates , remove prior copies and use gc() Memory management is key to use of R in production analytics.

6) Think object oriented. Forget other languages Think slice and dice and using $ and [] and using apply versus for loops.

7) Use ? and ?? before you google and ask for help on Stack Overflow Seriously dude R has a lot of documentation! A Lot! Use it . Also see CRAN Views!

8) You are not too old to learn dplyr on Datacamp Skilling up and reskilling is part of being a data science hacker

9) Subscribe to R-bloggers and never miss out on a new package that helps solve your problems R has 8000+ packages and 150000 + functions. All you need is one function to cut down your analysis time and go home early

10) Profiling code, benchmark functions and byte compilation seperate grown up from the kids data scientists. Hadley says- Hadley says- Enough said!

Using RMySQL from Ubuntu

  • Install MySQL

sudo apt-get install mysql-server

  • Check if Server is Running

sudo netstat -tap | grep mysql

I use the MySQL command line to check it

To connect

mysql -h localhost -u root -p

To see databases

mysql>show databases;

To see tables


mysql> show tables from mysql;

To quit mysql


mysql> \q

Screenshot from 2015-07-23 17:48:51

  • Install and load RMySQL from within R


  • I connect using this

mydb = dbConnect(MySQL(),
port = 8018,

  • I write sql queries using this
> dbGetQuery(mydb, "select * from  servers")
[1] Server_name Host        Db          Username    Password   
[6] Port        Socket      Wrapper     Owner      
<0 rows> (or 0-length row.names)
> dbGetQuery(mydb, "select * from  db")
 [1] Host                  Db                   
 [3] User                  Select_priv          
 [5] Insert_priv           Update_priv          
 [7] Delete_priv           Create_priv          
 [9] Drop_priv             Grant_priv           
[11] References_priv       Index_priv           
[13] Alter_priv            Create_tmp_table_priv
[15] Lock_tables_priv      Create_view_priv     
[17] Show_view_priv        Create_routine_priv  
[19] Alter_routine_priv    Execute_priv         
[21] Event_priv            Trigger_priv         
<0 rows> (or 0-length row.names)

Screenshot from 2015-07-23 18:16:09