How do you pick a modeling method? Spotlighted PAW session

Michael Berry whom we interviewed here at https://decisionstats.com/2010/10/05/interview-michael-j-a-berry-data-miners-inc/ is giving a session at PAW on Modeling Techniques

This is a featured post by our sponsor-


Spotlighted PAW Session: Michael Berry on Modeling Techniques

A long-term veteran expert, consultant, and instructor – who is normally found in a keynote session – TripAdvisor’s Michael Berry will serve PAW’s audience with invaluable insights by way of his highly rated, captivating speaking style. Mr. Berry is also the founder of the consultancy Data Miners and co-author of popular books, including Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management.

Witness this and 30 other sessions at Predictive Analytics World for Business,

September 27- October 1, 2015 in Boston.

Michael Berry Analytics Director

Tripadvisor for Business

SESSION: Picking the Right Modeling Technique for the Problem

Decision Tree? Neural Network? Regression? Naive Bayes? Support Vector Machine? It is said that when your only tool is a hammer, every problem looks like a thumb. Modern data mining toolkits are full of tools, but how do you pick the right tool for a particular predictive analytics task? Presenter: Michael Berry

Ten Thoughts on giving Analytics Trainings

I have delivered trainings by now to hundreds of students and professionals by both online as well as offline means.  Mostly I get great reviews. But twice in a decade I have bombed too.

1912293_931479710215486_4316765684077047706_o

Here are some thoughts on this-

1) Prior preparation is very necessary. day before class , whole code should be run one day before and tested.
Why do trainers not prepare every time- well time bandwidth is an issue.

2) Enterprise  clients spend on invoice, infrastructure, transport but also on per employee time. So errors during a training session with retail clients it is ok, but one has to be be very prepared with corporate clients. Corporate clients do not tolerate errors of even ten minutes.

3) You can lose the flow if you interrupt your training to questions and answers. Sometimes it is best to park questions.

4) Self discipline is always necessary in case a few assertive students don’t corner or hijack the agenda to what they need . The needs of the many outweigh the needs of the few in analytics training.

5) Even after numerous prior meetings, client requirements and intentions can remain unclear on what they want from training. Keep some flexibility but maintain pedagogy discipline.

6) Always ask prior requirements to read or join the training.

7) Homework rarely works with adults taking training in both online or offline classes. Classroom quizzes and live tasks work better.

8) R and Python are very dynamic in analytics technology. So keep updating those ppts. Sharing 10% of your content for free online can bring you a lot of leads for repeat business free.

9) Always ask for due time to create, rehearse and present the training regardless of client urgency. Training in analytics are long term investments and hurry can compromise your quality and reputation.

10) Always be proactive. If client is very unhappy, be the first to offer discounts. If the client is happy ask for a LinkedIn testimonial.

Tips for using R in production analytics environment

Newface1) Read.csv is dead. Long live fread Use fread from data.table to import data and get a speed up factor of 5 X in the data import phase itself. Ignore data.table package and languish in hell

2) Write.csv is boring. Write as a .Rda file Use .Rda file to get compressions of upto 4 X

3) Use new project mode from RStudio This helps to clean workflow management

4) Use GUIs like Deducer / kmggplot2 plugin from Rcommander for great data viz right now For people who want to use ggplot2 straight away

5) Avoiding duplicates , remove prior copies and use gc() Memory management is key to use of R in production analytics.

6) Think object oriented. Forget other languages Think slice and dice and using $ and [] and using apply versus for loops.

7) Use ? and ?? before you google and ask for help on Stack Overflow Seriously dude R has a lot of documentation! A Lot! Use it . Also see CRAN Views!

8) You are not too old to learn dplyr on Datacamp Skilling up and reskilling is part of being a data science hacker

9) Subscribe to R-bloggers and never miss out on a new package that helps solve your problems R has 8000+ packages and 150000 + functions. All you need is one function to cut down your analysis time and go home early

10) Profiling code, benchmark functions and byte compilation seperate grown up from the kids data scientists. Hadley says- http://adv-r.had.co.nz/Rcpp.html Hadley says-http://adv-r.had.co.nz/Profiling.html Enough said!

Using RMySQL from Ubuntu

  • Install MySQL

sudo apt-get install mysql-server

  • Check if Server is Running

sudo netstat -tap | grep mysql

I use the MySQL command line to check it

To connect

mysql -h localhost -u root -p

To see databases

mysql>show databases;

To see tables

 

mysql> show tables from mysql;

To quit mysql

 

mysql> \q

Screenshot from 2015-07-23 17:48:51

  • Install and load RMySQL from within R

install.packages(“RMySQL”)

library(RMySQL)
  • I connect using this

mydb = dbConnect(MySQL(),
user=’root’,
password=’XXX’,
host=’localhost’,
port = 8018,
dbname=’mysql’)

  • I write sql queries using this
> dbGetQuery(mydb, "select * from  servers")
[1] Server_name Host        Db          Username    Password   
[6] Port        Socket      Wrapper     Owner      
<0 rows> (or 0-length row.names)
> dbGetQuery(mydb, "select * from  db")
 [1] Host                  Db                   
 [3] User                  Select_priv          
 [5] Insert_priv           Update_priv          
 [7] Delete_priv           Create_priv          
 [9] Drop_priv             Grant_priv           
[11] References_priv       Index_priv           
[13] Alter_priv            Create_tmp_table_priv
[15] Lock_tables_priv      Create_view_priv     
[17] Show_view_priv        Create_routine_priv  
[19] Alter_routine_priv    Execute_priv         
[21] Event_priv            Trigger_priv         
<0 rows> (or 0-length row.names)

Screenshot from 2015-07-23 18:16:09

Source-

https://help.ubuntu.com/12.04/serverguide/mysql.html

RMySQL 0.10.0

https://mkmanu.wordpress.com/2014/07/24/r-and-mysql-a-tutorial-for-beginners/

Sponsored: PAW Boston keynotes, agenda and workshops announced

 

BOSTON

Seaport World Trade Center

Sept. 27 – Oct. 1, 2015

Keynote Speakers:

DEAN ABBOTT

Co-Founder & Chief Data Scientist of SmarterHQ
DR. PATRICK SURRY

Chief Data Scientist

Hopper
CHRISTOPHER WIGGINS

Chief Data Scientist

The New York Times

Co-located Events:

Predictive Analytics World for Business announces Boston program

Predictive Analytics World (PAW) for B usiness covers a wide range of business applications for predictive analytics across industry sectors including marketing, credit scoring, insurance, fraud detection, web optimization, and much more.

What’s on the Agenda?

  • 30+ sessions focused on predictive analytics in the industries of business, insurance, retail, workforce, education and more
  • 30+ speakers from Elder Research, Inc., Merkle Inc., EMC, Talent Analytics, Corp., Verizon, Metlife, State Street Corp.
  • 20+ case studies from American Savings Bank, Telenor, Johnson Controls, Halliburton, Travelers, Fidelity, Paychex
  • 7 training workshops diving into big data, R, predictive modeling, hands-on methods, & supercharging prediction
  • 3 tracks to focus your learning: All Levels, Expert/Practitioner, Financial Services
  • Networking during at the exhibit hall, reception, breaks

View the Agenda

Why Attend?
Improve your predictive analytics proficiency to achieve:

  • Bigger wins – Strengthen the impact of predictive analytics deployment
  • Broader capabilities – Establish new opportunities in data science
  • Big data – Leverage bigger data for prediction and drive bigger value

Super Early Bird Rates
Register Here
Sign up by June 26th to keep up to $650 in your pocket with super early bird rates. Save an additional $200 with each additional attendee registering from the same company at the same time.