Installing the book (SVMono) class in Lyx

  1. Start Lyx
  2. Your user directory can be got from looking at Help (last tab top -right)> About Lyx ( For Windows  this can be AppData – to find AppData put %appdata% at cmd line – or using windows button and R to run it – something like C:\Users\dell\AppData\Roaming\LyX2.0) Screenshot 2014-07-19 07.06.20Screenshot from 2013-09-27 15:29:34
  3. Download SVMono class from SVMono
  4. Unzip all the files into the layouts folder of your User Directory (step1) Screenshot from 2013-09-27 15:32:03
  5. Go to Lyx- Tools (second last tab-top right-) Reconfigure
  6. Close Lyx
  7. Start Lyx again
  8. Lyx (Documents (third last tab -top right ) Preferences- select Sv Mono as document class Screenshot from 2013-09-27 15:34:16

References-

http://wiki.lyx.org/Layouts/Layouts

http://wiki.lyx.org/LyX/UserDir

Thoughts on Soft Ware and Soft Vapour

  1. Interfaces for software, hardware,design and products evolve as technology enables better materials and more efficient consumption. The basic handicap though remains that humans do not evolve, at least not noticeable to themselves

  2. Human psychology continues to play a key role in marking success and failure of technological adoption. This includes the paradigms of loss aversion and being prone to logical fallacies in arguments put forward in advertising by fellow humans.

  3. What is easier? To create a better machine for a man. To train a man to be better at the machine. Often it is a mixture of both. Both have multiple costs and benefits to various agency players including incumbent corporations , challenger innovators, countries, regions and trade zones and environment.

  4. The practice of shipping code with an impatience of making up metrics alongside evolving industries causes dishonesty among interested stakeholders

  5. There are some things that can not be explained unless the receiver is trained to think along different paradigms than he has been used. The altered state of consciousness however is almost never credibly reversible.

  6. Information processing is the drug, not information itself.

  7. A Feedback loop is simple to construct with the results and errors flowing into inputs and next iteration processing. A Feedback loop is tough to sell to your own team , sometimes! Feedback loops are also inevitably gamed sometimes! By people of course

l8

 

Life Cycle of a Data Science Project

It is best to use CRISP -DM, SEMMA and/or KDD for a systematic approach

1) Understanding Business Requirements from Client

2) Converting Business Problem to a Statistical Problem

  • what data to collect

  • what is the cost of data

  • how can I enhance the data

  • data quality issues

3) Solving Statistical Problem with Tools (R, SAS, Excel)

  • import

  • data quality

  • outlier and missing value treatment

  • exploratory analysis

  • data visualization

  • hypothesis and problem framing

  • data mining and pattern identification

  • create success parameters for statistical solution

4) Converting Statistical Solution to Business Solution

  • project report template

  • assumptions and caveats

  • feedback from stakeholders

5) Communicating Business Solution to Client

  • presentation

  • report

  • customer satisfaction

  • monitoring of results

Using R and TwitteR together on Windows #rstats

You need to add the following apparently. on a Windows OS

options(RCurlOptions = list(cainfo = system.file(“CurlSSL”, “cacert.pem”, package = “RCurl”)))
download.file(url=”http://curl.haxx.se/ca/cacert.pem”, destfile=”cacert.pem”)

twitCred$handshake(cainfo=”cacert.pem”)

 

I am still investigating this to update my tutorial in previous post to be a complete stand alone Tutorial from Beginning to End

Using Twitter Data with R #rstats updated for API changes

Step 1

Install Package twitteR

install.packages("twitteR")
> install.packages("twitteR")
Installing package(s) into ‘/home/R/library’
(as ‘lib’ is unspecified)
also installing the dependencies ‘ROAuth’, ‘rjson’

trying URL 'http://cran.rstudio.com/src/contrib/ROAuth_0.9.3.tar.gz'
Content type 'application/x-gzip' length 6202 bytes
opened URL
==================================================
downloaded 6202 bytes

trying URL 'http://cran.rstudio.com/src/contrib/rjson_0.2.13.tar.gz'
Content type 'application/x-gzip' length 98132 bytes (95 Kb)
opened URL
==================================================
downloaded 95 Kb

trying URL 'http://cran.rstudio.com/src/contrib/twitteR_1.1.7.tar.gz'
Content type 'application/x-gzip' length 121696 bytes (118 Kb)
opened URL
==================================================
downloaded 118 Kb

* installing *source* package ‘ROAuth’ ...
** package ‘ROAuth’ successfully unpacked and MD5 sums checked
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded

* DONE (ROAuth)
* installing *source* package ‘rjson’ ...
** package ‘rjson’ successfully unpacked and MD5 sums checked
** libs
g++ -I/usr/share/R/include -DNDEBUG      -fpic  -O3 -pipe  -g  -c dump.cpp -o dump.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -O3 -pipe  -g  -c parser.c -o parser.o
g++ -shared -o rjson.so dump.o parser.o -L/usr/lib/R/lib -lR
installing to /home/R/library/rjson/libs
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
   ‘json_rpc_server.Rnw’ 
** testing if installed package can be loaded

* DONE (rjson)
* installing *source* package ‘twitteR’ ...
** package ‘twitteR’ successfully unpacked and MD5 sums checked
** R
** inst
** preparing package for lazy loading
Creating a generic function for ‘as.data.frame’ from package ‘base’ in package ‘twitteR’
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded

* DONE (twitteR)

The downloaded source packages are in
	‘/tmp/RtmpvY7yMN/downloaded_packages’

Step 2

Load Package twitteR

library(twitteR)
> library(twitteR)
Loading required package: ROAuth
Loading required package: RCurl
Loading required package: bitops
Loading required package: digest
Loading required package: rjson

Step 3

Using your Twitter Login ID and Password Login to https://dev.twitter.com/apps/  

In case you forget the twitter.com username and password, click on Forgot Password to reset Password

Screenshot from 2013-09-11 13:40:51

Step 4

Create a new app for yourself by navigating to My Applications

Screenshot from 2013-09-11 13:41:56

Step 5

Your Apps are here

https://dev.twitter.com/apps

Click on New Application (button on top right)

Screenshot from 2013-09-11 13:44:16

Step 6

Fill the options here- leave the callback  url blank

Name should be Unique

Description should be atleast 10 Charachters

Website can be a placeholder as of now (or your blog address) Screenshot from 2013-09-11 13:46:03

Agree to Terms and Conditions

Type the Spam Check Number and Letters Screenshot from 2013-09-11 13:46:18

Step 7

Note these  details from your new APP

Consumer Key

Consumer Secret

On the Bottom –

Click on Create your OAuth Token

Finally your APP page should look like this (dont worry i will be deleting this app so you cant hack my twitter yet)

Screenshot from 2013-09-11 13:52:10

Step 8

Go to R

Type the following code  after you changed the two consumer keys (IMPORTANT- You will need to change your Consumer Key and Consumer Secret to the one specific to YOUR app)

NOTE- WordPress makes some changes when you copy and paste code to your blog

(like  adding &8221 to lines 2-4 below- Ignore this please)

THE final formatted code is at very end of the post

library(twitteR)
reqURL <- “https://api.twitter.com/oauth/request_token&#8221;
accessURL <- “https://api.twitter.com/oauth/access_token&#8221;
authURL <- “https://api.twitter.com/oauth/authorize&#8221;
consumerKey <- “2uQlGBBMMXdDffcK2IkAsg
consumerSecret <- “xrGr71kTfdT3ypWFURGxyJOC4Oqf46Rwu4qxyxoEfM”
twitCred <- OAuthFactory$new(consumerKey=consumerKey,
consumerSecret=consumerSecret,
requestURL=reqURL,
accessURL=accessURL,
authURL=authURL)

Screenshot from 2013-09-11 13:41:56Step 9

Do the Twitter Handshake by pasting this

command in R Console

twitCred$handshake()

You will see a message like this from R

> twitCred$handshake()
To enable the connection, please direct your web browser to: http://api.twitter.com/oauth/authorize?oauth_token=pJqojAg2gxmqip3SprJAyOckdcD1nB3MvlbP2dWUDGQ When complete, record the PIN given to you and provide it here:

Step 10

Go to the link above given by R

You will get this message

Screenshot from 2013-09-11 14:01:42Click on blue button

Authorize app

Step 11 Entering the Pin

Now you see a pin here

like this

Screenshot from 2013-09-11 14:02:48-You cant copy and paste it. Write it down and then type in your R console

Step 12

Now register the credentials using

registerTwitterOAuth(twitCred)

you will see this

if done correctly

> registerTwitterOAuth(twitCred) [1] TRUE

123Step 13

Search Twitter using commands like here. Note it returns only 499 tweets

> a=searchTwitter(“#rstats”, n=2000)

Warning message: In doRppAPICall(“search/tweets”, n, params = params, retryOnRateLimit = retryOnRateLimit, : 2000 tweets were requested but the API can only return 499

Step 14 Now you can start analyzing the data

Convert the data into a data frame tweets_df = twListToDF(a)

Install Packages tm (for textmining and wordcloud)

> install.packages(c(“tm”, “wordcloud”))

Load the Packages

library(tm)

library(wordcloud)

Basic Word Cloud can be created using code below

b=Corpus(VectorSource(tweets_df$text), readerControl = list(language = “eng”))

b<- tm_map(b, tolower) #Changes case to lower case

b<- tm_map(b, stripWhitespace) #Strips White Space

b <- tm_map(b, removePunctuation) #Removes Punctuation

inspect(b)

tdm <- TermDocumentMatrix(b)

m1 <- as.matrix(tdm)

v1<- sort(rowSums(m1),decreasing=TRUE)

d1<- data.frame(word = names(v1),freq=v1)

wordcloud(d1$word,d1$freq)

For more detailed analysis on what you can do with Twitter and R, read this http://cran.r-project.org/web/packages/twitteR/twitteR.pdf or this https://sites.google.com/site/miningtwitter/

Step 15 Keep your OAuth keys safely, and do your homework with out bothering your instructor too much.

If you try and copy the paste the code from a website, be sure to change the quotation marks “” manually in your R console

Also see on text mining https://decisionstats.com/2012/03/19/text-mining-barack-obama/

FINAL CODE

install.packages("twitteR")
library(twitteR)
reqURL <- "https://api.twitter.com/oauth/request_token"
accessURL <- "https://api.twitter.com/oauth/access_token"
authURL <- "https://api.twitter.com/oauth/authorize"
consumerKey <- "rR16FxDLkTYmuVhqH4s4EQ"
consumerSecret <- "xrGr71kTfdT3ypWFURGxyJOC4Oqf46Rwu4qxyxoEfM"
twitCred <- OAuthFactory$new(consumerKey=consumerKey,
                             consumerSecret=consumerSecret,
                             requestURL=reqURL,
                             accessURL=accessURL,
                             authURL=authURL)
twitCred$handshake() #Pause here for the Handshake Pin Code
registerTwitterOAuth(twitCred) #Wait till you see True


a=searchTwitter("#rstats", n=2000) #Get the Tweets
 tweets_df = twListToDF(a) #Convert to Data Frame
 install.packages(c("tm", "wordcloud"))
 library(tm) 
library(wordcloud)   
  b=Corpus(VectorSource(tweets_df$text), readerControl = list(language = "eng"))
 b<- tm_map(b, tolower) #Changes case to lower case 
b<- tm_map(b, stripWhitespace) #Strips White Space 
b <- tm_map(b, removePunctuation) #Removes Punctuation
 inspect(b) 
tdm <- TermDocumentMatrix(b) 
m1 <- as.matrix(tdm) 
v1<- sort(rowSums(m1),decreasing=TRUE) 
d1<- data.frame(word = names(v1),freq=v1) 
wordcloud(d1$word,d1$freq)

 

The new RStudio Shiny Server – A bright Spark for hosted Stats Visualization #rstats

So I googled http://spark.rstudio.com/ and I got a cool list of Shiny Apps on the R Studio Shiny Server ( experimental!)

I suppose the next stage is to have some pre built themes to further enable or facillitate Business Intelligence kind of visualization so people do not have to build the UI.R from scratch

Also a  gallery for the best of Shiny is here http://www.rstudio.com/shiny/showcase/

http://spark.rstudio.com/trestletech/ShinyDash-Sample/

Screenshot from 2013-09-09 11:05:00

http://spark.rstudio.com/jkatz/SurveyMaps/

Screenshot from 2013-09-09 11:03:41

http://spark.rstudio.com/jkatz/DialectMap/

Screenshot from 2013-09-09 11:02:06

http://spark.rstudio.com/jbryer/gambler/

Screenshot from 2013-09-09 11:00:58

http://spark.rstudio.com/jbryer/timeline/

Screenshot from 2013-09-09 10:58:21

http://spark.rstudio.com/systematiced/MarketDashboard/

Screenshot from 2013-09-09 10:59:32

and others too

http://spark.rstudio.com/ram/shinySketch/

http://spark.rstudio.com/dgrapov/PCA/

http://spark.rstudio.com/dgrapov/1Dplots/

Using Quandl for Datasets and Research #rstats

Graph of Currency Exchange Rates - INR vs USD

I love the above graph from Quandl, it is as easy as using a search engine for numerical datasets, and it gives me graphs, and download and embed options. Nice Work Quandl!

I hope the R Package Quandl at http://cran.r-project.org/web/packages/Quandl/ is used more often for searching for datasets. Rather than import dataset using URL as in R Studio Server, maybe we can have some import Quandl dataset features too. URL’s are so 90’ish. or maybe the Shiny Server /Quandl mashup can bring some new ideas. After all Dashboard Design is still relevant today! Something like a ggplot based dashboard them for analysis , for Shiny server based visualizations. I believe Shiny can have more pre-built theme (to be continued)

Quandl though must be applauded for the options they give including the R code

Screenshot from 2013-09-09 10:04:23

1) EASY SEARCH Screenshot from 2013-09-09 09:56:10

2) MANY DATASETS (and they respect No Indexing requests) Screenshot from 2013-09-09 09:55:59

3) Final Search Results are embeddable ,downloadable and linkable.

Screenshot from 2013-09-09 09:55:38