Home » Posts tagged 'rstats'

Tag Archives: rstats

Updates at Statace : Early access to make your own R in the browser GUI #rstats

The guys at Statace released major updates- I am particularly excited for the ability to create a custom GUI box for your own analysis or for sharing with consulting clients or students.

What does that mean? Basically they are making it a bit like R Commander Extensions- so if you have a package or analysis you would rather do visually (than code) – you can create a GUI module for it. The modular extension is quite cool in my opinion, but further proof will be in how well designed the pudding is.


Public sharing of results
Now you can share your analysis results for the world to see (example). Just click Share in the results pane.

Google Drive integration
We added integration with Google Drive. This makes collaboration and synchronization of large files even easier. Don’t forget we also support Dropbox. Just click the Connect to menu in the file manager.

Plots zoom and SVG export
Now you can open plots in a separate window that supports zoom in and zoom out. From it, you can also export to the SVG format which is ideal for printing. Just click the lens icon next to any plot.

Point-and-click PCA + data transformation without R knowledge
You can now carry out a PCA by just pointing and clicking though Analysis > Dimensional Analysis > Principal Components Analysis. We also added the Data menu which allows you to filter and sort datasets without any knowledge of R.

(Secret) Build your own visual dialog box to run R code
Do you have colleagues who don’t know R but need to use functionality you developed? Do you do consulting and want your customers to be able to run your models with point-and-click? Do you want to share a piece of R code with the world in an easy-to-use way?
StatAce now allows you to easily create a custom graphical interface for your R code. The process is entirely visual (no coding) and is what we use to build our own Data & Analysis menus (e.g. the bivariate correlation and linear regression dialog boxes). We are testing the functionality with a limited number of users, and their feedback has been great. Drop us a line at predict@statace.com to request early access.


Screenshot 2014-04-15 15.34.25




The Amazing R-Fiddle truly brings #rstats to the browser

Datamind.com whom I interact with on and off, and also the masterminds behind http://www.rdocumentation.org/

have finally created their platform for interactive and gamified R learning on the web. Take a look- it does like slightly better than Codeacademy’s interface doesnt it. The platform is called http://www.r-fiddle.org/#/

More power to R for Cloud Computing!

Screenshot from 2013-11-21 21:37:25

Now if they could only collobrate with other players like Quandl, BigML and even StatAce for a even cooler suggestion. Even Revolution Analytics and RStudio who have very expensive training modules should be able to use this for self paced online learning courses!


Quote- A software of beauty is a joy forever – Keats

Polyglots for Data Science #python #sas #r #stats #spss #matlab #julia #octave

In the future I think analysts need to be polyglots- you will need to know more than one language for crunching data.

SAS, Python, R, Julia,SPSS,Matlab- Pick Any Two ;) or Any Three.

No, you can’t count C or Java as a statistical  language :) :)

Efforts to promote Polyglots in Statistical Software are-

1) R for SAS and SPSS Users (free or book)

2) R for Stata Users (book)

3) SAS and R (blog and book)

4) Using Python and R together

Probably we need a Python and R for Data Analysis book- just like we have for SAS and R books.

5) Matlab   and R

Reference (http://mathesaurus.sourceforge.net/matlab-python-xref.pdf ) includes Python

5) Octave and R

package http://cran.r-project.org/web/packages/RcppOctave/vignettes/RcppOctave.pdf includes Matlab

reference http://cran.r-project.org/doc/contrib/R-and-octave.txt

6) Julia and python

  • PyPlot uses the Julia PyCall package to call Python’s matplotlib directly from Julia

7) SPSS and Python is here

8) SPSS and R is as below

  • The Essentials for R for Statistics versions 22, 21, 20, and 19 are available here.
  • This link will take you to the SourceForge site where the Version 18 Essentials and Plugins are hosted.


9) Using R from Clojure – Incanter

Use embedded R from Clojure and Incanter http://github.com/jolby/rincanter

Using R for random number creation from time stamps #rstats

Suppose – let us just suppose- you want to create random numbers that are reproducible , and derived from time stamps

Here is the code in R

> a=as.numeric(Sys.time())
> set.seed(a)
> rnorm(log(a))

Note- you can create a custom function  ( I used  the log) for generating random numbers of the system time too. This creates a random numbered list of pseudo random numbers (since nothing machine driven is purely random in the strict philosophy of the word)


[1]  39621645  99451316 109889294 110275233 278994547   6554596  38654159  68748122   8920823  13293010
[11]  57664241  24533980 174529340 105304151 168006526  39173857  12810354 145341412 241341095  86568818
[21] 105672257

Possible applications- things that need both random numbers (like encryption keys) and time stamps (like events , web or industrial logs or as pseudo random pass codes in Google 2 factor authentication )

Note I used the rnorm function but you could possibly draw the functions also as a random input (rnorm or rcauchy)

Again I would trust my own random ness than one generated by an arm of US Govt (see http://www.nist.gov/itl/csd/ct/nist_beacon.cfm )

Update- Random numbers in R



The currently available RNG kinds are given below. kind is partially matched to this list. The default is "Mersenne-Twister".

The seed, .Random.seed[-1] == r[1:3] is an integer vector of length 3, where each r[i] is in 1:(p[i] - 1), where p is the length 3 vector of primes, p = (30269, 30307, 30323). The Wichmann–Hill generator has a cycle length of 6.9536e12 (= prod(p-1)/4, see Applied Statistics (1984) 33, 123 which corrects the original article).

A multiply-with-carry RNG is used, as recommended by George Marsaglia in his post to the mailing list ‘sci.stat.math’. It has a period of more than 2^60 and has passed all tests (according to Marsaglia). The seed is two integers (all values allowed).

Marsaglia’s famous Super-Duper from the 70′s. This is the original version which does not pass the MTUPLE test of the Diehard battery. It has a period of about 4.6*10^18 for most initial seeds. The seed is two integers (all values allowed for the first seed: the second must be odd).

We use the implementation by Reeds et al. (1982–84).

The two seeds are the Tausworthe and congruence long integers, respectively. A one-to-one mapping to S’s .Random.seed[1:12] is possible but we will not publish one, not least as this generator is not exactly the same as that in recent versions of S-PLUS.

From Matsumoto and Nishimura (1998). A twisted GFSR with period 2^19937 – 1 and equidistribution in 623 consecutive dimensions (over the whole period). The ‘seed’ is a 624-dimensional set of 32-bit integers plus a current position in that set.

A 32-bit integer GFSR using lagged Fibonacci sequences with subtraction. That is, the recurrence used is

X[j] = (X[j-100] – X[j-37]) mod 2^30

and the ‘seed’ is the set of the 100 last numbers (actually recorded as 101 numbers, the last being a cyclic shift of the buffer). The period is around 2^129.

An earlier version from Knuth (1997).

The 2002 version was not backwards compatible with the earlier version: the initialization of the GFSR from the seed was altered. R did not allow you to choose consecutive seeds, the reported ‘weakness’, and already scrambled the seeds.

Initialization of this generator is done in interpreted R code and so takes a short but noticeable time.

A ‘combined multiple-recursive generator’ from L’Ecuyer (1999), each element of which is a feedback multiplicative generator with three integer elements: thus the seed is a (signed) integer vector of length 6. The period is around 2^191.

The 6 elements of the seed are internally regarded as 32-bit unsigned integers. Neither the first three nor the last three should be all zero, and they are limited to less than 4294967087 and 4294944443 respectively.

This is not particularly interesting of itself, but provides the basis for the multiple streams used in package parallel.

Use a user-supplied generator.


Function RNGkind allows user-coded uniform and normal random number generators to be supplied.

Iris for Big Data #rstats #bigdata

Quote of the Day-

it is impossible to be a data scientist without knowing iris 

#Anonymous #Quotes


Revolution Analytics has been nice enough to provide both datasets and code for analyzing Big Data in R.



Site was updated so here are the new links


while the Datasets collection is still elementary, as a R Instructor I find this list extremely useful. However I wish they look at some other repositories and make .xdf and “tidy” csv versions. A little bit of RODBC usage should help, and so will some descriptions. Maybe they should partner with Quandl, DataMarket, or Infochimps on this initiative than do it alone.


Overall there can be a R package (like a Big Data version of the famous datasets package in R)

But a nice and very useful effort

Revolution R Datasets

More code-


Also a recent project made by a student of mine on Revolution Datasets and using their blog posts.

Note how much more better the above project is than use the mini and super clean datasets within R (like Boston)


Hat TIP- R’s very own Mr Smith
For more on IRIS


Using ifelse in R for creating new variables #rstats #data #manipulation

The ifelse function is simple and powerful and can help in data manipulation within R. Here I create a categoric variable from specific values in a numeric variable

> data(iris)

> iris$Type=ifelse(iris$Sepal.Length<5.8,”Small Flower”,”Big Flower”)
> table(iris$Type)
Big Flower Small Flower
77           73

The parameters  of ifelse is quite simple


ifelse(test, yes, no)

an object which can be coerced to logical mode.

return values for true elements of test.

return values for false elements of tes


Basics of Data Handling for R beginners #rstats

  • Assigning Objects

We can create new data objects and variables quite easily within R. We use the = or the → operator to denote assigning an object to it’s name. For the purpose of this article we will use = to assign objectnames and objects. This is very useful when we are doing data manipulation as we can reuse the manipulated data as inputs for other steps in our analysis.


Types of Data Objects in R

  • Lists

A list is simply a collection of data. We create a list using the c operator.

The following code creates a list named numlist from 6 input numeric data



The following code creates a list named charlist from 6 input character data



The following code creates a list named mixlistfrom both numeric and character data.

mixlist=c(1,2,3,4,”R language”,”Ajay”)


  • Matrices

Matrix is a two dimensional collection of data in rows and columns, unlike a list which is basically one dimensional. We can create a matrix using the matrix command while specifying the number of rows by nrow and number of columns by ncol paramter.

In the following code , we create an matrix named ajay and the data is input in 3 rows as specified, but it is entered into first column, then second column , so on.



[,1] [,2] [,3]

[1,] 1 4 12

[2,] 2 5 18

[3,] 3 6 24


However please note the effect of using the byrow=T (TRUE) option. In the following code we create an matrix named ajay and the data is input in 3 rows as specified, but it is entered into first row, then second row , so on.




[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

[3,] 12 18 24

  • Data Frames

A data frame is a list of variables of the same number of rows with unique row names. The column names are the names of the variables.



Get every new post delivered to your Inbox.

Join 733 other followers