## Topic Models in R- search documents for similarity by frequency

From the marvelous lovely Journal of Statistical Software, ignored by mainstream corporatia, but beloved to academia. here is one more interesting and very timely paper.

Can be used to grade stdudents homework, catch terrorists as in plagiarists , search engine spam linkers. Enjoy!

## Multi State Models

A special issue of the Journal of Statistical Software has come out devoted to Multi State Models and Competing Risks. It is a must read for anyone with interest in Pharma Analytics or Survival Analysis- even if you dont know much R

Here is an extract from “mstate: An R Package for the Analysis ofCompeting Risks and Multi-State Models”

Multi-state models are a very useful tool to answer a wide range of questions in sur-vival analysis that cannot, or only in a more complicated way, be answered by classicalmodels. They are suitable for both biomedical and other applications in which time-to-event variables are analyzed. However, they are still not frequently applied. So far, animportant reason for this has been the lack of available software. To overcome this prob-lem, we have developed the mstate package in R for the analysis of multi-state models.The package covers all steps of the analysis of multi-state models, from model buildingand data preparation to estimation and graphical representation of the results. It canbe applied to non- and semi-parametric (Cox) models. The package is also suitable forcompeting risks models, as they are a special category of multi-state models.

## Special Issue of JSS on R GUIs

An announcement by the Journal of Statistical Software- call for papers on R GUIs. Initial deadline is December 2010 with final versions published along 2011.

Announce

Special issue of the Journal of Statistical Software on

Graphical User Interfaces for R

Editors: Pedro Valero-Mora and Ruben Ledesma

Since it original paper from Gentleman and Ihaka was published, R has managed to gain an ever-increasing percentage of academic and professional statisticians but the spread of its use among novice and occasional users of statistics have not progressed at the same pace. Among the reasons for this relative lack of impact, the lack of a GUI or point and click interface is one of the causes most widely mentioned. But, however, in the last few years, this situation has been quietly changing and a number of projects have equipped R with a number of different GUIs, ranging from the very simple to the more advanced, and providing the casual user with what could be still a new source of trouble: choosing what is the GUI for him. We may have moved from the “too few” situation to the “too many” situation

This special issue of the JSS intends as one of its main goals to offer a general overview of the different GUIs currently available for R. Thus, we think that somebody trying to find its way among different alternatives may find useful it as starting point. However, we do not want to stop in a mere listing but we want to offer a bit of a more general discussion about what could be good GUIs for R (and how to build them). Therefore, we want to see papers submitted that discuss the whole concept of GUI in R, what elements it should include (or not), how this could be achieved, and, why not, if it is actually needed at all. Finally, despite the high success of R, this does not mean other systems may not treasure important features that we would like to see in R. Indeed, descriptions of these nice features that we do not have in R but are in other systems could be another way of driving the future progress of GUIs for R.In summary, we envision papers for this special issue on GUIs for R in the following categories:

- General discussions on GUIs for statistics, and for R.

- Implementing GUI toolboxes for R so others can program GUIs with them.

- R GUIs examples (with two subcategories, in the desktop or in the cloud).

- Is there life beyond R? What features have other systems that R does not have and why R needs them.

Papers can be sent directly to Pedro Valero-Mora (valerop@uv.es) or Ruben Ledesma (rdledesma@gmail.com) and they will follow the usual JSS reviewing procedure. Initial deadline is December 2010 with final versions published along 2011.

Jan de Leeuw; Distinguished Professor and Chair, UCLA Department of Statistics;Director: UCLA Center for Environmental Statistics (CES);

Editor: Journal of Multivariate Analysis, Journal of Statistical Software;homepages: http://gifi.stat.ucla.edu ++++++ http://www.cuddyvalley.org

## Interview Professor John Fox Creator R Commander

Here is an interview with Prof John Fox, creator of the very popular R language based GUI, RCmdr.

**Ajay- Describe your career in science from your high school days to the science books you have written. What do you think can be done to increase interest in science in young people.**

** John Fox-** I’m a sociologist and social statistician, so I don’t have a career in science, as that term is generally understood. I was interested in science as a child, however: I attended a science high school in New York City (Brooklyn Tech), and when I began university in 1964 at New York’s City College, I started in engineering. I moved subsequently through majors in philosophy and psychology, before finishing in sociology — had I not graduated in 1968 I probably would have moved on to something else. I took a statistics course during my last year as an undergraduate and found it fascinating. I enrolled in the sociology graduate program at the University of Michigan, where I specialized in social psychology and demography, and finished with a PhD in 1972 when I was 24 years old. I became interested in computers during my first year in graduate school, where I initially learned to program in Fortran. I also took quite a few courses in statistics and math.

I haven’t written any science books, but I have written and edited a number of books on social statistics, including, most recently, Applied Regression Analysis and Generalized Linear Models, Second Edition (Sage, 2008).

I’m afraid that I don’t know how to interest young people in science. Science seemed intrinsically interesting to me when I was young, and still does.

**Ajay- What prompted you to R Commander. How would you describe R Commander as a tool, say for a user of other languages and who want to learn R, but get afraid of the syntax.**

John- I originally programmed the R Commander so that I could use R to teach introductory statistics courses to sociology undergraduates. I previously taught this course with Minitab or SPSS, which were programs that I never used for my own work. I waited for someone to come up with a simple, portable, easily installed point-and-click interface to R, but nothing appeared on the horizon, and so I decided to give it a try myself.

I suppose that the R Commander can ease users into writing commands, inasmuch as the commands are displayed, but I suspect that most users don’t look at them. I think that serious prospective users of R should be encouraged to use the command-line interface along with a script editor of some sort. I wouldn’t exaggerate the difficulty of learning R: I came to R — actually S then — after having programmed in perhaps a dozen other languages, most recently at that point Lisp, and found the S language particularly easy to pick up.

** Ajay- I particularly like the R Cmdr plugins. Is it possible for anyone to increase R Commander with a customized package- plugin.**

** John-** That’s the basic idea, though the plug-in author has to be able to program in R and must learn a little Tcl/Tk.

**Ajay- Have you thought of using the R Commander GUI on an Amazon EC2 and thus making R high performance computing say available on demand ( similar to Zementis model deployment using Amazon Ec2). What are you views on the future of statistical computing**

** John- **I’m not sure whether or how an interface like the Rcmdr, which is Tcl/Tk-based, can be adapted to cloud computing. I also don’t feel qualified to predict the future of statistical computing.

I think that R is where the action is for the near future.

**Ajay-What are the best ways for using R Commander as a teaching tool ( I noticed the help is a bit outdated).**

** John-** Is the help a bit outdated? My intention is that the R Commander should be largely self-explanatory. Most people know how to use point-and-click interfaces. In the basic courses for which it is principally designed, my goals are to teach the essential ideas of statistical reasoning and some skills in data analysis. In this kind of course, statistical software should facilitate the basic goals of the course.

As I said, for serious data analysis, I believe that it’s a good idea to encourage use of the command-line interface.

** Ajay- What are your views on R being recognized by SAS Institute for it’s IML product. Do you think there can be a middle way for open source and proprietary software to exist.**

** John-** I imagine that R is a challenge for producers of proprietary software like SAS, partly because R development moves more quickly, but also because R is giving away something that SAS and other vendors of proprietary statistical software are selling. For example, I once used SAS quite a bit but don’t anymore. I also have the sense that for some time SAS has directed its energies more toward business uses of its software than toward purely statistical applications.

** Ajay- Do people in R Core team recognize the importance of GUI? What does the rest of R community feel? What has the feedback of users ben to you. Any plans to corporate sponsors for R Commander ( Rattle , an R language data mining GUI has a version called Rstat at http://www.informationbuilders.com/products/webfocus/predictivemodeling.html while the free version and code is at rattle.togaware.com)**

** John-** I feel that the R Commander GUI has been generally positively received, both by members of R Core who have said something about it to me and by others in the R community. Of course, a nice feature of the R package system is that people can simply ignore packages in which they have no interest. I noticed recently that a Journal of Statistical Software paper that I wrote several years ago on the Rcmdr package has been downloaded nearly 35,000 times.

Because I wouldn’t expect many students using the Rcmdr package in a course to read that paper, I expect that the package is being used fairly widely.

**Ajay- What does John Fox do for fun or as a hobby?**

** John- **I’m tempted to say that much of my work is fun — particularly doing research, writing programs, and writing papers and books. I used to be quite a serious photographer, but I haven’t done that in years, and the technology of photography has changed a great deal. I run and swim for exercise, but that’s not really fun. I like to read and to travel, but who doesn’t?

**Biography-**

Prof John Fox is a giant in his chosen fields and has edited/authored 13 books and written chapters for 12 more books.** **He has also written and been published in almost 49 Journal articles. He is also editor in chief for R News newsletter. You can read more about Dr Fox at http://socserv.mcmaster.ca/jfox/

**On R Cmdr-**

R Cmdr has substantially decreased the hygiene factor for people wanting to learn R- they begin with the GUI and then later transition to customization using command line. It is so simple in its design that even under graduates have started basic data analysis with R Cmdr after just a class.You can read more on it here at http://socserv.mcmaster.ca/jfox/Misc/**Rcmdr**/Getting-Started-with-the-**Rcmdr**.pdf

## Journal of Statistical Software

Here is a good open content Journal for people wanting to keep track of latest in statistical software.

It is called Journal of Statistical Software.

Citation: http://www.jstatsoft.org/

Established in 1996, the Journal of Statistical Software publishes articles, book reviews, code snippets, and software reviews on the subject of statistical software and algorithms. The contents are freely available on-line. For both articles and code snippets the source code is published along with the paper.

Implementations can use languages such as C, C++, S, Fortran, Java, PHP, Python and Ruby or environments such as Mathematica, MATLAB, R, S-PLUS, SAS, Stata, and XLISP-STAT.

E.g Book Reviews of A Handbook of Statistical Analyses Using SAS (Third Edition)

and Statistics and Data with R: An Applied Approach Through Examples

It is really cutting edge stuff for someone who wants to keep up with the latest and fast moving tech trends in statistical software and has convenient RSS feeds as well announce alerts for emails.

**Note- Various Journals can be ranked using a quantitative index called Impact Factor**

Citation http://in-cites.com/research/2007/august_27_2007-2.html

E.G For Statistics

In these columns, total citations to a journal’s published papers are divided by the total number of papers that the journal published, producing a citations-per-paper impact score over a five-year period (middle column) and a 26-year period (right-hand column).

Journals Ranked by Impact:

Statistics & Probability

Rank

2006

Impact Factor

Impact

2002-06Impact

1981-20061Bioinformatics

(4.89)Bioinformatics

(9.87)Econometrica

(52.93)2Biostatistics

(3.01)J. Royal Stat. Soc. B

(6.75)J. Royal Stat. Soc. B

(27.32)3Chemom. Intell. Lab.

(2.45)Biostatistics

(6.56)J. Am. Stat. Assoc.

(25.11)4Econometrica

(2.40)J. Computat. Biology

(6.49)Biometrika

(22.75)5J. Royal Stat. Soc. B

(2.32)Econometrica

(5.82)Annals of Statistics

(21.31)6IEEE ACM T Comp. Bi.

(2.28)J. Chemometrics

(5.08)Biometrics

(20.32)7J. Am. Stat. Assoc.

(2.17)J. Am. Stat. Assoc.

(4.95)Technometrics

(17.74)8Multivar. Behav. Res.

(2.10)Statistical Science

(4.19)Multivar. Behav. Res.

(16.62)9J. Computat. Biology

(2.00)Annals of Statistics

(3.94)Bioinformatics

(16.37)10Annals of Statistics

(1.90)Stat. in Medicine

(3.62)J. Royal Stat. Soc. A

(14.46)