You can use
pip install git+git://github.com/yhat/ggplot.git
or
pip install --upgrade https://github.com/yhat/ggplot/tarball/master
You can use
pip install git+git://github.com/yhat/ggplot.git
or
pip install --upgrade https://github.com/yhat/ggplot/tarball/master
Some thoughts on R Packages
Here is an interview with Jeff Allen who works with R and the new package Shiny in his technology startup. We featured his RGL Demo in our list of Shiny Demos- here
Ajay- Describe how you started using R. What are some of the benefits you noticed on moving to R?
Jeff- I began using R in an internship while working on my undergraduate degree. I was provided with some unformatted R code and asked to modularize the code then wrap it up into an R package for distribution alongside a publication.
To be honest, as a Computer Science student with training more heavily emphasizing the big high-level languages, R took some getting used to for me. It wasn’t until after I concluded that initial project and began using R to do my own data analysis that I began to realize its potential and value. It was the first scripting language which really made interactive use appealing to me — the experience of exploring a dataset in R was unlike anything Continue reading “Interview Jeff Allen Trestle Technology #rstats #rshiny”
Message from Winston Cheng of R Studio.
—-
I noticed this article sometime back by the most excellent hacker, John Myles White ( author Machine learning for Hackers)
http://www.johnmyleswhite.com/notebook/2012/08/12/the-social-dynamics-of-the-r-core-team/
Professor John Fox, whom we have interviewed here as the creator of R Commander, talked on this at User 2008 http://www.statistik.uni-dortmund.de/useR-2008/slides/Fox.pdf
I also noticed that R Project is stuck on SVN ( yes or no??, comment please) while some part of the rest of the World has moved on to Git. See http://en.wikipedia.org/wiki/Git_%28software%29
Is Git really that good compared to SVN http://stackoverflow.com/questions/871/why-is-git-better-than-subversion
Maybe, I think with 5000 packages and more , R -project needs to have more presence on Github and atleast consider Git for the distributed and international project R is becoming.
Continue reading “Data Visualization for R packages at Github #rstats”
I loved this book. Only 75 pages and very lucidly written and available on Github for free. Nice job by Avril Coghlan a.coghlan@ucc.ie
.Of course My usual suspects for Time Series Readings are –
1) The seminal pdf (2008!!) by a certain Prof Hyndman
Click to access Rtimeseries-ohp.pdf
2) JSS Paper -Automatic Time Series Forecasting: The forecast
Package for R http://www.jstatsoft.org/v27/i03/paper
3) The CRAN View http://cran.r-project.org/web/views/TimeSeries.html
This is cluttered and getting more and more cluttered. Some help on helping recent converts to R, especially in the field of corporate forecasting or time series for business analytics would really help.
Avril does an awesome job with this curiously named ( 😉 ) booklet at http://a-little-book-of-r-for-time-series.readthedocs.org/en/latest/src/timeseries.html
Here is an interview with Charlie Parker, head of large scale online algorithms at http://bigml.com
Ajay- Describe your own personal background in scientific computing, and how you came to be involved with machine learning, cloud computing and BigML.com
Charlie- I am a machine learning Ph.D. from Oregon State University. Francisco Martin (our founder and CEO), Adam Ashenfelter (the lead developer on the tree algorithm), and myself were all studying machine learning at OSU around the same time. We all went our separate ways after that.
Francisco started Strands and turned it into a 100+ million dollar company building recommender systems. Adam worked for CleverSet, a probabilistic modeling company that was eventually sold to Cisco, I believe. I worked for several years in the research labs at Eastman Kodak on data mining, text analysis, and computer vision.
When Francisco left Strands to start BigML, he brought in Justin Donaldson who is a brilliant visualization guy from Indiana, and an ex-Googler named Jose Ortega who is responsible for most of our data infrastructure. They pulled in Adam and I a few months later. We also have Poul Petersen, a former Strands employee, who manages our herd of servers. He is a wizard and makes everyone else’s life much easier.
Ajay- You use clojure for the back end of BigML.com .Are there any other languages and packages you are considering? What makes clojure such a good fit for cloud computing ?
Charlie- Clojure is a great language because it offers you all of the benefits of Java (extensive libraries, cross-platform compatibility, easy integration with things like Hadoop, etc.) but has the syntactical elegance of a functional language. This makes our code base small and easy to read as well as powerful.
We’ve had occasional issues with speed, but that just means writing the occasional function or library in Java. As we build towards processing data at the Terabyte level, we’re hoping to create a framework that is language-agnostic to some extent. So if we have some great machine learning code in C, for example, we’ll use Clojure to tie everything together, but the code that does the heavy lifting will still be in C. For the API and Web layers, we use Python and Django, and Justin is a huge fan of HaXe for our visualizations.
Ajay- Current support is for Decision Trees. When can we see SVM, K Means Clustering and Logit Regression?
Charlie- Right now we’re focused on perfecting our infrastructure and giving you new ways to put data in the system, but expect to see more algorithms appearing in the next few months. We want to make sure they are as beautiful and easy to use as the trees are. Without giving too much away, the first new thing we will probably introduce is an ensemble method of some sort (such as Boosting or Bagging). Clustering is a little further away but we’ll get there soon!
Ajay- How can we use the BigML.com API using R and Python.
Charlie- We have a public github repo for the language bindings. https://github.com/bigmlcom/io Right now, there there are only bash scripts but that should change very soon. The python bindings should be there in a matter of days, and the R bindings in probably a week or two. Clojure and Java bindings should follow shortly after that. We’ll have a blog post about it each time we release a new language binding. http://blog.bigml.com/
Ajay- How can we predict large numbers of observations using a Model that has been built and pruned (model scoring)?
Charlie- We are in the process of refactoring our backend right now for better support for batch prediction and model evaluation. This is something that is probably only a few weeks away. Keep your eye on our blog for updates!
Ajay- How can we export models built in BigML.com for scoring data locally.
Charlie- This is as simple as a call to our API. https://bigml.com/developers/models The call gives you a JSON object representing the tree that is roughly equivalent to a PMML-style representation.
About-
You can read about Charlie Parker at http://www.linkedin.com/pub/charles-parker/11/85b/4b5 and the rest of the BigML team at