Home » Posts tagged 'python' (Page 2)
Tag Archives: python
Python with Friends
Wanted to learn Python? Stuck on a desk with no redemption. You have two very lucid options. One is use Google. I mean not the search engine, but their class on learning Python.
The videos are available on Youtube at http://www.youtube.com/user/GoogleDevelopers (starting at http://www.youtube.com/watch?v=tKTZoB2Vjuk&feature=plcp)
http://code.google.com/edu/languages/google-python-class/
The other is new module of Python at code academy. It is truly awesome even if you dont know any programming!
So learn some awesome python today and be an excellent hacker tommorow!
JSS launches special edition for GUI for #Rstats
I love GUIs (graphical user interfaces)- they might be TCL/TK based or GTK based or even QT based. As a researcher they help me with faster coding, as a consultant they help with faster transition of projects from startup to handover stage and as an R instructor helps me get people to learn R faster.
I wish Python had some GUIs though
from the open access journal of statistical software-
JSS Special Volume 49: Graphical User Interfaces for R
Pedro M. Valero-Mora, Ruben Ledesma
Vol. 49, Issue 1, Jun 2012
Submitted 2012-06-03, Accepted 2012-06-03
Ya-Shan Cheng, Chien-Yu Peng
Vol. 49, Issue 2, Jun 2012
Submitted 2010-12-31, Accepted 2011-06-29
Joris J. Snellenburg, Sergey Laptenok, Ralf Seger, Katharine M. Mullen, Ivo H. M. van Stokkum
Vol. 49, Issue 3, Jun 2012
Submitted 2011-01-20, Accepted 2011-09-16
Marcel Austenfeld, Wolfram Beyschlag
Vol. 49, Issue 4, Jun 2012
Submitted 2011-01-05, Accepted 2012-02-20
Byron C. Wallace, Issa J. Dahabreh, Thomas A. Trikalinos, Joseph Lau, Paul Trow, Christopher H. Schmid
Vol. 49, Issue 5, Jun 2012
Submitted 2010-11-01, Accepted 2012-12-20
Bei Huang, Dianne Cook, Hadley Wickham
Vol. 49, Issue 6, Jun 2012
Submitted 2011-01-20, Accepted 2012-04-16
John Fox, Marilia S. Carvalho
Vol. 49, Issue 7, Jun 2012
Submitted 2010-12-26, Accepted 2011-12-28
Ian Fellows
Vol. 49, Issue 8, Jun 2012
Submitted 2011-02-28, Accepted 2011-09-08
Stefan Rödiger, Thomas Friedrichsmeier, Prasenjit Kapat, Meik Michalke
Vol. 49, Issue 9, Jun 2012
Submitted 2010-12-28, Accepted 2011-05-06
John Verzani
Vol. 49, Issue 10, Jun 2012
Submitted 2010-12-17, Accepted 2011-05-11
Antony Unwin
Vol. 49, Issue 11, Jun 2012
Submitted 2010-12-08, Accepted 2011-07-15
Interview Alvaro Tejada Galindo, SAP Labs Montreal, Using SAP Hana with #Rstats
Here is a brief interview with Alvaro Tejada Galindo aka Blag who is a developer working with SAP Hana and R at SAP Labs, Montreal. SAP Hana is SAP’s latest offering in BI , it’s also a database and a computing environment , and using R and HANA together on the cloud can give major productivity gains in terms of both speed and analytical ability, as per preliminary use cases.
Ajay- What made the R language a fit for SAP HANA. Did you consider other languages? What is your view on Julia/Python/SPSS/SAS/Matlab languages
Blag- I think “R” is a must for SAP HANA. As the fastest database in the market, we needed a language that could help us shape the data in the best possible way. “R” filled that purpose very well. Right now, “R” is not the only language as “L” can be used as well (http://wiki.tcl.tk/17068) …not forgetting “SQLScript” which is our own version of SQL (http://goo.gl/x3bwh) . I have to admit that I tried Julia, but couldn’t manage to make it work. Regarding Python, it’s an interesting question as I’m going to blog about Python and SAP HANA soon. About Matlab, SPSS and SAS I haven’t used them, so I got nothing to say there.
Ajay- What is your view on some of the limitations of R that can be overcome with using it with SAP HANA.
Blag- I think mostly the ability of SAP HANA to work with big data. Again, SAP HANA and “R” can work very nicely together and achieve things that weren’t possible before.
Ajay- Have you considered other vendors of R including working with RStudio, Revolution Analytics, and even Oracle R Enterprise.
Blag- I’m not really part of the SAP HANA or the R groups inside SAP, so I can’t really comment on that. I can only say that I use RStudio every time I need to do something with R. Regarding Oracle…I don’t think so…but they can use any of our products whenever they want.
Ajay- Do you have a case study on an actual usage of R with SAP HANA that led to great results.
Blag- Right now the use of “R” and SAP HANA is very preliminary, I don’t think many people has start working on it…but as an example that it works, you can check this awesome blog entry from my friend Jitender Aswani “Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps “ (http://allthingsr.blogspot.com/#!/2012/04/big-data-r-and-hana-analyze-200-million.html)
Ajay- Does your group in SAP plan to give to the R ecosystem by attending conferences like UseR 2012, sponsoring meets, or package development etc
Blag- My group is in charge of everything developers, so sure, we’re planning to get more in touch with R developers and their ecosystem. Not sure how we’re going to deal with it, but at least I’m going to get myself involved in the Montreal R Group.
About-
http://scn.sap.com/people/alvaro.tejadagalindo3
| Name: | Alvaro Tejada Galindo |
| Email: | a.tejada.galindo@sap.com |
| Profession: | Development |
| Company: | SAP Canada Labs-Montreal |
| Town/City: | Montreal |
| Country: | Canada |
| Instant Messaging Type: | |
| Instant Messaging ID: | Blag |
| Personal URL: | http://blagrants.blogspot.com |
| Professional Blog URL: | http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/u/252210910 |
| My Relation to SAP: | employee |
| Short Bio: | Development Expert for the Technology Innovation and Developer Experience team.Used to be an ABAP Consultant for the last 11 years. Addicted to programming since 1997. |
http://www.sap.com/solutions/technology/in-memory-computing-platform/hana/overview/index.epx
and from
http://en.wikipedia.org/wiki/SAP_HANA
SAP HANA is SAP AG’s implementation of in-memory database technology. There are four components within the software group:[1]
- SAP HANA DB (or HANA DB) refers to the database technology itself,
- SAP HANA Studio refers to the suite of tools provided by SAP for modeling,
- SAP HANA Appliance refers to HANA DB as delivered on partner certified hardware (see below) as anappliance. It also includes the modeling tools from HANA Studio as well replication and data transformation tools to move data into HANA DB,[2]
- SAP HANA Application Cloud refers to the cloud based infrastructure for delivery of applications (typically existing SAP applications rewritten to run on HANA).
R is integrated in HANA DB via TCP/IP. HANA uses SQL-SHM, a shared memory-based data exchange to incorporate R’s vertical data structure. HANA also introduces R scripts equivalent to native database operations like join or aggregation.[20] HANA developers can write R scripts in SQL and the types are automatically converted in HANA. R scripts can be invoked with HANA tables as both input and output in the SQLScript. R environments need to be deployed to use R within SQLScript
More blog posts on using SAP and R together
Dealing with R and HANAhttp://scn.sap.com/community/in-memory-business-data-management/blog/2011/11/28/dealing-with-r-and-hana
R meets HANA
http://scn.sap.com/community/in-memory-business-data-management/blog/2012/01/29/r-meets-hana
HANA meets R
http://scn.sap.com/community/in-memory-business-data-management/blog/2012/01/26/hana-meets-r
When SAP HANA met R – First kiss
http://scn.sap.com/community/developer-center/hana/blog/2012/05/21/when-sap-hana-met-r–first-kiss
Using RODBC with SAP HANA DB-
SAP HANA: My experiences on using SAP HANA with R
and of course the blog that started it all-
Jitender Aswani’s http://allthingsr.blogspot.in/
BigML meets R #rstats
I am just checking the nice new R package created by BigML.com co-founder Justin Donaldson. The name of the new package is bigml, which can confuse a bit since there do exist many big suffix named packages in R (including biglm)
The bigml package is available at CRAN http://cran.r-project.org/web/packages/bigml/index.html
I just tweaked the code given at http://blog.bigml.com/2012/05/10/r-you-ready-for-bigml/ to include the ssl authentication code at http://www.brocktibert.com/blog/2012/01/19/358/
so it goes
> library(bigml)
Loading required package: RJSONIO
Loading required package: RCurl
Loading required package: bitops
Loading required package: plyr
> setCredentials(“bigml_username”,”API_key”)
# download the file needed for authentication
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")
# set the curl options
curl <- getCurlHandle()
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem",
package = "RCurl"),
ssl.verifypeer = FALSE))
curlSetOpt(.opts = list(proxy = 'proxyserver:port'), curl = curl)
> iris.model = quickModel(iris, objective_field = ‘Species’)
Of course there are lots of goodies added here , so read the post yourself at http://blog.bigml.com/2012/05/10/r-you-ready-for-bigml/
Incidentally , the author of this R package (bigml) Justin Donalsdon who goes by name sudojudo at http://twitter.com/#!/sudojudo has also recently authored two other R packages including tsne at http://cran.r-project.org/web/packages/tsne/index.html (tsne: T-distributed Stochastic Neighbor Embedding for R (t-SNE) -A “pure R” implementation of the t-SNE algorithm) and a GUI toolbar http://cran.r-project.org/web/packages/sculpt3d/index.html (sculpt3d is a GTK+ toolbar that allows for more interactive control of a dataset inside the RGL plot window. Controls for simple brushing, highlighting, labeling, and mouseMode changes are provided by point-and-click rather than through the R terminal interface)
This along with the fact the their recently released python bindings for bigml.com was one of the top news at Hacker News- shows bigML.com is going for some traction in bringing cloud computing, better software interfaces and data mining together!
Interview BigML.com
Here is an interview with Charlie Parker, head of large scale online algorithms at http://bigml.com
Ajay- Describe your own personal background in scientific computing, and how you came to be involved with machine learning, cloud computing and BigML.com
Charlie- I am a machine learning Ph.D. from Oregon State University. Francisco Martin (our founder and CEO), Adam Ashenfelter (the lead developer on the tree algorithm), and myself were all studying machine learning at OSU around the same time. We all went our separate ways after that.
Francisco started Strands and turned it into a 100+ million dollar company building recommender systems. Adam worked for CleverSet, a probabilistic modeling company that was eventually sold to Cisco, I believe. I worked for several years in the research labs at Eastman Kodak on data mining, text analysis, and computer vision.
When Francisco left Strands to start BigML, he brought in Justin Donaldson who is a brilliant visualization guy from Indiana, and an ex-Googler named Jose Ortega who is responsible for most of our data infrastructure. They pulled in Adam and I a few months later. We also have Poul Petersen, a former Strands employee, who manages our herd of servers. He is a wizard and makes everyone else’s life much easier.
Ajay- You use clojure for the back end of BigML.com .Are there any other languages and packages you are considering? What makes clojure such a good fit for cloud computing ?
Charlie- Clojure is a great language because it offers you all of the benefits of Java (extensive libraries, cross-platform compatibility, easy integration with things like Hadoop, etc.) but has the syntactical elegance of a functional language. This makes our code base small and easy to read as well as powerful.
We’ve had occasional issues with speed, but that just means writing the occasional function or library in Java. As we build towards processing data at the Terabyte level, we’re hoping to create a framework that is language-agnostic to some extent. So if we have some great machine learning code in C, for example, we’ll use Clojure to tie everything together, but the code that does the heavy lifting will still be in C. For the API and Web layers, we use Python and Django, and Justin is a huge fan of HaXe for our visualizations.
Ajay- Current support is for Decision Trees. When can we see SVM, K Means Clustering and Logit Regression?
Charlie- Right now we’re focused on perfecting our infrastructure and giving you new ways to put data in the system, but expect to see more algorithms appearing in the next few months. We want to make sure they are as beautiful and easy to use as the trees are. Without giving too much away, the first new thing we will probably introduce is an ensemble method of some sort (such as Boosting or Bagging). Clustering is a little further away but we’ll get there soon!
Ajay- How can we use the BigML.com API using R and Python.
Charlie- We have a public github repo for the language bindings. https://github.com/bigmlcom/io Right now, there there are only bash scripts but that should change very soon. The python bindings should be there in a matter of days, and the R bindings in probably a week or two. Clojure and Java bindings should follow shortly after that. We’ll have a blog post about it each time we release a new language binding. http://blog.bigml.com/
Ajay- How can we predict large numbers of observations using a Model that has been built and pruned (model scoring)?
Charlie- We are in the process of refactoring our backend right now for better support for batch prediction and model evaluation. This is something that is probably only a few weeks away. Keep your eye on our blog for updates!
Ajay- How can we export models built in BigML.com for scoring data locally.
Charlie- This is as simple as a call to our API. https://bigml.com/developers/models The call gives you a JSON object representing the tree that is roughly equivalent to a PMML-style representation.
About-
You can read about Charlie Parker at http://www.linkedin.com/pub/charles-parker/11/85b/4b5 and the rest of the BigML team at
Interview Kelci Miclaus, SAS Institute Using #rstats with JMP
Here is an interview with Kelci Miclaus, a researcher working with the JMP division of the SAS Institute, in which she demonstrates examples of how the R programming language is a great hit with JMP customers who like to be flexible.
Ajay- How has JMP been using integration with R? What has been the feedback from customers so far? Is there a single case study you can point out where the combination of JMP and R was better than any one of them alone?
Kelci- Feedback from customers has been very positive. Some customers are using JMP to foster collaboration between SAS and R modelers within their organizations. Many are using JMP’s interactive visualization to complement their use of R. Many SAS and JMP users are using JMP’s integration with R to experiment with more bleeding-edge methods not yet available in commercial software. It can be used simply to smooth the transition with regard to sending data between the two tools, or used to build complete custom applications that take advantage of both JMP and R.
One customer has been using JMP and R together for Bayesian analysis. He uses R to create MCMC chains and has found that JMP is a great tool for preparing the data for analysis, as well as displaying the results of the MCMC simulation. For example, the Control Chart platform and the Bubble Plot platform in JMP can be used to quickly verify convergence of the algorithm. The use of both tools together can increase productivity since the results of an analysis can be achieved faster than through scripting and static graphics alone.
I, along with a few other JMP developers, have written applications that use JMP scripting to call out to R packages and perform analyses like multidimensional scaling, bootstrapping, support vector machines, and modern variable selection methods. These really show the benefit of interactive visual analysis of coupled with modern statistical algorithms. We’ve packaged these scripts as JMP add-ins and made them freely available on our JMP User Community file exchange. Customers can download them and now employ these methods as they would a regular JMP platform. We hope that our customers familiar with scripting will also begin to contribute their own add-ins so a wider audience can take advantage of these new tools.
(see http://www.decisionstats.com/jmp-and-r-rstats/)
Ajay- Are there plans to extend JMP integration with other languages like Python?
Kelci- We do have plans to integrate with other languages and are considering integrating with more based on customer requests. Python has certainly come up and we are looking into possibilities there.
Ajay- How is R a complimentary fit to JMP’s technical capabilities?
Kelci- R has an incredible breadth of capabilities. JMP has extensive interactive, dynamic visualization intrinsic to its largely visual analysis paradigm, in addition to a strong core of statistical platforms. Since our brains are designed to visually process pictures and animated graphs more efficiently than numbers and text, this environment is all about supporting faster discovery. Of course, JMP also has a scripting language (JSL) allowing you to incorporate SAS code, R code, build analytical applications for others to leverage SAS, R and other applications for users who don’t code or who don’t want to code.
JSL is a powerful scripting language on its own. It can be used for dialog creation, automation of JMP statistical platforms, and custom graphic scripting. In other ways, JSL is very similar to the R language. It can also be used for data and matrix manipulation and to create new analysis functions. With the scripting capabilities of JMP, you can create custom applications that provide both a user interface and an interactive visual back-end to R functionality. Alternatively, you could create a dashboard using statistical and/or graphical platforms in JMP to explore the data and with the click of a button, send a portion of the data to R for further analysis.
Another JMP feature that complements R is the add-in architecture, which is similar to how R packages work. If you’ve written a cool script or analysis workflow, you can package it into a JMP add-in file and send it to your colleagues so they can easily use it.
Ajay- What is the official view on R from your organization? Do you think it is a threat, or a complimentary product or another statistical platform that coexists with your offerings?
Kelci- Most definitely, we view R as complimentary. R contributors are providing a tremendous service to practitioners, allowing them to try a wide variety of methods in the pursuit of more insight and better results. The R community as a whole is providing a valued role to the greater analytical community by focusing attention on newer methods that hold the most promise in so many application areas. Data analysts should be encouraged to use the tools available to them in order to drive discovery and JMP can help with that by providing an analytic hub that supports both SAS and R integration.
Ajay- While you do use R, are there any plans to give back something to the R community in terms of your involvement and participation (say at useR events) or sponsoring contests.
Kelci- We are certainly open to participating in useR groups. At Predictive Analytics World in NY last October, they didn’t have a local useR group, but they did have a Predictive Analytics Meet-up group comprised of many R users. We were happy to sponsor this. Some of us within the JMP division have joined local R user groups, myself included. Given that some local R user groups have entertained topics like Excel and R, Python and R, databases and R, we would be happy to participate more fully here. I also hope to attend the useR! annual meeting later this year to gain more insight on how we can continue to provide tools to help both the JMP and R communities with their work.
We are also exploring options to sponsor contests and would invite participants to use their favorite tools, languages, etc. in pursuit of the best model. Statistics is about learning from data and this is how we make the world a better place.
About- Kelci Miclaus
Kelci is a research statistician developer for JMP Life Sciences at SAS Institute. She has a PhD in Statistics from North Carolina State University and has been using SAS products and R for several years. In addition to research interests in statistical genetics, clinical trials analysis, and multivariate analysis/visualization methods, Kelci works extensively with JMP, SAS, and R integration.
.
PMML Augustus
Here is a new-old system in open source for
for building and scoring statistical models designed to work with data sets that are too large to fit into memory.
http://code.google.com/p/augustus/
Augustus is an open source software toolkit for building and scoring statistical models. It is written in Python and its
most distinctive features are:
• Ability to be used on sets of big data; these are data sets that exceed either memory capacity or disk capacity, so
that existing solutions like R or SAS cannot be used. Augustus is also perfectly capable of handling problems
that can fit on one computer.
• PMML compliance and the ability to both:
– produce models with PMML-compliant formats (saved with extension .pmml).
– consume models from files with the PMML format.
Augustus has been tested and deployed on serveral operating systems. It is intended for developers who work in the
financial or insurance industry, information technology, or in the science and research communities.
Usage
Augustus produces and consumes Baseline, Cluster, Tree, and Ruleset models. Currently, it uses an event-based
approach to building Tree, Cluster and Ruleset models that is non-standard.
New to PMML ?
Read on http://code.google.com/p/augustus/wiki/PMML
The Predictive Model Markup Language or PMML is a vendor driven XML markup language for specifying statistical and data mining models. In other words, it is an XML language so that (more…)






