python – Page 2 – DECISION STATS

Polyglots for Data Science #python #sas #r #stats #spss #matlab #julia #octave

In the future I think analysts need to be polyglots- you will need to know more than one language for crunching data.

SAS, Python, R, Julia,SPSS,Matlab- Pick Any Two 😉 or Any Three.

No, you can’t count C or Java as a statistical language 🙂 🙂

Efforts to promote Polyglots in Statistical Software are-

1) R for SAS and SPSS Users (free or book)

SPSS and R reference
SAS/IML and R reference

JMP and R reference http://www.jmp.com/support/help/Working_with_R.shtml

2) R for Stata Users (book)

3) SAS and R (blog and book)

4) Using Python and R together

Accessing R from Python (Rpy2) http://www.bytemining.com/wp-content/uploads/2010/10/rpy2.pdf
Big Data with R and Python (though these have been made separately)

Python for Data Analysis is a book . Python for Data Analysis by Wes McKinney

Probably we need a Python and R for Data Analysis book- just like we have for SAS and R books.

The RPy2 documentation is handy http://rpy.sourceforge.net/rpy2/doc-2.1/html/introduction.html
A nice tutorial is also here – also the inspiration to writing this post http://files.meetup.com/1225993/Laurent%20Gautier_R_toPython_bridge_to_R.pdf#!

5) Matlab and R

Reference (http://mathesaurus.sourceforge.net/matlab-python-xref.pdf ) includes Python

5) Octave and R

package http://cran.r-project.org/web/packages/RcppOctave/vignettes/RcppOctave.pdf includes Matlab

reference http://cran.r-project.org/doc/contrib/R-and-octave.txt

6) Julia and python

Julia and IPython https://github.com/JuliaLang/IJulia.jl

PyPlot uses the Julia PyCall package to call Python’s matplotlib directly from Julia

7) SPSS and Python is here

8) SPSS and R is as below

The Essentials for R for Statistics versions 22, 21, 20, and 19 are available here.
This link will take you to the SourceForge site where the Version 18 Essentials and Plugins are hosted.
- Plugins for Version 18 for R

9) Using R from Clojure – Incanter

Use embedded R from Clojure and Incanter http://github.com/jolby/rincanter

NumFocus- The Python Statistical Community

I really liked the mature design, and foundation of this charitable organization. While it is similar to FOAS in many ways (http://www.foastat.org/projects.html) I like the projects . Excellent projects and some of which I think should be featured in Journal of Statistical Software– (since there is a seperate R Journal) unless it wants to be overtly R focused.

In the same manner I think some non Python projects should try and reach out to NumFocus (if it is not wanting to be so PyFocus-ed)

Here it is NumFocus

NumFOCUS supports and promotes world-class, innovative, open source scientific software. Most individual projects, even the wildly successful ones, find the overhead of a non-profit to be too large for their community to bear. NumFOCUS provides a critical service as an umbrella organization which removes the burden from the projects themselves to raise money.

Money donated through NumFOCUS goes to sponsor things like:

Coding sprints (food and travel)
Technical fellowships (sponsored students and mentors to work on code)
Equipment grants (to developers and projects)
Conference attendance for students (to PyData, SciPy, and other conferences)
Fees for continuous integration and other software engineering tools
Documentation development
Web-page hosting and bandwidth fees for projects

Core Projects

NumPy

static/images/NumPY.png NumPy is the fundamental package needed for scientific computing with Python. Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases. Repositories for NumPy binaries: http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy, a variety of versions – http://sourceforge.net/projects/numpy/files/NumPy/, version 1.6.1 – http://sourceforge.net/projects/numpy/files/NumPy/1.6.1/.

SciPy

static/images/scipy.png SciPy is open-source software for mathematics, science, and engineering. It is also the name of a very popular conference on scientific programming with Python. The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The SciPy library is built to work with NumPy arrays, and provides many user-friendly and efficient numerical routines such as routines for numerical integration and optimization.

Matplotlib

static/images/matplotlib.png 2D plotting library for Python that produces high quality figures that can be used in various hardcopy and interactive environments. matplolib is compatiable with python scripts and the python and ipython shells.

IPython

static/images/ipython.png High quality open source python shell that includes tools for high level and interactive parallel computing.

SymPy

static/images/SymPy2.jpg SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python and does not require any external libraries.

Other Projects

Cython

static/images/cython.png Cython is a language based on Pyrex that makes writing C extensions for Python as easy as writing them in Python itself. Cython supports calling C functions and declaring C types on variables and class attributes, allowing the compiler to generate very efficient C code from Cython code.

pandas

static/images/pandas.png pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

PyTables

PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data. PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. It features an Pythonic interface combined with C / Cython extensions for the performance-critical parts of the code. This makes it a fast, yet extremely easy to use tool for very large amounts of data. http://pytables.github.com/

scikit-image

static/images/scikitsimage.png Free high-quality and peer-reviewed volunteer produced collection of algorithms for image processing.

scikit-learn

static/images/scikitslearn.png Module designed for scientific pythons that provides accesible solutions to machine learning problems.

Scikits-Statsmodels

static/images/scikits.png Statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics and estimation of statistical models.

Spyder

static/images/spyder.png Interactive development environment for Python that features advanced editing, interactive testing, debugging and introspection capabilities, as well as a numerical computing environment made possible through the support of Ipython, NumPy, SciPy, and matplotlib.

Theano

static/images/theano_logo_allblue_200x46.png Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.

Associated Projects

NumFOCUS is currently looking for representatives to enable us to promote the following projects. For information contact us at: info@NumFOCUS.org.

Sage

static/images/sage.png Open source mathematics sofware system that combines existing open-source packages into a Python-based interface.

NetworkX

NetworkX is a Python language software package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

Python(X,Y)

static/images/pythonxy.png Free scientific and engineering development software used for numerical computations, and analysis and visualization of data using the Python programmimg language.

How to help your government keep the world safe using statistics #rstats #python #sas

Big Data for Big Brother. Now playing. At a computer near you. How to help water the tree of liberty using statistics?

Use R

Use Python

or use SAS software

SAS/CIA from the last paragraph of

Click to access ET_CD_Mumbai_Jul12.pdf

Interview Jeff Allen Trestle Technology #rstats #rshiny

Here is an interview with Jeff Allen who works with R and the new package Shiny in his technology startup. We featured his RGL Demo in our list of Shiny Demos- here

Ajay- Describe how you started using R. What are some of the benefits you noticed on moving to R?

Jeff- I began using R in an internship while working on my undergraduate degree. I was provided with some unformatted R code and asked to modularize the code then wrap it up into an R package for distribution alongside a publication.

To be honest, as a Computer Science student with training more heavily emphasizing the big high-level languages, R took some getting used to for me. It wasn’t until after I concluded that initial project and began using R to do my own data analysis that I began to realize its potential and value. It was the first scripting language which really made interactive use appealing to me — the experience of exploring a dataset in R was unlike anything Continue reading “Interview Jeff Allen Trestle Technology #rstats #rshiny”

How to be a Happy Hacker

I write on and off on hackers (see http://bit.ly/VWxSvP) and even some poetry on them (http://bit.ly/11RznQl) . During meetups, conferences, online discussions I run into them, I have interviewed them , and I have trained some of them (in analytics). Based on this decade long experience of observing hackers, and two decade long experience of hanging out with them- some thoughts on making you a better hacker, and a happier hacker even if you are a hacker activist or a hacker in enterprise software.

1) Everybody can be a hacker, but you need to know the basic attitude first. Not every Python or Java coder is a hacker. Coding is not hacking. More details here- https://decisionstats.com/2012/02/12/how-to-learn-to-be-a-hacker-easily/

2) Use tools like Coursera, Udacity, Codeacdemy to learn new languages. Even if you dont have the natural gift for memorizing syntax, some of it helps. (I forget syntax quite often. I google)

3) Learn tools like Metasploit if you want to learn the lucrative and romantic art of exploits hacking (http://www.offensive-security.com/metasploit-unleashed/Main_Page). The demand for information security is going to be huge. hackers with jobs are happy hackers.

4) Develop a serious downtime hobby.

Lets face it- your body was not designed to sit in front of a computer for 8 hours. But being a hacker will mean that commitment and maybe more.

Continue reading “How to be a Happy Hacker”

R Studio and Training

I really like the design, course structure and Hadley Wickham (in no particular order) as part of R Studio’ training suite which may be new, but is much better and open. Again I think Oracle’s training is awesome for online features , but some body needs to step up and create a credible R certification here. More power to R 😉

Check it out-

http://www.rstudio.com/training/

BigML creates a marketplace for Predictive Models

BigML has created a marketplace for selling Datasets and Models. This is a first (?) as the closest market for Predictive Analytics till now was Rapid Miner’s marketplace for extensions (at http://rapidupdate.de:8180/UpdateServer/faces/index.xhtml)

From http://blog.bigml.com/2012/10/25/worlds-first-predictive-marketplace/

SELL YOUR DATA

You can make your Dataset public. Mind you: the Datasets we are talking about are BigML’s fancy histograms. This means that other BigML users can look at your Dataset details and create new models based on this Dataset. But they can not see individual records or columns or use it beyond the statistical summaries of the Dataset. Your Source will remain private, so there is no possibility of anyone accessing the raw data.

SELL YOUR MODEL

Now, once you have created a great model, you can share it with the rest of the world. For free or at any price you set.Predictions are paid for in BigML Prediction Credits. The minimum price is ‘Free’ and the maximum price indicated is 100 credits.

White Box Models

Clicking on the white open lock will open up your model to the rest of the world. Anyone can now buy your model, explore it, use it to make predictions

Black Box Models

If you choose the black box setting (the black open lock icon), other BigML users will NOT be able to view or clone your model, but they will be able to use it to make predictions.

——

DOWNLOAD YOUR MODEL

BigML.com have added downloads to our models. Simply choose the format you want and you can copy/paste the code or text. There is a range of formats that they offer currently: JSON PML, PMML, Python, Ruby, Objective-C, Java, the rules of the decision tree in plain text and a Summary overview of your model. Around the corner are MS Excel downloads and R (of course!).

PUBLICIZE YOUR MODEL

There’s also an ’embed’ function, so now you can embed the little poster of your model in your blog post or website, so it is easy to share it in your own environment.

————————————————————————————————————————–

It is nice to see Models and Data getting the APPY treatment and hopefully, it will encourage other vendors Iike Google Prediction API etc to further spend thought and effort to reward data mining individuals directly without going through corporate intermediaries while ensuring intellectual property safeguards .

An R package market for enterprises? for Python libraries? JMP addins? A market for SAS Macros- who knows what the future shall hold. But overall, this is a very positive step by the BigML.com team. The App marketplace has helped revolutionize mobile and desktop computing and hopefully it will do the same for Business Analytics.

Please share:

Core Projects

Other Projects

Associated Projects

Please share:

Please share:

Please share:

Please share:

Please share:

Please share: