As part of my research for “Python for R Users- A Data Science Approach” (Wiley 2016), I came across PyPy (http://pypy.org/) What is PyPy?
PyPy is a fast, compliant alternative implementation of the Python language (2.7.10 and 3.2.5). It has several advantages and distinct features:
Speed: thanks to its Just-in-Time compiler, Python programs often run faster on PyPy.
Memory usage: memory-hungry Python programs (several hundreds of MBs or more) might end up taking less space than they do in CPython.
Compatibility: PyPy is highly compatible with existing python code. It supports cffi and can run popular python libraries like twisted and django.
Stackless: PyPy comes by default with support for stackless mode, providing micro-threads for massive concurrency.
Now R users might remember the debate with Renjin and pqR a few years ago. PyPy is an effort which has been around for some time and they are currently at an interesting phase.
Here is an interview with Maciej Fijalkowski of PyPy
Ajay Ohr-Why did you create PyPy to serve what need ?
PyPy– I joined pypy in 2006 or 2007, I don’t even remember, but it was about 2 years into the project existence. Shockingly enough, the very first idea was that there will be a python-in-python for educational purposes only. It later occurred to us that we can use the fact that PyPy is written in a high level language and apply various transformations to it, including just-in-time compilation. Overall it was a very roundabout way, but we came to the conclusion that this is the right way to provide a high-performance python virtual machine, after Armins experience writing Psyco, that likely only few people
Ajay Ohri- Describe the current state of PyPy especially regarding to using NumPy. Can we use it for Pandas, matplotlib,seaborn, scikit-learn, statsmodels in near future. What hinders your progress?
PyPy- We are right now in the state of flux. I’m almost inclined to say “talk to us in a few weeks/months”. I will describe the status right now as well as possible near futures. Right now, we have a custom version of numpy that supports most of the existing numpy and can be used, although it does not pass all the tests. It has a very fast array item access routines, so you can write your algorithms directly in python without looking into custom solutions. It however, does not provide a C API and so does not support anything else from the numeric stack.
We’re considering also supporting the original numpy with CPython C API, which will enable the whole numeric stack with some caveats. Currently, there are ongoing discussions and I can get back to you once this is resolved.
Our main problem is the CPython C API and the dependency of the entire numeric stack on that. It exposes a lot of CPython internals, like reference counting, the exact layout of lists and strings etc. We have a layer that provides some sort of compatibility with that, but we need more work in order to make it more robust and faster. In the case of C API the main hindrance is funding – I wrote a blog post detailing the current situation: http://lostinjit.blogspot.co.za/2015/11/python-c-api-pypy-and-road-into-future.html We would love to support the entire numeric stack and we will look into ways that make it possible.
Ajay Ohri-A faster more memory efficient Python – will it be useful for analysis of large amounts of numeric data ?
PyPy- Python owes much of it’s success to good integration with the C ecosystem. For years we’ve been told that no one needs a fast Python, because what is necessary to be fast is already in C and we can go away. That has proven to be blatantly false with projects like apache spark embedding python as a way to do computations. There are also a lot of Python programmers and it’s a bit unfair to expect from them to “write all the performance critical parts in C” or any of the other custom languages built around Python, like Cython. I personally think that there is a big place for a faster Python and we’re mostly fulfilling that role, except exactly for the case of integration with numeric libraries that is absolutely crucial for a lot of people. We need to improve that story if we were to fill in that gap completely and while predicting future is hard, we would do our best to support the numeric stack a lot better in the coming months.
Ajay Ohri- What are the day to day challenges you face while working on PyPy?
PyPy- That’s a tough question. There is no such thing in IT as “day to day challenges with technology” because if it’s really such a hindrance, you can usually automate it away. However, I don’t do only technical work these days, I deal a lot with people asking questions, looking at issues, trying to organize money for PyPy etc. This means that it’s very hard to pinpoint what a day-to-day activity is, let alone what it’s problems are.
The most repeating challenges that we face are how to make sure there is funding for chronically underfunded open source projects and how to explain our unusual architecture to newcomers. The technical issues we are heavily trying to automate away so if it’s a repeating problem, we are going to have more and more infrastructure to deal with it in a more systematic manner.
Ajay Ohri- You and your highly skilled team could probably make much more money per
hour working for companies in consulting projects, Why devote time to open source coding tools. What is the way we can get more people to donate or devote time
PyPy- It is a very interesting question, probably exceeding the scope of this interview, but I will try to give it a go anyway. I think by now it’s pretty obvious that Open Source is just a better way to make software, at least as far as infrastructure goes. I can’t think about a single proprietary language platform that’s not tied to a specific architecture. Even Microsoft and .NET are moving slowly towards Open Source, with Apple owning so much of the platform that no one has a say there.
That means that locally, yes, we could very likely make far more money working for some corporations, but globally it’s pretty clear that both our impact and the value we bring is much higher than it would be working for a corporation looking for its short term gains.
Additionally, the problems we are presented to work with are much more interesting than the ones we would likely encounter in the corporate environment. Funding Open Source is a very tricky question here and I think we need to find answers to that.
Everyone uses Open Source software, directly or indirectly and there is enough money made by companies profiting from using it to fund it. How to funnel this money is a problem that we’re trying to solve on a small scale, but would be wonderful to see the solution on a bigger scale.
Ajay Ohri- How can ensure automatic porting of algorithms from languages to Java Python R rather than manually creating packages. I mean if we can have Google Translate for Human languages, what can we do to make automatic translation of code between computer languages
PyPy- It would be very useful, but no one managed to do it well, maybe that means something. However, it’s quite easy to translate between languages naively – without taking into account best practices, more efficient ways of achieving goals etc. There is a whole discussion to be had, but I don’t think I’m going to have much insight into this.
PyPy is a replacement for CPython. It is built using the RPython language that was co-developed with it. The main reason to use it instead of CPython is speed: it runs generally faster
See more here http://pypy.org/features.html