Interview Jan de Leeuw Founder JSS

Here is an interview with one of the greatest statisticians and educator of this generation, Prof Jan de Leeuw. In this exclusive and free wheeling interview, Prof De Leeuw talks on the evolution of technology, education, statistics and generously shares nuggets of knowledge of interest to present and future statisticians.

DecisionStats(DS)- You have described UCLA Dept of Statistics as your magnum opus.Name a couple of turning points in your career which helped in this creation .

Jan de Leeuw (JDL) –From about 1980 until 1987 I was head of the Department of Data Theory at Leiden University. Our work there produced a large number of dissertations which we published using our own publishing company. I also became president of the Psychometric Society in 1987. These developments resulted in an invitation from UCLA to apply for the position of director of the interdepartmental program in social statistics, with a joint appointment in Mathematics and Psychology. I accepted the offer in 1987. The program eventually morphed into the new Department of Statistics in 1998.

DS- Describe your work with Gifi software and non linear multivariate analysis.

JDL- I started working on NLMVA and MDS in 1968, while I was a graduate student researcher in the new Department of Data Theory. Joe Kruskal and Doug Carroll invited me to spend a year at Bells Labs in Murray Hill in 1973-1974. At that time I also started working with Forrest Young and his student Yoshio Takane. This led to the sequence of “alternating least squares” papers, mainly in Psychometrika. After I returned to Leiden we set up a group of young researchers, supported by NWO (the Dutch equivalent of the NSF) and by SPSS, to develop a series of Fortran programs for NLMVA and MDS.
In 1980 the group had grown to about 10-15 people, and we gave a succesful postgraduate course on the “Gifi methods”, which eventually became the 1990 Gifi book. By the time I left Leiden most people in the group had gone on to do other things, although I continued to work in the area with some graduate students from Leiden and Los Angeles. Then around 2010 I worked with Patrick Mair, visiting scholar at UCLA, to produce the R packages smacof, anacor, homals, aspect, and isotone. Also see https://www.youtube.com/watch?v=u733Mf7jX24

DS- You have presided over almost 5 decades of changes in statistics. Can you describe the effect of changes in computing and statistical languages over the years, and some learning from these changes

JDL- I started in 1968 with PL/I. Card decks had to be flown to Paris to be compiled and executed on the IBM/360 mainframes. Around the same time APL came up and satisfied my personal development needs, although of course APL code was difficult to communicate. It was even difficult to underatand your own code after a week. We had APL symbol balls on the Selectrix typewriters and APL symbols on the character terminals. The basic model was there — you develop in an interpreted language (APL) and then for production you use a compiled language (FORTRAN). Over the years APL was replaced by XLISP and then by R. Fortran was largely replaced by C, I never switched to C++ or Java. We discouraged our students to use SAS or SPSS or MATLAB. UCLA Statistics promoted XLISP-STAT for quite a long time, but eventually we had to give it up. See http://www.stat.ucla.edu/~deleeuw/janspubs/2005/articles/deleeuw_A_05.pdf.

(In 1998 the UCLA Department of Statistics, which had been one of the major users of Lisp-Stat, and one of the main producers of Lisp-Stat code, decided to switch to S/R. This paper discusses why this decision was made, and what the pros and the cons were. )

Of course the WWW came up in the early nineties and we used a lot of CGI and PHP to write instructional software for browsers.

Generally, there has never been an computational environment like R — so integrated with statistical practice and development, and so enormous, accessible and democratic. I must admit I personally still prefer to use R as originally intended: as a convenient wrapper around and command line interface for compiled libraries and code. But it is also great for rapid prototyping, and in that role it has changed the face of statistics.
The fact that you cannot really propose statistical computations without providing R references and in many cases R code has contributed a great deal to reproducibility and open access.

DS- Does Big Data and Cloud Computing , in the era of data deluge require a new focus on creativity in statistics or just better application in industry of statistical computing over naive models

JDL- I am not really active in Big Data and Cloud Computing, mostly because I am more of a developer than a data analyst. That is of course a luxury.

The data deluge has been there for a long time (sensors in the road surface, satellites, weather stations, air pollution monitors, EEG’s, MRI’s) but until fairly recently there were no tools, both in hardware and software, to attack these data sets. Of course big data sets have changed the face of statistics once again, because in the context of big data the emphasis on optimality and precise models becomes laughable. What I see in the area is a lot of computer science, a lot of fads, a lot of ad-hoc work, and not much of a general rational approach. That may be unavoidable.

DS- What is your biggest failure in professional life

JDL- I decided in 1963 to major in psychology, mainly because I wanted to discover big truths about myself. About a year later I discovered that psychology and philosophy do not produce big truths, and that my self was not a very interesting object of study anyway. I switched to physics for a while, and minored in math, but by that time I already had a research assistant job, was developing software, and was not interested any more in going to lectures and doing experiments. In a sense I dropped out. It worked out fairly well, but it sometimes gives rise to imposter syndrome.

DS- If you had to do it all over again, what are the things you would really look forward to doing.

JDL- I really don’t know how to answer this. A life cannot be corrected, repeated, or relived.

DS- What motivates you to start Journal of Statistical software and push for open access.

JDL- That’s basically in the UserR! 2014 presentation. See http://gifi.stat.ucla.edu/janspubs/2014/notes/deleeuw_mullen_U_14.pdf

DS- How can we make the departments of Statistics and departments of Computer Science work closely for better industry relevant syllabus especially in data mining, business analytics and statistical computing.

JDL- That’s hard. The cultures are very different — CS is so much more agressive and self-assured, as well as having more powerful tools and better trained students. We have tried joint appointments but they do not work very well. There are some interdisciplinary programs but generally CS dominates and provides the basic keywords such as neural nets, machine learning, data mining, cloud computing and so on. One problem is that in many universities statistics is the department that teaches the introductory statistics courses, and not much more. Statistics is forever struggling to define itself, to fight silly battles about foundations, and to try to control the centrifugal forces that do statistics outside statistics departments.

DS- What are some of the work habits that have helped you be more productive in your writing and research

JDL– Well, if one is not brilliant, one has to be a workaholic. It’s hard on the family. I decided around 1975 that my main calling was to gather and organize groups of reseachers with varying skills and interests — and not to publish as much as possible. That helped.

About–http://en.wikipedia.org/wiki/Jan_de_Leeuw

Jan de Leeuw (born December 19, 1945) is a Dutch statistician, and Distinguished Professor of Statistics and Founding Chair of the Department of Statistics, University of California, Los Angeles. He is known as the founding editor and editor-in-chief of the Journal of Statistical Software, as well as editor-in-chief of the Journal of Multivariate Analysis.