Data Visualization: Central Banks

Iron Ore Company of Canada
Image via Wikipedia

Trying to compare the transparency of central banks via the data visualization of two very different central banks.

One is Reserve Bank of India and the other is Federal Reserve Bank of New York

Here are some points-

1) The federal bank gives you a huge clutter of charts to choose from and sometimes gives you very difficult to understand charts.

see http://www.newyorkfed.org/research/global_economy/usecon_charts.html

and http://www.newyorkfed.org/research/directors_charts/us18chart.pdf

us18chart

2) The Reserve bank of India choose Business Objects and gives you a proper drilldown kind  of  graph and tables. ( thats a lot of heavy metal and iron ore China needs from India 😉 😉

Foreign Trade – Export      Time-line: ALL

TIME LINE COUNTRY COMMODITY AMOUNT (US $ MILLION) EXPORT QUANTITY
2010:07 (JUL) – P China IRON ORE (Units: TON) 205.06 1878456
2010:06 (JUN) – P China IRON ORE (Units: TON) 427.68 6808528
2010:05 (MAY) – P China IRON ORE (Units: TON) 550.67 5290450
2010:04 (APR) – P China IRON ORE (Units: TON) 922.46 9931500
2010:03 (MAR) – P China IRON ORE (Units: TON) 829.75 13177672
2010:02 (FEB) – P China IRON ORE (Units: TON) 706.04 10141259
2010:01 (JAN) – P China IRON ORE (Units: TON) 577.13 8498784
2009:12 (DEC) – P China IRON ORE (Units: TON) 545.68 9264544
2009:11 (NOV) – P China IRON ORE (Units: TON) 508.17 9509213
2009:10 (OCT) – P China IRON ORE (Units: TON) 422.6 7691652
2009:09 (SEP) – P China IRON ORE (Units: TON) 278.04 4577943
2009:08 (AUG) – P China IRON ORE (Units: TON) 276.96 4371847
2009:07 (JUL) China IRON ORE (Units: TON) 266.11 4642237
2009:06 (JUN) China IRON ORE (Units: TON) 241.08 4584354

Source : DGCI & S, Ministry of Commerce & Industry, GoI

 

You can see the screenshots of the various visualization tools of the New York Fed Reserve Bank and Indian Reserve Bank- if the US Fed is serious about cutting the debt maybe it should start publishing better visuals

Mapping Health Statistics at CDC.gov

Astronaut Buzz Aldrin during the first human l...
Image via Wikipedia

CDC.gov has a great tool for showing United States statistics on death and injury, drillable by various details.

The tool is hosted at http://wisqars.cdc.gov:8080/cdcMapFramework/

As a test I decided to map out injuries due to fire arms , and compare firearm deaths of white people versus the whole population.(see firearm deaths file)

See white people are more likely than black people to own guns (also read http://www.ncbi.nlm.nih.gov/pubmed/9572612 ), but it seems statistically they are less likely to be injured by firearms- so it could affect support for gun control laws on a racial ground- that was my null hypothesis. No politics, just plain statistics. I dont know- why dont you look at the data and decide-

 

 

 

 

 

Interview Luis Torgo Author Data Mining with R

Example of k-nearest neighbour classification
Image via Wikipedia

Here is an interview with Prof Luis Torgo, author of the recent best seller “Data Mining with R-learning with case studies”.

Ajay- Describe your career in science. How do you think can more young people be made interested in science.

Luis- My interest in science only started after I’ve finished my degree. I’ve entered a research lab at the University of Porto and started working on Machine Learning, around 1990. Since then I’ve been involved generally in data analysis topics both from a research perspective as well as from a more applied point of view through interactions with industry partners on several projects. I’ve spent most of my career at the Faculty of Economics of the University of Porto, but since 2008 I’m at the department of Computer Science of the Faculty of Sciences of the same university. At the same time I’ve been a researcher at LIAAD / Inesc Porto LA (www.liaad.up.pt).

I like a lot what I do and like science and the “scientific way of thinking”, but I cannot say that I’ve always thought of this area as my “place”. Most of all I like solving challenging problems through data analysis. If that translates into some scientific outcome than I’m more satisfied but that is not my main goal, though I’m kind of “forced” to think about that because of the constraints of an academic career.

That does not mean I’m not passionate about science, I just think there are many more ways of “doing science” than what is reflected in the usual “scientific indicators” that most institutions seem to be more and more obsessed about.

Regards interesting young people in science that is a hard question that I’m not sure I’m qualified to answer. I do tend to think that young people are more sensible to concrete examples of problems they think are interesting and that science helps in solving, as a way of finding a motivation for facing the hard work they will encounter in a scientific career. I do believe in case studies as a nice way to learn and motivate, and thus my book 😉

Ajay- Describe your new book “Data Mining with R, learning with case studies” Why did you choose a case study based approach? who is the target audience? What is your favorite case study from the book

Luis- This book is about learning how to use R for data mining. The book follows a “learn by doing it” approach to data mining instead of the more common theoretical description of the available techniques in this discipline. This is accomplished by presenting a series of illustrative case studies for which all necessary steps, code and data are provided to the reader. Moreover, the book has an associated web page (www.liaad.up.pt/~ltorgo/DataMiningWithR) where all code inside the book is given so that easy copy-paste is possible for the more lazy readers.

The language used in the book is very informal without many theoretical details on the used data mining techniques. For obtaining these theoretical insights there are already many good data mining books some of which are referred in “further readings” sections given throughout the book. The decision of following this writing style had to do with the intended target audience of the book.

In effect, the objective was to write a monograph that could be used as a supplemental book for practical classes on data mining that exist in several courses, but at the same time that could be attractive to professionals working on data mining in non-academic environments, and thus the choice of this more practically oriented approach.

Regards my favorite case study that is a hard question for an author… still I would probably choose the “Predicting Stock Market Returns” case study (Chapter 3). Not only because I like this challenging problem, but mainly because the case study addresses all aspects of knowledge discovery in a real world scenario and not only the construction of predictive models. It tackles data collection, data pre-processing, model construction, transforming predictions into actions using different trading policies, using business-related performance metrics, implementing a trading simulator for “real-world” evaluation, and laying out grounds for constructing an online trading system.

Obviously, for all these steps there are far too many options to be possible to describe/evaluate all of them in a chapter, still I do believe that for the reader it is important to see the overall picture, and read about the relevant questions on this problem and some possible paths that can be followed at these different steps.

In other words: do not expect to become rich with the solution I describe in the chapter !

Ajay- Apart from R, what other data mining software do you use or have used in the past. How would you compare their advantages and disadvantages with R

Luis- I’ve played around with Clementine, Weka, RapidMiner and Knime, but really only playing with teaching goals, and no serious use/evaluation in the context of data mining projects. For the latter I mainly use R or software developed by myself (either in R or other languages). In this context, I do not think it is fair to compare R with these or other tools as I lack serious experience with them. I can however, tell you about what I see as the main pros and cons of R. The main reason for using R is really not only the power of the tool that does not stop surprising me in terms of what already exists and keeps appearing as contributions of an ever growing community, but mainly the ability of rapidly transforming ideas into prototypes. Regards some of its drawbacks I would probably mention the lack of efficiency when compared to other alternatives and the problem of data set sizes being limited by main memory.

I know that there are several efforts around for solving this latter issue not only from the community (e.g. http://cran.at.r-project.org/web/views/HighPerformanceComputing.html), but also from the industry (e.g. Revolution Analytics), but I would prefer that at this stage this would be a standard feature of the language so the the “normal” user need not worry about it. But then this is a community effort and if I’m not happy with the current status instead of complaining I should do something about it!

Ajay- Describe your writing habit- How do you set about writing the book- did you write a fixed amount daily or do you write in bursts etc

Luis- Unfortunately, I write in bursts whenever I find some time for it. This is much more tiring and time consuming as I need to read back material far too often, but I cannot afford dedicating too much consecutive time to a single task. Actually, I frequently tease my PhD students when they “complain” about the lack of time for doing what they have to, that they should learn to appreciate the luxury of having a single task to complete because it will probably be the last time in their professional life!

Ajay- What do you do to relax or unwind when not working?

Luis- For me, the best way to relax from work is by playing sports. When I’m involved in some game I reset my mind and forget about all other things and this is very relaxing for me. A part from sports I enjoy a lot spending time with my family and friends. A good and long dinner with friends over a good bottle of wine can do miracles when I’m too stressed with work! Finally,I do love traveling around with my family.

Luis Torgo

Short Bio: Luis Torgo has a degree in Systems and Informatics Engineering and a PhD in Computer Science. He is an Associate Professor of the Department of Computer Science of the Faculty of Sciences of the University of Porto. He is also a researcher of the Laboratory of Artificial Intelligence and Data Analysis (LIAAD) belonging to INESC Porto LA. Luis Torgo has been an active researcher in Machine Learning and Data Mining for more than 20 years. He has lead several academic and industrial Data Mining research projects. Luis Torgo accompanies the R project almost since its beginning, using it on his research activities. He teaches R at different levels and has given several courses in different countries.

For reading “Data Mining with R” – you can visit this site, also to avail of a 20% discount the publishers have generously given (message below)-

For more information and to place an order, visit us at http://www.crcpress.com.  Order online and apply 20% Off discount code 907HM at checkout.  CRC is pleased to offer free standard shipping on all online orders!

link to the book page  http://www.crcpress.com/product/isbn/9781439810187

Price: $79.95
Cat. #: K10510
ISBN: 9781439810187
ISBN 10: 1439810184
Publication Date: November 09, 2010
Number of Pages: 305
Availability: In Stock
Binding(s): Hardback 

Assumptions on Guns

This is a very crude yet functional homemade g...
Image via Wikipedia

While sitting in Delhi, India- I sometimes notice that there is one big new worthy gun related incident in the United States every six months (latest incident Gabrielle giffords incident) and the mythical NRA (which seems just as powerful as equally mythical Jewish American or Cuban American lobby ) . As someone who once trained to fire guns (.22 and SLR -rifles actually), comes from a gun friendly culture (namely Punjabi-North Indian), my dad carried a gun sometimes as a police officer during his 30 plus years of service, I dont really like guns (except when they are in a movie). My 3 yr old son likes guns a lot (for some peculiar genetic reason even though we are careful not to show him any violent TV or movie at all).

So to settle the whole guns are good- guns are bad thing I turned to the one resource -Internet

Here are some findings-

1) A lot of hard statistical data on guns is biased by the perspective of the writer- it reminds me of the old saying Lies, True lies and Statistics.

2) There is not a lot of hard data in terms of a universal research which can be quoted- unlike say lung cancer is caused by cigarettes- no broad research which can be definitive in this regards.

3) American , European and Asian attitudes on guns actually seem a function of historical availability , historic crime rates and cultural propensity for guns.

Switzerland and United States are two extreme outlier examples on gun causing violence causal statistics.

4) Lot of old and outdated data quoted selectively.

It seems you can fudge data about guns in the following ways-

1) Use relative per capita numbers vis a vis aggregate numbers

2) Compare and contrast gun numbers with crime numbers selectively

3) Remove drill down of type of firearm- like hand guns, rifles, automatic, semi automatic

Maybe I am being simplistic-but I found it easier to list credible data sources on guns than to summarize all assumptions on guns. Are guns good or bad- i dont know -it depends? Any research you can quote is welcome.

Data Sources on Guns and Firearms and Crime-

1) http://www.justfacts.com/guncontrol.asp

Ownership

* As of 2009, the United States has a population of 307 million people.[5]

* Based on production data from firearm manufacturers,[6] there are roughly 300 million firearms owned by civilians in the United States as of 2010. Of these, about 100 million are handguns.[7]

* Based upon surveys, the following are estimates of private firearm ownership in the U.S. as of 2010:

Households With a Gun Adults Owning a Gun Adults Owning a Handgun
Percentage 40-45% 30-34% 17-19%
Number 47-53 million 70-80 million 40-45 million

[8]

* A 2005 nationwide Gallup poll of 1,012 adults found the following levels of firearm ownership:

Category Percentage Owning 

a Firearm

Households 42%
Individuals 30%
Male 47%
Female 13%
White 33%
Nonwhite 18%
Republican 41%
Independent 27%
Democrat 23%

[9]

* In the same poll, gun owners stated they own firearms for the following reasons:

Protection Against Crime 67%
Target Shooting 66%
Hunting 41%

2) NationMaster.com

http://www.nationmaster.com/graph/cri_mur_wit_fir-crime-murders-with-firearms

VIEW DATA: Totals Per capita
Definition Source Printable version
Bar Graph Pie Chart Map

Showing latest available data.

Rank Countries Amount
# 1 South Africa: 31,918
# 2 Colombia: 21,898
# 3 Thailand: 20,032
# 4 United States: 9,369
# 5 Philippines: 7,708
# 6 Mexico: 2,606
# 7 Slovakia: 2,356
# 8 El Salvador: 1,441
# 9 Zimbabwe: 598
# 10 Peru: 442
# 11 Germany: 269
# 12 Czech Republic: 181
# 13 Ukraine: 173
# 14 Canada: 144
# 15 Albania: 135
# 16 Costa Rica: 131
# 17 Azerbaijan: 120
# 18 Poland: 111
# 19 Uruguay: 109
# 20 Spain: 97
# 21 Portugal: 90
# 22 Croatia: 76
# 23 Switzerland: 68
# 24 Bulgaria: 63
# 25 Australia: 59
# 26 Sweden: 58
# 27 Bolivia: 52
# 28 Japan: 47
# 29 Slovenia: 39
= 30 Hungary: 38
= 30 Belarus: 38
# 32 Latvia: 28
# 33 Burma: 27
# 34 Macedonia, The Former Yugoslav Republic of: 26
# 35 Austria: 25
# 36 Estonia: 21
# 37 Moldova: 20
# 38 Lithuania: 16
= 39 United Kingdom: 14
= 39 Denmark: 14
# 41 Ireland: 12
# 42 New Zealand: 10
# 43 Chile: 9
# 44 Cyprus: 4
# 45 Morocco: 1
= 46 Iceland: 0
= 46 Luxembourg: 0
= 46 Oman: 0
Total: 100,693
Weighted average: 2,097.8

DEFINITION: Total recorded intentional homicides committed with a firearm. Crime statistics are often better indicators of prevalence of law enforcement and willingness to report crime, than actual prevalence.

SOURCE: The Eighth United Nations Survey on Crime Trends and the Operations of Criminal Justice Systems (2002) (United Nations Office on Drugs and Crime, Centre for International Crime Prevention)

3)

Bureau of Justice Statistics

see

http://bjs.ojp.usdoj.gov/dataonline/Search/Homicide/State/RunHomTrendsInOneVar.cfm

or the brand new website (till 2009) on which I CANNOT get gun crime but can get total

http://www.ucrdatatool.gov/

Estimated  murder rate *
Year United States-Total

1960 5.1
1961 4.8
1962 4.6
1963 4.6
1964 4.9
1965 5.1
1966 5.6
1967 6.2
1968 6.9
1969 7.3
1970 7.9
1971 8.6
1972 9.0
1973 9.4
1974 9.8
1975 9.6
1976 8.7
1977 8.8
1978 9.0
1979 9.8
1980 10.2
1981 9.8
1982 9.1
1983 8.3
1984 7.9
1985 8.0
1986 8.6
1987 8.3
1988 8.5
1989 8.7
1990 9.4
1991 9.8
1992 9.3
1993 9.5
1994 9.0
1995 8.2
1996 7.4
1997 6.8
1998 6.3
1999 5.7
2000 5.5
2001 5.6
2002 5.6
2003 5.7
2004 5.5
2005 5.6
2006 5.7
2007 5.6
2008 5.4
2009 5.0
Notes: National or state offense totals are based on data from all reporting agencies and estimates for unreported areas.
* Rates are the number of reported offenses per 100,000 population
  • United States-Total –
    • The 168 murder and nonnegligent homicides that occurred as a result of the bombing of the Alfred P. Murrah Federal Building in Oklahoma City in 1995 are included in the national estimate.
    • The 2,823 murder and nonnegligent homicides that occurred as a result of the events of September 11, 2001, are not included in the national estimates.

     

  • Sources: 


    FBI, Uniform Crime Reports as prepared by the National Archive of Criminal Justice Data


    4) united nation statistics of 2002  were too old in my opinion.
    wikipedia seems too broad based to qualify as a research article but is easily accessible http://en.wikipedia.org/wiki/Gun_violence_in_the_United_States
    to actually buy a gun or see guns available for purchase in United States see
    http://www.usautoweapons.com/

    SAS Lawsuit against WPS- Application Dismissed

    I saw Phil Rack http://twitter.com/#!/PhilRack (whom I have interviewed before at https://decisionstats.com/2009/02/03/interview-phil-rack/ ) and whom I dont talk to since Obama won the election-

     

     

     

     

     

     

     

    well Phil -creator of Bridge to R- first SAS language to R language interface- mentioned this judgment and link.

     

    Probably Phil should revise the documentation of Bridge to R- lest he is sued himself!!!

    Conclusion
    It was for these reasons that I decided to dismiss SAS’s application.

    From-

    http://www.bailii.org/cgi-bin/markup.cgi?doc=/ew/cases/EWHC/Ch/2010/3012.html

     

    Neutral Citation Number: [2010] EWHC 3012 (Ch)
    Case No: HC09C03293

    IN THE HIGH COURT OF JUSTICE
    CHANCERY DIVISION
    Royal Courts of Justice
    Strand, London, WC2A 2LL
    22 November 2010

    B e f o r e :

    THE HON MR JUSTICE ARNOLD
    ____________________
    Between:
    SAS INSTITUTE INC. Claimant
    – and –

    WORLD PROGRAMMING LIMITED Defendant

    ____________________

    Michael Hicks (instructed by Bristows) for the Claimant
    Martin Howe QC and Isabel Jamal (instructed by Speechly Bircham LLP) for the Defendant
    Hearing date: 18 November 2010
    ____________________

    HTML VERSION OF JUDGMENT
    ____________________

    Crown Copyright ©

    MR. JUSTICE ARNOLD :

    Introduction
    By order dated 28 July 2010 I referred certain questions concerning the interpretation of Council Directive 91/250/EEC of 14 May 1991 on the legal protection of computer programs, which was recently codified as European Parliament and Council Directive 2009/24/EC of 23 April 2009, and European Parliament and Council Directive 2001/29/EC of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society to the Court of Justice of the European Union under Article 267 of the Treaty on the Functioning of the European Union. The background to the reference is set out in full in my judgment dated 23 July 2010 [2010] EWHC 1829 (Ch). The reference is presently pending before the Court of Justice as Case C-406/10. By an application notice issued on 11 October 2010 SAS applied for the wording of the questions to be amended in a number of respects. I heard that application on 18 November 2010 and refused it for reasons to be given later. This judgment contains those reasons.

    The questions and the proposed amendments
    I set out below the questions referred with the amendments proposed by SAS shown by strikethrough and underlining:

    “A. On the interpretation of Council Directive 91/250/EEC of 14 May 1991 on the legal protection of computer programs and of Directive 2009/24/EC of the European Parliament and of the Council of 23 April 2009 (codified version):
    1. Where a computer program (‘the First Program’) is protected by copyright as a literary work, is Article 1(2) to be interpreted as meaning that it is not an infringement of the copyright in the First Program for a competitor of the rightholder without access to the source code of the First Program, either directly or via a process such as decompilation of the object code, to create another program (‘the Second Program’) which replicates by copying the functions of the First Program?
    2. Is the answer to question 1 affected by any of the following factors:
    (a) the nature and/or extent of the functionality of the First Program;
    (b) the nature and/or extent of the skill, judgment and labour which has been expended by the author of the First Program in devising and/or selecting the functionality of the First Program;
    (c) the level of detail to which the functionality of the First Program has been reproduced in the Second Program;
    (d) if, the Second Program includes the following matters as a result of copying directly or indirectly from the First Program:
    (i) the selection of statistical operations which have been implemented in the First Program;
    (ii) the selection of mathematical formulae defining the statistical operations which the First Program carries out;
    (iii) the particular commands or combinations of commands by which those statistical operations may be invoked;
    (iv) the options which the author of the First Program has provided in respect of various commands;
    (v) the keywords and syntax recognised by the First Program;
    (vi) the defaults which the author of the First Program has chosen to implement in the event that a particular command or option is not specified by the user;
    (vii) the number of iterations which the First Program will perform in certain circumstances;
    (e)(d) if the source code for the Second Program reproduces by copying aspects of the source code of the First Program to an extent which goes beyond that which was strictly necessary in order to produce the same functionality as the First Program?
    3. Where the First Program interprets and executes application programs written by users of the First Program in a programming language devised by the author of the First Program which comprises keywords devised or selected by the author of the First Program and a syntax devised by the author of the First Program, is Article 1(2) to be interpreted as meaning that it is not an infringement of the copyright in the First Program for the Second Program to be written so as to interpret and execute such application programs using the same keywords and the same syntax?
    4. Where the First Program reads from and writes to data files in a particular format devised by the author of the First Program, is Article 1(2) to be interpreted as meaning that it is not an infringement of the copyright in the First Program for the Second Program to be written so as to read from and write to data files in the same format?
    5. Does it make any difference to the answer to questions 1, 2, 3 and 4 if the author of the Second Program created the Second Program without access to the source code of the First Program, either directly or via decompilation of the object code by:
    (a) observing, studying and testing the functioning of the First Program; or
    (b) reading a manual created and published by the author of the First Program which describes the functions of the First Program (“the Manual”) and by implementing in the Second Program the functions described in the Manual; or
    (c) both (a) and (b)?
    6. Where a person has the right to use a copy of the First Program under a licence, is Article 5(3) to be interpreteding as meaning that the licensee is entitled, without the authorisation of the rightholder, to perform acts of loading, running and storing the program in order to observe, test or study the functioning of the First Program so as to determine the ideas and principles which underlie any element of the program, if the licence permits the licensee to perform acts of loading, running and storing the First Program when using it for the particular purpose permitted by the licence, but the acts done in order to observe, study or test the First Program extend outside the scope of the purpose permitted by the licence and are therefore acts for which the licensee has no right to use the copy of the First Program under the licence?
    7. Is Article 5(3) to be interpreted as meaning that acts of observing, testing or studying of the functioning of the First Program are to be regarded as being done in order to determine the ideas or principles which underlie any element of the First Program where they are done:
    (a) to ascertain the way in which the First Program functions, in particular details which are not described in the Manual, for the purpose of writing the Second Program in the manner referred to in question 1 above;
    (b) to ascertain how the First Program interprets and executes statements written in the programming language which it interprets and executes (see question 3 above);
    (c) to ascertain the formats of data files which are written to or read by the First Program (see question 4 above);
    (d) to compare the performance of the Second Program with the First Program for the purpose of investigating reasons why their performances differ and to improve the performance of the Second Program;
    (e) to conduct parallel tests of the First Program and the Second Program in order to compare their outputs in the course of developing the Second Program, in particular by running the same test scripts through both the First Program and the Second Program;
    (f) to ascertain the output of the log file generated by the First Program in order to produce a log file which is identical or similar in appearance;
    (g) to cause the First Program to output data (in fact, data correlating zip codes to States of the USA) for the purpose of ascertaining whether or not it corresponds with official databases of such data, and if it does not so correspond, to program the Second Program so that it will respond in the same way as the First Program to the same input data.
    B. On the interpretation of Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society:
    8. Where the Manual is protected by copyright as a literary work, is Article 2(a) to be interpreted as meaning that it is an infringement of the copyright in the Manual for the author of the Second Program to reproduce or substantially reproduce in the Second Program any or all of the following matters described in the Manual:
    (a) the selection of statistical operations which have been described in the Manual as being implemented in the First Program;
    (b) the mathematical formulae used in the Manual to describe those statistical operations;
    (c) the particular commands or combinations of commands by which those statistical operations may be invoked;
    (d) the options which the author of the First Program has provided in respect of various commands;
    (e) the keywords and syntax recognised by the First Program;
    (f) the defaults which the author of the First Program has chosen to implement in the event that a particular command or option is not specified by the user;
    (g) the number of iterations which the First Program will perform in certain circumstances?
    9. Is Article 2(a) to be interpreted as meaning that it is an infringement of the copyright in the Manual for the author of the Second Program to reproduce or substantially reproduce in a manual describing the Second Program the keywords and syntax recognised by the First Program?”

    Jurisdiction
    It was common ground between counsel that, although there is no direct authority on the point, it appears that the Court of Justice would accept an amendment to questions which had previously been referred by the referring court. The Court of Justice has stated that “national courts have the widest discretion in referring matters”: see Case 166/73 Rheinmühlen Düsseldorf v Einfuhr-und Vorratstelle für Getreide under Futtermittel [1974] ECR 33 at [4]. If an appeal court substitutes questions for those referred by a lower court, the substituted questions will be answered: Case 65/77 Razanatsimba [1977] ECR 2229. Sometimes the Court of Justice itself invites the referring court to clarify its questions, as occurred in Interflora Inc v Marks & Spencer plc (No 2) [2010] EWHC 925 (Ch). In these circumstances, there does not appear to be any reason to think that, if the referring court itself had good reason to amend its questions, the Court of Justice would disregard the amendment.

    Counsel for WPL submitted, however, that, as a matter of domestic procedural law, this Court had no jurisdiction to vary an order for reference once sealed unless either there had been a material change of circumstances since the order (as in Interflora) or it had subsequently emerged that the Court had made the order on a false basis. He submitted that neither of those conditions was satisfied here. In those circumstances, the only remedy of a litigant in the position of SAS was to seek to appeal to the Court of Appeal.

    As counsel for WPL pointed out, CPR rule 3.1(7) confers on courts what appears to be a general power to vary or revoke their own orders. The proper exercise of that power was considered by the Court of Appeal in Collier v Williams [2006] EWCA Civ 20, [2006] 1 WLR 1945 and Roult v North West Strategic Health Authority [2009] EWCA Civ 444, [2010] 1 WLR 487.

    In Collier Dyson LJ (as he then was) giving the judgment of the Court of Appeal said:

    “39. We now turn to the third argument. CPR 3.1(7) gives a very general power to vary or revoke an order. Consideration was given to the circumstances in which that power might be used by Patten J in Lloyds Investment (Scandinavia) Limited v Christen Ager-Hanssen [2003] EWHC 1740 (Ch). He said at paragraph 7:
    ‘The Deputy Judge exercised a discretion under CPR Part 13.3. It is not open to me as a judge exercising a parallel jurisdiction in the same division of the High Court to entertain what would in effect be an appeal from that order. If the Defendant wished to challenge whether the order made by Mr Berry was disproportionate and wrong in principle, then he should have applied for permission to appeal to the Court of Appeal. I have been given no real reasons why this was not done. That course remains open to him even today, although he will have to persuade the Court of Appeal of the reasons why he should have what, on any view, is a very considerable extension of time. It seems to me that the only power available to me on this application is that contained in CPR Part 3.1(7), which enables the Court to vary or revoke an order. This is not confined to purely procedural orders and there is no real guidance in the White Book as to the possible limits of the jurisdiction. Although this is not intended to be an exhaustive definition of the circumstances in which the power under CPR Part 3.1(7) is exercisable, it seems to me that, for the High Court to revisit one of its earlier orders, the Applicant must either show some material change of circumstances or that the judge who made the earlier order was misled in some way, whether innocently or otherwise, as to the correct factual position before him. The latter type of case would include, for example, a case of material non-disclosure on an application for an injunction. If all that is sought is a reconsideration of the order on the basis of the same material, then that can only be done, in my judgment, in the context of an appeal. Similarly it is not, I think, open to a party to the earlier application to seek in effect to re-argue that application by relying on submissions and evidence which were available to him at the time of the earlier hearing, but which, for whatever reason, he or his legal representatives chose not to employ. It is therefore clear that I am not entitled to entertain this application on the basis of the Defendant’s first main submission, that Mr Berry’s order was in any event disproportionate and wrong in principle, although I am bound to say that I have some reservations as to whether he was right to impose a condition of this kind without in terms enquiring whether the Defendant had any realistic prospects of being able to comply with the condition.’
    We endorse that approach. We agree that the power given by CPR 3.1(7) cannot be used simply as an equivalent to an appeal against an order with which the applicant is dissatisfied. The circumstances outlined by Patten J are the only ones in which the power to revoke or vary an order already made should be exercised under 3.1(7).”
    In Roult Hughes LJ, with whom Smith and Carnwath LJJ agreed, said at [15]:

    “There is scant authority upon Rule 3.1(7) but such as exists is unanimous in holding that it cannot constitute a power in a judge to hear an appeal from himself in respect of a final order. Neuberger J said as much in Customs & Excise v Anchor Foods (No 3) [1999] EWHC 834 (Ch). So did Patten J in Lloyds Investment (Scandinavia) Ltd v Ager-Hanssen [2003] EWHC 1740 (Ch). His general approach was approved by this court, in the context of case management decisions, in Collier v Williams [2006] EWCA Civ 20. I agree that in its terms the rule is not expressly confined to procedural orders. Like Patten J in Ager-Hanssen I would not attempt any exhaustive classification of the circumstances in which it may be proper to invoke it. I am however in no doubt that CPR 3.1(7) cannot bear the weight which Mr Grime’s argument seeks to place upon it. If it could, it would come close to permitting any party to ask any judge to review his own decision and, in effect, to hear an appeal from himself, on the basis of some subsequent event. It would certainly permit any party to ask the judge to review his own decision when it is not suggested that he made any error. It may well be that, in the context of essentially case management decisions, the grounds for invoking the rule will generally fall into one or other of the two categories of (i) erroneous information at the time of the original order or (ii) subsequent event destroying the basis on which it was made. The exigencies of case management may well call for a variation in planning from time to time in the light of developments. There may possibly be examples of non-procedural but continuing orders which may call for revocation or variation as they continue – an interlocutory injunction may be one. But it does not follow that wherever one or other of the two assertions mentioned (erroneous information and subsequent event) can be made, then any party can return to the trial judge and ask him to re-open any decision…..”
    In the present case there has been no material change of circumstances since I made the Order dated 28 July 2010. Nor did counsel for SAS suggest that I had made the Order upon a false basis. Counsel for SAS did submit, however, that the Court of Appeal had left open the possibility that it might be proper to exercise the power conferred by rule 3.1(7) even if there had no been material change of circumstances and it was not suggested that the order in question had been made on a false basis. Furthermore, he relied upon paragraph 1.1 of the Practice Direction to CPR Part 68, which provides that “responsibility for settling the terms of the reference lies with the English court and not with the parties”. He suggested that this meant that orders for references were not subject to the usual constraints on orders made purely inter partes.

    In my judgment PD68 paragraph 1.1 does not justify exercising the power conferred by rule 3.1(7) in circumstances falling outside those identified in Collier and Roult. I am therefore very doubtful that it would be a proper exercise of the power conferred on me by CPR r. 3.1(7) to vary the Order dated 28 July 2010 in the present circumstances. I prefer, however, not to rest my decision on that ground.

    Discretion
    Counsel for WPL also submitted that, even if this Court had jurisdiction to amend the questions, I should exercise my discretion by refusing to do so for two reasons. First, because the application was made too late. Secondly, because there was no sufficient justification for the amendments anyway. I shall consider these points separately.

    Delay
    The relevant dates are as follows. The judgment was handed down on 23 July 2010, a draft having been made available to the parties a few days before that. There was a hearing to consider the form of the order, and in particular the wording of the questions to be referred, on 28 July 2010. Prior to that hearing both parties submitted drafts of the questions, and the respective drafts were discussed at the hearing. Following the hearing I settled the Order, and in particular the questions. The Order was sealed on 2 August 2010. The sealed Order was received by the parties between 3 and 5 August 2010. At around the same time the Senior Master of the Queen’s Bench Division transmitted the Order to the Court of Justice. On 15 September 2010 the Registry of the Court of Justice notified the parties, Member States and EU institutions of the reference. On 1 October 2010 the United Kingdom Intellectual Property Office advertised the reference on its website and invited comments by interested parties by 7 October 2010. The latest date on which written observations on the questions referred may be filed at the Court of Justice is 8 December 2010 (two months from the date of the notification plus 10 days extension on account of distance where applicable). This period is not extendable in any circumstances.

    As noted above, the application was not issued until 11 October 2010. No justification has been provided by SAS for the delay in making the application. The only explanation offered by counsel for SAS was that the idea of proposing the amendments had only occurred to those representing SAS when starting work on SAS’s written observations.

    Furthermore, the application notice requested that the matter be dealt with without a hearing. In my view that was not appropriate: the application was plainly one which was likely to require at least a short hearing. Furthermore, the practical consequence of proceeding in that way was to delay the hearing of the application. The paper application was put before me on 22 October 2010. On the same day I directed that the matter be listed for hearing. In the result it was not listed for hearing until 18 November 2010. If SAS had applied for the matter to be heard urgently, I am sure that it could have been dealt with sooner.

    As counsel for WPL submitted, it is likely that the parties, Member States and institutions who intend to file written observations are now at an advanced stage of preparing those observations. Indeed, it is likely that preparations would have been well advanced even on 11 October 2010. To amend the questions at this stage in the manner proposed by SAS would effectively require the Court of Justice to re-start the written procedure all over again. The amended questions would have to be translated into all the EU official languages; the parties, Member States and EU institutions would have to be notified of the amended questions; and the time for submitting written observations would have to be re-set. This would have two consequences. First, a certain amount of time, effort and money on the part of those preparing written observations would be wasted. Secondly, the progress of the case would be delayed. Those are consequences that could have been avoided if SAS had moved promptly after receiving the sealed Order.

    In these circumstances, it would not in my judgment be proper to exercise any discretion I may have in favour of amending the questions.

    No sufficient justification
    Counsel for WPL submitted that in any event SAS’s proposed amendments were not necessary in order to enable the Court of Justice to provide guidance on the issues in this case, and therefore there was no sufficient justification for making the amendments.

    Before addressing that submission directly, I think it is worth commenting more generally on the formulation of questions. As is common ground, and reflected in paragraph 1.1 of PD68, it is well established that the questions posed on a reference under Article 267 are the referring court’s questions, not the parties’. The purpose of the procedure is for the Court of Justice to provide the referring court with the guidance it needs in order to deal with the issues before it. It follows that it is for the referring court to decide how to formulate the questions.

    In my view it is usually helpful for the court to have the benefit of the parties’ comments on the wording of the proposed questions, as envisaged in paragraph 1.1 of PD68. There are two main reasons for this. The first is to try to ensure that the questions are sufficiently comprehensive to enable all the issues arising to be addressed by the Court of Justice, and thus avoid the need for a further reference at a later stage of the proceedings, as occurred in the Boehringer Ingelheim v Swingward litigation. In that case Laddie J referred questions to the Court of Justice, which were answered in Case C-143/00 [2002] ECR I-3759. The Court of Appeal subsequently concluded, with regret, that the answers to those questions did not suffice to enable it to deal with the case, and referred further questions to the Court of Justice: [2004] EWCA Civ 575, [2004] ETMR 65. Those questions were answered in Case C-348/04 [2007] ECR I-3391. The second main reason is to try to ensure that the questions are clear and free from avoidable ambiguity or obscurity.

    In my experience it is not uncommon for parties addressing the court on the formulation of the questions to attempt to ensure that the questions are worded in a leading manner, that is to say, in a way which suggests the desired answer. In my view that is neither proper nor profitable. It is not proper because the questions should so far as possible be impartially worded. It is not profitable because experience shows that the Court of Justice is usually not concerned with the precise wording of the questions referred, but with their legal substance. Thus the Court of Justice frequently reformulates the question in giving its answer.

    As counsel for WPL pointed out, and as I have already mentioned, in the present case the parties provided me with draft questions which were discussed at a hearing. In settling the questions I took into account the parties’ drafts and their comments on each other’s drafts, but the final wording is, for better or worse, my own.

    As counsel for WPL submitted, at least to some extent SAS’s proposed amendments to the questions appear designed to bring the wording closer to that originally proposed by SAS. This is particularly true of the proposed amendment to question 1. In my judgment it would not be a proper exercise of any discretion that I may have to permit such an amendment, both because it appears to be an attempt by SAS to have the question worded in a manner which it believes favours its case and because its proper remedy if it objected to my not adopting the wording it proposed was to seek to appeal to the Court of Appeal. In saying this, I do not overlook the fact that SAS proposes to move some of the words excised from question 1 to question 5.

    In any event, I am not satisfied that any of the amendments are necessary either to enable the parties to present their respective arguments to the Court of Justice or to enable the Court to give guidance on any of the issues arising in this case. On the contrary, I consider that the existing questions are sufficient for these purposes. By way of illustration, I will take the biggest single amendment, which is the proposed insertion of new paragraph (d) in question 2. In my view, the matters referred to in paragraph (d) are matters that are encompassed within paragraphs (b) and/or (c); or at least can be addressed by the parties, and hence the Court of Justice, in the context provided by paragraphs (b) and/or (c). When I put this to counsel for SAS during the course of argument, he accepted it.

    Other amendments counsel for SAS himself presented as merely being minor matters of clarification. In my view none of them amount to the elimination of what would otherwise be ambiguities or obscurities in the questions.

    It is fair to say that SAS have identified a small typographical error in question 2 (“interpreting” should read “interpreted”), but in my view this is an obvious error which will not cause any difficulty in the proceedings before the Court of Justice.

    Conclusion
    It was for these reasons that I decided to dismiss SAS’s application

    Interview James Dixon Pentaho

    Here is an interview with James Dixon the founder of Pentaho, self confessed Chief Geek and CTO. Pentaho has been growing very rapidly and it makes open source Business Intelligence solutions- basically the biggest chunk of enterprise software market currently.

    Ajay-  How would you describe Pentaho as a BI product for someone who is completely used to traditional BI vendors (read non open source). Do the Oracle lawsuits over Java bother you from a business perspective?

    James-

    Pentaho has a full suite of BI software:

    * ETL: Pentaho Data Integration

    * Reporting: Pentaho Reporting for desktop and web-based reporting

    * OLAP: Mondrian ROLAP engine, and Analyzer or Jpivot for web-based OLAP client

    * Dashboards: CDF and Dashboard Designer

    * Predictive Analytics: Weka

    * Server: Pentaho BI Server, handles web-access, security, scheduling, sharing, report bursting etc

    We have all of the standard BI functionality.

    The Oracle/Java issue does not bother me much. There are a lot of software companies dependent on Java. If Oracle abandons Java a lot resources will suddenly focus on OpenJDK. It would be good for OpenJDK and might be the best thing for Java in the long term.

    Ajay-  What parts of Pentaho’s technology do you personally like the best as having an advantage over other similar proprietary packages.

    Describe the latest Pentaho for Hadoop offering and Hadoop/HIVE ‘s advantage over say Map Reduce and SQL.

    James- The coolest thing is that everything is pluggable:

    * ETL: New data transformation steps can be added. New orchestration controls (job entries) can be added. New perspectives can be added to the design UI. New data sources and destinations can be added.

    * Reporting: New content types and report objects can be added. New data sources can be added.

    * BI Server: Every factory, engine, and layer can be extended or swapped out via configuration. BI components can be added. New visualizations can be added.

    This means it is very easy for Pentaho, partners, customers, and community member to extend our software to do new things.

    In addition every engine and component can be fully embedded into a desktop or web-based application. I made a youtube video about our philosophy: http://www.youtube.com/watch?v=uMyR-In5nKE

    Our Hadoop offerings allow ETL developers to work in a familiar graphical design environment, instead of having to code MapReduce jobs in Java or Python.

    90% of the Hadoop use cases we hear about are transformation/reporting/analysis of structured/semi-structured data, so an ETL tool is perfect for these situations.

    Using Pentaho Data Integration reduces implementation and maintenance costs significantly. The fact that our ETL engine is Java and is embeddable means that we can deploy the engine to the Hadoop data nodes and transform the data within the nodes.

    Ajay-  Do you think the combination of recession, outsourcing,cost cutting, and unemployment are a suitable environment for companies to cut technology costs by going out of their usual vendor lists and try open source for a change /test projects.

    Jamie- Absolutely. Pentaho grew (downloads, installations, revenue) throughout the recession. We are on target to do 250% of what we did last year, while the established vendors are flat in terms of new license revenue.

    Ajay-  How would you compare the user interface of reports using Pentaho versus other reporting software. Please feel free to be as specific.

    James- We have all of the everyday, standard reporting features covered.

    Over the years the old tools, like Crystal Reports, have become bloated and complicated.

    We don’t aim to have 100% of their features, because we’d end us just as complicated.

    The 80:20 rule applies here. 80% of the time people only use 20% of their features.

    We aim for 80% feature parity, which should cover 95-99% of typical use cases.

    Ajay-  Could you describe the Pentaho integration with R as well as your relationship with Weka. Jaspersoft already has a partnership with Revolution Analytics for RevoDeployR (R on a web server)-

    Any  R plans for Pentaho as well?

    James- The feature set of R and Weka overlap to a small extent – both of them include basic statistical functions. Weka is focused on predictive models and machine learning, whereas R is focused on a full suite of statistical models. The creator and main Weka developer is a Pentaho employee. We have integrated R into our ETL tool. (makes me happy 🙂 )

    (probably not a good time to ask if SAS integration is done as well for a big chunk of legacy base SAS/ WPS users)

    About-

    As “Chief Geek” (CTO) at Pentaho, James Dixon is responsible for Pentaho’s architecture and technology roadmap. James has over 15 years of professional experience in software architecture, development and systems consulting. Prior to Pentaho, James held key technical roles at AppSource Corporation (acquired by Arbor Software which later merged into Hyperion Solutions) and Keyola (acquired by Lawson Software). Earlier in his career, James was a technology consultant working with large and small firms to deliver the benefits of innovative technology in real-world environments.