Troubleshooting Rattle Installation- Data Mining R GUI

Screenshot of Synaptic Package Manager running...
Image via Wikipedia

I really find the Rattle GUI very very nice and easy to do any data mining task. The software is available from http://rattle.togaware.com/

The only issue is Rattle can be quite difficult to install due to dependencies on GTK+

After fiddling for a couple of years- this is what I did

1) Created dual boot OS- Basically downloaded the netbook remix from http://ubuntu.com I created a dual boot OS so you can choose at the beginning whether to use Windows or Ubuntu Linux in that session.  Alternatively you can download VM Player www.vmware.com/products/player/ if you want to do both

2) Download R packages using Ubuntu packages and Install GTK+ dependencies before that.

GTK + Requires

  1. Libglade
  2. Glib
  3. Cairo
  4. Pango
  5. ATK

If  you are a Linux newbie like me who doesnt get the sudo apt get, tar, cd, make , install rigmarole – scoot over to synaptic software packages or just the main ubuntu software centre and download these packages one by one.

For R Dependencies, you need

  • PMML
  • XML
  • RGTK2

Again use r-cran as the prefix to these package names and simply install (almost the same way Windows does it easily -double click)

see http://packages.ubuntu.com/search?suite=lucid&searchon=names&keywords=r-cran

4) Install Rattle from source

http://rattle.togaware.com/rattle-download.html

Advanced users can download the Rattle source packages directly:

Save theses to your hard disk (e.g., to your Desktop) but don’t extract them. Then, on GNU/Linux run the install command shown below. This command is entered into a terminal window:

  • R CMD INSTALL rattle_2.6.0.tar.gz

After installation-

5) Type library(rattle) and rattle.info to get messages on what R packages to update for a proper functioning

</code>

> library(rattle)
Rattle: Graphical interface for data mining using R.
Version 2.6.0 Copyright (c) 2006-2010 Togaware Pty Ltd.
Type 'rattle()' to shake, rattle, and roll your data.
> rattle.info()
Rattle: version 2.6.0
R: version 2.11.1 (2010-05-31) (Revision 52157)

Sysname: Linux
Release: 2.6.35-23-generic
Version: #41-Ubuntu SMP Wed Nov 24 10:18:49 UTC 2010
Nodename: k1-M725R
Machine: i686
Login: k1ng
User: k1ng

Installed Dependencies
RGtk2: version 2.20.3
pmml: version 1.2.26
colorspace: version 1.0-1
cairoDevice: version 2.14
doBy: version 4.1.2
e1071: version 1.5-24
ellipse: version 0.3-5
foreign: version 0.8-41
gdata: version 2.8.1
gtools: version 2.6.2
gplots: version 2.8.0
gWidgetsRGtk2: version 0.0-69
Hmisc: version 3.8-3
kernlab: version 0.9-12
latticist: version 0.9-43
Matrix: version 0.999375-46
mice: version 2.4
network: version 1.5-1
nnet: version 7.3-1
party: version 0.9-99991
playwith: version 0.9-53
randomForest: version 4.5-36 upgrade available 4.6-2
rggobi: version 2.1.16
survival: version 2.36-2
XML: version 3.2-0
bitops: version 1.0-4.1

Upgrade the packages with:

 > install.packages(c("randomForest"))

<code>

Now upgrade whatever package rattle.info tells to upgrade.

This is much simpler and less frustrating than some of the other ways to install Rattle.

If all goes well, you will see this familiar screen popup when you type

>rattle()

 

SAS Lawsuit against WPS- Application Dismissed

I saw Phil Rack http://twitter.com/#!/PhilRack (whom I have interviewed before at https://decisionstats.com/2009/02/03/interview-phil-rack/ ) and whom I dont talk to since Obama won the election-

 

 

 

 

 

 

 

well Phil -creator of Bridge to R- first SAS language to R language interface- mentioned this judgment and link.

 

Probably Phil should revise the documentation of Bridge to R- lest he is sued himself!!!

Conclusion
It was for these reasons that I decided to dismiss SAS’s application.

From-

http://www.bailii.org/cgi-bin/markup.cgi?doc=/ew/cases/EWHC/Ch/2010/3012.html

 

Neutral Citation Number: [2010] EWHC 3012 (Ch)
Case No: HC09C03293

IN THE HIGH COURT OF JUSTICE
CHANCERY DIVISION
Royal Courts of Justice
Strand, London, WC2A 2LL
22 November 2010

B e f o r e :

THE HON MR JUSTICE ARNOLD
____________________
Between:
SAS INSTITUTE INC. Claimant
– and –

WORLD PROGRAMMING LIMITED Defendant

____________________

Michael Hicks (instructed by Bristows) for the Claimant
Martin Howe QC and Isabel Jamal (instructed by Speechly Bircham LLP) for the Defendant
Hearing date: 18 November 2010
____________________

HTML VERSION OF JUDGMENT
____________________

Crown Copyright ©

MR. JUSTICE ARNOLD :

Introduction
By order dated 28 July 2010 I referred certain questions concerning the interpretation of Council Directive 91/250/EEC of 14 May 1991 on the legal protection of computer programs, which was recently codified as European Parliament and Council Directive 2009/24/EC of 23 April 2009, and European Parliament and Council Directive 2001/29/EC of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society to the Court of Justice of the European Union under Article 267 of the Treaty on the Functioning of the European Union. The background to the reference is set out in full in my judgment dated 23 July 2010 [2010] EWHC 1829 (Ch). The reference is presently pending before the Court of Justice as Case C-406/10. By an application notice issued on 11 October 2010 SAS applied for the wording of the questions to be amended in a number of respects. I heard that application on 18 November 2010 and refused it for reasons to be given later. This judgment contains those reasons.

The questions and the proposed amendments
I set out below the questions referred with the amendments proposed by SAS shown by strikethrough and underlining:

“A. On the interpretation of Council Directive 91/250/EEC of 14 May 1991 on the legal protection of computer programs and of Directive 2009/24/EC of the European Parliament and of the Council of 23 April 2009 (codified version):
1. Where a computer program (‘the First Program’) is protected by copyright as a literary work, is Article 1(2) to be interpreted as meaning that it is not an infringement of the copyright in the First Program for a competitor of the rightholder without access to the source code of the First Program, either directly or via a process such as decompilation of the object code, to create another program (‘the Second Program’) which replicates by copying the functions of the First Program?
2. Is the answer to question 1 affected by any of the following factors:
(a) the nature and/or extent of the functionality of the First Program;
(b) the nature and/or extent of the skill, judgment and labour which has been expended by the author of the First Program in devising and/or selecting the functionality of the First Program;
(c) the level of detail to which the functionality of the First Program has been reproduced in the Second Program;
(d) if, the Second Program includes the following matters as a result of copying directly or indirectly from the First Program:
(i) the selection of statistical operations which have been implemented in the First Program;
(ii) the selection of mathematical formulae defining the statistical operations which the First Program carries out;
(iii) the particular commands or combinations of commands by which those statistical operations may be invoked;
(iv) the options which the author of the First Program has provided in respect of various commands;
(v) the keywords and syntax recognised by the First Program;
(vi) the defaults which the author of the First Program has chosen to implement in the event that a particular command or option is not specified by the user;
(vii) the number of iterations which the First Program will perform in certain circumstances;
(e)(d) if the source code for the Second Program reproduces by copying aspects of the source code of the First Program to an extent which goes beyond that which was strictly necessary in order to produce the same functionality as the First Program?
3. Where the First Program interprets and executes application programs written by users of the First Program in a programming language devised by the author of the First Program which comprises keywords devised or selected by the author of the First Program and a syntax devised by the author of the First Program, is Article 1(2) to be interpreted as meaning that it is not an infringement of the copyright in the First Program for the Second Program to be written so as to interpret and execute such application programs using the same keywords and the same syntax?
4. Where the First Program reads from and writes to data files in a particular format devised by the author of the First Program, is Article 1(2) to be interpreted as meaning that it is not an infringement of the copyright in the First Program for the Second Program to be written so as to read from and write to data files in the same format?
5. Does it make any difference to the answer to questions 1, 2, 3 and 4 if the author of the Second Program created the Second Program without access to the source code of the First Program, either directly or via decompilation of the object code by:
(a) observing, studying and testing the functioning of the First Program; or
(b) reading a manual created and published by the author of the First Program which describes the functions of the First Program (“the Manual”) and by implementing in the Second Program the functions described in the Manual; or
(c) both (a) and (b)?
6. Where a person has the right to use a copy of the First Program under a licence, is Article 5(3) to be interpreteding as meaning that the licensee is entitled, without the authorisation of the rightholder, to perform acts of loading, running and storing the program in order to observe, test or study the functioning of the First Program so as to determine the ideas and principles which underlie any element of the program, if the licence permits the licensee to perform acts of loading, running and storing the First Program when using it for the particular purpose permitted by the licence, but the acts done in order to observe, study or test the First Program extend outside the scope of the purpose permitted by the licence and are therefore acts for which the licensee has no right to use the copy of the First Program under the licence?
7. Is Article 5(3) to be interpreted as meaning that acts of observing, testing or studying of the functioning of the First Program are to be regarded as being done in order to determine the ideas or principles which underlie any element of the First Program where they are done:
(a) to ascertain the way in which the First Program functions, in particular details which are not described in the Manual, for the purpose of writing the Second Program in the manner referred to in question 1 above;
(b) to ascertain how the First Program interprets and executes statements written in the programming language which it interprets and executes (see question 3 above);
(c) to ascertain the formats of data files which are written to or read by the First Program (see question 4 above);
(d) to compare the performance of the Second Program with the First Program for the purpose of investigating reasons why their performances differ and to improve the performance of the Second Program;
(e) to conduct parallel tests of the First Program and the Second Program in order to compare their outputs in the course of developing the Second Program, in particular by running the same test scripts through both the First Program and the Second Program;
(f) to ascertain the output of the log file generated by the First Program in order to produce a log file which is identical or similar in appearance;
(g) to cause the First Program to output data (in fact, data correlating zip codes to States of the USA) for the purpose of ascertaining whether or not it corresponds with official databases of such data, and if it does not so correspond, to program the Second Program so that it will respond in the same way as the First Program to the same input data.
B. On the interpretation of Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society:
8. Where the Manual is protected by copyright as a literary work, is Article 2(a) to be interpreted as meaning that it is an infringement of the copyright in the Manual for the author of the Second Program to reproduce or substantially reproduce in the Second Program any or all of the following matters described in the Manual:
(a) the selection of statistical operations which have been described in the Manual as being implemented in the First Program;
(b) the mathematical formulae used in the Manual to describe those statistical operations;
(c) the particular commands or combinations of commands by which those statistical operations may be invoked;
(d) the options which the author of the First Program has provided in respect of various commands;
(e) the keywords and syntax recognised by the First Program;
(f) the defaults which the author of the First Program has chosen to implement in the event that a particular command or option is not specified by the user;
(g) the number of iterations which the First Program will perform in certain circumstances?
9. Is Article 2(a) to be interpreted as meaning that it is an infringement of the copyright in the Manual for the author of the Second Program to reproduce or substantially reproduce in a manual describing the Second Program the keywords and syntax recognised by the First Program?”

Jurisdiction
It was common ground between counsel that, although there is no direct authority on the point, it appears that the Court of Justice would accept an amendment to questions which had previously been referred by the referring court. The Court of Justice has stated that “national courts have the widest discretion in referring matters”: see Case 166/73 Rheinmühlen Düsseldorf v Einfuhr-und Vorratstelle für Getreide under Futtermittel [1974] ECR 33 at [4]. If an appeal court substitutes questions for those referred by a lower court, the substituted questions will be answered: Case 65/77 Razanatsimba [1977] ECR 2229. Sometimes the Court of Justice itself invites the referring court to clarify its questions, as occurred in Interflora Inc v Marks & Spencer plc (No 2) [2010] EWHC 925 (Ch). In these circumstances, there does not appear to be any reason to think that, if the referring court itself had good reason to amend its questions, the Court of Justice would disregard the amendment.

Counsel for WPL submitted, however, that, as a matter of domestic procedural law, this Court had no jurisdiction to vary an order for reference once sealed unless either there had been a material change of circumstances since the order (as in Interflora) or it had subsequently emerged that the Court had made the order on a false basis. He submitted that neither of those conditions was satisfied here. In those circumstances, the only remedy of a litigant in the position of SAS was to seek to appeal to the Court of Appeal.

As counsel for WPL pointed out, CPR rule 3.1(7) confers on courts what appears to be a general power to vary or revoke their own orders. The proper exercise of that power was considered by the Court of Appeal in Collier v Williams [2006] EWCA Civ 20, [2006] 1 WLR 1945 and Roult v North West Strategic Health Authority [2009] EWCA Civ 444, [2010] 1 WLR 487.

In Collier Dyson LJ (as he then was) giving the judgment of the Court of Appeal said:

“39. We now turn to the third argument. CPR 3.1(7) gives a very general power to vary or revoke an order. Consideration was given to the circumstances in which that power might be used by Patten J in Lloyds Investment (Scandinavia) Limited v Christen Ager-Hanssen [2003] EWHC 1740 (Ch). He said at paragraph 7:
‘The Deputy Judge exercised a discretion under CPR Part 13.3. It is not open to me as a judge exercising a parallel jurisdiction in the same division of the High Court to entertain what would in effect be an appeal from that order. If the Defendant wished to challenge whether the order made by Mr Berry was disproportionate and wrong in principle, then he should have applied for permission to appeal to the Court of Appeal. I have been given no real reasons why this was not done. That course remains open to him even today, although he will have to persuade the Court of Appeal of the reasons why he should have what, on any view, is a very considerable extension of time. It seems to me that the only power available to me on this application is that contained in CPR Part 3.1(7), which enables the Court to vary or revoke an order. This is not confined to purely procedural orders and there is no real guidance in the White Book as to the possible limits of the jurisdiction. Although this is not intended to be an exhaustive definition of the circumstances in which the power under CPR Part 3.1(7) is exercisable, it seems to me that, for the High Court to revisit one of its earlier orders, the Applicant must either show some material change of circumstances or that the judge who made the earlier order was misled in some way, whether innocently or otherwise, as to the correct factual position before him. The latter type of case would include, for example, a case of material non-disclosure on an application for an injunction. If all that is sought is a reconsideration of the order on the basis of the same material, then that can only be done, in my judgment, in the context of an appeal. Similarly it is not, I think, open to a party to the earlier application to seek in effect to re-argue that application by relying on submissions and evidence which were available to him at the time of the earlier hearing, but which, for whatever reason, he or his legal representatives chose not to employ. It is therefore clear that I am not entitled to entertain this application on the basis of the Defendant’s first main submission, that Mr Berry’s order was in any event disproportionate and wrong in principle, although I am bound to say that I have some reservations as to whether he was right to impose a condition of this kind without in terms enquiring whether the Defendant had any realistic prospects of being able to comply with the condition.’
We endorse that approach. We agree that the power given by CPR 3.1(7) cannot be used simply as an equivalent to an appeal against an order with which the applicant is dissatisfied. The circumstances outlined by Patten J are the only ones in which the power to revoke or vary an order already made should be exercised under 3.1(7).”
In Roult Hughes LJ, with whom Smith and Carnwath LJJ agreed, said at [15]:

“There is scant authority upon Rule 3.1(7) but such as exists is unanimous in holding that it cannot constitute a power in a judge to hear an appeal from himself in respect of a final order. Neuberger J said as much in Customs & Excise v Anchor Foods (No 3) [1999] EWHC 834 (Ch). So did Patten J in Lloyds Investment (Scandinavia) Ltd v Ager-Hanssen [2003] EWHC 1740 (Ch). His general approach was approved by this court, in the context of case management decisions, in Collier v Williams [2006] EWCA Civ 20. I agree that in its terms the rule is not expressly confined to procedural orders. Like Patten J in Ager-Hanssen I would not attempt any exhaustive classification of the circumstances in which it may be proper to invoke it. I am however in no doubt that CPR 3.1(7) cannot bear the weight which Mr Grime’s argument seeks to place upon it. If it could, it would come close to permitting any party to ask any judge to review his own decision and, in effect, to hear an appeal from himself, on the basis of some subsequent event. It would certainly permit any party to ask the judge to review his own decision when it is not suggested that he made any error. It may well be that, in the context of essentially case management decisions, the grounds for invoking the rule will generally fall into one or other of the two categories of (i) erroneous information at the time of the original order or (ii) subsequent event destroying the basis on which it was made. The exigencies of case management may well call for a variation in planning from time to time in the light of developments. There may possibly be examples of non-procedural but continuing orders which may call for revocation or variation as they continue – an interlocutory injunction may be one. But it does not follow that wherever one or other of the two assertions mentioned (erroneous information and subsequent event) can be made, then any party can return to the trial judge and ask him to re-open any decision…..”
In the present case there has been no material change of circumstances since I made the Order dated 28 July 2010. Nor did counsel for SAS suggest that I had made the Order upon a false basis. Counsel for SAS did submit, however, that the Court of Appeal had left open the possibility that it might be proper to exercise the power conferred by rule 3.1(7) even if there had no been material change of circumstances and it was not suggested that the order in question had been made on a false basis. Furthermore, he relied upon paragraph 1.1 of the Practice Direction to CPR Part 68, which provides that “responsibility for settling the terms of the reference lies with the English court and not with the parties”. He suggested that this meant that orders for references were not subject to the usual constraints on orders made purely inter partes.

In my judgment PD68 paragraph 1.1 does not justify exercising the power conferred by rule 3.1(7) in circumstances falling outside those identified in Collier and Roult. I am therefore very doubtful that it would be a proper exercise of the power conferred on me by CPR r. 3.1(7) to vary the Order dated 28 July 2010 in the present circumstances. I prefer, however, not to rest my decision on that ground.

Discretion
Counsel for WPL also submitted that, even if this Court had jurisdiction to amend the questions, I should exercise my discretion by refusing to do so for two reasons. First, because the application was made too late. Secondly, because there was no sufficient justification for the amendments anyway. I shall consider these points separately.

Delay
The relevant dates are as follows. The judgment was handed down on 23 July 2010, a draft having been made available to the parties a few days before that. There was a hearing to consider the form of the order, and in particular the wording of the questions to be referred, on 28 July 2010. Prior to that hearing both parties submitted drafts of the questions, and the respective drafts were discussed at the hearing. Following the hearing I settled the Order, and in particular the questions. The Order was sealed on 2 August 2010. The sealed Order was received by the parties between 3 and 5 August 2010. At around the same time the Senior Master of the Queen’s Bench Division transmitted the Order to the Court of Justice. On 15 September 2010 the Registry of the Court of Justice notified the parties, Member States and EU institutions of the reference. On 1 October 2010 the United Kingdom Intellectual Property Office advertised the reference on its website and invited comments by interested parties by 7 October 2010. The latest date on which written observations on the questions referred may be filed at the Court of Justice is 8 December 2010 (two months from the date of the notification plus 10 days extension on account of distance where applicable). This period is not extendable in any circumstances.

As noted above, the application was not issued until 11 October 2010. No justification has been provided by SAS for the delay in making the application. The only explanation offered by counsel for SAS was that the idea of proposing the amendments had only occurred to those representing SAS when starting work on SAS’s written observations.

Furthermore, the application notice requested that the matter be dealt with without a hearing. In my view that was not appropriate: the application was plainly one which was likely to require at least a short hearing. Furthermore, the practical consequence of proceeding in that way was to delay the hearing of the application. The paper application was put before me on 22 October 2010. On the same day I directed that the matter be listed for hearing. In the result it was not listed for hearing until 18 November 2010. If SAS had applied for the matter to be heard urgently, I am sure that it could have been dealt with sooner.

As counsel for WPL submitted, it is likely that the parties, Member States and institutions who intend to file written observations are now at an advanced stage of preparing those observations. Indeed, it is likely that preparations would have been well advanced even on 11 October 2010. To amend the questions at this stage in the manner proposed by SAS would effectively require the Court of Justice to re-start the written procedure all over again. The amended questions would have to be translated into all the EU official languages; the parties, Member States and EU institutions would have to be notified of the amended questions; and the time for submitting written observations would have to be re-set. This would have two consequences. First, a certain amount of time, effort and money on the part of those preparing written observations would be wasted. Secondly, the progress of the case would be delayed. Those are consequences that could have been avoided if SAS had moved promptly after receiving the sealed Order.

In these circumstances, it would not in my judgment be proper to exercise any discretion I may have in favour of amending the questions.

No sufficient justification
Counsel for WPL submitted that in any event SAS’s proposed amendments were not necessary in order to enable the Court of Justice to provide guidance on the issues in this case, and therefore there was no sufficient justification for making the amendments.

Before addressing that submission directly, I think it is worth commenting more generally on the formulation of questions. As is common ground, and reflected in paragraph 1.1 of PD68, it is well established that the questions posed on a reference under Article 267 are the referring court’s questions, not the parties’. The purpose of the procedure is for the Court of Justice to provide the referring court with the guidance it needs in order to deal with the issues before it. It follows that it is for the referring court to decide how to formulate the questions.

In my view it is usually helpful for the court to have the benefit of the parties’ comments on the wording of the proposed questions, as envisaged in paragraph 1.1 of PD68. There are two main reasons for this. The first is to try to ensure that the questions are sufficiently comprehensive to enable all the issues arising to be addressed by the Court of Justice, and thus avoid the need for a further reference at a later stage of the proceedings, as occurred in the Boehringer Ingelheim v Swingward litigation. In that case Laddie J referred questions to the Court of Justice, which were answered in Case C-143/00 [2002] ECR I-3759. The Court of Appeal subsequently concluded, with regret, that the answers to those questions did not suffice to enable it to deal with the case, and referred further questions to the Court of Justice: [2004] EWCA Civ 575, [2004] ETMR 65. Those questions were answered in Case C-348/04 [2007] ECR I-3391. The second main reason is to try to ensure that the questions are clear and free from avoidable ambiguity or obscurity.

In my experience it is not uncommon for parties addressing the court on the formulation of the questions to attempt to ensure that the questions are worded in a leading manner, that is to say, in a way which suggests the desired answer. In my view that is neither proper nor profitable. It is not proper because the questions should so far as possible be impartially worded. It is not profitable because experience shows that the Court of Justice is usually not concerned with the precise wording of the questions referred, but with their legal substance. Thus the Court of Justice frequently reformulates the question in giving its answer.

As counsel for WPL pointed out, and as I have already mentioned, in the present case the parties provided me with draft questions which were discussed at a hearing. In settling the questions I took into account the parties’ drafts and their comments on each other’s drafts, but the final wording is, for better or worse, my own.

As counsel for WPL submitted, at least to some extent SAS’s proposed amendments to the questions appear designed to bring the wording closer to that originally proposed by SAS. This is particularly true of the proposed amendment to question 1. In my judgment it would not be a proper exercise of any discretion that I may have to permit such an amendment, both because it appears to be an attempt by SAS to have the question worded in a manner which it believes favours its case and because its proper remedy if it objected to my not adopting the wording it proposed was to seek to appeal to the Court of Appeal. In saying this, I do not overlook the fact that SAS proposes to move some of the words excised from question 1 to question 5.

In any event, I am not satisfied that any of the amendments are necessary either to enable the parties to present their respective arguments to the Court of Justice or to enable the Court to give guidance on any of the issues arising in this case. On the contrary, I consider that the existing questions are sufficient for these purposes. By way of illustration, I will take the biggest single amendment, which is the proposed insertion of new paragraph (d) in question 2. In my view, the matters referred to in paragraph (d) are matters that are encompassed within paragraphs (b) and/or (c); or at least can be addressed by the parties, and hence the Court of Justice, in the context provided by paragraphs (b) and/or (c). When I put this to counsel for SAS during the course of argument, he accepted it.

Other amendments counsel for SAS himself presented as merely being minor matters of clarification. In my view none of them amount to the elimination of what would otherwise be ambiguities or obscurities in the questions.

It is fair to say that SAS have identified a small typographical error in question 2 (“interpreting” should read “interpreted”), but in my view this is an obvious error which will not cause any difficulty in the proceedings before the Court of Justice.

Conclusion
It was for these reasons that I decided to dismiss SAS’s application

Libre Office

Some ambiguity about Libre Office and why it needed to change from Open Office- just when Open Office seemed so threatening on the desktop

FROM- http://www.documentfoundation.org/faq/

Q: So is this a breakaway project?

A: Not at all. The Document Foundation will continue to be focused on developing, supporting, and promoting the same software, and it’s very much business as usual. We are simply moving to a new and more appropriate organisational model for the next decade – a logical development from Sun’s inspirational launch a decade ago.

Q: Why are you calling yourselves “The Document Foundation”?

A: For ten years we have used the same name – “OpenOffice.org” – for both the Community and the software. We’ve decided it removes ambiguity to have a different name for the two, so the Community is now “The Document Foundation”, and the software “LibreOffice”. Note: there are other examples of this usage in the free software community – e.g. the Mozilla Foundation with the Firefox browser.

Q: Does this mean you intend to develop other pieces of software?

A: We would like to have that possibility open to us in the future…

Q: And why are you calling the software “LibreOffice” instead of “OpenOffice.org”?

A: The OpenOffice.org trademark is owned by Oracle Corporation. Our hope is that Oracle will donate this to the Foundation, along with the other assets it holds in trust for the Community, in due course, once legal etc issues are resolved. However, we need to continue work in the meantime – hence “LibreOffice” (“free office”).

Q: Why are you building a new web infrastructure?

A: Since Oracle’s takeover of Sun Microsystems, the Community has been under “notice to quit” from our previous Collabnet infrastructure. With today’s announcement of a Foundation, we now have an entity which can own our emerging new infrastructure.

Q: What does this announcement mean to other derivatives of OpenOffice.org?

A: We want The Document Foundation to be open to code contributions from as many people as possible. We are delighted to announce that the enhancements produced by the Go-OOo team will be merged into LibreOffice, effective immediately. We hope that others will follow suit.

Q: What difference will this make to the commercial products produced by Oracle Corporation, IBM, Novell, Red Flag, etc?

A: The Document Foundation cannot answer for other bodies. However, there is nothing in the licence arrangements to stop companies continuing to release commercial derivatives of LibreOffice. The new Foundation will also mean companies can contribute funds or resources without worries that they may be helping a commercial competitor.

Q: What difference will The Document Foundation make to developers?

A: The Document Foundation sets out deliberately to be as developer friendly as possible. We do not demand that contributors share their copyright with us. People will gain status in our community based on peer evaluation of their contributions – not by who their employer is.

Q: What difference will The Document Foundation make to users of LibreOffice?

A: LibreOffice is The Document Foundation’s reason for existence. We do not have and will not have a commercial product which receives preferential treatment. We only have one focus – delivering the best free office suite for our users – LibreOffice.

—————————————————————————————————-

Non Microsoft and Non Oracle vendors are indeed going to find it useful the possiblities of bundling a free Libre Office that reduces the total cost of ownership for analytics software. Right now, some of the best free advertising for Microsoft OS and Office is done by enterprise software vendors who create Windows Only Products and enable MS Office integration better than  Open Office integration. This is done citing user demand- but it is a chicken egg dilemma- as functionality leads to enhanced demand. Microsoft on the other hand is aware of this dependence and has made SQL Server and SQL Analytics (besides investing in analytics startups like Revolution Analytics) along with it’s own infrastructure -Azure Cloud Platform/EC2 instances.

Interview Michael J. A. Berry Data Miners, Inc

Here is an interview with noted Data Mining practitioner Michael Berry, author of seminal books in data mining, noted trainer and consultantmjab picture

Ajay- Your famous book “Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management” came out in 2004, and an update is being planned for 2011. What are the various new data mining techniques and their application that you intend to talk about in that book.

Michael- Each time we do a revision, it feels like writing a whole new book. The first edition came out in 1997 and it is hard to believe how much the world has changed since then. I’m currently spending most of my time in the on-line retailing world. The things I worry about today–improving recommendations for cross-sell and up-sell,and search engine optimization–wouldn’t have even made sense to me back then. And the data sizes that are routine today were beyond the capacity of the most powerful super computers of the nineties. But, if possible, Gordon and I have changed even more than the data mining landscape. What has changed us is experience. We learned an awful lot between the first and second editions, and I think we’ve learned even more between the second and third.

One consequence is that we now have to discipline ourselves to avoid making the book too heavy to lift. For the first edition, we could write everything we knew (and arguably, a bit more!); now we have to remind ourselves that our intended audience is still the same–intelligent laymen with a practical interest in getting more information out of data. Not statisticians. Not computer scientists. Not academic researchers. Although we welcome all readers, we are primarily writing for someone who works in a marketing department and has a title with the word “analyst” or “analytics” in it. We have relaxed our “no equations” rule slightly for cases when the equations really do make things easier to explain, but the core explanations are still in words and pictures.

The third edition completes a transition that was already happening in the second edition. We have fully embraced standard statistical modeling techniques as full-fledged components of the data miner’s toolkit. In the first edition, it seemed important to make a distinction between old, dull, statistics, and new, cool, data mining. By the second edition, we realized that didn’t really make sense, but remnants of that attitude persisted. The third edition rectifies this. There is a chapter on statistical modeling techniques that explains linear and logistic regression, naive Bayes models, and more. There is also a brand new chapter on text mining, a curious omission from previous editions.

There is also a lot more material on data preparation. Three whole chapters are devoted to various aspects of data preparation. The first focuses on creating customer signatures. The second is focused on using derived variables to bring information to the surface, and the third deals with data reduction techniques such as principal components. Since this is where we spend the greatest part of our time in our work, it seemed important to spend more time on these subjects in the book as well.

Some of the chapters have been beefed up a bit. The neural network chapter now includes radial basis functions in addition to multi-layer perceptrons. The clustering chapter has been split into two chapters to accommodate new material on soft clustering, self-organizing maps, and more. The survival analysis chapter is much improved and includes material on some of our recent application of survival analysis methods to forecasting. The genetic algorithms chapter now includes a discussion of swarm intelligence.

Ajay- Describe your early career and how you came into Data Mining as a profession. What do you think of various universities now offering MS in Analytics. How do you balance your own teaching experience with your consulting projects at The Data Miners.

Michael- I fell into data mining quite by accident. I guess I always had a latent interest in the topic. As a high school and college student, I was a fan of Martin Gardner‘s mathematical games in in Scientific American. One of my favorite things he wrote about was a game called New Eleusis in which one players, God, makes up a rule to govern how cards can be played (“an even card must be followed by a red card”, say) and the other players have to figure out the rule by watching what plays are allowed by God and which ones are rejected. Just for my own amusement, I wrote a computer program to play the game and presented it at the IJCAI conference in, I think, 1981.

That paper became a chapter in a book on computer game playing–so my first book was about finding patterns in data. Aside from that, my interest in finding patterns in data lay dormant for years. At Thinking Machines, I was in the compiler group. In particular, I was responsible for the run-time system of the first Fortran Compiler for the CM-2 and I represented Thinking Machines at the Fortran 8X (later Fortran-90) standards meetings.

What changed my direction was that Thinking Machines got an export license to sell our first machine overseas. The machine went to a research lab just outside of Paris. The connection machine was so hard to program, that if you bought one, you got an applications engineer to go along with it. None of the applications engineers wanted to go live in Paris for a few months, but I did.

Paris was a lot of fun, and so, I discovered, was actually working on applications. When I came back to the states, I stuck with that applied focus and my next assignment was to spend a couple of years at Epsilon, (then a subsidiary of American Express) working on a database marketing system that stored all the “records of charge” for American Express card members. The purpose of the system was to pick ads to go in the billing envelope. I also worked on some more general purpose data mining software for the CM-5.

When Thinking Machines folded, I had the opportunity to open a Cambridge office for a Virginia-based consulting company called MRJ that had been a major channel for placing Connection Machines in various government agencies. The new group at MRJ was focused on data mining applications in the commercial market. At least, that was the idea. It turned out that they were more interested in data warehousing projects, so after a while we parted company.

That led to the formation of Data Miners. My two partners in Data Miners, Gordon Linoff and Brij Masand, share the Thinking Machines background.

To tell the truth, I really don’t know much about the university programs in data mining that have started to crop up. I’ve visited the one at NC State, but not any of the others.

I myself teach a class in “Marketing Analytics” at the Carroll School of Management at Boston College. It is an elective part of the MBA program there. I also teach short classes for corporations on their sites and at various conferences.

Ajay- At the previous Predictive Analytics World, you took a session on Forecasting and Predicting Subsciber levels (http://www.predictiveanalyticsworld.com/dc/2009/agenda.php#day2-6) .

It seems inability to forecast is a problem many many companies face today. What do you think are the top 5 principles of business forecasting which companies need to follow.

Michael- I don’t think I can come up with five. Our approach to forecasting is essentially simulation. We try to model the underlying processes and then turn the crank to see what happens. If there is a principal behind that, I guess it is to approach a forecast from the bottom up rather than treating aggregate numbers as a time series.

Ajay- You often partner your talks with SAS Institute, and your blog at http://blog.data-miners.com/ sometimes contain SAS code as well. What particular features of the SAS software do you like. Do you use just the Enterprise Miner or other modules as well for Survival Analysis or Forecasting.

Michael- Our first data mining class used SGI’s Mineset for the hands-on examples. Later we developed versions using Clementine, Quadstone, and SAS Enterprise Miner. Then, market forces took hold. We don’t market our classes ourselves, we depend on others to market them and then share in the revenue.

SAS turned out to be much better at marketing our classes than the other companies, so over time we stopped updating the other versions. An odd thing about our relationship with SAS is that it is only with the education group. They let us use Enterprise Miner to develop course materials, but we are explicitly forbidden to use it in our consulting work. As a consequence, we don’t use it much outside of the classroom.

Ajay- Also any other software you use (apart from SQL and J)

Michael- We try to fit in with whatever environment our client has set up. That almost always is SQL-based (Teradata, Oracle, SQL Server, . . .). Often SAS Stat is also available and sometimes Enterprise Miner.

We run into SPSS, Statistica, Angoss, and other tools as well. We tend to work in big data environments so we’ve also had occasion to use Ab Initio and, more recently, Hadoop. I expect to be seeing more of that.

Biography-

Together with his colleague, Gordon Linoff, Michael Berry is author of some of the most widely read and respected books on data mining. These best sellers in the field have been translated into many languages. Michael is an active practitioner of data mining. His books reflect many years of practical, hands-on experience down in the data mines.

Data Mining Techniques cover

Data Mining Techniques for Marketing, Sales and Customer Relationship Management

by Michael J. A. Berry and Gordon S. Linoff
copyright 2004 by John Wiley & Sons
ISB

Mining the Web cover

Mining the Web

by Michael J.A. Berry and Gordon S. Linoff
copyright 2002 by John Wiley & Sons
ISBN 0-471-41609-6

Non-English editions available in Traditional Chinese and Simplified Chinese

This book looks at the new opportunities and challenges for data mining that have been created by the web. The book demonstrates how to apply data mining to specific types of online businesses, such as auction sites, B2B trading exchanges, click-and-mortar retailers, subscription sites, and online retailers of digital content.

Mastering Data Mining

by Michael J.A. Berry and Gordon S. Linoff
copyright 2000 by John Wiley & Sons
ISBN 0-471-33123-6

Non-English editions available in JapaneseItalianTraditional Chinese , and Simplified Chinese

A case study-based guide to applying data mining techniques for solving practical business problems. These “warts and all” case studies are drawn directly from consulting engagements performed by the authors.

A data mining educator as well as a consultant, Michael is in demand as a keynote speaker and seminar leader in the area of data mining generally and the application of data mining to customer relationship management in particular.

Prior to founding Data Miners in December, 1997, Michael spent 8 years at Thinking Machines Corporation. There he specialized in the application of massively parallel supercomputing techniques to business and marketing applications, including one of the largest database marketing systems of the time.

IPSUR – A Free R Textbook

Here is a free R textbook called IPSUR-

http://ipsur.r-forge.r-project.org/book/index.php

IPSUR stands for Introduction to Probability and Statistics Using R, ISBN: 978-0-557-24979-4, which is a textbook written for an undergraduate course in probability and statistics. The approximate prerequisites are two or three semesters of calculus and some linear algebra in a few places. Attendees of the class include mathematics, engineering, and computer science majors.

IPSUR is FREE, in the GNU sense of the word. Hard copies are available for purchase here from Lulu and will be available (coming soon) from the other standard online retailers worldwide. The price of the book is exactly the manufacturing cost plus the retailers’ markup. You may be able to get it even cheaper by downloading an electronic copy and printing it yourself, but if you elect this route then be sure to get the publisher-quality PDF from theDownloads page. And double check the price. It was cheaper for my students to buy a perfect-bound paperback from Lulu and have it shipped to their door than it was to upload the PDF to Fed-Ex Kinkos and Xerox a coil-bound copy (and on top of that go pick it up at the store).

If you are going to buy from anywhere other than Lulu then be sure to check the time-stamp on the copyright page. There is a 6 to 8 week delay from Lulu to Amazon and you may not be getting the absolute latest version available.

Refer to the Installation page for instructions to install an electronic copy of IPSUR on your personal computer. See the Feedback page for guidance about questions or comments you may have about IPSUR.

Also see http://ipsur.r-forge.r-project.org/rcmdrplugin/index.php for the R Cmdr Plugin

This plugin for the R Commander accompanies the text Introduction to Probability and Statistics Using R by G. Jay Kerns. The plugin contributes functions unique to the book as well as specific configuration and functionality to R Commander, the pioneering work by John Fox of McMaster University.

RcmdrPlugin.IPSUR’s primary goal is to provide a user-friendly graphical user interface (GUI) to the open-source and freely available R statistical computing environment. RcmdrPlugin.IPSUR is equipped to handle many of the statistical analyses and graphical displays usually encountered by upper division undergraduate mathematics, statistics, and engineering majors. Available features are comparable to many expensive commercial packages such as Minitab, SPSS, and JMP-IN.

Since the audience of RcmdrPlugin.IPSUR is slightly different than Rcmdr’s, certain functionality has been added and selected error-checks have been disabled to permit the student to explore alternative regions of the statistical landscape. The resulting benefit of increased flexibility is balanced by somewhat increased vulnerability to syntax errors and misuse; the instructor should keep this and the academic audience in mind when usingRcmdrPlugin.IPSUR in the classroom

GNU PSPP- The Open Source SPSS

If you are SPSS user (for statistics/ not data mining) you can also try 0ut GNU PSPP- which is the open source equivalent and quite eerily impressive in performance. It is available at http://www.gnu.org/software/pspp/ or http://pspp.awardspace.com/ and you can also read more at http://en.wikipedia.org/wiki/PSPP

PSPP is a program for statistical analysis of sampled data. It is a Free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions.

[ Image of Variable Sheet ]The most important of these exceptions are, that there are no “time bombs”; your copy of PSPP will not “expire” or deliberately stop working in the future. Neither are there any artificial limits on the number of cases or variables which you can use. There are no additional packages to purchase in order to get “advanced” functions; all functionality that PSPP currently supports is in the core package.

PSPP can perform descriptive statistics, T-tests, linear regression and non-parametric tests. Its backend is designed to perform its analyses as fast as possible, regardless of the size of the input data. You can use PSPP with its graphical interface or the more traditional syntax commands.

A brief list of some of the features of PSPP follows:

  • Supports over 1 billion cases.
  • Supports over 1 billion variables.
  • Syntax and data files are compatible with SPSS.
  • Choice of terminal or graphical user interface.
  • Choice of text, postscript or html output formats.
  • Inter-operates with GnumericOpenOffice.Org and other free software.
  • Easy data import from spreadsheets, text files and database sources.
  • Fast statistical procedures, even on very large data sets.
  • No license fees.
  • No expiration period.
  • No unethical “end user license agreements”.
  • Fully indexed user manual.
  • Free Software; licensed under GPLv3 or later.
  • Cross platform; Runs on many different computers and many different operating systems.

PSPP is particularly aimed at statisticians, social scientists and students requiring fast convenient analysis of sampled data.

and

Features

This software provides a basic set of capabilities: frequencies, cross-tabs comparison of means (T-tests and one-way ANOVA); linear regression, reliability (Cronbach’s Alpha, not failure or Weibull), and re-ordering data, non-parametric tests, factor analysis and more.

At the user’s choice, statistical output and graphics are done in asciipdfpostscript or html formats. A limited range of statistical graphs can be produced, such as histogramspie-charts and np-charts.

PSPP can import GnumericOpenDocument and Excel spreadsheetsPostgres databasescomma-separated values– and ASCII-files. It can export files in the SPSS ‘portable’ and ‘system’ file formats and to ASCII files. Some of the libraries used by PSPP can be accessed programmatically; PSPP-Perl provides an interface to the libraries used by PSPP.

Origins

The PSPP project (originally called “Fiasco”) is a free, open-source alternative to the proprietary statistics package SPSS. SPSS is closed-source and includes a restrictive licence anddigital rights management. The author of PSPP considered this ethically unacceptable, and decided to write a program which might with time become functionally identical to SPSS, except that there would be no licence expiry, and everyone would be permitted to copy, modify and share the program.

Release history

  • 0.7.5 June 2010 http://pspp.awardspace.com/
  • 0.6.2 October 2009
  • 0.6.1 October 2008
  • 0.6.0 June 2008
  • 0.4.0.1 August 2007
  • 0.4.0 August 2005
  • 0.3.0 April 2004
  • 0.2.4 January 2000
  • 0.1.0 August 1998

Third Party Reviews

In the book “SPSS For Dummies“, the author discusses PSPP under the heading of “Ten Useful Things You Can Find on the Internet” [1]. In 2006, the South African Statistical Association presented a conference which included an analysis of how PSPP can be used as a free replacement to SPSS [2].

Citation-

Please send FSF & GNU inquiries to gnu@gnu.org. There are also other ways to contact the FSF. Please send broken links and other corrections (or suggestions) to bug-gnu-pspp@gnu.org.

Copyright © 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007 Free Software Foundation, Inc., 51 Franklin St – Suite 330, Boston, MA 02110, USA – Verbatim copying and distribution of this entire article are permitted worldwide, without royalty, in any medium, provided this notice, and the copyright notice, are preserved.

Protected: SAS Institute lawsuit against WPS Episode 2 The Clone Wars

This content is password protected. To view it please enter your password below:

%d bloggers like this: