Open Source and Software Strategy

Curt Monash at Monash Research pointed out some ongoing open source GPL issues for WordPress and the Thesis issue (Also see http://ma.tt/2009/04/oracle-and-open-source/ and  http://www.mattcutts.com/blog/switching-things-around/).

As a user of both going upwards of 2 years- I believe open source and GPL license enforcement are general parts of software strategy of most software companies nowadays. Some thoughts on  open source and software strategy-Thesis remains a very very popular theme and has earned upwards of 100,000 $ for its creator (estimate based on 20k plus installs and 60$ avg price)

  • Little guys like to give away code to get some satisfaction/ recognition, big guys give away free code only when its necessary or when they are not making money in that product segment anyway.
  • As Ethan Hunt said, ” Every Hero needs a Villian”. Every software (market share) war between players needs One Big Company Holding more market share and Open Source Strategy between other player who is not able to create in house code, so effectively out sources by creating open source project. But same open source propent rarely gives away the secret to its own money making project.
    • Examples- Google creates open source Android, but wont reveal its secret algorithm for search which drives its main profits,
    • Google again puts a paper for MapReduce but it’s Yahoo that champions Hadoop,
    • Apple creates open source projects (http://www.apple.com/opensource/) but wont give away its Operating Source codes (why?) which help people buys its more expensive hardware,
    • IBM who helped kickstart the whole proprietary code thing (remember MS DOS) is the new champion of open source (http://www.ibm.com/developerworks/opensource/) and
    • Microsoft continues to spark open source debate but read http://blogs.technet.com/b/microsoft_blog/archive/2010/07/02/a-perspective-on-openness.aspx and  also http://www.microsoft.com/opensource/
    • SAS gives away a lot of open source code (Read Jim Davis , CMO SAS here , but will stick to Base SAS code (even though it seems to be making more money by verticals focus and data mining).
    • SPSS was the first big analytics company that helps supports R (open source stats software) but will cling to its own code on its softwares.
    • WordPress.org gives away its software (and I like Akismet just as well as blogging) for open source, but hey as anyone who is on WordPress.com knows how locked in you can get by its (pricy) platform.
    • Vendor Lock-in (wink wink price escalation) is the elephant in the room for Big Software Proprietary Companies.
    • SLA Quality, Maintenance and IP safety is the uh-oh for going in for open source software mostly.
  • Lack of IP protection for revenue models for open source code is the big bottleneck  for a lot of companies- as very few software users know what to do with source code if you give it to them anyways.
    • If companies were confident that they would still be earning same revenue and there would be less leakage or theft, they would gladly give away the source code.
    • Derivative softwares or extensions help popularize the original softwares.
      • Half Way Steps like Facebook Applications  the original big company to create a platform for third party creators),
      • IPhone Apps and Android Applications show success of creating APIs to help protect IP and software control while still giving some freedom to developers or alternate
      • User Interfaces to R in both SAS/IML and JMP is a similar example
  • Basically open source is mostly done by under dog while top dog mostly rakes in money ( and envy)
  • There is yet to a big commercial success in open source software, though they are very good open source softwares. Just as Google’s success helped establish advertising as an alternate ( and now dominant) revenue source for online companies , Open Source needs a big example of a company that made billions while giving source code away and still retaining control and direction of software strategy.
  • Open source people love to hate proprietary packages, yet there are more shades of grey (than black and white) and hypocrisy (read lies) within  the open source software movement than the regulated world of big software. People will be still people. Software is just a piece of code.  😉

(Art citation-http://gapingvoid.com/about/ and http://gapingvoidgallery.com/

Top 10 Graphical User Interfaces in Statistical Software

Here is a list of top 10 GUIs in Statistical Software. The overall criterion is based on-

  • User Friendly Nature for a New User to begin click and point and learn.
  • Cleanliness of Automated Code or Log generated.
  • Practical application in consulting and corporate world.
  • Cost and Ease of Ownership (including purchase,install,training,maintainability,renewal)
  • Aesthetics (or just plain pretty)

However this list is not in order of ranking- ( as beauty (of GUI) lies in eyes of the beholder). For a list of top 10 GUI in R language only please see –

https://rforanalytics.wordpress.com/graphical-user-interfaces-for-r/

This is only a GUI based list so it excludes notable command line or text editor submit commands based softwares which are also very powerful and user friendly.

  1. JMP –

While critics of SAS Institute often complain on the premium pricing of the basic model (especially AFTER the entry of another SAS language software WPS from http://www.teamwpc.co.uk/products/wps – they should try out JMP from http://jmp.com – it has a 1 month free evaluation, is much less expensive and the GUI makes it very very easy to do basic statistical analysis and testing. The learning curve is surprisingly fast to pick it up (as it should be for well designed interfaces) and it allows for very good quality output graphics as well.

2.SPSS

The original GUI in this class of softwares- it has now expanded to a big portfolio of products. However SPSS 18 is nice with the increasing focus on Python and an early adoptee of R compatible interfaces, SPSS does offer a much affordable solution as well with a free evaluation. See especially http://www.spss.com/statistics/ and http://www.spss.com/software/modeling/modeler-pro/

the screenshot here is of SPSS Modeler

3. WPS

While it offers an alternative to Base SAS and SAS /Access software , I really like the affordability (1 Month Free Evaluation and overall lower cost especially for multiple CPU servers ), speed (on the desktop but not on the IBM OS version ) and the intuitive design as well as extensibility of the Workbench. It may look like an integrated development environment and not a proper GUI, but with all the menu features it does qualify as a GUI in my opinion. Continue reading “Top 10 Graphical User Interfaces in Statistical Software”

Using Red R- R with a Visual Interface

For people complaining about the GUI on R, here is the ah Enterprise Version of R called Red R.

It is available at the website at http://www.red-r.org/

 

You can read more there or just go through the short video created by them at

Basically it is a click and point method of using R with the ability to store schemas and thus very good for repeatable operations as well.


Not bad for epic software, huh?

Portrait of a Lady

Thats a screenshot of Daneese Cooper’s Wikepedia page. Danese was fired without severance by the Intel Capital Series B investors at http://www.reolution-computing.com If this is what you get after a lifetime of working in open Source, maybe I should recommend

people get job with Prof Jim Goodnight, Phd who rarely fires people and has managed to steer his company profitably without an IPO or Series Z funding.

On the other hand I kind of admire ladies trying to work in software companies. They are so few. and look up to people like Daneese to say that yes they can make it big too.

Good bye Daneese. May your big heart rest in piece on your blog  http://danesecooper.blogs.com/.

Screenshot-28

Red R- A new beginning

Check out an interesting new interface to R.

Note I haven’t tested it but plan to do so shortly as I am currently using Ubuntu 9 almost exclusively nowadays.

R fans who are  not quite overjoyed  with the wonderful beauty and charm  of the traditional R GUI may want to give it a try.

Citation-

http://code.google.com/p/r-orange/

Note- This website does not assume responsibilty for any software glitches as R comes with no warranty- unlike other softwares that come loaded with both a warranty and then bug-fix patches.

redr

R releases new version R 2.9.2

What is new in 2.9.2 (technical details not marketing spit and shine),

what didnt work in 2.9.1 ( shockingly bugs are fixed openly !!)

NEW FEATURES

    o   install.packages(NULL) now lists packages only once even if they
        occur in more than one repository (as the latest compatible
        version of those available will always be downloaded).

    o   approxfun() and approx() now accept a 'rule' of length two, for
        easy specification of different interpolation rules on left and
        right.

        They no longer segfault for invalid zero-length specification
        of 'yleft, 'yright', or 'f'.

    o   seq_along(x) is now equivalent to seq_len(length(x)) even where
        length() has an S3/S4 method; previously it (intentionally)
        always used the default method for length().

    o   PCRE has been updated to version 7.9 (for bug fixes).

    o   agrep() uses 64-bit ints where available on 32-bit platforms
        and so may do a better job with complex matches.
        (E.g. PR#13789, which failed only on 32-bit systems.)

DEPRECATED & DEFUNCT

    o   R CMD Rd2txt is deprecated, and will be removed in 2.10.0.
        (It is just a wrapper for R CMD Rdconv -t txt.)

    o   tools::Rd_parse() is deprecated and will be removed in 2.10.0
        (which will use only Rd version 2).

BUG FIXES

    o   parse_Rd() still did not handle source reference encodings
        properly.

    o   The C utility function PrintValue no longer attempts to print
        attributes for CHARSXPs as those attributes are used
        internally for the CHARSXP cache.  This fixes a segfault when
        calling it on a CHARSXP from C code.

    o   PDF graphics output was producing two instances of anything
        drawn with the symbol font face. (Report from Baptiste Auguie.)

    o   length(x) <- newval and grep() could cause memory corruption.
        (PR#13837)

    o   If model.matrix() was given too large a model, it could crash
        R. (PR#13838, fix found by Olaf Mersmann.)

    o   gzcon() (used by load()) would re-open an open connection,
        leaking a file descriptor each time. (PR#13841)

    o   The checks for inconsistent inheritance reported by setClass()
        now detect inconsistent superclasses and give better warning
        messages.

    o   print.anova() failed to recognize the column labelled
        P(>|Chi|) from a Poisson/binomial GLM anova as a p-value
        column in order to format it appropriately (and as a
        consequence it gave no significance stars).

    o   A missing PROTECT caused rare segfaults during calls to
        load().  (PR#13880, fix found by Bill Dunlap.)

    o   gsub() in a non-UTF-8 locale with a marked UTF-8 input
        could in rare circumstances overrun a buffer and so segfault.

    o   R CMD Rdconv --version was not working correctly.

    o   Missing PROTECTs in nlm() caused "random" errors. (PR#13381 by
        Adam D.I. Kramer, analysis and suggested fix by Bill Dunlap.)

    o   Some extreme cases of pbeta(log.p = TRUE) are more accurate
        (finite values < -700 rather than -Inf).  (PR#13786)

        pbeta() now reports on more cases where the asymptotic
        expansions lose accuracy (the underlying TOMS708 C code was
        ignoring some of these, including the PR#13786 example).

    o   new.env(hash = TRUE, size = NA) now works the way it has been
        documented to for a long time.

    o   tcltk::tk_choose.files(multi = TRUE) produces better-formatted
        output with filenames containing spaces.  (PR#13875)

    o   R CMD check --use-valgrind did not run valgrind on the package
tests.

    o   The tclvalue() and the print() and as.xxx methods for class
        "tclObj" crashed R with an invalid object -- seen with an
        object saved from an earlier session.

    o   R CMD BATCH garbled options -d <debugger> (useful for
        valgrind, although --debugger=valgrind always worked)

    o   INSTALL with LazyData and Encoding declared in DESCRIPTION
        might have left options("encoding") set for the rest of the
        package installation.

And from www.r-project.org the remaining updated news
  • R version 2.9.2 has been released on 2009-08-24. The source code will first become available in this directory, and eventually via all of CRAN. Binaries will arrive in due course (see download instructions above).
  • The first issue of The R Journal is now available
  • The R Foundation as been awarded four slots for R projects in the Google Summer of Code 2009.
  • DSC 2009, The 6th workshop on Directions in Statistical Computing, has been held at the Center for Health and Society, University of Copenhagen, Denmark, July 13-14, 2009.
  • useR! 2009, the R user conference, has been be held at Agrocampus Rennes, France, July 8-10, 2009.
  • useR! 2010, the R user conference, will be held at NIST, Gaithersburg, Maryland, USA, July 21-23, 2010.
  • We have started to collect information about local UseR Groups in the R Wiki.

Citation – http://www.r-project.org

Making Government Transparent Using R

Here is a terrific interview on O’Reilley Radar at http://radar.oreilly.com/2009/07/making-government-transparent.html

It actually talks of using open source statistics like R to make Government more transparent- like analyzing waste.

Some interesting extracts- like I didnt know S is being maintained by SAS.( I thought Tibco had S Plus)

Citation-http://radar.oreilly.com/2009/07/making-government-transparent.html

James Turner: So switching gears, the other thing you’re talking about and a big part of your professional life is the R language. Now I will confess that like Erlang, R is something that is on my radar and I see and I look at it and I say, “Okay. When am I ever going to use it?” I mean Erlang is used some places, but R I guess has a very nichey type of audience, doesn’t it?

Danese Cooper: You know, interestingly enough that’s changing. I think that’s been true. R has been in production or in development, let’s say, for the last 20 years. It is patterned after the S language, which was developed in the ’60s at Bell Labs around the same time that UNIX and C were being developed. And it was S for statistics, right? R is sort of a, “If we had known then what we know now” version of S. They’ve been working on it for 20 years in an academic setting. So it has been very slow to grow. But just in the last couple of years, it’s really gotten to a place where it’s ready for enterprise use. And just this year, the people that maintain S, a company called SAS, S-a-s, in South America, south of this country, have announced that they’re going to have to support R, like it’s that widely used now, particularly in schools.

Danese Cooper works for Revolution COmputing that creates a wonderful and professional version of R called Revolution R – some of the work on parallelization and enabling 64 bit Windows R is great. Danese is also a solid open source credentials person having worked with the Board and also with Apache. O Reilley Media’s work in open source conferences is terrific as well.

That apart, the great stuff is in the rest of this must read interview which is available athttp://radar.oreilly.com/2009/07/making-government-transparent.html