Calling #Rstats lovers and bloggers – to work together on “The R Programming wikibook”

so you think u like R, huh. Well it is time to pay it forward.

Message from a dear R blogger, Tal G from Tel Aviv (creator of R-bloggers.com and SAS-X.com)

———————————————————————————————————-
Calling R lovers and bloggers – to work together on “The R Programming wikibook”
Posted: 20 Jun 2011 07:05 AM PDT

This post is a call for both R community members and R-bloggers, to come and help make The R Programming wikibook be amazing:

Dear R community member – please consider giving a visit to The R Programming wikibook. If you wish to contribute your knowledge and editing skills to the project, then you could learn how to write in wiki-markup here, and how to edit a wikibook here (you can even use R syntax highlighting in the wikibook). You could take information into the site from the (soon to be) growing list of available R resources for harvesting.

Dear R blogger, you can help The R Programming wikibook by doing the following:

Write to your readers about the project and invite them to join.
Add your blog’s R content as an available resource for other editors to use for the wikibook. Here is how to do that:
First, make a clear indication on your blog that your content is licensed under cc-by-sa copyrights (*see what it means at the end of the post). You can do this by adding it to the footer of your blog, or by writing a post that clearly states that this is the case (what a great opportunity to write to your readers about the project…).
Next, go and add a link, to where all of your R content is located on your site, to the resource page (also with a link to the license post, if you wrote one). For example, since I write about other things besides R, I would give a link to my R category page, and will also give a link to this post. If you do not know how to add it to the wiki, just e-mail me about it (tal.galili@gmail.com).
If you are an R blogger, besides living up to the spirit of the R community, you will benefit from joining this project in that every time someone will use your content on the wikibook, they will add your post as a resource. In the long run, this is likely to help visitors of the site get to know about you and strengthen your site’s SEO ranking. Which reminds me, if you write about this, I always appreciate a link back to my blog

* Having a cc-by-sa copyrights means that you will agree that anyone may copy, distribute, display, and make derivative works based on your content, only if they give the author (you) the credits in the manner specified by you. And also that the user may distribute derivative works only under a license identical to the license that governs the original work.

———-

Three more points:

1) This post is a result of being contacted by Paul (a.k.a: PAC2), asking if I could help promote “The R Programming wikibook” among R-bloggers and their readers. Paul has made many contributions to the book so far. So thank you Paul for both reaching out and helping all of us with your work on this free open source project.

2) I should also mention that the R wiki exists and is open for contribution. And naturally, every thing that will help the R wikibook will help the R wiki as well.

3) Copyright notice: I hereby release all of the writing material content that is categoriesed in the R category page, under the cc-by-sa copyrights (date: 20.06.2011). Now it’s your turn!

———-

List of R bloggers who have joined: (This list will get updated as this “group writing” project will progress)

R-statistics blog (that’s Tal…)
Decisionstats.com (That’s me)
……………………………………………………………………………….
3) Copyright notice: I hereby release all of the writing material content of this website, under the cc-by-sa copyrights (date: 21.06.2011). Now it’s your turn!

https://decisionstats.com/privacy-3/

Content Licensing-
This website has all content licensed under
http://creativecommons.org/licenses/by-sa/3.0/
You are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work

Heritage offers 3 million chump change for Monkeys

My perspective is life is not fair, and if someone offers me 1 mill a year so they make 1 bill a year, I would still take it, especially if it leads to better human beings and better humanity on this planet. Health care isnt toothpaste.

Unless there are even more fine print changes involved- there exist several players in the pharma sector who do build and deploy models internally for denying claims or prospecting medical doctors with freebies, but they might just get caught with the new open data movement

————————————————————————————————–

A note from KDNuggets-

Heritage Health Prizereleased a second set of data on May 4. They also recently modified their ruleswhich now demand complete exclusivity and seem to disallow use of other tools (emphasis mine – Gregory PS)

21. LICENSE
By registering for the Competition, each Entrant (a) grants to Sponsor and its designees a worldwide, exclusive (except with respect to Entrant) , sub-licensable (through multiple tiers), transferable, fully paid-up, royalty-free, perpetual, irrevocable right to use, not use, reproduce, distribute (through multiple tiers), create derivative works of, publicly perform, publicly display, digitally perform, make, have made, sell, offer for sale and import the entry and the algorithm used to produce the entry, as well as any other algorithm, data or other information whatsoever developed or produced at any time using the data provided to Entrant in this Competition (collectively, the “Licensed Materials”), in any media now known or hereafter developed, for any purpose whatsoever, commercial or otherwise, without further approval by or payment to Entrant (the “License”) and
(b) represents that he/she/it has the unrestricted right to grant the License. 
Entrant understands and agrees that the License is exclusive except with respect to Entrant: Entrant may use the Licensed Materials solely for his/her/its own patient management and other internal business purposes but may not grant or otherwise transfer to any third party any rights to or interests in the Licensed Materials whatsoever.

This has lead to a call to boycott the competition by Tristan, who also notes that academics cannot publish their results without prior written approval of the Sponsor.

Anthony Goldbloom, CEO of Kaggle, emailed the HHP participants on May 4

HPN have asked me to pass on the following message: “The Heritage Provider Network is sponsoring the Heritage Health Prize to spur innovation and creative thinking in healthcare. HPN, however, is a medical group and must retain an exclusive license to the algorithms created using its data so as to ensure that the algorithms are used responsibly, and are only used to provide better health care to patients and not for improper purposes.
Put simply, while the competition hopes to spur innovation, this is not a competition regarding movie ratings or chess results. We hope that the clarifications we have made to the Rules and the FAQ adequately address your concerns and look forward to your participation in the competition.”

What do you think? Will the exclusive license prevent you from participating?

Analyzing Conversations on Twitter

If you are a marketing , analyst relationship, public relationship or a product manager who uses or abuses social media, you sometimes need to track what influencers and analysts are saying. A tool called Bettween allows you to capture public conversations between two influential (or interesting) tweeps.

See conversations between Neil Raden http://www.beyeblogs.com/raden/ and Curt Monash http://www.dbms2.com/ two noted BI gurus

http://bettween.com/neilraden/curtmonash

  • @NEILRADEN66
  • @CURTMONASH61
  • TOTAL MESSAGES127
  • SHARE CONVERSATION


unless Google decides to license its Wave technology to Twitter for separate encrypted , or public tweets. 🙂 They do share some history and employees (cough cough) or Twitter waits to create or better its public /protected tweet mode to be more granular

http://bettween.com/neilraden/curtmonash#statistics

tools to analyze Twitter conversations in SAS

What to do if you see a possible GPL violation

GNU Lesser General Public License
Image via Wikipedia

Well I have played with software (mostly but not exclusively) analytical, and I admire the zeal and energy of both open source and closed source practioners- all having relatively decent people executing strategies their investors or owners tell them to do (closed source) or motivated by their own self sense of cool-change the world-openness (open source)

What I dont get is people stealing open source code- repackaging without adding major contributions- claiming patent pending stuff- and basically making money by creating CLOSED source from the open source software-(as open source is yet to break the enterprise glass cieling)

you are either open source or you arent.

bi- sexuality is okay. bi-codability is not.

Next time you see someone stealing some community’s open source code- refer to this excellent link.

 

But, we cannot act on our own if we do not hold copyright. Thus, be sure to find out who the copyright holders of the software are before reporting a violation.

http://www.gnu.org/licenses/gpl-violation.html

Violations of the GNU Licenses

If you think you see a violation of the GNU GPLLGPLAGPL, or FDL, the first thing you should do is double-check the facts:

  • Does the distribution contain a copy of the License?
  • Does it clearly state which software is covered by the License? Does it say anything misleading, perhaps giving the impression that something is covered by the License when in fact it is not?
  • Is source code included in the distribution?
  • Is a written offer for source code included with a distribution of just binaries?
  • Is the available source code complete, or is it designed for linking in other non-free modules?

If there seems to be a real violation, the next thing you need to do is record the details carefully:

  • the precise name of the product
  • the name of the person or organization distributing it
  • email addresses, postal addresses and phone numbers for how to contact the distributor(s)
  • the exact name of the package whose license is violated
  • how the license was violated:
    • Is the copyright notice of the copyright holder included?
    • Is the source code completely missing?
    • Is there a written offer for source that’s incomplete in some way? This could happen if it provides a contact address or network URL that’s somehow incorrect.
    • Is there a copy of the license included in the distribution?
    • Is some of the source available, but not all? If so, what parts are missing?

The more of these details that you have, the easier it is for the copyright holder to pursue the matter.

Once you have collected the details, you should send a precise report to the copyright holder of the packages that are being misused. The copyright holder is the one who is legally authorized to take action to enforce the license.

If the copyright holder is the Free Software Foundation, please send the report to <license-violation@gnu.org>. It’s important that we be able to write back to you to get more information about the violation or product. So, if you use an anonymous remailer, please provide a return path of some sort. If you’d like to encrypt your correspondence, just send a brief mail saying so, and we’ll make appropriate arrangements.

Note that the GPL, and other copyleft licenses, are copyright licenses. This means that only the copyright holders are empowered to act against violations. The FSF acts on all GPL violations reported on FSF copyrighted code, and we offer assistance to any other copyright holder who wishes to do the same.

But, we cannot act on our own if we do not hold copyright. Thus, be sure to find out who the copyright holders of the software are before reporting a violation.

 

PSPP – SPSS 's Open Source Counterpart

A Bold GNU Head
Image via Wikipedia

New Website for Windows Installers for PSPP– try at your own time if you are dedicated to either SPSS or free statistical computing.

http://pspp.awardspace.com/

This page is intended to give a stable root for downloading the PSPP-for-Windows setup from free mirrors.

Highlights of the current PSPP-for-Windows setup
PSPP info:

Current version: Master version = 0.7.6
Release date: See filenames
Information about PSPP: http://www.gnu.org/software/pspp
PSPP Manual: PDF or HTML
(current version will be installed on your PC by the installer package)
Package info:

Windows version: Windows XP and newer
Package Size: 15 Mb
Size on disk: 34 Mb
Technical: MinGW based
Cross-compiled on openSUSE 11.3

Downloads:
There are issues with the latest build. Some users report crashes on their systems on other systems it works fine.

Version Installer for multi-user installation.
Administrator privileges required.
Recommended version.
Installer for single-user installation.
No administrator privileges required
0.7.6-g38ba1e-blp-build20101116
0.7.5-g805e7e-blp-build20100908
0.7.5-g7803d3-blp-build20100820
0.7.5-g333ac4-blp-build20100727
PSPP-Master-2010-11-16
PSPP-Master-2010-09-08
PSPP-Master-2010-08-20
PSPP-Master-2010-07-27
PSPP-Master-single-user-2010-11-16
PSPP-Master-single-user-2010-09-08
PSPP-Master-single-user-2010-08-20
PSPP-Master-single-user-2010-07-27

 

Sources can be found here.

Also see http://en.wikipedia.org/wiki/PSPP

At the user’s choice, statistical output and graphics are done in ASCIIPDFPostScript or HTML formats. A limited range of statistical graphs can be produced, such as histogramspie-charts and np-charts.

PSPP can import GnumericOpenDocument and Excel spreadsheetsPostgres databasescomma-separated values– and ASCII-files. It can export files in the SPSS ‘portable’ and ‘system’ file formats and to ASCII files. Some of the libraries used by PSPP can be accessed programmatically; PSPP-Perl provides an interface to the libraries used by PSPP.

and

http://www.gnu.org/software/pspp/

A brief list of some of the features of PSPP follows:

  • Supports over 1 billion cases.
  • Supports over 1 billion variables.
  • Syntax and data files are compatible with SPSS.
  • Choice of terminal or graphical user interface.
  • Choice of text, postscript or html output formats.
  • Inter-operates with GnumericOpenOffice.Org and other free software.
  • Easy data import from spreadsheets, text files and database sources.
  • Fast statistical procedures, even on very large data sets.
  • No license fees.
  • No expiration period.
  • No unethical “end user license agreements”.
  • Fully indexed user manual.
  • Free Software; licensed under GPLv3 or later.
  • Cross platform; Runs on many different computers and many different operating systems.

 

WPS Version 2.5.1 Released – can still run SAS language/data and R

However this is what Phil Rack the reseller is quoting on http://www.minequest.com/Pricing.html

Windows Desktop Price: $884 on 32-bit Windows and $1,149 on 64-bit Windows.

The Bridge to R is available on the Windows platforms and is available for free to customers who
license WPS through MineQuest,LLC. Companies and organizations outside of North America
may purchase a license for the Bridge to R which starts at $199 per desktop or $599 per server

Windows Server Price: $1,903 per logical CPU for 32-bit and $2,474 for 64-bit.

Note that Linux server versions are available but do not yet support the Eclipse IDE and are
command line only

WPS sure seems going well-but their pricing is no longer fixed and on the home website, you gotta fill a form. Ditt0 for the 30 day free evaluation

http://www.teamwpc.co.uk/products/wps/modules/core

Data File Formats

The table below provides a summary of data formats presently supported by the WPS Core module.

Data File Format Un-Compressed
Data
Compressed
Data
Read Write Read Write
SD2 (SAS version 6 data set)
SAS7BDAT (SAS version 7 data set)
SAS7BDAT (SAS version 8 data set)
SAS7BDAT (SAS version 9 data set)
SASSEQ (SAS version 8/9 sequential file)
V8SEQ (SAS version 8 sequential file)
V9SEQ (SAS version 9 sequential file)
WPD (WPS native data set)
WPDSEQ (WPS native sequential file)
XPORT (transport format)

Additional access to EXCEL, SPSS and dBASE files is supported by utilising the WPS Engine for DB Filesmodule.

and they have a new product release on Valentine Day 2011 (oh these Europeans!)

From the press release at http://www.teamwpc.co.uk/press/wps2_5_1_released

WPS Version 2.5.1 Released 

New language support, new data engines, larger datasets, improved scalability

LONDON, UK – 14 February 2011 – World Programming today released version 2.5.1 of their WPS software for workstations, servers and mainframes.

WPS is a competitively priced, high performance, highly scalable data processing and analytics software product that allows users to execute programs written in the language of SAS. WPS is supported on a wide variety of hardware and operating system platforms and can connect to and work with many types of data with ease. The WPS user interface (Workbench) is frequently praised for its ease of use and flexibility, with the option to include numerous third-party extensions.

This latest version of the software has the ability to manipulate even greater volumes of data, removing the previous 2^31 (2 billion) limit on number of observations.

Complimenting extended data processing capabilities, World Programming has worked hard to boost the performance, scalability and reliability of the WPS software to give users the confidence they need to run heavy workloads whilst delivering maximum value from available computer power.

WPS version 2.5.1 offers additional flexibility with the release of two new data engines for accessing Greenplum and SAND databases. WPS now comes with eleven data engines and can access a huge range of commonly used and industry-standard file-formats and databases.

Support in WPS for the language of SAS continues to expand with more statistical procedures, data step functions, graphing controls and many other language items and options.

WPS version 2.5.1 is available as a free upgrade to all licensed users of WPS.

Summary of Main New Features:

  • Supporting Even Larger Datasets
    WPS is now able to process very large data sets by lifting completely the previous size limit of 2^31 observations.
  • Performance and Scalability Boosted
    Performance and scalability improvements across the board combine to ensure even the most demanding large and concurrent workloads are processed efficiently and reliably.
  • More Language Support
    WPS 2.5.1 continues the expansion of it’s language support with over 70 new language items, including new Procedures, Data Step functions and many other language items and options.
  • Statistical Analysis
    The procedure support in WPS Statistics has been expanded to include PROC CLUSTER and PROC TREE.
  • Graphical Output
    The graphical output from WPS Graphing has been expanded to accommodate more configurable graphics.
  • Hash Tables
    Support is now provided for hash tables.
  • Greenplum®
    A new WPS Engine for Greenplum provides dedicated support for accessing the Greenplum database.
  • SAND®
    A new WPS Engine for SAND provides dedicated support for accessing the SAND database.
  • Oracle®
    Bulk loading support now available in the WPS Engine for Oracle.
  • SQL Server®
    To enhance existing SQL Server database access, a new SQLSERVR (please note spelling) facility in the ODBC engine.

More Information:

Existing Users should visit www.teamwpc.co.uk/support/wps/release where you can download a readme file containing more information about all the new features and fixes in WPS 2.5.1.

New Users should visit www.teamwpc.co.uk/products/wps where you can explore in more detail all the features available in WPS or request a free evaluation.

and from http://www.teamwpc.co.uk/products/wps/data it seems they are going on the BIG DATA submarine as well-

Data Support 

Extremely Large Data Size Handling

WPS is now able to handle extremely large data sets now that the previous limit of 2^31 observations has been lifted.

Access Standard Databases

Use I/O Features in WPS Core

  • CLIPBOARD (Windows only)
  • DDE (Windows only)
  • EMAIL (via SMTP or MAPI)
  • FTP
  • HTTP
  • PIPE (Windows and UNIX only)
  • SOCKET
  • STDIO
  • URL

Use Standard Data File Formats

SAS to R Challenge: Unique benchmarking

Flag of Town of Cary
Image via Wikipedia

An interesting announcemnet from Revolution Analytics promises to convert your legacy code in SAS language not only cheaper but faster. It’ s a very very interesting challenge and I wonder how SAS users ,corporates, customers as well as the Institute itself reacts

http://www.revolutionanalytics.com/sas-challenge/

Take the SAS to R Challenge

Are you paying for expensive software licenses and hardware to run time-consuming statistical analyses on big data sets?

If you’re doing linear regressions, logistic regressions, predictions, or multivariate crosstabulations* there’s something you should know: Revolution Analytics can get the same results for a substantially lower cost and faster than SAS®.

For a limited time only, Revolution Analytics invites you take the SAS to R Challenge. Let us prove that we can deliver on our promise of replicating your results in R, faster and cheaper than SAS.

Take the challenge

Here’s how it works:

Fill out the short form below, and one of our conversion experts will contact you to discuss the SAS code you want to convert. If we think Revolution R Enterprise can get the same results faster than SAS, we’ll convert your code to R free of charge. Our goal is to demonstrate that Revolution R Enterprise will produce the same results in less time. There’s no obligation, but if you choose to convert, we guarantee that your license cost for Revolution R Enterprise will be less than half what you’re currently paying for the equivalent SAS software.**

It’s that simple.

We’ll show you that you don’t need expensive hardware and software to do high quality statistical analysis of big data. And we’ll show that you don’t need to tie up your computing resources with long running operations. With Revolution R Enterprise, you can run analyses on commodity hardware using Linux or Windows, scale to terabyte-class data problems and do it at processing speeds you would never have thought possible.

Sign up now, and we will be in touch shortly.

Take the challenge

 

—————————-

SAS is a registered trademark of the SAS Institute, Cary, NC, in the US and other countries.

*Additional statistical algorithms are being rapidly added to Revolution R Enterprise. Custom development services are also available.

**Revolution Analytics retains the right to determine eligibility for this offer. Offer available until March 31, 2011.