SAS to R Challenge: Unique benchmarking

Flag of Town of Cary — Image via Wikipedia

An interesting announcemnet from Revolution Analytics promises to convert your legacy code in SAS language not only cheaper but faster. It’ s a very very interesting challenge and I wonder how SAS users ,corporates, customers as well as the Institute itself reacts

http://www.revolutionanalytics.com/sas-challenge/

Are you paying for expensive software licenses and hardware to run time-consuming statistical analyses on big data sets?

If you’re doing linear regressions, logistic regressions, predictions, or multivariate crosstabulations* there’s something you should know: Revolution Analytics can get the same results for a substantially lower cost and faster than SAS®.

Quick Link:
Revolution R Enterprise 4.2
Top 10 Reasons to Buy

For a limited time only, Revolution Analytics invites you take the SAS to R Challenge. Let us prove that we can deliver on our promise of replicating your results in R, faster and cheaper than SAS.

Here’s how it works:

Fill out the short form below, and one of our conversion experts will contact you to discuss the SAS code you want to convert. If we think Revolution R Enterprise can get the same results faster than SAS, we’ll convert your code to R free of charge. Our goal is to demonstrate that Revolution R Enterprise will produce the same results in less time. There’s no obligation, but if you choose to convert, we guarantee that your license cost for Revolution R Enterprise will be less than half what you’re currently paying for the equivalent SAS software.**

It’s that simple.

We’ll show you that you don’t need expensive hardware and software to do high quality statistical analysis of big data. And we’ll show that you don’t need to tie up your computing resources with long running operations. With Revolution R Enterprise, you can run analyses on commodity hardware using Linux or Windows, scale to terabyte-class data problems and do it at processing speeds you would never have thought possible.

Sign up now, and we will be in touch shortly.

—————————-

SAS is a registered trademark of the SAS Institute, Cary, NC, in the US and other countries.

*Additional statistical algorithms are being rapidly added to Revolution R Enterprise. Custom development services are also available.

**Revolution Analytics retains the right to determine eligibility for this offer. Offer available until March 31, 2011.

Revolution R Enterprise 4.2 now available (revolutionanalytics.com)
Live from Strata (revolutionanalytics.com)
Revolution Analytics in 2010 (revolutionanalytics.com)
UBIT: SAS for Windows (ubit.buffalo.edu)
What’s Next for Revolution R and Hadoop? (revolutionanalytics.com)
A simple test to predict coronary artery disease (r-bloggers.com)

LibreOffice Stable Release launched

Non Oracle Open Office completes important milestone- from the press release

The Document Foundation launches LibreOffice 3.3

The first stable release of the free office suite is available for download

The Internet, January 25, 2011 – The Document Foundation launches LibreOffice 3.3, the first stable release of the free office suite developed by the community. In less than four months, the number of developers hacking LibreOffice has grown from less than twenty in late September 2010, to well over one hundred today. This has allowed us to release ahead of the aggressive schedule set by the project.

Not only does it ship a number of new and original features, LibreOffice 3.3 is also a significant achievement for a number of reasons:

– the developer community has been able to build their own and independent process, and get up and running in a very short time (with respect to the size of the code base and the project’s strong ambitions);

– thanks to the high number of new contributors having been attracted into the project, the source code is quickly undergoing a major clean-up to provide a better foundation for future development of LibreOffice;

– the Windows installer, which is going to impact the largest and most diverse user base, has been integrated into a single build containing all language versions, thus reducing the size for download sites from 75 to 11GB, making it easier for us to deploy new versions more rapidly and lowering the carbon footprint of the entire infrastructure.

Caolán McNamara from RedHat, one of the developer community leaders, comments, “We are excited: this is our very first stable release, and therefore we are eager to get user feedback, which will be integrated as soon as possible into the code, with the first enhancements being released in February. Starting from March, we will be moving to a real time-based, predictable, transparent and public release schedule, in accordance with Engineering Steering Committee’s goals and users’ requests”. The LibreOffice development roadmap is available at http://wiki.documentfoundation.org/ReleasePlan

LibreOffice 3.3 brings several unique new features. The 10 most-popular among community members are, in no particular order:

the ability to import and work with SVG files;
an easy way to format title pages and their numbering in Writer;
a more-helpful Navigator Tool for Writer;
improved ergonomics in Calc for sheet and cell management;
and Microsoft Works and Lotus Word Pro document import filters.

In addition, many great extensions are now bundled, providing

PDF import,

a slide-show presenter console,

a much improved report builder, and more besides.

A more-complete and detailed list of all the new features offered by LibreOffice 3.3 is viewable on the following web page: http://www.libreoffice.org/download/new-features-and-fixes/

LibreOffice 3.3 also provides all the new features of OpenOffice.org 3.3, such as new custom properties handling; embedding of standard PDF fonts in PDF documents; new Liberation Narrow font; increased document protection in Writer and Calc; auto decimal digits for “General” format in Calc; 1 million rows in a spreadsheet; new options for CSV import in Calc; insert drawing objects in Charts; hierarchical axis labels for Charts; improved slide layout handling in Impress; a new easier-to-use print interface; more options for changing case; and colored sheet tabs in Calc. Several of these new features were contributed by members of the LibreOffice team prior to the formation of The Document Foundation.

LibreOffice hackers will be meeting at FOSDEM in Brussels on February 5 and 6, and will be presenting their work during a one-day workshop on February 6, with speeches and hacking sessions coordinated by several members of the project.

The home of The Document Foundation is at http://www.documentfoundation.org

The home of LibreOffice is at http://www.libreoffice.org where the download page has been redesigned by the community to be more user-friendly.

*** About The Document Foundation

The Document Foundation has the mission of facilitating the evolution of the OOo Community into a new, open, independent, and meritocratic organization within the next few months. An independent Foundation is a better reflection of the values of our contributors, users and supporters, and will enable a more effective, efficient and transparent community. TDF will protect past investments by building on the achievements of the first decade, will encourage wide participation within the community, and will co-ordinate activity across the community.

*** Media Contacts for TDF

Florian Effenberger (Germany)

Mobile: +49 151 14424108 – E-mail: floeff@documentfoundation.org

Olivier Hallot (Brazil)

Mobile: +55 21 88228812 – E-mail: olivier.hallot@documentfoundation.org

Charles H. Schulz (France)

Mobile: +33 6 98655424 – E-mail: charles.schulz@documentfoundation.org

Italo Vignoli (Italy)

Mobile: +39 348 5653829 – E-mail: italo.vignoli@documentfoundation.org

LibreOffice now default Office Suite in Ubuntu 11.04 (omgubuntu.co.uk)
Ubuntu 11.04 switches to LibreOffice in latest daily builds (downloadsquad.switched.com)
Ubuntu Ditches OpenOffice For LibreOffice (informationweek.com)
Ubuntu opts for LibreOffice over Oracle’s OpenOffice (zdnet.com)
LibreOffice Is Taking Shape With Third Beta (pcworld.com)
Ubuntu 11 Switches To Libre Office (lockergnome.com)

Comparing Bit Torrent Downloaders

Tux, as originally drawn by Larry Ewing — Image via Wikipedia

I personally like UTorrent on Windows and KTorrent on Linux.

While no experts on this, anything that gets the data down faster while maximizing my pipes efficiency.

I also like Torrenting than any of the sudo-apt get method of downloading software or the zip unzip,tar untar, install/make file

Torrenting is a simpler way of sharing applications but sadly not used much by the stats computing community to share downloads.

Also I think any dashboard or visualization should be sorted (but not alphabetically but numerically/categorically)

SORT THE DASHBOARD —-KEEP IT SORTED

So I am partially recreating after sorting the data viz from http://en.wikipedia.org/wiki/Comparison_of_BitTorrent_clients

BitTorrent client	Magnet URI	Super-seeding	Embedded tracker	UPnP [81]	NAT Port Mapping Protocol	NAT traversal [82]	DHT [83]	Peer exchange	Encryption	UDP tracker	LPD
µTorrent	Yes	Yes[95]	Yes[96]	Yes[97]	Yes	Yes[98]	Yes[99]	Yes[85]	Yes[100]	Yes	Yes[101]
BitSpirit [11]	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No
BitTorrent 6	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes[85]	Yes	Yes	Yes
OneSwarm	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No
qBittorrent	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
SoMud	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Vuze (formerly Azureus)	Yes	Yes	Yes	Yes	Yes	Yes[102]	Yes[87]	Yes	Yes	Yes	No
BitComet	Yes	Yes	Separate download	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No
Tixati [43]	Yes	Yes	No	Yes	No	No	Yes	Yes	Yes	Yes	Partial
Aria2	Yes	No	Yes	No	No	No	Yes	Yes	Yes	Yes	Yes
Tribler	Yes	No	Yes	Yes	Yes	No	Yes	Yes	Yes	No	No
Bitflu	Yes	No	No	No	No	No	Yes	Yes	No	Yes	No
Deluge	Yes	No	No	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Flush	Yes	No	No	Yes	Yes	No	Yes	Yes	No	No	Yes
KTorrent	Yes	No	No	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Partial
Shareaza	Yes	No	No	Yes	Yes	No	Yes[93]	Yes	No	No	No
Transmission	Yes	No	No	Yes	Yes	Yes	Yes	Yes[94]	Yes	No	Yes
LimeWire	Partial	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes	No
BitTyrant	No	Yes[citation needed]	Yes	Yes	Yes	Yes[86]	Yes[87]	Yes	Yes	No	No
BitTornado	No	Yes	Yes[84]	Yes	No	No	No	No	Yes	No	No
Torrent Swapper	No	Yes	Yes[84]	Yes	No	No	No	Yes	No	No	No
Localhost	No	Yes	Yes	Yes	No	Yes	Yes [89]	No	No	No	No
Meerkat Bittorrent Client	No	Yes	No	Yes	Yes	Yes	Yes	No	Yes	No	No
rTorrent	No	Yes	No	No	No	No	Yes	Yes	Yes	Yes	No[92]
TorrentFlux	No	Yes	No	Yes	No	No	No	No	Yes	No	No
TorrentVolve	No	Partial [76]	No	Partial[76]	Partial [76]	Partial [76]	Partial[76]	Partial [76]	Partial [76]	Partial [76]	No
Opera	No	No	Yes[90]	No	No	No	No	Yes[91]	No	No	No
BitTorrent 5 / Mainline	No	No	Yes[84]	Yes	Yes	No	Yes	Yes	Yes	No	No
ABC	No	No	Yes	Yes	No	No	No	No	No	No	No
Blog Torrent	No	No	Yes	No	No	No	No	No	No	No	No
MLDonkey	No	No	Yes	Yes	Yes	No	No	No	No	Yes	No
Tomato Torrent	No	No	Yes	No	No	No	Yes	No	No	No	No
Acquisition	No	No	No	No	Yes	No	No	No	No	No	No
Arctic Torrent	No	No	No	No	No	No	No	Yes	No	No	No
BitLet	No	No	No	Yes	No	No	No	No	No	No	No
BitLord	No	No	No	Yes	No	Yes	No	Yes	No	Yes	No
BitThief	No	No	No	No	No	No	No	No	No	No	No
Bits on Wheels	No	No	No	No	No	No	No	No	No	No	No
BTG	No	No	No	Yes	Yes	No	Yes	Yes	Yes	Yes	No
BTPD	No	No	No	No	No	No	No	No	No	No	No
FlashGet	No	No	No	No	No	No	Yes	No	Yes	No	No
Folx	No	No	No	Yes	Yes	No	Yes	Yes	No	Yes	No
Free Download Manager	No	No	No	No	No	No	Yes	Yes	No	No	No
G3 Torrent	No	No	No	No	No	No	No	No	No	No	No
Gnome BitTorrent	No	No	No	No	No	No	No	No	No	No	No
Halite	No	No	No	Yes	Yes	No	Yes	No	Yes	No[88]	No
QTorrent	No	No	No	No	No	No	No	No	No	No	No
Rufus	No	No	No	No	No	No	No	No	No	No	No
SymTorrent	No	No	No	N/A	N/A	N/A	No	No	No	No	No
Tonido Torrent	No	No	No	Yes	Yes	Yes	Yes	No	No	No	No
Torium	No	No	No	Yes	No	No	Yes	No	No	No	No
ZipTorrent	No	No	No	Yes	Yes	No	No	Yes	No	No	No

uTorrent Falcon Remote Controls Your BitTorrent Downloads from Any Browser [Downloads] (lifehacker.com)
Transmission 2.0 Adds a Whole Lot of Stability to the Popular BitTorrent Client [Downloads] (lifehacker.com)
Put uTorrent On Steroids By Installing Extensions On It [Windows] (makeuseof.com)
uTorrent Outpaces Vuze in BitTorrent Download Speed by 16% [File Sharing] (lifehacker.com)
uTorrent Adds Great iPhone (and Android) Remote Torrent Control Interface [Utorrent] (lifehacker.com)
Dropbox + uTorrent “Watched Folders” FTW (benjaminste.in)
BitTorrent’s Mainline and uTorrent clients reach 100 million active monthly users (downloadsquad.switched.com)
5 Best μTorrent Apps (maketecheasier.com)
Top 10 Cross-Platform BitTorrent Clients (tesarn.blogspot.com)
The 5 Best Torrent Clients For Linux (makeuseof.com)
You: Tribler BitTorrent Client Searches and Downloads Files, No Unreliable Tracker Required [Downloads] (lifehacker.com)
The Next Big DDOS Attack May Come via BitTorrent (gigaom.com)
BitTorrent Inc. To Launch All-In-One BitTorrent Ecosystem (torrentfreak.com)
Bittorrent Inc Launching All In One Application: Vuze Competitor (crenk.com)
BitTorrent Client Offers P2P Without Central Tracking (tech.slashdot.org)
How to Share Your Own Files Using BitTorrent [UltraNewb] (lifehacker.com)
Install apps on uTorrent with App Studio (madrasgeek.com)
Vuze 4.6 adds uTP support, speeds up torrent downloads (downloadsquad.switched.com)

Windows Azure and Amazon Free offer

For Hi-Computing folks try out Azure for free-

http://www.microsoft.com/windowsazure/offers/popup/popup.aspx?lang=en&locale=en-US&offer=MS-AZR-0001P#compute

Windows Azure Platform
Introductory Special

This promotional offer enables you to try a limited amount of the Windows Azure platform at no charge. The subscription includes a base level of monthly compute hours, storage, data transfers, a SQL Azure database, Access Control transactions and Service Bus connections at no charge. Please note that any usage over this introductory base level will be charged at standard rates.

Included each month at no charge:

Windows Azure
- 25 hours of a small compute instance
- 500 MB of storage
- 10,000 storage transactions
SQL Azure
- 1GB Web Edition database (available for first 3 months only)
Windows Azure platform AppFabric
- 100,000 Access Control transactions
- 2 Service Bus connections
Data Transfers (per region)
- 500 MB in
- 500 MB out

Any monthly usage in excess of the above amounts will be charged at the standard rates. This introductory special will end on March 31, 2011 and all usage will then be charged at the standard rates.

Standard Rates:

Windows Azure

Compute*
- Extra small instance**: $0.05 per hour
- Small instance (default): $0.12 per hour
- Medium instance: $0.24 per hour
- Large instance: $0.48 per hour
- Extra large instance: $0.96 per hour

http://aws.amazon.com/ec2/pricing/

Free Tier*

As part of AWS’s Free Usage Tier, new AWS customers can get started with Amazon EC2 for free. Upon sign-up, new AWScustomers receive the following EC2 services each month for one year:

750 hours of EC2 running Linux/Unix Micro instance usage
750 hours of Elastic Load Balancing plus 15 GB data processing
10 GB of Amazon Elastic Block Storage (EBS) plus 1 million IOs, 1 GB snapshot storage, 10,000 snapshot Get Requests and 1,000 snapshot Put Requests
15 GB of bandwidth in and 15 GB of bandwidth out aggregated across all AWS services

Paid Instances-

Standard On-Demand Instances	Linux/UNIX Usage	Windows Usage
Small (Default)	$0.085 per hour	$0.12 per hour
Large	$0.34 per hour	$0.48 per hour
Extra Large	$0.68 per hour	$0.96 per hour
Micro On-Demand Instances
Micro	$0.02 per hour	$0.03 per hour
High-Memory On-Demand Instances
Extra Large	$0.50 per hour	$0.62 per hour
Double Extra Large	$1.00 per hour	$1.24 per hour
Quadruple Extra Large	$2.00 per hour	$2.48 per hour
High-CPU On-Demand Instances
Medium	$0.17 per hour	$0.29 per hour
Extra Large	$0.68 per hour	$1.16 per hour
Cluster Compute Instances
Quadruple Extra Large	$1.60 per hour	N/A`*`
Cluster GPU Instances
Quadruple Extra Large	$2.10 per hour	N/A`*`
`*` Windows is not currently available for Cluster Compute or Cluster GPU Instances.

NOTE- Amazon Instance definitions differ slightly from Azure definitions

http://aws.amazon.com/ec2/instance-types/

Available Instance Types

Standard Instances

Instances of this family are well suited for most applications.

Small Instance – default*

1.7 GB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)
160 GB instance storage
32-bit platform
I/O Performance: Moderate
API name: m1.small

Large Instance

7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.large

Extra Large Instance

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.xlarge

Micro Instances

Instances of this family provide a small amount of consistent CPU resources and allow you to burst CPU capacity when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically.

Micro Instance

613 MB memory
Up to 2 EC2 Compute Units (for short periodic bursts)
EBS storage only
32-bit or 64-bit platform
I/O Performance: Low
API name: t1.micro

High-Memory Instances

Instances of this family offer large memory sizes for high throughput applications, including database and memory caching applications.

High-Memory Extra Large Instance

17.1 GB of memory
6.5 EC2 Compute Units (2 virtual cores with 3.25 EC2 Compute Units each)
420 GB of instance storage
64-bit platform
I/O Performance: Moderate
API name: m2.xlarge

High-Memory Double Extra Large Instance

34.2 GB of memory
13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each)
850 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.2xlarge

High-Memory Quadruple Extra Large Instance

68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.4xlarge

High-CPU Instances

Instances of this family have proportionally more CPU resources than memory (RAM) and are well suited for compute-intensive applications.

High-CPU Medium Instance

1.7 GB of memory
5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)
350 GB of instance storage
32-bit platform
I/O Performance: Moderate
API name: c1.medium

High-CPU Extra Large Instance

7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: c1.xlarge

Cluster Compute Instances

Instances of this family provide proportionally high CPU resources with increased network performance and are well suited for High Performance Compute (HPC) applications and other demanding network-bound applications. Learn more about use of this instance type for HPC applications.

Cluster Compute Quadruple Extra Large Instance

23 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cc1.4xlarge

Cluster GPU Instances

Instances of this family provide general-purpose graphics processing units (GPUs) with proportionally high CPU and increased network performance for applications benefitting from highly parallelized processing, including HPC, rendering and media processing applications. While Cluster Compute Instances provide the ability to create clusters of instances connected by a low latency, high throughput network, Cluster GPU Instances provide an additional option for applications that can benefit from the efficiency gains of the parallel computing power of GPUs over what can be achieved with traditional processors. Learn moreabout use of this instance type for HPC applications.

Cluster GPU Quadruple Extra Large Instance

22 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
2 x NVIDIA Tesla “Fermi” M2050 GPUs
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cg1.4xlarge

versus-

Windows Azure compute instances come in five unique sizes to enable complex applications and workloads.

Compute Instance Size	CPU	Memory	Instance Storage	I/O Performance
Extra Small	1 GHz	768 MB	20 GB*	Low
Small	1.6 GHz	1.75 GB	225 GB	Moderate
Medium	2 x 1.6 GHz	3.5 GB	490 GB	High
Large	4 x 1.6 GHz	7 GB	1,000 GB	High
Extra large	8 x 1.6 GHz	14 GB	2,040 GB	High

*There is a limitation on the Virtual Hard Drive (VHD) size if you are deploying a Virtual Machine role on an extra small instance. The VHD can only be up to 15 GB.

Blog Post: New Year’s Resolution – Test Drive Windows Azure platform for 30 Days No Credit Card Required (blogs.msdn.com)
Cloud Throw Down: Part 3 – Relational Databases and Instance Prices (cloudave.com)
Amazon To Offer Free Cloud Services (informationweek.com)
Microsoft Is Serious To Take On Amazon Web Services (cloudave.com)
The Four Dimensions of Cloud Provisioning (itexpertvoice.com)
“Windows Azure Discovery Events for ISVs in Western US” and related posts (ditii.com)
Windows Azure Free Training Kit – December (devcurry.com)
Windows Azure Free Training Kit – December (mt-soft.com.ar)
Windows Azure updates help with cloud migration (v3.co.uk)

R for Predictive Modeling:Workshop

A workshop on using R for Predictive Modeling, by the Director, Non Clinical Stats, Pfizer. Interesting Bay Area Event- part of next edition of Predictive Analytics World

Sunday, March 13, 2011 in San Francisco

R for Predictive Modeling:
A Hands-On Introduction

Intended Audience: Practitioners who wish to learn how to execute on predictive analytics by way of the R language; anyone who wants “to turn ideas into software, quickly and faithfully.”

Knowledge Level: Either hands-on experience with predictive modeling (without R) or hands-on familiarity with any programming language (other than R) is sufficient background and preparation to participate in this workshop.

Workshop Description

This one-day session provides a hands-on introduction to R, the well-known open-source platform for data analysis. Real examples are employed in order to methodically expose attendees to best practices driving R and its rich set of predictive modeling packages, providing hands-on experience and know-how. R is compared to other data analysis platforms, and common pitfalls in using R are addressed.

The instructor, a leading R developer and the creator of CARET, a core R package that streamlines the process for creating predictive models, will guide attendees on hands-on execution with R, covering:

A working knowledge of the R system
The strengths and limitations of the R language
Preparing data with R, including splitting, resampling and variable creation
Developing predictive models with R, including decision trees, support vector machines and ensemble methods
Visualization: Exploratory Data Analysis (EDA), and tools that persuade
Evaluating predictive models, including viewing lift curves, variable importance and avoiding overfitting

Hardware: Bring Your Own Laptop
Each workshop participant is required to bring their own laptop running Windows or OS X. The software used during this training program, R, is free and readily available for download.

Attendees receive an electronic copy of the course materials and related R code at the conclusion of the workshop.

Price and Registration Info:

Schedule

Workshop starts at 9:00am
Morning Coffee Break at 10:30am – 11:00am
Lunch provided at 12:30 – 1:15pm
Afternoon Coffee Break at 2:30pm – 3:00pm
End of the Workshop: 4:30pm

Instructor

Max Kuhn, Director, Nonclinical Statistics, Pfizer

Max Kuhn is a Director of Nonclinical Statistics at Pfizer Global R&D in Connecticut. He has been apply models in the pharmaceutical industries for over 15 years.

He is a leading R developer and the author of several R packages including the CARET package that provides a simple and consistent interface to over 100 predictive models available in R.

Mr. Kuhn has taught courses on modeling within Pfizer and externally, including a class for the India Ministry of Information Technology.

http://www.predictiveanalyticsworld.com/sanfrancisco/2011/r_for_predictive_modeling.php

In-Depth Hands-on Workshops Delivered By Analytics Experts and Leading Practitioners at Predictive Analytics World March 13-17, 2011, San Francisco, California (prweb.com)
Rapid Insight Provides Low-Cost Options for Desktop Data Transformation and Predictive Modeling (customerthink.com)
An R interface to the Google Prediction API (revolutionanalytics.com)
THE NEW NEXT: TrendsSpotting’s Trend Prediction Model (trendsspotting.com)
JMP Launches Global Online Store Powered by e-academy, Inc. (prweb.com)
Drug’s likelihood of causing birth defects predicted by model (physorg.com)

PySpread Magic

Image via Wikipedia

Just working with PySpread- and worked on a 1 million by 1 million spreadsheet- Python sure looks promising for the way ahead for stat computing ( you need to

sudo apt-get install python-numpy python-rpy python-scipy python-gmpy wxpython*,

cd to the untarred bz2 file from http://pyspread.sourceforge.net/download.html, (like

:~/Downloads$ cd pyspread-0.1.2

:~/Downloads/pyspread-0.1.2

sudo python setup.py install

)

http://pyspread.sourceforge.net/

by Martin Manns

about	Pyspread is a cross-platform Python spreadsheet application. It is based on and written in the programming language Python. Instead of spreadsheet formulas, Python expressions are entered into the spreadsheet cells. Each expression returns a Python object that can be accessed from other cells. These objects can represent anything including lists or matrices.
features	Three dimensional grid with up to 85,899,345 rows and 14,316,555 columns (64 bit systems, depends on row height and column width). Note that a million cells require about 500 MB of memory. Complex data types such as lists, trees or matrices within a single cell. Macros for functionalities that are too complex for a single Python expression. Python module access from each cell, which allows: Arbitrary size rational numbers (via gmpy), Fixed point decimal numbers for business calculations, (via the decimal module from the standard library) Advanced statistics including plotting functions (via RPy) Much more via <your favourite module>. CSV import and export Clipboard access
warning	The concept of pyspread allows doing everything from each cell that a Python script can do. This powerful feature has its drawbacks. A spreadsheet may very well delete your hard drive or send your data via the Internet. Of course this is a non-issue if you sandbox properly or if you only use self developed spreadsheets. Since this is not the case for everyone (see discussion at lwn.net), a GPG signature based trust model for spreadsheet files has been introduced. It ensures that only your own trusted files are executed on loading. Untrusted files are displayed in safe mode. You can approve a file manually. Inspect carefully.

about

Pyspread is a cross-platform Python spreadsheet application. It is based on and written in the programming language Python.

Instead of spreadsheet formulas, Python expressions are entered into the spreadsheet cells. Each expression returns a Python object that can be accessed from other cells. These objects can represent anything including lists or matrices.

features

Three dimensional grid with up to 85,899,345 rows and 14,316,555 columns (64 bit systems, depends on row height and column width). Note that a million cells require about 500 MB of memory.
Complex data types such as lists, trees or matrices within a single cell.
Macros for functionalities that are too complex for a single Python expression.
Python module access from each cell, which allows:
- Arbitrary size rational numbers (via gmpy),
- Fixed point decimal numbers for business calculations, (via the decimal module from the standard library)
- Advanced statistics including plotting functions (via RPy)
- Much more via <your favourite module>.
CSV import and export
Clipboard access

warning

The concept of pyspread allows doing everything from each cell that a Python script can do. This powerful feature has its drawbacks. A spreadsheet may very well delete your hard drive or send your data via the Internet. Of course this is a non-issue if you sandbox properly or if you only use self developed spreadsheets.

Since this is not the case for everyone (see discussion at lwn.net), a GPG signature based trust model for spreadsheet files has been introduced. It ensures that only your own trusted files are executed on loading. Untrusted files are displayed in safe mode. You can approve a file manually. Inspect carefully.

Python Package Index : PyPI (pypi.python.org)
SciPy – (scipy.org)
Top Ten Articles of 2010 (blog.pythonlibrary.org)
Ride the snake: Calling Python libraries from Haskell (john-millikin.com)
PyPy 1.4: Ouroboros in practice (morepypy.blogspot.com)
pyFLTK Home Page (pyfltk.sourceforge.net)
PyPy 1.4.1 (morepypy.blogspot.com)
python -me : a silly but useful command line trick (voidspace.org.uk)
PyPM Index for Python Developers (descentintodarkness.wordpress.com)
Python Extension Packages for Windows – Christoph Gohlke (lfd.uci.edu)
Ruby, Python, and Science (johndcook.com)
Compiling Python Code (effbot.org)
Deep end is deep (ask.metafilter.com)

Ways to use both Windows and Linux together

Some programming ways to use both Windows and Linux

1) Wubi

http://wubi.sourceforge.net/

Wubi only adds an extra option to boot into Ubuntu. Wubi does not require you to modify the partitions of your PC, or to use a different bootloader, and does not install special drivers.

2) Wine

Wine lets you run Windows software on other operating systems. With Wine, you can install and run these applications just like you would in Windows. Read more at http://wiki.winehq.org/Debunking_Wine_Myths

http://www.winehq.org/about/

3) Cygwin

http://www.cygwin.com/

Cygwin is a Linux-like environment for Windows. It consists of two parts:

A DLL (cygwin1.dll) which acts as a Linux API emulation layer providing substantial Linux API functionality.

A collection of tools which provide Linux look and feel

What Isn’t Cygwin?

Cygwin is not a way to run native linux apps on Windows. You have to rebuild your application from source if you want it to run on Windows.

Cygwin is not a way to magically make native Windows apps aware of UNIX ® functionality, like signals, ptys, etc. Again, you need to build your apps from source if you want to take advantage of Cygwin functionality.

4) Vmplayer

https://www.vmware.com/products/player/

VMware Player is the easiest way to run multiple operating systems at the same time on your PC. With its user-friendly interface, VMware Player makes it effortless for anyone to try out Windows 7, Chrome OS or the latest Linux releases, or create isolated virtual machines to safely test new software and surf the Web

ovigia: The table of equivalents / replacements / analogs of Windows software in Linux. (Official site of the table) (linuxrsp.ru)
How You Know When It’s Time to Switch to Linux (pcworld.com)
Choosing a Virtual Machine Solution (brighthub.com)

Here’s how it works:

It’s that simple.

Related Articles

Please share:

Related Articles

Please share:

Related Articles (Ps the Related Articles is auto generated by Zementa- a software embedded within WordPress.com in case you are wondering what the deal with the linking is)

Please share:

For Hi-Computing folks try out Azure for free-

Windows Azure Platform Introductory Special

Available Instance Types

Standard Instances

Micro Instances

High-Memory Instances

High-CPU Instances

Cluster Compute Instances

Cluster GPU Instances

Related Articles

Please share:

Sunday, March 13, 2011 in San Francisco

R for Predictive Modeling: A Hands-On Introduction

Workshop Description

Schedule

Instructor

Max Kuhn, Director, Nonclinical Statistics, Pfizer

Related Articles

Please share:

Related Articles

Please share:

What Isn’t Cygwin?

Related Articles

Please share:

Windows Azure Platform
Introductory Special

R for Predictive Modeling:
A Hands-On Introduction