Ways to use both Windows and Linux together

Tux, as originally drawn by Larry Ewing — Image via Wikipedia

Some programming ways to use both Windows and Linux

http://wubi.sourceforge.net/

Wubi only adds an extra option to boot into Ubuntu. Wubi does not require you to modify the partitions of your PC, or to use a different bootloader, and does not install special drivers.

2) Wine

Wine lets you run Windows software on other operating systems. With Wine, you can install and run these applications just like you would in Windows. Read more at http://wiki.winehq.org/Debunking_Wine_Myths

http://www.winehq.org/about/

3) Cygwin

http://www.cygwin.com/

Cygwin is a Linux-like environment for Windows. It consists of two parts:

A DLL (cygwin1.dll) which acts as a Linux API emulation layer providing substantial Linux API functionality.

A collection of tools which provide Linux look and feel

What Isn’t Cygwin?

Cygwin is not a way to run native linux apps on Windows. You have to rebuild your application from source if you want it to run on Windows.

Cygwin is not a way to magically make native Windows apps aware of UNIX ® functionality, like signals, ptys, etc. Again, you need to build your apps from source if you want to take advantage of Cygwin functionality.

4) Vmplayer

https://www.vmware.com/products/player/

VMware Player is the easiest way to run multiple operating systems at the same time on your PC. With its user-friendly interface, VMware Player makes it effortless for anyone to try out Windows 7, Chrome OS or the latest Linux releases, or create isolated virtual machines to safely test new software and surf the Web

ovigia: The table of equivalents / replacements / analogs of Windows software in Linux. (Official site of the table) (linuxrsp.ru)
How You Know When It’s Time to Switch to Linux (pcworld.com)
Choosing a Virtual Machine Solution (brighthub.com)

Using SAS/IML with R

SAS just released an updated documentation to SAS/IML language with a special chapter devoted to using R

Here is an example-

CALL EXPORTMATRIXTOR( IMLMatrix, RMatrix ) ;

CALL IMPORTMATRIXFROMR( IMLMatrix, RExpr ) ;

If you have existing SAS licences and existing hardware and loots of data -this may be the best of both worlds- without getting into the mess of technically learning MKL threads/BLAS/Premium Packages/Cloud

Another thought- its a good professional looking help book, which is what more R packages can do (work on improving ease of their help/update vignettes)

Link-http://support.sas.com/documentation/cdl/en/imlug/63541/HTML/default/viewer.htm#r_toc.htm

Calling Functions in the R Language

SAS Continues to Expand Analytics Options with Additional R Integration (eon.businesswire.com)
SAS and R joins SAS-x (r-bloggers.com)
Microsoft (Probably) Didn’t Just Buy Unix (linux.slashdot.org)
Novell to retain rights to Unix (v3.co.uk)

Enterprise Linux rises rapidly:New Report

A new report from Linux Foundation found significant growth trends for enterprise usage of Linux- which should be welcome to software companies that have enabled Linux versions of software, service providers that provide Linux based consulting (note -lesser competition, lower overheads) and to application creators.

From –

http://www.linuxfoundation.org/news-media/announcements/2010/10/new-linux-foundation-user-survey-shows-enterprise-linux-achieve-sig

Key Findings from the Report
• 79.4 percent of companies are adding more Linux relative to other operating systems in the next five years.

• More people are reporting that their Linux deployments are migrations from Windows than any other platform, including Unix migrations. 66 percent of users surveyed say that their Linux deployments are brand new (“Greenfield”) deployments.

• Among the early adopters who are operating in cloud environments, 70.3 percent use Linux as their primary platform, while only 18.3 percent use Windows.

• 60.2 percent of respondents say they will use Linux for more mission-critical workloads over the next 12 months.

• 86.5 percent of respondents report that Linux is improving and 58.4 percent say their CIOs see Linux as more strategic to the organization as compared to three years ago.

• Drivers for Linux adoption extend beyond cost: technical superiority is the primary driver, followed by cost and then security.

• The growth in Linux, as demonstrated by this report, is leading companies to increasingly seek Linux IT professionals, with 38.3 percent of respondents citing a lack of Linux talent as one of their main concerns related to the platform.

• Users participate in Linux development in three primary ways: testing and submitting bugs (37.5 percent), working with vendors (30.7 percent) and participating in The Linux Foundation activities (26.0 percent).

and from the report itself-

download here-

http://www.linuxfoundation.org/lp/page/download-the-free-linux-adoption-trends-report

As Enterprise Moves to the Cloud, Enterprise Moves to Linux [Infographic] (readwriteweb.com)
Report: Linux makes gains in server applications (news.cnet.com)
Pogo Linux Introduces Intel Xeon Processor-Based VMware Certified Servers, Available Immediately (prweb.com)
SUSE Linux Optimization for SAP Apps… Now? (enterpriseirregulars.com)

Using Code Editors in R

Using Enhanced Code Editors

Advantages of using enhanced code editors

1) Readability- Features like syntax coloring helps make the code more readable for documentation as well as debugging and improvement. Example functions may be colored in blue, input parameters in green, and simple default code syntax in black. Especially for lengthy programs or tweaking auto generated code by GUI, this readability comes in handy.

2) Automatic syntax error checking- Enhanced editors can prompt you if certain errors in syntax (like brackets not closed, commas misplaced)- and errors may be highlighted in color (red mostly). This helps a lot in correcting code especially if you are either new to R programming or your main focus is business insights and not just coding. Syntax debugging is thus simplified.

3) Speed of writing code- Most programmers report an increase in writing code speed when using an enhanced editor.

4) Point Breaks- You can insert breaks at certain parts of code to run some lines of code together, or debug a program. This is a big help given that default code editor makes it very cumbersome and you have to copy and paste lines of code again and again to run selectively. On an enhanced editor you can submit lines as well as paragraphs of code.

5) Auto-Completion- Auto completion enables or suggests options you to complete the syntax even when you have typed part of the function name.

Some commonly used code editors are –
Notepad++ -It supports R and also has a plugin called NPP to R.
It can be used for a wide variety of other languages as well, and has all the features mentioned above.

Revolution R Productivity Environment (RPE)-While Revolution R has announced a new GUI to be launched in 2011- the existing enhancements to their software include a code editor called RPE.

Syntax color highlighting is already included. Code Snippets work in a fairly simply way.
Right click-
Click on Insert Code Snippet.

You can get a drop down of tasks to do- (like Analysis)
Selecting Analysis we get another list of sub-tasks (like Clustering).
Once you click on Clustering you get various options.
Like clicking clara will auto insert the code for clara clustering.

Now even if you are averse to using a GUI /or GUI creators don’t have your particular analysis you can basically type in code at an extremely fast pace.
It is useful to even experienced people who do not have to type in the entire code, but it is a boon to beginners as the parameters in function inserted by code snippet are automatically selected in multiple colors. And it can help you modify the auto generated code by your R GUI at a much faster pace.

TinnR -The most popular and a very easy to use code editor. It is available at http://www.sciviews.org/Tinn-R/
It’s disadvantage is it supports Windows operating system only.
Recommended as the beginner’s chose fore code editor.

Eclipse with R plugin http://www.walware.de/goto/statet This is recommended especially to people working with Eclipse and on Unix systems. It enables you to do most of the productivity enhancement featured in other text editors including submitting code the R session.

Gvim (http://www.vim.org/) along Vim-R-plugin2
(http://www.vim.org/scripts/script.php?script_id=2628) should be
cited. The Vim-R-plugin developer recently added windows support to a
lean cross-platform package that works well. It can be suited as a niche text editor to people who like less features in the software. It is not as good as Eclipse or Notepad++ but is probably the simplest to use.

Running R on Amazon EC2 :Windows

Running R on an Amazon EC2 has following benefits-

1) Elastic Memory and Number of Processors for heavy computation
2) Affordable micro instances for smaller datasets (2 cents per hour for Unix to 3 cents per hour).
3) An easy to use interface console for managing datasets as well as processes

Running R on an Amazon EC2 on Windows Instance has following additional benefits-

1) Remote Desktop makes operation of R very easy
2) 64 Bit R can be used
3) You can also use your evaluation of Revolution R Enterprise (which is free to academics) and quite inexpensive for enterprise software for corporates.

You can thus combine R GUIs (like Rattle , R Cmdr or Deducer based upon your need for statistical analysis, data mining or graphical analysis) , with 64 Bit OS, and Revolution’s REvoScaler Package to manage huge huge datasets at a very easy to use analytics solution.

Pricing-for Computation on EC2

Standard On-Demand Instances	Linux/UNIX Usage	Windows Usage
Small (Default)	$0.085 per hour	$0.12 per hour
Large	$0.34 per hour	$0.48 per hour
Extra Large	$0.68 per hour	$0.96 per hour
Micro On-Demand Instances	Linux/UNIX Usage	Windows Usage
Micro	$0.02 per hour	$0.03 per hour
High-Memory On-Demand Instances
Extra Large	$0.50 per hour	$0.62 per hour
Double Extra Large	$1.00 per hour	$1.24 per hour
Quadruple Extra Large	$2.00 per hour	$2.48 per hour
High-CPU On-Demand Instances
Medium	$0.17 per hour	$0.29 per hour
Extra Large	$0.68 per hour	$1.16 per hour
Cluster Compute Instances
Quadruple Extra Large	$1.60 per hour	N/A*
`*` Windows is not currently available for Cluster Compute Instances.

Internet Data Transfer

The pricing below is based on data transferred “in” and “out” of Amazon EC2.

Data Transfer In	US & EU Regions	APAC Region
All Data Transfer	Free until Nov 1, 2010 `*`	Free until Nov 1, 2010 `*`

Data Transfer Out `**`	US & EU Regions	APAC Region
First 1 GB per Month	$0.00 per GB	$0.00 per GB
Up to 10 TB per Month	$0.15 per GB	$0.19 per GB

Amazon EBS Volumes- To store data

$0.10 per GB-month of provisioned storage
$0.10 per 1 million I/O requests

Amazon EBS Snapshots to Amazon S3 (priced the same as Amazon S3)

$0.15 per GB-month of data stored
$0.01 per 1,000 PUT requests (when saving a snapshot)
$0.01 per 10,000 GET requests (when loading a snapshot)

http://aws.amazon.com/ec2/#pricing Other costs are optional to your needs

Based on the above- I set out to try and create a how-to DIY for running R (and RevolutionR on 64bit Windows on EC2)

1) Logon to https://console.aws.amazon.com/ec2/home

2) Launch Windows Instance

Choose AMI

Left Margin-AMI-

Top Windows – Select Windows 64 AMI

(note if you select SQL Server it will cost you extra)

Then go through the following steps and launch instance

Selecting EC2 compute depending on number of cores, memory needs and budget

Create a key pair (a .pem file which is basically an encrypted password) and download it.
For tags, etc just click on and pass (or read and create some tags to help you remember, and organize multiple instances)
In configure firewall, remember to Enable Access to RDP (Remote Desktop) and HTTP. You can choose to enable whole internet or your own ip address/es for logging in
Review and launch instance

Go to instance (leftmost margin)

and see status (yellow for pending)
Click on Instance Actions-Connect on Top Bar to see following
Download the .RDP shortcut file and
Click on Instance Actions-Request Admin Password

Wait 15 minutes while burning few cents for free as Microsoft creates a password for you
Have coffee (or tea is you are health minded)
Click Again on Instance Actions-Request Admin Password

Open the key pair file (or .pem file created earlier) using

notepad, and copy and paste the Private Key (looks like gibberish)- and click Decrypt.

Retrieve Password for logging on.

Note the new password generated- this is your Remote Desktop Password.

Click on the .rdp file (or Shortcut file created earlier)- It will connect to your Windows instance.

Enter the new generated password in Remote Desktop

This looks like a new clean machine with just Windows OS installed on it.

Install Chrome (or any other browser) if you do not use Internet Explorer
Install Acrobat Reader (for documentation), Revolution R Enterprise~ 490 mb (it will automatically ask to install the .NET framework-4 files) and /or R

Install packages (I recommend installing R Commander, Rattle and Deducer). Apart from the fact that these GUIs are quite complimentary- they also will install almost all main packages that you need for analysis (as their dependencies) Revolution R installs parallel programming packages by default.

If you want to save your files for working later, you can make a snapshot (go to amazon console-ec2- left margin- ABS -Snapshot- you will see an attached memory (green light)- click on create snapshot to save your files for working later
If you want to use my Windows snapshot you can work on it , just when you start your Amazon Ec2 you can click on snapshots and enter details (see snapshot name below) for making a copy or working on it for exploring either 64 bit R, or multi core cloud computing or just trying out Revolution R’s new packages for academic purposes.

Dryad- Microsoft's answer to MR

While reading across the internet I came across Microsoft’s version to MapReduce called Dryad- which has been around for some time, but has not generated quite the buzz that Hadoop or MapReduce are doing.

http://research.microsoft.com/en-us/projects/dryadlinq/

DryadLINQ

DryadLINQ is a simple, powerful, and elegant programming environment for writing large-scale data parallel applications running on large PC clusters.

Overview

New! An academic release of Dryad/DryadLINQ is now available for public download.

The goal of DryadLINQ is to make distributed computing on large compute cluster simple enough for every programmers. DryadLINQ combines two important pieces of Microsoft technology: the Dryad distributed execution engine and the .NET Language Integrated Query (LINQ).

Dryad provides reliable, distributed computing on thousands of servers for large-scale data parallel applications. LINQ enables developers to write and debug their applications in a SQL-like query language, relying on the entire .NET library and using Visual Studio.

DryadLINQ translates LINQ programs into distributed Dryad computations:

C# and LINQ data objects become distributed partitioned files.

LINQ queries become distributed Dryad jobs.

C# methods become code running on the vertices of a Dryad job.

DryadLINQ has the following features:

Declarative programming: computations are expressed in a high-level language similar to SQL

Automatic parallelization: from sequential declarative code the DryadLINQ compiler generates highly parallel query plans spanning large computer clusters. For exploiting multi-core parallelism on each machine DryadLINQ relies on the PLINQ parallelization framework.

Integration with Visual Studio: programmers in DryadLINQ take advantage of the comprehensive VS set of tools: Intellisense, code refactoring, integrated debugging, build, source code management.

Integration with .Net: all .Net libraries, including Visual Basic, and dynamic languages are available.

and

Conciseness: the following line of code is a complete implementation of the Map-Reduce computation framework in DryadLINQ:

and http://research.microsoft.com/en-us/projects/dryad/

Dryad

The Dryad Project is investigating programming models for writing parallel and distributed programs to scale from a small cluster to a large data-center.

Overview

New! An academic release of DryadLINQ is now available for public download.

Dryad is an infrastructure which allows a programmer to use the resources of a computer cluster or a data center for running data-parallel programs. A Dryad programmer can use thousands of machines, each of them with multiple processors or cores, without knowing anything about concurrent programming.

The Structure of Dryad Jobs

A Dryad programmer writes several sequential programs and connects them using one-way channels. The computation is structured as a directed graph: programs are graph vertices, while the channels are graph edges. A Dryad job is a graph generator which can synthesize any directed acyclic graph. These graphs can even change during execution, in response to important events in the computation.

Dryad is quite expressive. It completely subsumes other computation frameworks, such as Google’s map-reduce, or the relational algebra. Moreover, Dryad handles job creation and management, resource management, job monitoring and visualization, fault tolerance, re-execution, scheduling, and accounting.

The Dryad Software Stack

As a proof of Dryad’s versatility, a rich software ecosystem has been built on top Dryad:

SSIS on Dryad executes many instances of SQL server, each in a separate Dryad vertex, taking advantage of Dryad’s fault tolerance and scheduling. This system is currently deployed in a live production system as part of one of Microsoft’s AdCenter log processing pipelines.

DryadLINQ generates Dryad computations from the LINQ Language-Integrated Query extensions to C#.

The distributed shell is a generalization of the pipe concept from the Unix shell in three ways. If Unix pipes allow the construction of one-dimensional (1-D) process structures, the distributed shell allows the programmer to build 2-D structures in a scripting language. The distributed shell generalizes Unix pipes in three ways:

It allows processes to easily connect multiple file descriptors of each process — hence the 2-D aspect.

It allows the construction of pipes spanning multiple machines, across a cluster.

It virtualizes the pipelines, allowing the execution of pipelines with many more processes than available machines, by time-multiplexing processors and buffering results.

Several languages are compiled to distributed shell processes. PSQL is an early version, recently replaced with Scope.

Publications

Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly
European Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21-23, 2007

Video of a presentation on Dryad at the Google Campus, given by Michael Isard, Nov 1, 2007.

Also interesting to read-

Why does Dryad use a DAG?

he basic computational model we decided to adopt for Dryad is the directed-acyclic graph (DAG). Each node in the graph is a computation, and each edge in the graph is a stream of data traveling in the direction of the edge. The amount of data on any given edge is assumed to be finite, the computations are assumed to be deterministic, and the inputs are assumed to be immutable. This isn’t by any means a new way of structuring a distributed computation (for example Condor had DAGMan long before Dryad came along), but it seemed like a sweet spot in the design space given our other constraints.

So, why is this a sweet spot? A DAG is very convenient because it induces an ordering on the nodes in the graph. That makes it easy to design scheduling policies, since you can define a node to be ready when its inputs are available, and at any time you can choose to schedule as many ready nodes as you like in whatever order you like, and as long as you always have at least one scheduled you will continue to make progress and never deadlock. It also makes fault-tolerance easy, since given our determinism and immutability assumptions you can backtrack as far as you want in the DAG and re-execute as many nodes as you like to regenerate intermediate data that has been lost or is unavailable due to cluster failures.

from

http://blogs.msdn.com/b/dryad/archive/2010/07/23/why-does-dryad-use-a-dag.aspx

Compression Tips

1) Stuck with Huge Datasets in SAS.

Use SAS Code,

Options compress=yes

2)Stuck with huge datasets in UNIX Space.

Use compress “filename.extension”

3) Huge data in Windows- Use the following utility

Use 7 Zip.Open source

You don’t need to register or pay for 7-Zip.

www.7–zip.org/

SAVE SPACE ON YOUR SYSTEMS 🙂

What Isn’t Cygwin?

Related Articles

Please share:

Calling Functions in the R Language

Related Articles

Please share:

It is an interesting report (and for some reason in a blue font-making it more like a blue paper than a white paper)

Related Articles

Please share:

Please share:

Please share:

Overview

Overview

The Structure of Dryad Jobs

The Dryad Software Stack

Publications

Why does Dryad use a DAG?

Please share:

Please share: