Workflows and MyExperiment.org

Here is a great website for sharing workflows – it is called MyExperiment.org and it can also include Work flows from many software.

myExperiment currently has 4742 members270 groups1842 workflows423 files and 173 packs

Could it also include workflow from Red-R from #rstats or Enterprise Miner

Continue reading “Workflows and MyExperiment.org”

Machine Learning Contest

New Contest at http://www.ecmlpkdd2011.org/dcOverview.php

 

 

Discovery Challenge Overview

Organization | Overview | Task and DatasetsTimeline

 

General description: tasks and dataset

VideoLectures.net is a free and open access multimedia repository of video lectures, mainly of research and educational character. The lectures are given by distinguished scholars and scientists at the most important and prominent events like conferences, summer schools, workshops and science promotional events from many fields of Science. The portal is aimed at promoting science, exchanging ideas and fostering knowledge sharing by providing high quality didactic contents not only to the scientific community but also to the general public. All lectures, accompanying documents, information and links are systematically selected and classified through the editorial process taking into account also users’ comments.

The ECML-PKDD 2011 Discovery Challenge is organized in order to improve the website’s current recommender system. The challenge consists of two main tasks and a “side-by” contest. The provided data is for both of the tasks, and it is up to the contestants how it will be used for learning (building up) a recommender.

Due to the nature of the problem, each of the tasks has its own merit: task 1 simulates new-user and new- item recommendation (cold-start mode), task 2 simulates clickstream based recommendation (normal mode). Continue reading “Machine Learning Contest”

Updated Interview Elissa Fink -VP Tableau Software

Here is an interview with Elissa Fink, VP Marketing of that new wonderful software called Tableau that makes data visualization so nice and easy to learn and work with.

Elissa Fink, VP, Marketing

Ajay-  Describe your career journey from high school to over 20 plus years in marketing. What are the various trends that you have seen come and go in marketing.

Elissa- I studied literature and linguistics in college and didn’t discover analytics until my first job selling advertising for the Wall Street Journal. Oddly enough, the study of linguistics is not that far from decision analytics: they both are about taking a structured view of information and trying to see and understand common patterns. At the Journal, I was completely captivated analyzing and comparing readership data. At the same time, the idea of using computers in marketing was becoming more common. I knew that the intersection of technology and marketing was going to radically change things – how we understand consumers, how we market and sell products, and how we engage with customers. So from that point on, I’ve always been focused on technology and marketing, whether it’s working as a marketer at technology companies or applying technology to marketing problems for other types of companies.  There have been so many interesting trends. Taking a long view, a key trend I’ve noticed is how marketers work to understand, influence and motivate consumer behavior. We’ve moved marketing from where it was primarily unpredictable, qualitative and aimed at talking to mass audiences, where the advertising agency was king. Now it’s a discipline that is more data-driven, quantitative and aimed at conversations with individuals, where the best analytics wins. As with any trend, the pendulum swings far too much to either side causing backlashes but overall, I think we are in a great place now. We are using data-driven analytics to understand consumer behavior. But pure analytics is not the be-all, end-all; good marketing has to rely on understanding human emotions, intuition and gut feel – consumers are far from rational so taking only a rational or analytical view of them will never explain everything we need to know.

Ajay- Do you think technology companies are still predominantly dominated by men . How have you seen diversity evolve over the years. What initiatives has Tableau taken for both hiring and retaining great talent.

Elissa- The thing I love about the technology industry is that its key success metrics – inventing new products that rapidly gain mass adoption in pursuit of making profit – are fairly objective. There’s little subjective nature to the counting of dollars collected selling a product and dollars spent building a product. So if a female can deliver a better product and bigger profits faster and better, then that female is going to get the resources, jobs, power and authority to do exactly that. That’s not to say that the technology industry is gender-blind, race-blind, etc. It isn’t – technology is far from perfect. For example, the industry doesn’t have enough diversity in positions of power. But I think overall, in comparison to a lot of other industries, it’s pretty darn good at giving people with great ideas the opportunities to realize their visions regardless of their backgrounds or characteristics.

At Tableau, we are very serious about bringing in and developing talented people – they are the key to our growth and success. Hiring is our #1 initiative so we’ve spent a lot of time and energy both on finding great candidates and on making Tableau a place that they want to work. This includes things like special recruiting events, employee referral programs, a flexible work environment, fun social events, and the rewards of working for a start-up. Probably our biggest advantage is the company itself – working with people you respect on amazing, cutting-edge products that delight customers and are changing the world is all too rare in the industry but a reality at Tableau. One of our senior software developers put it best when he wrote “The emphasis is on working smarter rather than longer: family and friends are why we work, not the other way around. Tableau is all about happy, energized employees executing at the highest level and delivering a highly usable, high quality, useful product to our customers.” People who want to be at a place like that should check out our openings at http://www.tableausoftware.com/jobs.

Ajay- What are most notable features in tableau’s latest edition. What are the principal software that competes with Tableau Software products and how would you say Tableau compares with them.

Elissa- Tableau 6.1 will be out in July and we are really excited about it for 3 reasons.

First, we’re introducing our mobile business intelligence capabilities. Our customers can have Tableau anywhere they need it. When someone creates an interactive dashboard or analytical application with Tableau and it’s viewed on a mobile device, an iPad in particular, the viewer will have a native, touch-optimized experience. No trying to get your fingertips to act like a mouse. And the author didn’t have to create anything special for the iPad; she just creates her analytics the usual way in Tableau. Tableau knows the dashboard is being viewed on an iPad and presents an optimized experience.

Second, we’ve take our in-memory analytics engine up yet another level. Speed and performance are faster and now people can update data incrementally rapidly. Introduced in 6.0, our data engine makes any data fast in just a few clicks. We don’t run out of memory like other applications. So if I build an incredible dashboard on my 8-gig RAM PC and you try to use it on your 2-gig RAM laptop, no problem.

And, third, we’re introducing more features for the international markets – including French and German versions of Tableau Desktop along with more international mapping options.  It’s because we are constantly innovating particularly around user experience that we can compete so well in the market despite our relatively small size. Gartner’s seminal research study about the Business Intelligence market reported a massive market shift earlier this year: for the first time, the ease-of-use of a business intelligence platform was more important than depth of functionality. In other words, functionality that lots of people can actually use is more important than having sophisticated functionality that only specialists can use. Since we focus so heavily on making easy-to-use products that help people rapidly see and understand their data, this is good news for our customers and for us.

Ajay-  Cloud computing is the next big thing with everyone having a cloud version of their software. So how would you run Cloud versions of Tableau Server (say deploying it on an Amazon Ec2  or a private cloud)

Elissa- In addition to the usual benefits espoused about Cloud computing, the thing I love best is that it makes data and information more easily accessible to more people. Easy accessibility and scalability are completely aligned with Tableau’s mission. Our free product Tableau Public and our product for commercial websites Tableau Digital are two Cloud-based products that deliver data and interactive analytics anywhere. People often talk about large business intelligence deployments as having thousands of users. With Tableau Public and Tableau Digital, we literally have millions of users. We’re serving up tens of thousands of visualizations simultaneously – talk about accessibility and scalability!  We have lots of customers connecting to databases in the Cloud and running Tableau Server in the Cloud. It’s actually not complex to set up. In fact, we focus a lot of resources on making installation and deployment easy and fast, whether it’s in the cloud, on premise or what have you. We don’t want people to have spend weeks or months on massive roll-out projects. We want it to be minutes, hours, maybe a day or 2. With the Cloud, we see that people can get started and get results faster and easier than ever before. And that’s what we’re about.

Ajay- Describe some of the latest awards that Tableau has been wining. Also how is Tableau helping universities help address the shortage of Business Intelligence and Big Data professionals.

Elissa-Tableau has been very fortunate. Lately, we’ve been acknowledged by both Gartner and IDC as the fastest growing business intelligence software vendor in the world. In addition, our customers and Tableau have won multiple distinctions including InfoWorld Technology Leadership awards, Inc 500, Deloitte Fast 500, SQL Server Magazine Editors’ Choice and Community Choice awards, Data Hero awards, CODiEs, American Business Awards among others. One area we’re very passionate about is academia, participating with professors, students and universities to help build a new generation of professionals who understand how to use data. Data analysis should not be exclusively for specialists. Everyone should be able to see and understand data, whatever their background. We come from academic roots, having been spun out of a Stanford research project. Consequently, we strongly believe in supporting universities worldwide and offer 2 academic programs. The first is Tableau For Teaching, where any professor can request free term-length licenses of Tableau for academic instruction during his or her courses. And, we offer a low-cost Student Edition of Tableau so that students can choose to use Tableau in any of their courses at any time.

Elissa Fink, VP Marketing,Tableau Software

 

Elissa Fink is Tableau Software’s Vice President of Marketing. With 20+ years helping companies improve their marketing operations through applied data analysis, Elissa has held executive positions in marketing, business strategy, product management, and product development. Prior to Tableau, Elissa was EVP Marketing at IXI Corporation, now owned by Equifax. She has also served in executive positions at Tele Atlas (acquired by TomTom), TopTier Software (acquired by SAP), and Nielsen/Claritas. Elissa also sold national advertising for the Wall Street Journal. She’s a frequent speaker and has spoken at conferences including the DMA, the NCDM, Location Intelligence, the AIR National Forum and others. Elissa is a graduate of Santa Clara University and holds an MBA in Marketing and Decision Systems from the University of Southern California.

Elissa first discovered Tableau late one afternoon at her previous company. Three hours later, she was still “at play” with her data. “After just a few minutes using the product, I was getting answers to questions that were taking my company’s programmers weeks to create. It was instantly obvious that Tableau was on a special mission with something unique to offer the world. I just had to be a part of it.”

To know more – read at http://www.tableausoftware.com/

and existing data viz at http://www.tableausoftware.com/learn/gallery

Storm seasons: measuring and tracking key indicators
What’s happening with local real estate prices?
How are sales opportunities shaping up?
Identify your best performing products
Applying user-defined parameters to provide context
Not all tech companies are rocket ships
What’s really driving the economy?
Considering factors and industry influencers
The complete orbit along the inside, or around a fixed circle
How early do you have to be at the airport?
What happens if sales grow but so does customer churn?
What are the trends for new retail locations?
How have student choices changed?
Do patients who disclose their HIV status recover better?
Closer look at where gas prices swing in areas of the U.S.
U.S. Census data shows more women of greater age
Where do students come from and how does it affect their grades?
Tracking customer service effectiveness
Comparing national and local test scores
What factors correlate with high overall satisfaction ratings?
Fund inflows largely outweighed outflows well after the bubble
Which programs are competing for federal stimulus dollars?
Oil prices and volatility
A classic candlestick chart
How do oil, gold and CPI relate to the GDP growth rate?

 

Interview – first Big Data Conference

From time to time, I revisit the past just to ensure I am not choking on the same spin or hype bullshit I resent in this world of marketing the worlds perfect software (everyone claims to be it, noone is it)

From the very first Big Data conference ever (and kudos to Steve Wooledge at AsterData for coining the buzz word of the decade- BIG DATA)- a two year old interview.

Is it relevant- or outdated.

watch?v=cqgChBV_JBs

https://www.youtube.com/v/cqgChBV_JBs?version=3&hl=en_US

Hacking Hackers

This is a ten step program to fight hacking attacks. You may or may not choose to ignore it, laugh at it, or ponder on it.

1) Internet security is a billion dollar business which will only grow in size as cloud computing approaches. Pioneers in providing security will earn considerable revenue like McAffee  , Norton did in the PC era. Incidentally it also means the consulting/partner group that is willing to work with virtual workers and virtual payments to offshore consultants.

2) Industrial espionage has existed from the days the West stole Gunpowder and Silk formula from China (and China is now doing the same to its software). The company and country will the best hackers will win. Keep your team motivated mate, or it is very easy for them to defect to the other side of the (cyber) wall.

3) When 2 billion people have access to internet the number of hackers will grow in number and quality much more rapidly than when only 100 million people across the world had access. Thanks to Google Translate, Paypal, Skype video Call, Tor Project, and Google Voice i can and have collaborative with hackers almost in all geographies. You can only imagine what the black hats are doing.

4) Analyzing hackers is like reading Chinese Tea Leaves. If you have experienced analysts, you will slip up. recruit the hackers in the dormitory before China recruits them using Lulz Security as a bogus cover. or USA recruits them as cover for spreading democracy in the Arab countries.

5) get your website audited for security breaches. sponsor a hack my website contest. before someone else does it for you.

6) Fighting hackers was always tough. But now we have part time hackers , people with perfectly respectable jobs who look like Mr Andersen and hack like Neo from the Matrix. Every kid once wanted to be a firefighter. Every geek dreams  of the one ultimate hack.

7) if you cant beat hackers, join them.

8) the more machine data is generated, the more you need external experts and newer software interfaces. Investing in open data, datasets is good. Keeping Bradley manning naked in his cell is bad. ignore the bad PR at your own cost.

9) Stop blaming China for every hack attack. You are a techie not a politician

10) Hack hard. Hack well. If someone hacks you, you will need to hack them off offensively unless you just want to be an easy mark for the rest of your lives. Counter -hacking expertise needs to be strengthened and groomed. hacking is an offense not just a defense game.

 

 

Amazon Ec2 goes Red Hat

message from Amazing Amazon’s cloud team- this will also help for #rstats users given that revolution Analytics full versions on RHEL.

—————————————————-

on-demand instances of Amazon EC2 running Red Hat Enterprise Linux (RHEL) for as little as $0.145 per instance hour. The offering combines the cost-effectiveness, scalability and flexibility of running in Amazon EC2 with the proven reliability of Red Hat Enterprise Linux.

Highlights of the offering include:

  • Support is included through subscription to AWS Premium Support with back-line support by Red Hat
  • Ongoing maintenance, including security patches and bug fixes, via update repositories available in all Amazon EC2 regions
  • Amazon EC2 running RHEL currently supports RHEL 5.5, RHEL 5.6, RHEL 6.0 and RHEL 6.1 in both 32 bit and 64 bit formats, and is available in all Regions.
  • Customers who already own Red Hat licenses will continue to be able to use those licenses at no additional charge.
  • Like all services offered by AWS, Amazon EC2 running Red Hat Enterprise Linux offers a low-cost, pay-as-you-go model with no long-term commitments and no minimum fees.

For more information, please visit the Amazon EC2 Red Hat Enterprise Linux page.

which is

Amazon EC2 Running Red Hat Enterprise Linux

Amazon EC2 running Red Hat Enterprise Linux provides a dependable platform to deploy a broad range of applications. By running RHEL on EC2, you can leverage the cost effectiveness, scalability and flexibility of Amazon EC2, the proven reliability of Red Hat Enterprise Linux, and AWS premium support with back-line support from Red Hat.. Red Hat Enterprise Linux on EC2 is available in versions 5.5, 5.6, 6.0, and 6.1, both in 32-bit and 64-bit architectures.

Amazon EC2 running Red Hat Enterprise Linux provides seamless integration with existing Amazon EC2 features including Amazon Elastic Block Store (EBS), Amazon CloudWatch, Elastic-Load Balancing, and Elastic IPs. Red Hat Enterprise Linux instances are available in multiple Availability Zones in all Regions.

Sign Up

Pricing

Pay only for what you use with no long-term commitments and no minimum fee.

On-Demand Instances

On-Demand Instances let you pay for compute capacity by the hour with no long-term commitments.

Region:US – N. VirginiaUS – N. CaliforniaEU – IrelandAPAC – SingaporeAPAC – Tokyo
Standard Instances Red Hat Enterprise Linux
Small (Default) $0.145 per hour
Large $0.40 per hour
Extra Large $0.74 per hour
Micro Instances Red Hat Enterprise Linux
Micro $0.08 per hour
High-Memory Instances Red Hat Enterprise Linux
Extra Large $0.56 per hour
Double Extra Large $1.06 per hour
Quadruple Extra Large $2.10 per hour
High-CPU Instances Red Hat Enterprise Linux
Medium $0.23 per hour
Extra Large $0.78 per hour
Cluster Compute Instances Red Hat Enterprise Linux
Quadruple Extra Large $1.70 per hour
Cluster GPU Instances Red Hat Enterprise Linux
Quadruple Extra Large $2.20 per hour

Pricing is per instance-hour consumed for each instance type. Partial instance-hours consumed are billed as full hours.

↑ Top

and

Available Instance Types

Standard Instances

Instances of this family are well suited for most applications.

Small Instance – default*

1.7 GB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)
160 GB instance storage
32-bit platform
I/O Performance: Moderate
API name: m1.small

Large Instance

7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.large

Extra Large Instance

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.xlarge

Micro Instances

Instances of this family provide a small amount of consistent CPU resources and allow you to burst CPU capacity when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically.

Micro Instance

613 MB memory
Up to 2 EC2 Compute Units (for short periodic bursts)
EBS storage only
32-bit or 64-bit platform
I/O Performance: Low
API name: t1.micro

High-Memory Instances

Instances of this family offer large memory sizes for high throughput applications, including database and memory caching applications.

High-Memory Extra Large Instance

17.1 GB of memory
6.5 EC2 Compute Units (2 virtual cores with 3.25 EC2 Compute Units each)
420 GB of instance storage
64-bit platform
I/O Performance: Moderate
API name: m2.xlarge

High-Memory Double Extra Large Instance

34.2 GB of memory
13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each)
850 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.2xlarge

High-Memory Quadruple Extra Large Instance

68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.4xlarge

High-CPU Instances

Instances of this family have proportionally more CPU resources than memory (RAM) and are well suited for compute-intensive applications.

High-CPU Medium Instance

1.7 GB of memory
5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)
350 GB of instance storage
32-bit platform
I/O Performance: Moderate
API name: c1.medium

High-CPU Extra Large Instance

7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: c1.xlarge

Cluster Compute Instances

Instances of this family provide proportionally high CPU resources with increased network performance and are well suited for High Performance Compute (HPC) applications and other demanding network-bound applications. Learn more about use of this instance type for HPC applications.

Cluster Compute Quadruple Extra Large Instance

23 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cc1.4xlarge

Cluster GPU Instances

Instances of this family provide general-purpose graphics processing units (GPUs) with proportionally high CPU and increased network performance for applications benefitting from highly parallelized processing, including HPC, rendering and media processing applications. While Cluster Compute Instances provide the ability to create clusters of instances connected by a low latency, high throughput network, Cluster GPU Instances provide an additional option for applications that can benefit from the efficiency gains of the parallel computing power of GPUs over what can be achieved with traditional processors. Learn more about use of this instance type for HPC applications.

Cluster GPU Quadruple Extra Large Instance

22 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
2 x NVIDIA Tesla “Fermi” M2050 GPUs
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cg1.4xlarge

 


Getting Started

To get started using Red Hat Enterprise Linux on Amazon EC2, perform the following steps:

  • Open and log into the AWS Management Console
  • Click on Launch Instance from the EC2 Dashboard
  • Select the Red Hat Enterprise Linux AMI from the QuickStart tab
  • Specify additional details of your instance and click Launch
  • Additional details can be found on each AMI’s Catalog Entry page

The AWS Management Console is an easy tool to start and manage your instances. If you are looking for more details on launching an instance, a quick video tutorial on how to use Amazon EC2 with the AWS Management Console can be found here .
A full list of Red Hat Enterprise Linux AMIs can be found in the AWS AMI Catalog.

↑ Top


Support

All customers running Red Hat Enterprise Linux on EC2 will receive access to repository updates from Red Hat. Moreover, AWS Premium support customers can contact AWS to get access to a support structure from both Amazon and Red Hat.

↑ Top


Resources

↑ Top


About Red Hat

Red Hat, the world’s leading open source solutions provider, is headquartered in Raleigh, NC with over 50 satellite offices spanning the globe. Red Hat provides high-quality, low-cost technology with its operating system platform, Red Hat Enterprise Linux, together with applications, management and Services Oriented Architecture (SOA) solutions, including the JBoss Enterprise Middleware Suite. Red Hat also offers support, training and consulting services to its customers worldwide.

 

also from Revolution Analytics- in case you want to #rstats in the cloud and thus kill all that talk of RAM dependency, slow R than other softwares (just increase the RAM above in the instances to keep it simple)

,or Revolution not being open enough

http://www.revolutionanalytics.com/downloads/gpl-sources.php

GPL SOURCES

Revolution Analytics uses an Open-Core Licensing model. We provide open- source R bundled with proprietary modules from Revolution Analytics that provide additional functionality for our users. Open-source R is distributed under the GNU Public License (version 2), and we make our software available under a commercial license.

Revolution Analytics respects the importance of open source licenses and has contributed code to the open source R project and will continue to do so. We have carefully reviewed our compliance with GPLv2 and have worked with Mark Radcliffe of DLA Piper, the outside General Legal Counsel of the Open Source Initiative, to ensure that we fully comply with the obligations of the GPLv2.

For our Revolution R distribution, we may make some minor modifications to the R sources (the ChangeLog file lists all changes made). You can download these modified sources of open-source R under the terms of the GPLv2, using either the links below or those in the email sent to you when you download a specific version of Revolution R.

Download GPL Sources

Product Version Platform Modified R Sources
Revolution R Community 3.2 Windows R 2.10.1
Revolution R Community 3.2 MacOS R 2.10.1
Revolution R Enterprise 3.1.1 RHEL R 2.9.2
Revolution R Enterprise 4.0 Windows R 2.11.1
Revolution R Enterprise 4.0.1 RHEL R 2.11.1
Revolution R Enterprise 4.1.0 Windows R 2.11.1
Revolution R Enterprise 4.2 Windows R 2.11.1
Revolution R Enterprise 4.2 RHEL R 2.11.1
Revolution R Enterprise 4.3 Windows & RHEL R 2.12.2