Google AppInventor in Action

A GUI based SDK for making Apps for Mobiles (Android)- that you can then put in the Android Marketplace.

Watch a 60 sec video on that!

PAW Reception and R Meetup

New DC meetup for R Users-

source- http://www.meetup.com/R-users-DC/calendar/14236478/

October’s R meet-up will be co-located with the Predictive Analytics World Conference (http://www.predictive…) taking place in Washington DC October 19-20. PAW is the premiere business-focused event for predictive analytics professionals, managers and commercial practitioners.

Agenda:

6:30 – 7:30 PAW Reception (open to meet-up attendees)
7:30 – 9:00 DC-R Meetup

Talks:
“How to speak ggplot2 like a native”
Harlan D. Harris, PhD @HarlanH

“Saving the world with R”
Michael Milton @michaelmilton

Important Registration Instructions:
You are welcome to RSVP here at meetup. The PAW organizers have requested that we register in the PAW site for the R meetup so they can provide badges to members which will give you access to the reception. There is no charge to register using the PAW site. Please click here to register.


Speaker Bios

Harlan D. Harris, PhD, is a statistical data scientist working for Kaplan Test Prep and Admissions in New York City. He has degrees from the University of Wisconsin-Madison and the University of Illinois at Urbana-Champaign. Prior to turning to the private sector, he worked as a researcher and lecturer in various areas of Artificial Intelligence and Cognitive Science at the University of Illinois, Columbia University, the University of Connecticut, and New York University.

Harlan’s talk is titled “How to speak ggplot2 like a native.”. One of the most innovative ideas in data visualization in recent years is that graphical images can be described using a grammar. Just as a fluent speaker of a language can talk more precisely and clearly than someone using a tourist phrasebook, graphics based on a grammar can yield more insights than graphics based on a limited set of templates (bar chart, pie graph, etc.). There are at least two implementations of the Grammar of Graphics idea in R, of which the most popular is the ggplot2 package written by Prof. Hadley Wickham. Just as with natural languages, ggplot2 has a surface structure made up of R vocabulary elements, as well as a deep structure that mediates the link between the vocabulary and the “semantic” representation of the data shown on a computer screen. In this introductory presentation, the links among these levels of representation are demonstrated, so that new ggplot2 users can build the mental models necessary for fluent and creative visualization of their data.

Michael Milton is a Client Manager at Blue State Digital. When he’s not saving the world by designing interactive marketing strategies that connect passionate users with causes and organizations, he writes about data and analytics. For O’Reilly Media, he wrote Head First Data Analysis and Head First Excel and has created the videos Great R: Level 1 and Getting the Most Out of Google Apps for Business.

Michael’s talk is called “How to Save the World Using R.” In this wide-ranging discussion, Michael will highlight individuals and organizations who are using R to help others as well as ways in which R can be used to promote good statistical thinking.

Windows Azure vs Amazon EC2 (and Google Storage)

Here is a comparison of Windows Azure instances vs Amazon compute instances

Compute Instance Sizes:

Developers have the ability to choose the size of VMs to run their application based on the applications resource requirements. Windows Azure compute instances come in four unique sizes to enable complex applications and workloads.

Compute Instance Size CPU Memory Instance Storage I/O Performance
Small 1.6 GHz 1.75 GB 225 GB Moderate
Medium 2 x 1.6 GHz 3.5 GB 490 GB High
Large 4 x 1.6 GHz 7 GB 1,000 GB High
Extra large 8 x 1.6 GHz 14 GB 2,040 GB High

Standard Rates:

Windows Azure

  • Compute
    • Small instance (default): $0.12 per hour
    • Medium instance: $0.24 per hour
    • Large instance: $0.48 per hour
    • Extra large instance: $0.96 per hour
  • Storage
    • $0.15 per GB stored per month
    • $0.01 per 10,000 storage transactions
  • Content Delivery Network (CDN)
    • $0.15 per GB for data transfers from European and North American locations*
    • $0.20 per GB for data transfers from other locations*
    • $0.01 per 10,000 transactions*

Source –

http://www.microsoft.com/windowsazure/offers/popup/popup.aspx?lang=en&locale=en-US&offer=MS-AZR-0001P

and

http://www.microsoft.com/windowsazure/windowsazure/

Amazon EC2 has more options though——————————-

http://aws.amazon.com/ec2/pricing/

Standard On-Demand Instances Linux/UNIX Usage Windows Usage
Small (Default) $0.085 per hour $0.12 per hour
Large $0.34 per hour $0.48 per hour
Extra Large $0.68 per hour $0.96 per hour
Micro On-Demand Instances Linux/UNIX Usage Windows Usage
Micro $0.02 per hour $0.03 per hour
High-Memory On-Demand Instances
Extra Large $0.50 per hour $0.62 per hour
Double Extra Large $1.00 per hour $1.24 per hour
Quadruple Extra Large $2.00 per hour $2.48 per hour
High-CPU On-Demand Instances
Medium $0.17 per hour $0.29 per hour
Extra Large $0.68 per hour $1.16 per hour
Cluster Compute Instances
Quadruple Extra Large $1.60 per hour N/A*
* Windows is not currently available for Cluster Compute Instances.

http://aws.amazon.com/ec2/instance-types/

Standard Instances

Instances of this family are well suited for most applications.

Small Instance – default*

1.7 GB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)
160 GB instance storage (150 GB plus 10 GB root partition)
32-bit platform
I/O Performance: Moderate
API name: m1.small

Large Instance

7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage (2×420 GB plus 10 GB root partition)
64-bit platform
I/O Performance: High
API name: m1.large

Extra Large Instance

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage (4×420 GB plus 10 GB root partition)
64-bit platform
I/O Performance: High
API name: m1.xlarge

Micro Instances

Instances of this family provide a small amount of consistent CPU resources and allow you to burst CPUcapacity when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically.

Micro Instance

613 MB memory
Up to 2 EC2 Compute Units (for short periodic bursts)
EBS storage only
32-bit or 64-bit platform
I/O Performance: Low
API name: t1.micro

High-Memory Instances

Instances of this family offer large memory sizes for high throughput applications, including database and memory caching applications.

High-Memory Extra Large Instance

17.1 GB of memory
6.5 EC2 Compute Units (2 virtual cores with 3.25 EC2 Compute Units each)
420 GB of instance storage
64-bit platform
I/O Performance: Moderate
API name: m2.xlarge

High-Memory Double Extra Large Instance

34.2 GB of memory
13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each)
850 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.2xlarge

High-Memory Quadruple Extra Large Instance

68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.4xlarge

High-CPU Instances

Instances of this family have proportionally more CPU resources than memory (RAM) and are well suited for compute-intensive applications.

High-CPU Medium Instance

1.7 GB of memory
5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)
350 GB of instance storage
32-bit platform
I/O Performance: Moderate
API name: c1.medium

High-CPU Extra Large Instance

7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: c1.xlarge

Cluster Compute Instances

Instances of this family provide proportionally high CPU resources with increased network performance and are well suited for High Performance Compute (HPC) applications and other demanding network-bound applications. Learn more about use of this instance type for HPC applications.

Cluster Compute Quadruple Extra Large Instance

23 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cc1.4xlarge

Also http://www.microsoft.com/en-us/sqlazure/default.aspx

offers SQL Databases as a service with a free trial offer

If you are into .Net /SQL big time or too dependent on MS, Azure is a nice option to EC2 http://www.microsoft.com/windowsazure/offers/popup/popup.aspx?lang=en&locale=en-US&offer=COMPARE_PUBLIC

Updated- I just got approved for Google Storage so am adding their info- though they are in Preview (and its free right now) 🙂

https://code.google.com/apis/storage/docs/overview.html

Functionality

Google Storage for Developers offers a rich set of features and capabilities:

Basic Operations

  • Store and access data from anywhere on the Internet.
  • Range-gets for large objects.
  • Manage metadata.

Security and Sharing

  • User authentication using secret keys or Google account.
  • Authenticated downloads from a web browser for Google account holders.
  • Secure access using SSL.
  • Easy, powerful sharing and collaboration via ACLs for individuals and groups.

Performance and scalability

  • Up to 100 gigabytes per object and 1,000 buckets per account during the preview.
  • Strong data consistency—read-after-write consistency for all upload and delete operations.
  • Namespace for your domain—only you can create bucket URIs containing your domain name.
  • Data replicated in multiple data centers across the U.S. and within the same data center.

Tools

  • Web-based storage manager.
  • GSUtil, an open source command line tool.
  • Compatible with many existing cloud storage tools and libraries.

Read the Getting Started Guide to learn more about the service.

Note: Google Storage for Developers does not support Google Apps accounts that use your company domain name at this time.

Back to top

Pricing

Google Storage for Developers pricing is based on usage.

  • Storage—$0.17/gigabyte/month
  • Network
    • Upload data to Google
      • $0.10/gigabyte
    • Download data from Google
      • $0.15/gigabyte for Americas and EMEA
      • $0.30/gigabyte for Asia-Pacific
  • Requests
    • PUT, POST, LIST—$0.01 per 1,000 requests
    • GET, HEAD—$0.01 per 10,000 requests

A Google App for Sales- ERPLY

While not quite Salesforce.com, a promising start for the first ERP Google App at https://www.google.com/enterprise/marketplace/viewListing?productListingId=5759+8485502070963042532

An interesting development-maybe there could be some statistical or BI apps on Google App Marketplace soon 😉

KXEN Update

Update from a very good data mining software company, KXEN –

  1. Longtime Chairman and founder Roger Haddad is retiring but would be a Board Member. See his interview with Decisionstats here https://decisionstats.wordpress.com/2009/01/05/interview-roger-haddad-founder-of-kxen-automated-modeling-software/ (note images were hidden due to migration from .com to .wordpress.com )
  2. New Members of Leadership are as-
John Ball, CEOJohn Ball
Chief Executive Officer

John Ball brings 20 years of experience in enterprise software, deep expertise in business intelligence and CRM applications, and a proven track record of success driving rapid growth at highly innovative companies.

Prior to joining KXEN, Mr. Ball served in several executive roles at salesforce.com, the leading provider of SaaS applications. Most recently, John served as VP & General Manager, Analytics and Reporting Products, where he spearheaded salesforce.com’s foray into CRM analytics and business intelligence. John also served as VP & General Manager, Service and Support Applications at salesforce.com, where he successfully grew the business to become the second largest and fastest growing product line at salesforce.com. Before salesforce.com, Ball was founder and CEO of Netonomy, the leading provider of customer self-service solutions for the telecommunications industry. Ball also held a number of executive roles at Business Objects, including General Manager, Web Products, where delivered to market the first 3 versions of WebIntelligence. Ball has a master’s degree in electrical engineering from Georgia Tech and a master’s degree in electric

I hope John atleast helps build a KXEN Force.com application- there are only 2 data mining apps there on App Exchange. Also on the wish list  more social media presence, a Web SaaS/Amazon API for KXEN, greater presence in American/Asian conferences, and a solution for SME’s (which cannot afford the premium pricing of the flagship solution. An alliance with bigger BI vendors like Oracle, SAP or IBM  for selling the great social network analysis.

Bill Russell as Non Executive Chairman-

Bill Russell as Non-executive Chairman of the Board, effective July 16 2010. Russell has 30 years of operational experience in enterprise software, with a special focus on business intelligence, analytics, and databases.Russell held a number of senior-level positions in his more than 20 years at Hewlett-Packard, including Vice President and General Manager of the multi-billion dollar Enterprise Systems Group. He has served as Non-executive Chairman of the Board for Sylantro Systems Corporation, webMethods Inc., and Network Physics, Inc. and has served as a board director for Cognos Inc. In addition to KXEN, Russell currently serves on the boards of Saba, PROS Holdings Inc., Global 360, ParAccel Inc., and B.T. Mancini Company.

Xavier Haffreingue as senior vice president, worldwide professional services and solutions.
He has almost 20 years of international enterprise software experience gained in the CRM, BI, Web and database sectors. Haffreingue joins KXEN from software provider Axway where he was VP global support operations. Prior to Axway, he held various leadership roles in the software industry, including VP self service solutions at Comverse Technologies and VP professional services and support at Netonomy, where he successfully delivered multi-million dollar projects across Europe, Asia-Pacific and Africa. Before that he was with Business Objects and Sybase, where he ran support and services in southern Europe managing over 2,500 customers in more than 20 countries.

David Guercio  as senior vice president, Americas field operations. Guercio brings to the role more than 25 years experience of building and managing high-achieving sales teams in the data mining, business intelligence and CRM markets. Guercio comes to KXEN from product lifecycle management vendor Centric Software, where he was EVP sales and client services. Prior to Centric, he was SVP worldwide sales and client services at Inxight Software, where he was also Chairman and CEO of the company’s Federal Systems Group, a subsidiary of Inxight that saw success in the US Federal Government intelligence market. The success in sales growth and penetration into the federal government led to the acquisition of Inxight by Business Objects in 2007, where Guercio then led the Inxight sales organization until Business Objects was acquired by SAP. Guercio was also a key member of the management team and a co-founder at Neovista, an early pioneer in data mining and predictive analytics. Additionally, he held the positions of director of sales and VP of professional services at Metaphor Computer Systems, one of the first data extraction solutions companies, which was acquired by IBM. During his career, Guercio also held executive positions at Resonate and SiGen.

3) Venture Capital funding to fund expansion-

It has closed $8 million in series D funding to further accelerate its growth and international expansion. The round was led by NextStage and included participation from existing investors XAnge Capital, Sofinnova Ventures, Saints Capital and Motorola Ventures.

This was done after John Ball had joined as CEO.

4) Continued kudos from analysts and customers for it’s technical excellence.

KXEN was named a leader in predictive analytics and data mining by Forrester Research (1) and was rated highest for commercial deployments of social network analytics by Frost & Sullivan (2)

Also it became an alliance partner of Accenture- which is also a prominent SAS partner as well.

In Database Optimization-

In KXEN V5.1, a new data manipulation module (ADM) is provided in conjunction with scoring to optimize database workloads and provide full in-database model deployment. Some leading data mining vendors are only now beginning to offer this kind of functionality, and then with only one or two selected databases, giving KXEN a more than five-year head start. Some other vendors are only offering generic SQL generation, not optimized for each database, and do not provide the wealth of possible outputs for their scoring equations: For example, real operational applications require not only to generate scores, but decision probabilities, error bars, individual input contributions – used to derive reasons of decision and more, which are available in KXEN in-database scoring modules.

Since 2005, KXEN has leveraged databases as the data manipulation engine for analytical dataset generation. In 2008, the ADM (Analytical Data Management) module delivered a major enhancement by providing a very easy to use data manipulation environment with unmatched productivity and efficiency. ADM works as a generator of optimized database-specific SQL code and comes with an integrated layer for the management of meta-data for analytics.

KXEN Modeling Factory- (similar to SAS’s recent product Rapid Predictive Modeler http://www.sas.com/resources/product-brief/rapid-predictive-modeler-brief.pdf and http://jtonedm.com/2010/09/02/first-look-rapid-predictive-modeler/)

KXEN Modeling Factory (KMF) has been designed to automate the development and maintenance of predictive analytics-intensive systems, especially systems that include large numbers of models, vast amounts of data or require frequent model refreshes. Information about each project and model is monitored and disseminated to ensure complete management and oversight and to facilitate continual improvement in business performance.

Main Functions

Schedule: creation of the Analytic Data Set (ADS), setup of how and when to score, setup of when and how to perform model retraining and refreshes …

Report
: Monitormodel execution over time, Track changes in model quality over time, see how useful one variable is by considering its multiple instance in models …

Notification
: Rather than having to wade through pages of event logs, KMF Department allows users to manage by exception through notifications.

Other products from KXEN have been covered here before https://decisionstats.wordpress.com/tag/kxen/ , including Structural Risk Minimization- https://decisionstats.wordpress.com/2009/04/27/kxen-automated-regression-modeling/

Thats all for the KXEN update- all the best to the new management team and a splendid job done by Roger Haddad in creating what is France and Europe’s best known data mining company.

Note- Source – http://www.kxen.com


Trrrouble in land of R…and Open Source Suggestions

Recently some comments by Ross Ihake , founder of R Statistical Software on Revolution Analytics, leading commercial vendor of R….. came to my attention-

http://www.stat.auckland.ac.nz/mail/archive/r-downunder/2010-May/000529.html

[R-downunder] Article on Revolution Analytics

Ross Ihaka ihaka at stat.auckland.ac.nz
Mon May 10 14:27:42 NZST 2010


On 09/05/10 09:52, Murray Jorgensen wrote:
> Perhaps of interest:
>
> http://www.theregister.co.uk/2010/05/06/revolution_commercial_r/

Please note that R is "free software" not "open source".  These guys
are selling a GPLed work without disclosing the source to their part
of the work. I have complained to them and so far they have given me
the brush off. I am now considering my options.

Don't support these guys by buying their product. The are not feeding
back to the rights holders (the University of Auckland and I are rights
holders and they didn't even have the courtesy to contact us).

--
Ross Ihaka                         Email:  ihaka at stat.auckland.ac.nz
Department of Statistics           Phone:  (64-9) 373-7599 x 85054
University of Auckland             Fax:    (64-9) 373-7018
Private Bag 92019, Auckland
New Zealand
and from http://www.theregister.co.uk/2010/05/06/revolution_commercial_r/
Open source purists probably won't be all too happy to learn that Revolution is going to be employing an "open core" strategy, which means the core R programs will remain open source and be given tech support under a license model, but the key add-ons that make R more scalable will be closed source and sold under a separate license fee. Because most of those 2,500 add-ons for R were built by academics and Revolution wants to supplant SPSS and SAS as the tools used by students, Revolution will be giving the full single-user version of the R Enterprise stack away for free to academics. 
Conclusion-
So one co-founder of R is advocating not to buy from Revolution Analytics , which has the other co-founder of R, Gentleman on its board. 
Source- http://www.revolutionanalytics.com/aboutus/leadership.php

2) If Revolution Analytics is using 2500 packages for free but insisting on getting paid AND closing source of it’s packages (which is a technical point- how exactly can you prevent source code of a R package from being seen)

Maybe there can be a PACKAGE marketplace just like Android Apps, Facebook Apps, and Salesforce.com Apps – so atleast some of the thousands of R package developers can earn – sorry but email lists do not pay mortgages and no one is disputing the NEED for commercializing R or rewarding developers.

Though Barr created SAS, he gave up control to Goodnight and Sall https://decisionstats.wordpress.com/2010/06/02/sas-early-days/

and Goodnight and Sall do pay their developers well- to the envy of not so well paid counterparts.

3) I really liked the innovation of Revolution Analytics RevoScalar, and I wish that the default R dataset be converted to XDF dataset so that it basically kills

off the R criticism of being slow on bigger datasets. But I also realize the need for creating an analytics marketplace for R developers and R students- so academic version of R being free and Revolution R being paid seems like a trade off.

Note- You can still get a job faster as a stats student if you mention SAS and not R as a statistical skill- not all stats students go into academics.

4) There can be more elegant ways of handling this than calling for ignoring each other as REVOLUTION and Ihake seem to be doing to each other.

I can almost hear people in Cary, NC chuckling at Norman Nie, long time SPSS opponent and now REVOLUTION CEO, and his antagonizing R’s academicians within 1 year of taking over- so I hope this ends well for all. The road to hell is paved with good intentions- so if REVOLUTION can share some source code with say R Core members (even Microsoft shares source code with partners)- and R Core and Revolution agree on a licensing royalty from each other, they can actually speed up R package creation rather than allow this 2 decade effort to end up like S and S plus and TIBCO did.

Maybe Richard Stallman can help-or maybe Ihaka has a better sense of where things will go down in a couple of years-he must know something-he invented it, didnt he

On 09/05/10 09:52, Murray Jorgensen wrote:
> Perhaps of interest:
>
> http://www.theregister.co.uk/2010/05/06/revolution_commercial_r/

Please note that R is "free software" not "open source".  These guys
are selling a GPLed work without disclosing the source to their part
of the work. I have complained to them and so far they have given me
the brush off. I am now considering my options.

Don't support these guys by buying their product. The are not feeding
back to the rights holders (the University of Auckland and I are rights
holders and they didn't even have the courtesy to contact us).

--
Ross Ihaka                         Email:  ihaka at stat.auckland.ac.nz
Department of Statistics           Phone:  (64-9) 373-7599 x 85054
University of Auckland             Fax:    (64-9) 373-7018
Private Bag 92019, Auckland
New Zealand

Google AppInventor -Android and Business Intelligence

Here is a great new tool for techies to start creating Android Apps right away- even if you have no knowledge of the platform. Of course there are existing great number of apps- including my favorite Android Data Mining App in R – called AnalyticDroid http://analyticdroid.togaware.com/

Basically it calls the Rattle (R Analytical Tool To Learn Easily) Data Mining GUI -enabling data mining from an Android Mobile using remote computing.

I dont know if any other statistical application is available on Android Mobiles- though SAS did have a presentation on using SAS on IPhone

http://www.wuss.org/proceedings09/09WUSSProceedings/papers/dpr/DPR-Truong.pdf



SAS Mobile -Iphone App

All you need to do is go to http://appinventor.googlelabs.com/about/index.html and request access (yes there is a 2 week approval waiting line)

Because App Inventor provides access to a GPS-location sensor, you can build apps that know where you are. You can build an app to help you remember where you parked your car, an app that shows the location of your friends or colleagues at a concert or conference, or your own custom tour app of your school, workplace, or a museum.
You can write apps that use the phone features of an Android phone. You can write an app that periodically texts “missing you” to your loved ones, or an app “No Text While Driving” that responds to all texts automatically with “sorry, I’m driving and will contact you later”. You can even have the app read the incoming texts aloud to you (though this might lure you into responding).
App Inventor provides a way for you to communicate with the web. If you know how to write web apps, you can use App Inventor to write Android apps that talk to your favorite web sites, such as Amazon and Twitter.

Here is a not so statistical Android App I am trying to create called Hang-Out

using the current GPS location of your phone to find nearest Pub, Movie or Diner and catch Bus- Train based on your location city, the GPS and time of request and schedule of those cities public transport- very much WIP