BigML creates a marketplace for Predictive Models

BigML has created a marketplace for selling Datasets and Models. This is a first (?) as the closest market for Predictive Analytics till now was Rapid Miner’s marketplace for extensions (at http://rapidupdate.de:8180/UpdateServer/faces/index.xhtml)

From http://blog.bigml.com/2012/10/25/worlds-first-predictive-marketplace/

SELL YOUR DATA

You can make your Dataset public. Mind you: the Datasets we are talking about are BigML’s fancy histograms. This means that other BigML users can look at your Dataset details and create new models based on this Dataset. But they can not see individual records or columns or use it beyond the statistical summaries of the Dataset. Your Source will remain private, so there is no possibility of anyone accessing the raw data.

SELL YOUR MODEL

Now, once you have created a great model, you can share it with the rest of the world. For free or at any price you set.Predictions are paid for in BigML Prediction Credits. The minimum price is ‘Free’ and the maximum price indicated is 100 credits.

White Box Models

Clicking on the white open lock will open up your model to the rest of the world. Anyone can now buy your model, explore it, use it to make predictions

Black Box Models

If you choose the black box setting (the black open lock icon), other BigML users will NOT be able to view or clone your model, but they will be able to use it to make predictions.

——

DOWNLOAD YOUR MODEL

BigML.com have added downloads to our models. Simply choose the format you want and you can copy/paste the code or text. There is a range of formats that they offer currently: JSON PML, PMML, Python, Ruby, Objective-C, Java, the rules of the decision tree in plain text and a Summary overview of your model. Around the corner are MS Excel downloads and R (of course!).

PUBLICIZE YOUR MODEL

There’s also an ’embed’ function, so now you can embed the little poster of your model in your blog post or website, so it is easy to share it in your own environment.

————————————————————————————————————————–

It is nice to see Models and Data getting the APPY treatment and hopefully, it will encourage other vendors Iike Google Prediction API etc to further spend thought and effort to reward data mining individuals directly without going through corporate intermediaries while ensuring intellectual property safeguards .

An R package market for enterprises? for Python libraries? JMP addins? A market for SAS Macros- who knows what the future shall hold. But overall, this is a very positive step by the BigML.com team. The App marketplace has helped revolutionize mobile and desktop computing and hopefully it will do the same for Business Analytics.

 

Google Cloud is finally here

Amazon gets some competition, and customers should see some relief, unless Google withdraws commitment on these products after a few years of trying (like it often does now!)

 

http://cloud.google.com/products/index.html

Machine Type Pricing
Configuration Virtual Cores Memory GCEU * Local disk Price/Hour $/GCEU/hour
n1-standard-1-d 1 3.75GB *** 2.75 420GB *** $0.145 0.053
n1-standard-2-d 2 7.5GB 5.5 870GB $0.29 0.053
n1-standard-4-d 4 15GB 11 1770GB $0.58 0.053
n1-standard-8-d 8 30GB 22 2 x 1770GB $1.16 0.053
Network Pricing
Ingress Free
Egress to the same Zone. Free
Egress to a different Cloud service within the same Region. Free
Egress to a different Zone in the same Region (per GB) $0.01
Egress to a different Region within the US $0.01 ****
Inter-continental Egress At Internet Egress Rate
Internet Egress (Americas/EMEA destination) per GB
0-1 TB in a month $0.12
1-10 TB $0.11
10+ TB $0.08
Internet Egress (APAC destination) per GB
0-1 TB in a month $0.21
1-10 TB $0.18
10+ TB $0.15
Persistent Disk Pricing
Provisioned space $0.10 GB/month
Snapshot storage** $0.125 GB/month
IO Operations $0.10 per million
IP Address Pricing
Static IP address (assigned but unused) $0.01 per hour
Ephemeral IP address (attached to instance) Free
* GCEU is Google Compute Engine Unit — a measure of computational power of our instances based on industry benchmarks; review the GCEU definition for more information
** coming soon
*** 1GB is defined as 2^30 bytes
**** promotional pricing; eventually will be charged at internet download rates

Google Prediction API

Tap into Google’s machine learning algorithms to analyze data and predict future outcomes.

Leverage machine learning without the complexity
Use the familiar RESTful interface
Enter input in any format – numeric or text

Build smart apps

Learn how you can use Prediction API to build customer sentiment analysis, spam detection or document and email classification.

Google Translation API

Use Google Translate API to build multilingual apps and programmatically translate text in your webpage or application.

Translate text into other languages programmatically
Use the familiar RESTful interface
Take advantage of Google’s powerful translation algorithms

Build multilingual apps

Learn how you can use Translate API to build apps that can programmatically translate text in your applications or websites.

Google BigQuery

Analyze Big Data in the cloud using SQL and get real-time business insights in seconds using Google BigQuery. Use a fully-managed data analysis service with no servers to install or maintain.
Figure

Reliable & Secure

Complete peace of mind as your data is automatically replicated across multiple sites and secured using access control lists.
Scale infinitely

You can store up to hundreds of terabytes, paying only for what you use.
Blazing fast

Run ad hoc SQL queries on
multi-terabyte datasets in seconds.

Google App Engine

Create apps on Google’s platform that are easy to manage and scale. Benefit from the same systems and infrastructure that power Google’s applications.

Focus on your apps

Let us worry about the underlying infrastructure and systems.
Scale infinitely

See your applications scale seamlessly from hundreds to millions of users.
Business ready

Premium paid support and 99.95% SLA for business users

Software Review- BigML.com – Machine Learning meets the Cloud

I had a chance to dekko the new startup BigML https://bigml.com/ and was suitably impressed by the briefing and my own puttering around the site. Here is my review-

1) The website is very intutively designed- You can create a dataset from an uploaded file in one click and you can create a Decision Tree model in one click as well. I wish other cloud computing websites like  Google Prediction API make design so intutive and easy to understand. Also unlike Google Prediction API, the models are not black box models, but have a description which can be understood.

2) It includes some well known data sources for people trying it out. They were kind enough to offer 5 invite codes for readers of Decisionstats ( if you want to check it yourself, use the codes below the post, note they are one time only , so the first five get the invites.

BigML is still invite only but plan to get into open release soon.

3) Data Sources can only be by uploading files (csv) but they plan to change this hopefully to get data from buckets (s3? or Google?) and from URLs.

4) The one click operation to convert data source into a dataset shows a histogram (distribution) of individual variables.The back end is clojure , because the team explained it made the easiest sense and fit with Java. The good news (?) is you would never see the clojure code at the back end. You can read about it from http://clojure.org/

As cloud computing takes off (someday) I expect clojure popularity to take off as well.

Clojure is a dynamic programming language that targets the Java Virtual Machine (and the CLR, and JavaScript). It is designed to be a general-purpose language, combining the approachability and interactive development of a scripting language with an efficient and robust infrastructure for multithreaded programming. Clojure is a compiled language – it compiles directly to JVM bytecode, yet remains completely dynamic. Every feature supported by Clojure is supported at runtime. Clojure provides easy access to the Java frameworks, with optional type hints and type inference, to ensure that calls to Java can avoid reflection.

Clojure is a dialect of Lisp

 

5) As of now decision trees is the only distributed algol, but they expect to roll out other machine learning stuff soon. Hopefully this includes regression (as logit and linear) and k means clustering. The trees are created and pruned in real time which gives a slightly animated (and impressive effect). and yes model building is an one click operation.

The real time -live pruning is really impressive and I wonder why /how it can ever be replicated in other software based on desktop, because of the sheer interactive nature.

 

Making the model is just half the work. Creating predictions and scoring the model is what is really the money-earner. It is one click and customization is quite intuitive. It is not quite PMML compliant yet so I hope some Zemanta like functionality can be added so huge amounts of models can be applied to predictions or score data in real time.

 

If you are a developer/data hacker, you should check out this section too- it is quite impressive that the designers of BigML have planned for API access so early.

https://bigml.com/developers

BigML.io gives you:

  • Secure programmatic access to all your BigML resources.
  • Fully white-box access to your datasets and models.
  • Asynchronous creation of datasets and models.
  • Near real-time predictions.

 

Note: For your convenience, some of the snippets below include your real username and API key.

Please keep them secret.

REST API

BigML.io conforms to the design principles of Representational State Transfer (REST)BigML.io is enterely HTTP-based.

BigML.io gives you access to four basic resources: SourceDatasetModel and Prediction. You cancreatereadupdate, and delete resources using the respective standard HTTP methods: POSTGET,PUT and DELETE.

All communication with BigML.io is JSON formatted except for source creation. Source creation is handled with a HTTP PUT using the “multipart/form-data” content-type

HTTPS

All access to BigML.io must be performed over HTTPS

and https://bigml.com/developers/quick_start ( In think an R package which uses JSON ,RCurl  would further help in enhancing ease of usage).

 

Summary-

Overall a welcome addition to make software in the real of cloud computing and statistical computation/business analytics both easy to use and easy to deploy with fail safe mechanisms built in.

Check out https://bigml.com/ for yourself to see.

The invite codes are here -one time use only- first five get the invites- so click and try your luck, machine learning on the cloud.

If you dont get an invite (or it is already used, just leave your email there and wait a couple of days to get approval)

  1. https://bigml.com/accounts/register/?code=E1FE7
  2. https://bigml.com/accounts/register/?code=09991
  3. https://bigml.com/accounts/register/?code=5367D
  4. https://bigml.com/accounts/register/?code=76EEF
  5. https://bigml.com/accounts/register/?code=742FD

2011 Analytics Recap

Events in the field of data that impacted us in 2011

1) Oracle unveiled plans for R Enterprise. This is one of the strongest statements of its focus on in-database analytics. Oracle also unveiled plans for a Public Cloud

2) SAS Institute released version 9.3 , a major analytics software in industry use.

3) IBM acquired many companies in analytics and high tech. Again.However the expected benefits from Cognos-SPSS integration are yet to show a spectacular change in market share.

2011 Selected acquisitions

Emptoris Inc. December 2011

Cúram Software Ltd. December 2011

DemandTec December 2011

Platform Computing October 2011

 Q1 Labs October 2011

Algorithmics September 2011

 i2 August 2011

Tririga March 2011

 

4) SAP promised a lot with SAP HANA- again no major oohs and ahs in terms of market share fluctuations within analytics.

http://www.sap.com/india/news-reader/index.epx?articleID=17619

5) Amazon continued to lower prices of cloud computing and offer more options.

http://aws.amazon.com/about-aws/whats-new/2011/12/21/amazon-elastic-mapreduce-announces-support-for-cc2-8xlarge-instances/

6) Google continues to dilly -dally with its analytics and cloud based APIs. I do not expect all the APIs in the Google APIs suit to survive and be viable in the enterprise software space.  This includes Google Cloud Storage, Cloud SQL, Prediction API at https://code.google.com/apis/console/b/0/ Some of the location based , translation based APIs may have interesting spin offs that may be very very commercially lucrative.

7) Microsoft -did- hmm- I forgot. Except for its investment in Revolution Analytics round 1 many seasons ago- very little excitement has come from MS plans in data mining- The plugins for cloud based data mining from Excel remain promising yet , while Azure remains a stealth mode starter.

8) Revolution Analytics promised us a GUI and didnt deliver (till yet 🙂 ) . But it did reveal a much better Enterprise software Revolution R 5.0 is one of the strongest enterprise software in the R /Stat Computing space and R’s memory handling problem is now an issue of perception than actual stuff thanks to newer advances in how it is used.

9) More conferences, more books and more news on analytics startups in 2011. Big Data analytics remained a strong buzzword. Expect more from this space including creative uses of Hadoop based infrastructure.

10) Data privacy issues continue to hamper and impede effective analytics usage. So does rational and balanced regulation in some of the most advanced economies. We expect more regulation and better guidelines in 2012.

Interview Zach Goldberg, Google Prediction API

Here is an interview with Zach Goldberg, who is the product manager of Google Prediction API, the next generation machine learning analytics-as-an-api service state of the art cloud computing model building browser app.
Ajay- Describe your journey in science and technology from high school to your current job at Google.

Zach- First, thanks so much for the opportunity to do this interview Ajay!  My personal journey started in college where I worked at a startup named Invite Media.   From there I transferred to the Associate Product Manager (APM) program at Google.  The APM program is a two year rotational program.  I did my first year working in display advertising.  After that I rotated to work on the Prediction API.

Ajay- How does the Google Prediction API help an average business analytics customer who is already using enterprise software , servers to generate his business forecasts. How does Google Prediction API fit in or complement other APIs in the Google API suite.

Zach- The Google Prediction API is a cloud based machine learning API.  We offer the ability for anybody to sign up and within a few minutes have their data uploaded to the cloud, a model built and an API to make predictions from anywhere. Traditionally the task of implementing predictive analytics inside an application required a fair amount of domain knowledge; you had to know a fair bit about machine learning to make it work.  With the Google Prediction API you only need to know how to use an online REST API to get started.

You can learn more about how we help businesses by watching our video and going to our project website.

Ajay-  What are the additional use cases of Google Prediction API that you think traditional enterprise software in business analytics ignore, or are not so strong on.  What use cases would you suggest NOT using Google Prediction API for an enterprise.

Zach- We are living in a world that is changing rapidly thanks to technology.  Storing, accessing, and managing information is much easier and more affordable than it was even a few years ago.  That creates exciting opportunities for companies, and we hope the Prediction API will help them derive value from their data.

The Prediction API focuses on providing predictive solutions to two types of problems: regression and classification. Businesses facing problems where there is sufficient data to describe an underlying pattern in either of these two areas can expect to derive value from using the Prediction API.

Ajay- What are your separate incentives to teach about Google APIs  to academic or researchers in universities globally.

Zach- I’d refer you to our university relations page

Google thrives on academic curiosity. While we do significant in-house research and engineering, we also maintain strong relations with leading academic institutions world-wide pursuing research in areas of common interest. As part of our mission to build the most advanced and usable methods for information access, we support university research, technological innovation and the teaching and learning experience through a variety of programs.

Ajay- What is the biggest challenge you face while communicating about Google Prediction API to traditional users of enterprise software.

Zach- Businesses often expect that implementing predictive analytics is going to be very expensive and require a lot of resources.  Many have already begun investing heavily in this area.  Quite often we’re faced with surprise, and even skepticism, when they see the simplicity of the Google Prediction API.  We work really hard to provide a very powerful solution and take care of the complexity of building high quality models behind the scenes so businesses can focus more on building their business and less on machine learning.

 

 

Preview- Google Cloud SQL

From –http://code.google.com/apis/sql/

What is Google Cloud SQL?

Google Cloud SQL is web service that allows you to create, configure, and use relational databases with your App Engine applications. It is a fully-managed service that maintains, manages, and administers your databases, allowing you to focus on your applications and services.

By offering the capabilities of a MySQL database, the service enables you to easily move your data, applications, and services into and out of the cloud. This allows for high data portability and helps in faster time-to-market because you can quickly leverage your existing database (using JDBC and/or DB-API) in your App Engine application.

Here is where you can get an invite to the beta only Google Cloud SQL

Sign up for Limited Preview

Google Cloud SQL is available to a limited number of users. To sign up for the service:

  1. Visit the Google APIs Console. The console opens the All services pane.
  2. Find the SQL Service line in the Services table and click Request access…
  3. Fill out the enrollment form.
  4. Our team will review your enrollment information and respond by email to the address associated with your Google Account.
  5. Follow the link in the email to view the Terms of Service. Please read these carefully before accepting.
  6. Sign up for the google-cloud-sql-announce group to receive important announcements and product news. (NOTE- Members: 384)
and after all that violence and double talk, a walk in the clouds with SQL.
1. There are three kinds of instances in the beta view
2. Wait for the Instance to be created note- the Design of the Interface uptil now is much better than Amazon’s.  
Note you need to have an appspot application from Google Apps and can choose between the Python and Java versions. Quite clearly there is a play for other languages too. I think GO is also supported.
3. You can import your data from your Google Storage bucket
4. I am not that hot at coding or maybe the interface was too pretty. Anyways- the log tells me that import of the text file has failed from Google Storage to Google Cloud SQL 
5. Incidentally the Google Cloud Storage interface is also much better than the Amazon GUI for transferring data- Note I was using the classical statistical dataset Boston Housing Data as the test case. 
6. The SQL prompt is the weakest part of the design process of the Interphase. There is no Query builder and the SELECT FROM WHERE prompt is slightly amusing/ insulting . I mean guys either throw in a fully fledged GUI for query builder similar to the MYSQL Workbench , than create a pretty white command prompt.
7. You can also export your data back to your Google Storage bucket 
These are early days, and I am trying to see if there is a play for some cloud kind of ODBC action between R, Prediction API , and the cloud SQL… so try it out yourself at http://code.google.com/apis/sql/ and see if there is any juice you can build  here.

Using #Rstats for online data access

There are multiple packages in R to read data straight from online datasets.
These are as follows- Continue reading “Using #Rstats for online data access”