I had a chance to dekko the new startup BigML https://bigml.com/ and was suitably impressed by the briefing and my own puttering around the site. Here is my review-
1) The website is very intutively designed- You can create a dataset from an uploaded file in one click and you can create a Decision Tree model in one click as well. I wish other cloud computing websites like Google Prediction API make design so intutive and easy to understand. Also unlike Google Prediction API, the models are not black box models, but have a description which can be understood.
2) It includes some well known data sources for people trying it out. They were kind enough to offer 5 invite codes for readers of Decisionstats ( if you want to check it yourself, use the codes below the post, note they are one time only , so the first five get the invites.
BigML is still invite only but plan to get into open release soon.
3) Data Sources can only be by uploading files (csv) but they plan to change this hopefully to get data from buckets (s3? or Google?) and from URLs.
4) The one click operation to convert data source into a dataset shows a histogram (distribution) of individual variables.The back end is clojure , because the team explained it made the easiest sense and fit with Java. The good news (?) is you would never see the clojure code at the back end. You can read about it from http://clojure.org/
As cloud computing takes off (someday) I expect clojure popularity to take off as well.
Clojure is a dialect of Lisp
5) As of now decision trees is the only distributed algol, but they expect to roll out other machine learning stuff soon. Hopefully this includes regression (as logit and linear) and k means clustering. The trees are created and pruned in real time which gives a slightly animated (and impressive effect). and yes model building is an one click operation.
The real time -live pruning is really impressive and I wonder why /how it can ever be replicated in other software based on desktop, because of the sheer interactive nature.
Making the model is just half the work. Creating predictions and scoring the model is what is really the money-earner. It is one click and customization is quite intuitive. It is not quite PMML compliant yet so I hope some Zemanta like functionality can be added so huge amounts of models can be applied to predictions or score data in real time.
If you are a developer/data hacker, you should check out this section too- it is quite impressive that the designers of BigML have planned for API access so early.
BigML.io gives you:
- Secure programmatic access to all your BigML resources.
- Fully white-box access to your datasets and models.
- Asynchronous creation of datasets and models.
- Near real-time predictions.
Note: For your convenience, some of the snippets below include your real username and API key.
Please keep them secret.
BigML.io conforms to the design principles of Representational State Transfer (REST). BigML.io is enterely HTTP-based.
BigML.io gives you access to four basic resources: Source, Dataset, Model and Prediction. You cancreate, read, update, and delete resources using the respective standard HTTP methods: POST, GET,PUT and DELETE.
All communication with BigML.io is JSON formatted except for source creation. Source creation is handled with a HTTP PUT using the “multipart/form-data” content-type
All access to BigML.io must be performed over HTTPS
and https://bigml.com/developers/quick_start ( In think an R package which uses JSON ,RCurl would further help in enhancing ease of usage).
Overall a welcome addition to make software in the real of cloud computing and statistical computation/business analytics both easy to use and easy to deploy with fail safe mechanisms built in.
Check out https://bigml.com/ for yourself to see.
The invite codes are here -one time use only- first five get the invites- so click and try your luck, machine learning on the cloud.
If you dont get an invite (or it is already used, just leave your email there and wait a couple of days to get approval)