R in Oracle Java Cloud and Existing R – Java Integration #rstats

So I finally got my test plan accepted for a 1 month trial to the Oracle Public Cloud at https://cloud.oracle.com/ .

oc1 I am testing this for my next book R for Cloud Computing ( I have already covered Windows Azure, Amazon AWS, and in the middle of testing Google Compute).

Some initial thoughts- this Java cloud seemed more suitable for web apps, than for data science ( but I have to spend much more time on this).

I really liked the help and documentation and tutorials, Oracle has invested a lot in it to make it friendly to enterprise users.

Hopefully the Oracle R Enterprise  ORE guys can talk to the Oracle Cloud department and get some common use case projects going.

oc3.7

In the meantime, I did a roundup on all R -Java projects.

They include- Continue reading “R in Oracle Java Cloud and Existing R – Java Integration #rstats”

Software Review- BigML.com – Machine Learning meets the Cloud

I had a chance to dekko the new startup BigML https://bigml.com/ and was suitably impressed by the briefing and my own puttering around the site. Here is my review-

1) The website is very intutively designed- You can create a dataset from an uploaded file in one click and you can create a Decision Tree model in one click as well. I wish other cloud computing websites like  Google Prediction API make design so intutive and easy to understand. Also unlike Google Prediction API, the models are not black box models, but have a description which can be understood.

2) It includes some well known data sources for people trying it out. They were kind enough to offer 5 invite codes for readers of Decisionstats ( if you want to check it yourself, use the codes below the post, note they are one time only , so the first five get the invites.

BigML is still invite only but plan to get into open release soon.

3) Data Sources can only be by uploading files (csv) but they plan to change this hopefully to get data from buckets (s3? or Google?) and from URLs.

4) The one click operation to convert data source into a dataset shows a histogram (distribution) of individual variables.The back end is clojure , because the team explained it made the easiest sense and fit with Java. The good news (?) is you would never see the clojure code at the back end. You can read about it from http://clojure.org/

As cloud computing takes off (someday) I expect clojure popularity to take off as well.

Clojure is a dynamic programming language that targets the Java Virtual Machine (and the CLR, and JavaScript). It is designed to be a general-purpose language, combining the approachability and interactive development of a scripting language with an efficient and robust infrastructure for multithreaded programming. Clojure is a compiled language – it compiles directly to JVM bytecode, yet remains completely dynamic. Every feature supported by Clojure is supported at runtime. Clojure provides easy access to the Java frameworks, with optional type hints and type inference, to ensure that calls to Java can avoid reflection.

Clojure is a dialect of Lisp

 

5) As of now decision trees is the only distributed algol, but they expect to roll out other machine learning stuff soon. Hopefully this includes regression (as logit and linear) and k means clustering. The trees are created and pruned in real time which gives a slightly animated (and impressive effect). and yes model building is an one click operation.

The real time -live pruning is really impressive and I wonder why /how it can ever be replicated in other software based on desktop, because of the sheer interactive nature.

 

Making the model is just half the work. Creating predictions and scoring the model is what is really the money-earner. It is one click and customization is quite intuitive. It is not quite PMML compliant yet so I hope some Zemanta like functionality can be added so huge amounts of models can be applied to predictions or score data in real time.

 

If you are a developer/data hacker, you should check out this section too- it is quite impressive that the designers of BigML have planned for API access so early.

https://bigml.com/developers

BigML.io gives you:

  • Secure programmatic access to all your BigML resources.
  • Fully white-box access to your datasets and models.
  • Asynchronous creation of datasets and models.
  • Near real-time predictions.

 

Note: For your convenience, some of the snippets below include your real username and API key.

Please keep them secret.

REST API

BigML.io conforms to the design principles of Representational State Transfer (REST)BigML.io is enterely HTTP-based.

BigML.io gives you access to four basic resources: SourceDatasetModel and Prediction. You cancreatereadupdate, and delete resources using the respective standard HTTP methods: POSTGET,PUT and DELETE.

All communication with BigML.io is JSON formatted except for source creation. Source creation is handled with a HTTP PUT using the “multipart/form-data” content-type

HTTPS

All access to BigML.io must be performed over HTTPS

and https://bigml.com/developers/quick_start ( In think an R package which uses JSON ,RCurl  would further help in enhancing ease of usage).

 

Summary-

Overall a welcome addition to make software in the real of cloud computing and statistical computation/business analytics both easy to use and easy to deploy with fail safe mechanisms built in.

Check out https://bigml.com/ for yourself to see.

The invite codes are here -one time use only- first five get the invites- so click and try your luck, machine learning on the cloud.

If you dont get an invite (or it is already used, just leave your email there and wait a couple of days to get approval)

  1. https://bigml.com/accounts/register/?code=E1FE7
  2. https://bigml.com/accounts/register/?code=09991
  3. https://bigml.com/accounts/register/?code=5367D
  4. https://bigml.com/accounts/register/?code=76EEF
  5. https://bigml.com/accounts/register/?code=742FD

The Latest GUI for R- BioR

Once more a spanking new shiny software –

Bio7 is a integrated development environment for ecological modelling based on the Rich-Client-Platformconcept of the Java IDE Eclipse. The Bio7 platform contains several perspectives which arrange several views for a special purpose useful for the development and analysis of ecological models. One special perspective bundles a feature rich GUI (Graphical User Interface) for the statistical software R.
For the bidirectional communication between Java and R the Rserve application is used (as a backend to evaluate R code and transfer data from and to Java).
The Bio7 R perspective (see figure below) is divided into a R-Shell view on the left side (conceptual the R side) and a Table view on the right side (conceptual the Java side).
Data can be imported to a spreadsheet, edited and then transferred to the R workspace. Vice versa data from R can be transferred to a sheet of the Table view and then exported e.g. to an Excel or OpenOffice file.

and

General:

Built upon Eclipse 3.6.1.

Now works with the latest Java version! (Windows version bundled with the latest JRE release).

Removed the Soil perspective (now soils can be modeled with ImageJ (float precision). Active images can be displayed in the 3D discrete view (new example available).

Removed the database perspective and the plant layer. You can now built any discrete models without any plant layer.

Removed several controls in the Control view. Added the “Custom Controls” view. In addition ported the Swing component of the Time panel to Swt.

Deleted the avi to swf converter in the ImageJ menu.

Now patterns can be saved with opened Java editor source. If this file is reopened and dragged on Bio7 the pattern is loaded, the source is compiled and the setup method (if available) is executed. In this way model files can be used for presentations ->drag, setup and run. The save actions are located in the Speadsheet view toolbar.

More options available to disable panel painting and recording of values (if not needed for speed!).

New Setup button in the toolbar of Bio7 to trigger a compiled setup method if available.

Removed the load and save pattern buttons from the toolbar of Bio7. Discrete patterns can now be stored with the available action in the spreadsheet view menu.

New P2 Update Manager available in Bio7.

Updated the Janino Compiler.

New HTML perspective added with a view which embeds the TinyMC editor.

New options to disable painting operations for the discrete panels.

New option to explicitly enable scripts at startup (for a faster startup).

Quadgrid (Hexgrid)

Only states are now available which can be created in the “Spreadsheet” view menu easily. Patterns can be stored and restored as usual but are now stored in an *.exml file.

New method to transfer the quadgrid pattern as a matrix to R.

New method to transfer the population data of all quadgrid states to R.

ImageJ:

Update to the latest version (with additional fixes).

Fixed a bug to rename the image.

Thumbnail browser can now open images recursevely(limited to 1000 pics), the magnifiyng glass can be disabled, too.

Plugins can be installed dynamically with a drag and drop operation on the ImageJ view or toolbar (as known from ImageJ).

Installed plugins now extend the plugin menu as submenus or subsubmenus (not finished yet!).

Plugins can now be created with the Java editor. New Bio7 Wizard available to create a plugin template.

Compiled Java files can be added to a *.jar file with a new available action in the Navigator view (if you rightclick on the files in the Navigator). In this way ImageJ plugins can be packaged in a *.jar.

Floweditor:

Fixed a repaint bug in the debug mode of a flow (now draws correctly the active shape in the flow).

Resize with Strg+Scrollwheel works again.

Comments with more than one line works again.

New Test action to verify connections in a flow.

Debug mode now shows all executed Shapes.

Integrated more default tests (for the verification of a regular flow).

A mouse-click now deletes colored shapes in a flow (e.g. in debug mode).

Points panel:

Integrated (dynamic) Voronoi, Delauney visualization (with area and clip to rectangle action).

Points coordinates can now be set in double precision.

Transfer of point coordinates to R now in double precision.

Bio7 Table:

New import and export of Excel 2007 OOXML.

Row headers can now be resized with the mouse device.

R:

Updated R (2.12.1) and Rserve (0.6.3) to the latest version.

New help action in the R-Shell view.

New action to display help for R specific commands in the embedded Bio7 browser (which opens automatically).

New Key actions to copy the selected variable names to the expression dialog (c=cocatenate (+), a=add (,)).

New action to transfer character or numeric vectors horizontally or vertically in an opened spread (Table view) at selection coordinates.

Empty spaces in the filepath are now allowed under Windows if Rserve is started with a system shell or the RGUI (for the tempfile select a location in the Preferences dialog which is writeable) is started.This works also for the RGUI action.

Improved the search for the “Install packages” action (option “Case Sensitive” added).

API:

New API methods available!

And:

Many fixes since the last version!

 

Installation

Important information:

A certain firewall software can corrupt the Bio7 *.zip file (as well as other files).
Please ensure that you have downloaded a functioning Bio7 1.5 version. In addition it is also reported that a certain antivirus software detects the bundled R software (on Windows) as malware. Often the R specific “open.exe” is detected as malware. Please use a different scanner to make sure that the software is not infected if you have any doubts. For more details see:

http://r.789695.n4.nabble.com/trojan-at-current-development-version-td3244348.html