New Plotters in Rapid Miner 5.2

I almost missed this because of my vacation and traveling

Rapid Miner has a tonne of new stuff (Statuary Ethics Declaration- Rapid Miner has been an advertising partner for Decisionstats – see the right margin)

see

http://rapid-i.com/component/option,com_myblog/Itemid,172/lang,en/

Great New Graphical Plotters

and some flashy work

and a great series of educational lectures

A Simple Explanation of Decision Tree Modeling based on Entropies

Link: http://www.simafore.com/blog/bid/94454/A-simple-explanation-of-how-entropy-fuels-a-decision-tree-model

Description of some of the basics of decision trees. Simple and hardly any math, I like the plots explaining the basic idea of the entropy as splitting criterion (although we actually calculate gain ratio differently than explained…)

Logistic Regression for Business Analytics using RapidMiner

Link: http://www.simafore.com/blog/bid/57924/Logistic-regression-for-business-analytics-using-RapidMiner-Part-2

Same as above, but this time for modeling with logistic regression.
Easy to read and covering all basic ideas together with some examples. If you are not familiar with the topic yet, part 1 (see below) might help.

Part 1 (Basics): http://www.simafore.com/blog/bid/57801/Logistic-regression-for-business-analytics-using-RapidMiner-Part-1

Deploy Model: http://www.simafore.com/blog/bid/82024/How-to-deploy-a-logistic-regression-model-using-RapidMiner

Advanced Information: http://www.simafore.com/blog/bid/99443/Understand-3-critical-steps-in-developing-logistic-regression-models

and lastly a new research project for collaborative data mining

http://www.e-lico.eu/

e-LICO Architecture and Components

The goal of the e-LICO project is to build a virtual laboratory for interdisciplinary collaborative research in data mining and data-intensive sciences. The proposed e-lab will comprise three layers: the e-science and data mining layers will form a generic research environment that can be adapted to different scientific domains by customizing the application layer.

  1. Drag a data set into one of the slots. It will be automatically detected as training data, test data or apply data, depending on whether it has a label or not.
  2. Select a goal. The most frequent one is probably “Predictive Modelling”. All goals have comments, so you see what they can be used for.
  3. Select “Fetch plans” and wait a bit to get a list of processes that solve your problem. Once the planning completes, select one of the processes (you can see a preview at the right) and run it. Alternatively, select multiple (selecting none means selecting all) and evaluate them on your data in a batch.

The assistant strives to generate processes that are compatible with your data. To do so, it performs a lot of clever operations, e.g., it automatically replaces missing values if missing values exist and this is required by the learning algorithm or performs a normalization when using a distance-based learner.

You can install the extension directly by using the Rapid-I Marketplace instead of the old update server. Just go to the preferences and enter http://rapidupdate.de:8180/UpdateServer as the update URL

Of course Rapid Miner has been of the most professional open source analytics company and they have been doing it for a long time now. I am particularly impressed by the product map (see below) and the graphical user interface.

http://rapid-i.com/content/view/186/191/lang,en/

Product Map

Just click on the products in the overview below in order to get more information about Rapid-I products.

 

Rapid-I Product Overview 

 

Google Cloud SQL

Another xing bang API from the boyz in Mountain View. (entry by invite only) But it is free and you can test your stuff on a MySQL db =10 GB

Database as a service ? (Maybe)— while Amazon was building fires (and Fire)

—————————————————————–

https://code.google.com/apis/sql/index.html

What is Google Cloud SQL?

Google Cloud SQL is a web service that provides a highly available, fully-managed, hosted SQL storage solution for your App Engine applications.

What are the benefits of using Google Cloud SQL?

You can access a familiar, highly available SQL database from your App Engine applications, without having to worry about provisioning, management, and integration with other Google services.

How much does Google Cloud SQL cost?

We will not be billing for this service in 2011. We will give you at least 30 days’ advance notice before we begin billing in the future. Other services such as Google App Engine, Google Cloud Storage etc. that you use with Google Cloud SQL may have their own payment terms, and you need to pay for them. Please consult their documentation for details.

Currently you are limited to the three instance sizes. What if I need to store more data or need better performance?

In the Limited Preview period, we only have three sizes available. If you have specific needs, we would like to hear from you on our google-cloud-sqldiscussion board.

When is Google Cloud SQL be out of Limited Preview?

We are working hard to make the service generally available.We don’t have a firm date that we can announce right now.

Do you support all the features of MySQL?

In general, Google Cloud SQL supports all the features of MySQL. The following are lists of all the unsupported features and notable differences that Google Cloud SQL has from MySQL.

Unsupported Features:

  • User defined functions
  • MySql replication

Unsupported MySQL statements:

  • LOAD DATA INFILE
  • SELECT ... INTO OUTFILE
  • SELECT ... INTO DUMPFILE
  • INSTALL PLUGIN .. SONAME ...
  • UNINSTALL PLUGIN
  • CREATE FUNCTION ... SONAME ...

Unsupported SQL Functions:

  • LOAD_FILE()

Notable Differences:

  • If you want to import databases with binary data into your Google Cloud SQL instance, you must use the --hex-blob option with mysqldump.Although this is not a required flag when you are using a local MySQL server instance and the MySQL command line, it is required if you want to import any databases with binary data into your Google Cloud SQL instance. For more information, see Importing Data.
How large a database can I use with Google Cloud SQL?
Currently, in this limited preview period, your database instance must be no larger than 10GB.
How can I be notified when there are any changes to Google Cloud SQL?
You can sign up for the sql-announcements forum where we post announcements and news about the Google Cloud SQL.
How can I cancel my Google Cloud SQL account?
To remove all data from your Google Cloud SQL account and disable the service:

  1. Delete all your data. You can remove your tables, databases, and indexes using the drop command. For more information, see SQL DROP statement.
  2. Deactivate the Google Cloud SQL by visiting the Services pane and clicking the On button next to Google Cloud SQL. The button changes from Onto Off.
How do I report a bug, request a feature, or ask a question?
You can report bugs and request a feature on our project page.You can ask a question in our discussion forum.

Getting Started

Can I use languages other than Java or Python?
Only Java and Python are supported for Google Cloud SQL.
Can I use Google Cloud SQL outside of Google App Engine?
The Limited Preview is primarily focused on giving Google App Engine customers the ability to use a familiar relational database environment. Currently, you cannot access Google Cloud SQL from outside Google App Engine.
What database engine are we using in the Google Cloud SQL?
MySql Version 5.1.59
Do I need to install a local version of MySQL to use the Development Server?
Yes.

Managing Your Instances

Do I need to use the Google APIs Console to use Google Cloud SQL?
Yes. For basic tasks like granting access control to applications, creating instances, and deleting instances, you need to use the Google APIs Console.
Can I import or export specific databases?
No, currently it is not possible to export specific databases. You can only export your entire instance.
Do I need a Google Cloud Storage account to import or export my instances?
Yes, you need to sign up for a Google Cloud Storage account or have access to a Google Cloud Storage account to import or export your instances. For more information, see Importing and Exporting Data.
If I delete my instance, can I reuse the instance name?
Yes, but not right away. The instance name is reserved for up to two months before it can be reused.

Tools & Resources

Can I use Django with Google Cloud SQL?
No, currently Google Cloud SQL is not compatible with Django.
What is the best tool to use for interacting with my instance?
There are a variety of tools available for Google Cloud SQL. For executing simple statements, you can use the SQL prompt. For executing more complicated tasks, you might want to use the command line tool. If you want to use a tool with a graphical interface, the SQuirrel SQL Client provides an interface you can use to interact with your instance.

Common Technical Questions

Should I use InnoDB for my tables?
Yes. InnoDB is the default storage engine in MySQL 5.5 and is also the recommended storage engine for Google Cloud SQL. If you do not need any features that require MyISAM, you should use InnoDB. You can convert your existing tables using the following SQL command, replacing tablename with the name of the table to convert:

ALTER tablename ENGINE = InnoDB;

If you have a mysqldump file where all your tables are in MyISAM format, you can convert them by piping the file through a sed script:

mysqldump --databases database_name [-u username -p  password] --hex-blob database_name | sed 's/ENGINE=MyISAM/ENGINE=InnoDB/g' > database_file.sql

Warning: You should not do this if your mysqldump file contains the mysql schema. Those files must remain in MyISAM.

Are there any size or QPS limits?
Yes, the following limits apply to Google Cloud SQL:

Resource Limits from External Requests Limits from Google App Engine
Queries Per Second (QPS) 5 QPS No limit
Maximum Request Size 16 MB
Maximum Response Size 16 MB

Google App Engine Limits

Google App Engine applications are also subject to additional Google App Engine quotas and limits. Requests from Google App Engine applications to Google Cloud SQL are subject to the following time limits:

  • All database requests must finish within the HTTP request timer, around 60 seconds.
  • Offline requests like cron tasks have a time limit of 10 minutes.
  • Backend requests to Google Cloud SQL have a time limit of 10 minutes.

App Engine-specific quotas and access limits are discussed on the Google App Engine Quotas page.

Should I use Google Cloud SQL with my non-High Replication App Engine application?
We recommend that you use Google Cloud SQL with High Replication App Engine applications. While you can use use Google Cloud SQL with applications that do not use high replication, doing so might impact performance.
Source-
https://code.google.com/apis/sql/faq.html#supportmysqlfeatures

Google Takeout rocks!

I love the new Google Takeout. Just take out all the data stored within Google and download.

and unlike Facebook – it kind of builds this almost within a minute thus showing they are serious on privacy. Facebook takes too long to build the archive.

I would also prefer to download the data as a torrent or anything which is resumable if the connection breaks

 

Coming up-

yet another preview of Google +

 

Jaspersoft 4.1 launched

http://www.jaspersoft.com/events/jaspersoft-41-webinar-north-america

one more webinar, but a newer software and a changing paradigm.

———————————————————————————————————————————————

Webinar: Introducing Jaspersoft 4.1- North America

 Date: June 8, 2011

Time: 10:00 AM PT/1:00 PM ET
Duration: 60 Minutes
Language: English

Self-service Business Intelligence helps individuals and organizations respond quickly to operational and strategic decisions.  A driving factor for the success of self-service BI is an intuitive and integrated UI that spans reporting, dashboards, and data analysis. Additionally, the rise of the cloud and big-data environments presents new opportunities for organizations that can analyze and respond to these emerging data sources.

In this webinar, see how the new Jaspersoft Business Intelligence Suite 4.1 release achieves all this and more.
Register now and learn how you can now take advantage of:

  • Modern self-service BI for reporting and analysis: powerful, unified capabilities for both ad hoc reporting and analysis tasks.
  • Flexible insight into any data source: a seamless analytics UI for both data stored within relational, OLAP, and big data stores.
  • Maximum performance—now designed to take advantage of 64-bit processors.

    Join the webinar and explore the world’s most advanced and affordable BI suite through live demos, interactive Q&A and more.

the upcoming Jaspersoft 4.1 Webinar and Demo. Here’s a preview of what you’ll see:

  • Unified drag-and-drop UI for reporting and analytics. Jaspersoft 4.1 delivers powerful, unified drag-and-drop usability to both ad hoc reporting and analysis tasks while supporting both in-memory and OLAP engines. That means easier, more powerful analysis for business users and a faster route to self-service BI.
  • Insight against any data source. The analytics UI provides easier, more powerful analysis across relational, OLAP, and Big Data stores.
  • High performance with native 64-bit installer support. With Jaspersoft 4.1, you can also scale BI applications to new performance heights, thanks to full support for today’s powerful 64-bit processors.

Join us for the webinar and see how quickly and easily BI Builders like you can deliver true, self-service BI, combining reports, dashboards, and analytics, against virtually any data source — all through a single, web-based UI.

Google releases V1.2 of Google Prediction API

Diagram showing overview of cloud computing in...
Image via Wikipedia

To join the preview group, go to the APIs Console and click the Prediction API slider to “ON,” and then sign up for a Google Storage account.

For the past several months, I have been member of a semi-public beta test/group/forum – that is headed by Travis Green of the Google Prediction API Team (not the hockey player). Basically in helping the Google guys more feedback on the feature list for model building via cloud computing. I couldn’t talk about it much , because it was all NDA hush hush.

Anyways- as of today the version 1.2 of Google Prediction API has been launched. What does this do to the ordinary Joe Modeler? Well it helps gives your models -thats right your plain vanilla logistic regression,arima, arimax, models an added ensemble option of using Google’s Machine Learning Continue reading “Google releases V1.2 of Google Prediction API”