open source – Page 4 – DECISION STATS

PMML Augustus

Here is a new-old system in open source for

for building and scoring statistical models designed to work with data sets that are too large to fit into memory.

http://code.google.com/p/augustus/

Augustus is an open source software toolkit for building and scoring statistical models. It is written in Python and its
most distinctive features are:
• Ability to be used on sets of big data; these are data sets that exceed either memory capacity or disk capacity, so
that existing solutions like R or SAS cannot be used. Augustus is also perfectly capable of handling problems
that can fit on one computer.
• PMML compliance and the ability to both:
– produce models with PMML-compliant formats (saved with extension .pmml).
– consume models from files with the PMML format.
Augustus has been tested and deployed on serveral operating systems. It is intended for developers who work in the
financial or insurance industry, information technology, or in the science and research communities.
Usage
Augustus produces and consumes Baseline, Cluster, Tree, and Ruleset models. Currently, it uses an event-based
approach to building Tree, Cluster and Ruleset models that is non-standard.

New to PMML ?

Read on http://code.google.com/p/augustus/wiki/PMML

The Predictive Model Markup Language or PMML is a vendor driven XML markup language for specifying statistical and data mining models. In other words, it is an XML language so that Continue reading “PMML Augustus”

Revolution Webinar Series #Rstats

Revolution Analytics Webinar-

Featured Webinar


	David Champagne CTO, Revolution Analytics
	Tuesday, December 20th
	11:00AM – 11:30AM Pacific Click here for the webinar time in your local time zone

Big Data Starts with R

Traditional IT infrastructure is simply unable to meet

the demands of the new “Big Data Analytics” landscape. Many enterprises are turning to the “R” statistical programming language and Hadoop (both open source projects) as a potential solution. This webinar will introduce the statistical capabilities of R within the Hadoop ecosystem. We’ll cover:

An introduction to new packages developed by Revolution Analytics to facilitate interaction with the data stores HDFS and HBase so that they can be leveraged from the R environment
An overview of how to write Map Reduce jobs in R using Hadoop
Special considerations that need to be made when working with R and Hadoop.

We’ll also provide additional resources that are available to people interested in integrating R and Hadoop.

Upcoming Webinars

Wed, Dec 14th
11:00AM – 11:30AM PT

Revolution R Enterprise – 100% R and MoreR users already know why the R language is the lingua franca of statisticians today: because it’s the most powerful statistical language in the world. Revolution Analytics builds on the power of open source R, and adds performance, productivity and integration features to create Revolution R Enterprise. In this webinar, author and blogger David Smith will introduce the additional capabilities of Revolution R Enterprise.

Archived Webinars-

Revolution Webinar: New Features in Revolution R Enterprise 5.0 (including RevoScaleR) to Support Scalable Data AnalysisRevolution R Enterprise 5.0 is Revolution Analytics’ scalable analytics platform. At its core is Revolution Analytics’ enhanced Distribution of R, the world’s most widely-used project for statistical computing. In this webinar, Dr. Ranney will discuss new features and show examples of the new functionality, which extend the platform’s usability, integration and scalability

Webinar: Using R within Oracle #rstats

Webinar: Using R within Oracle — Nov 30, noon EST

==========================================

Oracle now supports the R open source statistical programming language. Come to this webinar to learn more about using R within an Oracle environment.

— URL for TechCast: https://stbeehive.oracle.com/bconf/confDetails?confID=334B:3BF0:owch:38893C00F42F38A1E0404498C8A6612B0004075AECF7&guest=true&confKey=608880
— Web Conference ID: 303397
— Web Conference Key: 608880
— Dialup: 1-866-682-4770 , ID 5548204, passcode 1234

After a steady rise in the past few years, in 2010 the open source data mining software R overtook other tools to become the tool used by more data miners (43%) than any other (http://www.rexeranalytics.com/Data-Miner-Survey-Results-2010.html).

Several analytic tool vendors have added R-integration to their software. However, Oracle is the largest company to throw their weight behind R. On October 3, Oracle unveiled their integration of R: Oracle R Enterprise (http://www.oracle.com/us/corporate/features/features-oracle-r-enterprise-498732.html) as part of their Oracle Big Data Appliance announcement (http://www.oracle.com/us/corporate/press/512001).

Oracle R Enterprise allows users to perform statistical analysis with advanced visualization on data stored in Oracle Database. Oracle R Enterprise enables scalable R solutions, while facilitating production deployment of R scripts and Hadoop based solutions, as well as integration of R results with Oracle BI Publisher and OBIEE dashboards.

This TechCast introduces the various Oracle R Enterprise components and features, along with R script demonstrations that interface with Oracle Database.

TechCast presenter: Mark Hornick, Senior Manager, Oracle Advanced Analytics Development.
This TechCast is part of the ongoing TechCasts series coordinated by Oracle BIWA: The BI, Warehousing and Analytics SIG (http://www.oracleBIWA.org).

Secure Browsing from Mobile and PC ( Tor ,PeerNet, WasteAgain)

While Tor remains the tool of choice with pseudo-techie hacker wannabes , there is enough juice and smoke and mirrors on the market to confuse your average Joe.

For a secure browsing experience on Mobile – do NOT use either Apple or Windows OS

Use Android and this app called Orbot in particular

Installing Tor with a QR code

Orbot is easy to install by simply scanning the following QR code with your Android Barcode scanner.

Installing Tor from the Android Market

Orbot is available in the Android Market.

ENTER PEERNET

If you have a Dell PC, well just use PeerNet to configure and set up your own network around the neighbourhood. This is particularly applicable if you are in country that is both repressive and not so technologically advanced. Wont work in China or USA.

http://support.dell.com/support/edocs/network/p70008/EN/vista_7/peernet.htm

What is a peer network?

A peer network is a network in which one computer can connect directly to another computer. This capability is accomplished by enabling access point (AP) functionality on one of the computers. Other computers can then connect to this computer in the same way that they would connect to a physical AP. If Internet Connection Sharing is enabled on the computer that has the AP functionality, computers that connect to that computer have Internet connectivity as well.

A basic peer network, which requires no networking knowledge or experience to set up, should meet the needs of most home users and small businesses. By default, a basic peer network is configured with the strongest available security (see How do I set up a basic peer network?).

For users who are familiar with wireless networking technology, advanced configuration features are available to do the following:

• Change security settings (see How do I configure my peer network?)

• Choose which method (push button or PIN) computers with Wi-Fi Protected Setup™ capability can join your peer network (see How do I allow peer devices to join my peer network using Wi-Fi Protected Setup technology?)

• Change the DHCP Server IP address (see How do I configure my peer network?).

• Change the channel on which to operate your peer network (see How do I configure my peer network?)

If you are really really in a need for secure browsing (like you are maybe a big hot shot in the tech world), I suggest go over to VMWare

http://www.vmware.com/products/player/

create a seperate Linux (Ubuntu for ease) virtual disc, then download the Tor Browser Bundle from

https://www.torproject.org/projects/torbrowser.html.en for surfing and a Peernet (above) or a prepaid one time use disposable mobile pre-paid wireless card. It is also quite easy to delete your virtual disc in times of emergencies (but it is best to use encryption even when in Ubuntu https://help.ubuntu.com/community/EncryptedHome)

IRC chat is less secure than you think it is thanks to BOT Trawlers- so I am hoping someone in the open source community updates Waste Again for encrypted chats http://wasteagain.sourceforge.net/

What is “WASTE again”?

“WASTE again” enables you to create a decentralized and secure private mesh network using an unsecure network, such as the internet. Once the public encryption keys are exchanged, sending messages, creating groupchats and transferring files is easy and secure.

Creating a mesh

To create a mesh you need at least two computers with “WASTE again” installed. During installation, a unique pair of public and private keys for each computer is being generated. Before the first connection can be established, you need to exchange these public keys. These keys enable “WASTE again” to authenticate every connection to other “WASTE again” clients.

After exchanging the keys, you simply type in the computers IP address to connect to. If that computer is located behind a firewall or a NAT-router, you have to create a portmap first to enable incoming connections.

At least one computer in your mesh has to be able to accept incoming connections, making it a “public node”. If no direct connection between two firewalled computers can be made, “WASTE again” automatically routes your traffic through one or more of the available public nodes.

Every new node simply has to exchange keys with one of the connected nodes and then connect to it. All the other nodes will exchange their keys automatically over the mesh.

Google Cloud SQL

Another xing bang API from the boyz in Mountain View. (entry by invite only) But it is free and you can test your stuff on a MySQL db =10 GB

Database as a service ? (Maybe)— while Amazon was building fires (and Fire)

—————————————————————–

https://code.google.com/apis/sql/index.html

What is Google Cloud SQL?

Google Cloud SQL is a web service that provides a highly available, fully-managed, hosted SQL storage solution for your App Engine applications.

What are the benefits of using Google Cloud SQL?

You can access a familiar, highly available SQL database from your App Engine applications, without having to worry about provisioning, management, and integration with other Google services.

How much does Google Cloud SQL cost?

We will not be billing for this service in 2011. We will give you at least 30 days’ advance notice before we begin billing in the future. Other services such as Google App Engine, Google Cloud Storage etc. that you use with Google Cloud SQL may have their own payment terms, and you need to pay for them. Please consult their documentation for details.

Currently you are limited to the three instance sizes. What if I need to store more data or need better performance?

In the Limited Preview period, we only have three sizes available. If you have specific needs, we would like to hear from you on our google-cloud-sqldiscussion board.

When is Google Cloud SQL be out of Limited Preview?

We are working hard to make the service generally available.We don’t have a firm date that we can announce right now.

Do you support all the features of MySQL?

In general, Google Cloud SQL supports all the features of MySQL. The following are lists of all the unsupported features and notable differences that Google Cloud SQL has from MySQL.

Unsupported Features:

User defined functions
MySql replication

Unsupported MySQL statements:

LOAD DATA INFILE
SELECT ... INTO OUTFILE
SELECT ... INTO DUMPFILE
INSTALL PLUGIN .. SONAME ...
UNINSTALL PLUGIN
CREATE FUNCTION ... SONAME ...

Unsupported SQL Functions:

LOAD_FILE()

Notable Differences:

If you want to import databases with binary data into your Google Cloud SQL instance, you must use the --hex-blob option with mysqldump.Although this is not a required flag when you are using a local MySQL server instance and the MySQL command line, it is required if you want to import any databases with binary data into your Google Cloud SQL instance. For more information, see Importing Data.

How large a database can I use with Google Cloud SQL?

Currently, in this limited preview period, your database instance must be no larger than 10GB.

How can I be notified when there are any changes to Google Cloud SQL?

You can sign up for the sql-announcements forum where we post announcements and news about the Google Cloud SQL.

How can I cancel my Google Cloud SQL account?

To remove all data from your Google Cloud SQL account and disable the service:

Delete all your data. You can remove your tables, databases, and indexes using the drop command. For more information, see SQL DROP statement.
Deactivate the Google Cloud SQL by visiting the Services pane and clicking the On button next to Google Cloud SQL. The button changes from Onto Off.

How do I report a bug, request a feature, or ask a question?

You can report bugs and request a feature on our project page.You can ask a question in our discussion forum.

Getting Started

Can I use languages other than Java or Python?: Only Java and Python are supported for Google Cloud SQL.
Can I use Google Cloud SQL outside of Google App Engine?: The Limited Preview is primarily focused on giving Google App Engine customers the ability to use a familiar relational database environment. Currently, you cannot access Google Cloud SQL from outside Google App Engine.
What database engine are we using in the Google Cloud SQL?: MySql Version 5.1.59
Do I need to install a local version of MySQL to use the Development Server?: Yes.

Managing Your Instances

Do I need to use the Google APIs Console to use Google Cloud SQL?: Yes. For basic tasks like granting access control to applications, creating instances, and deleting instances, you need to use the Google APIs Console.
Can I import or export specific databases?: No, currently it is not possible to export specific databases. You can only export your entire instance.
Do I need a Google Cloud Storage account to import or export my instances?: Yes, you need to sign up for a Google Cloud Storage account or have access to a Google Cloud Storage account to import or export your instances. For more information, see Importing and Exporting Data.
If I delete my instance, can I reuse the instance name?
: Yes, but not right away. The instance name is reserved for up to two months before it can be reused.

Tools & Resources

Can I use Django with Google Cloud SQL?: No, currently Google Cloud SQL is not compatible with Django.
What is the best tool to use for interacting with my instance?: There are a variety of tools available for Google Cloud SQL. For executing simple statements, you can use the SQL prompt. For executing more complicated tasks, you might want to use the command line tool. If you want to use a tool with a graphical interface, the SQuirrel SQL Client provides an interface you can use to interact with your instance.

Common Technical Questions

Should I use InnoDB for my tables?

Yes. InnoDB is the default storage engine in MySQL 5.5 and is also the recommended storage engine for Google Cloud SQL. If you do not need any features that require MyISAM, you should use InnoDB. You can convert your existing tables using the following SQL command, replacing tablename with the name of the table to convert:

ALTER tablename ENGINE = InnoDB;

If you have a mysqldump file where all your tables are in MyISAM format, you can convert them by piping the file through a sed script:

mysqldump --databases database_name [-u username -p  password] --hex-blob database_name | sed 's/ENGINE=MyISAM/ENGINE=InnoDB/g' > database_file.sql

Warning: You should not do this if your mysqldump file contains the mysql schema. Those files must remain in MyISAM.

Are there any size or QPS limits?

Yes, the following limits apply to Google Cloud SQL:

Resource	Limits from External Requests	Limits from Google App Engine
Queries Per Second (QPS)	5 QPS	No limit
Maximum Request Size	16 MB
Maximum Response Size	16 MB

Google App Engine Limits

Google App Engine applications are also subject to additional Google App Engine quotas and limits. Requests from Google App Engine applications to Google Cloud SQL are subject to the following time limits:

All database requests must finish within the HTTP request timer, around 60 seconds.
Offline requests like cron tasks have a time limit of 10 minutes.
Backend requests to Google Cloud SQL have a time limit of 10 minutes.

App Engine-specific quotas and access limits are discussed on the Google App Engine Quotas page.

Should I use Google Cloud SQL with my non-High Replication App Engine application?

We recommend that you use Google Cloud SQL with High Replication App Engine applications. While you can use use Google Cloud SQL with applications that do not use high replication, doing so might impact performance.

Source-

https://code.google.com/apis/sql/faq.html#supportmysqlfeatures

Knowledge Discovery in Databases -KDD using PostgreSQL and #Rstats

Here is a small brief primer for beginners on configuring an open source database and using an open source analytics package.

All you need to know – is to read!

1. download PostgreSQL from
http://www.postgresql.org/download/windowsInstall PostgreSQL

Remember to store /memorize the password for the user postgres!

Create a connection using pgAdmin feature in Start Menu

2. download ODBC driver from
http://www.postgresql.org/ftp/odbc/versions/msi/
and the Win 64 edition from
http://wwwmaster.postgresql.org/download/mirrors-ftp/odbc/versions/msi/psqlodbc_09_00_0310-x64.zip

install ODBC driver

3. Go to

Start Menu\Control Panel\All Control Panel Items\Administrative Tools\Data Sources (ODBC)

4. Configure the following details in System DSN and User DSN using the ADD tabs .Test connection to check if connection is working

5. Start R and install and load library RODBC

6. Use following initial code for R- if you know SQL you can do the rest
> library(RODBC)

> odbcDataSources(type = c(“all”, “user”, “system”))
SQLServer PostgreSQL30 PostgreSQL35W
“SQL Server” “PostgreSQL ANSI(x64)” “PostgreSQL Unicode(x64)”

> ajay=odbcConnect(“PostgreSQL30”, uid = “postgres”, pwd = “XX”)

> sqlTables(ajay)
TABLE_QUALIFIER TABLE_OWNER TABLE_NAME TABLE_TYPE REMARKS
1 postgres public names TABLE

> crimedat <- sqlFetch(ajay, “names”)

The all new Blogging in Blogger

I had given up on Blogspot ever having a makeover in favor of the nice themes at

wordpress, but man, the new CEO at google is really shaking some stuff here.

Check out the nice features for customizing the themes at Blogspot

Continue reading “The all new Blogging in Blogger”

Please share:

Please share:

Please share:

What is a peer network?

What is “WASTE again”?

Creating a mesh

Please share:

Getting Started

Managing Your Instances

Tools & Resources

Common Technical Questions

Please share:

Please share:

Please share: