Teradata Analytics

A recent announcement showing Teradata partnering with KXEN and Revolution Analytics for Teradata Analytics.

http://www.teradata.com/News-Releases/2012/Teradata-Expands-Integrated-Analytics-Portfolio/

The Latest in Open Source Emerging Software Technologies
Teradata provides customers with two additional open source technologies – “R” technology from Revolution Analytics for analytics and GeoServer technology for spatial data offered by the OpenGeo organization – both of which are able to leverage the power of Teradata in-database processing for faster, smarter answers to business questions.

In addition to the existing world-class analytic partners, Teradata supports the use of the evolving “R” technology, an open source language for statistical computing and graphics. “R” technology is gaining popularity with data scientists who are exploiting its new and innovative capabilities, which are not readily available. The enhanced “R add-on for Teradata” has a 50 percent performance improvement, it is easier to use, and its capabilities support large data analytics. Users can quickly profile, explore, and analyze larger quantities of data directly in the Teradata Database to deliver faster answers by leveraging embedded analytics.

Teradata has partnered with Revolution Analytics, the leading commercial provider of “R” technology, because of customer interest in high-performing R applications that deliver superior performance for large-scale data. “Our innovative customers understand that big data analytics takes a smart approach to the entire infrastructure and we will enable them to differentiate their business in a cost-effective way,” said David Rich, chief executive officer, Revolution Analytics. “We are excited to partner with Teradata, because we see great affinity between Teradata and Revolution Analytics – we embrace parallel computing and the high performance offered by multi-core and multi-processor hardware.”

and

The Teradata Data Lab empowers business users and leading analytic partners to start building new analytics in less than five minutes, as compared to waiting several weeks for the IT department’s assistance.

“The Data Lab within the Teradata database provides the perfect foundation to enable self-service predictive analytics with KXEN InfiniteInsight,” said John Ball, chief executive officer, KXEN. “Teradata technologies, combined with KXEN’s automated modeling capabilities and in-database scoring, put the power of predictive analytics and data mining directly into the hands of business users. This powerful combination helps our joint customers accelerate insight by delivering top-quality models in orders of magnitude faster than traditional approaches.”

Read more at

http://www.sacbee.com/2012/03/06/4315500/teradata-expands-integrated-analytics.html

Chrome Extension- MafiaaFire

The chrome extension MafiaaWire basically gives you an updated list of redirected websites. So the next time , your evil highness shuts down your favorite website- the list promises to give you an update.  While obviously entertainment intellectual property is a very obvious site category for such redirects, in some cases these extensions can be used for simple things like hosting dissents or protesters against govt corruption in non US countries .

Basically under the new SOPA act (an oline version of pepper spray http://en.wikipedia.org/wiki/Stop_Online_Piracy_Act) even browsers like Firefox and Chrome would be liable for any such extension that can be used to download American Intellectual property illegally.

In the meantime – this is an interesting and creative use case of technology and sociology merging in the brave new world.

You can read about it here-

http://en.wikipedia.org/wiki/MAFIAAFire_Redirector

MAFIAAFire works by downloading a list which contains the names of the “blocked” sites as well as the sites to redirect to. This list is downloaded every time Firefox starts up or every two days on the Chrome version (although the user has the choice to force an update on the Chrome version instead of waiting for two days).

When a user types in a domain name from the list of blocked domains, the add-on recognizes this and automatically redirects the user to the secondary site. Since this happens before the browser connects to the DNS server, this renders any DNS blocks useless.

Although the add-on checks for which sites are entered into the address bar every time (as it needs to check if that site is on its block list), it does not log these requests nor send these requests to any central server. In other words: it does not track the user.

or

Download it from

https://chrome.google.com/webstore/detail/hnifiobpjihmmjgiokkaalgomddebhng

Interesting times indeed!

Related-

Encryption

http://poemsforkush.wordpress.com/2011/12/17/encryption/

 

Interview Eberhard Miethke and Dr. Mamdouh Refaat, Angoss Software

Here is an interview with Eberhard Miethke and Dr. Mamdouh Refaat, of Angoss Software. Angoss is a global leader in delivering business intelligence software and predictive analytics solutions that help businesses capitalize on their data by uncovering new opportunities to increase sales and profitability and to reduce risk.

Ajay-  Describe your personal journey in software. How can we guide young students to pursue more useful software development than just gaming applications.

 Mamdouh- I started using computers long time ago when they were programmed using punched cards! First in Fortran, then C, later C++, and then the rest. Computers and software were viewed as technical/engineering tools, and that’s why we can still see the heavy technical orientation of command languages such as Unix shells and even in the windows Command shell. However, with the introduction of database systems and Microsoft office apps, it was clear that business will be the primary user and field of application for software. My personal trip in software started with scientific applications, then business and database systems, and finally statistical software – which you can think of it as returning to the more scientific orientation. However, with the wide acceptance of businesses of the application of statistical methods in different fields such as marketing and risk management, it is a fast growing field that in need of a lot of innovation.

Ajay – Angoss makes multiple data mining and analytics products. could you please introduce us to your product portfolio and what specific data analytics need they serve.

a- Attached please find our main product flyers for KnowledgeSTUDIO and KnowledgeSEEKER. We have a 3rd product called “strategy builder” which is an add-on to the decision tree modules. This is also described in the flyer.

(see- Angoss Knowledge Studio Product Guide April2011  and http://www.scribd.com/doc/63176430/Angoss-Knowledge-Seeker-Product-Guide-April2011  )

Ajay-  The trend in analytics is for big data and cloud computing- with hadoop enabling processing of massive data sets on scalable infrastructure. What are your plans for cloud computing, tablet based as well as mobile based computing.

a- This is an area where the plan is still being figured out in all organizations. The current explosion of data collected from mobile phones, text messages, and social websites will need radically new applications that can utilize the data from these sources. Current applications are based on the relational database paradigm designed in the 70’s through the 90’s of the 20th century.

But data sources are generating data in volumes and formats that are challenging this paradigm and will need a set of new tools and possibly programming languages to fit these needs. The cloud computing, tablet based and mobile computing (which are the same thing in my opinion, just different sizes of the device) are also two technologies that have not been explored in analytics yet.

The approach taken so far by most companies, including Angoss, is to rely on new xml-based standards to represent data structures for the particular models. In this case, it is the PMML (predictive modelling mark-up language) standard, in order to allow the interoperability between analytics applications. Standardizing on the representation of models is viewed as the first step in order to allow the implementation of these models to emerging platforms, being that the cloud or mobile, or social networking websites.

The second challenge cited above is the rapidly increasing size of the data to be analyzed. Angoss has already identified this challenge early on and is currently offering in-database analytics drivers for several database engines: Netezza, Teradata and SQL Server.

These drivers allow our analytics products to translate their routines into efficient SQL-based scripts that run in the database engine to exploit its performance as well as the powerful hardware on which it runs. Thus, instead of copying the data to a staging format for analytics, these drivers allow the data to be analyzed “in-place” within the database without moving it.

Thus offering performance, security and integrity. The performance is improved because of the use of the well tuned database engines running on powerful hardware.

Extra security is achieved by not copying the data to other platforms, which could be less secure. And finally, the integrity of the results are vastly improved by making sure that the results are always obtained by analyzing the up-to-date data residing in the database rather than an older copy of the data which could be obsolete by the time the analysis is concluded.

Ajay- What are the principal competing products to your offerings, and what makes your products special or differentiated in value to them (for each customer segment).

a- There are two major players in today’s market that we usually encounter as competitors, they are: SAS and IBM.

SAS offers a data mining workbench in the form of SAS Enterprise Miner, which is closely tied to SAS data mining methodology known as SEMMA.

On the other hand, IBM has recently acquired SPSS, which offered its Clementine data mining software. IBM has now rebranded Clementine as IBM SPSS Modeller.

In comparison to these products, our KnowledgeSTUDIO and KnowledgeSEEKER offer three main advantages: ease of use; affordability; and ease of integration into existing BI environments.

Angoss products were designed to look-and-feel-like popular Microsoft office applications. This makes the learning curve indeed very steep. Typically, an intermediate level analyst needs only 2-3 days of training to become proficient in the use of the software with all its advanced features.

Another important feature of Angoss software products is their integration with SAS/base product, and SQL-based database engines. All predictive models generated by Angoss can be automatically translated to SAS and SQL scripts. This allows the generation of scoring code for these common platforms. While the software interface simplifies all the tasks to allow business users to take advantage of the value added by predictive models, the software includes advanced options to allow experienced statisticians to fine-tune their models by adjusting all model parameters as needed.

In addition, Angoss offers a unique product called StrategyBuilder, which allows the analyst to add key performance indicators (KPI’s) to predictive models. KPI’s such as profitability, market share, and loyalty are usually required to be calculated in conjunction with any sales and marketing campaign. Therefore, StrategyBuilder was designed to integrate such KPI’s with the results of a predictive model in order to render the appropriate treatment for each customer segment. These results are all integrated into a deployment strategy that can also be translated into an execution code in SQL or SAS.

The above competitive features offered by the software products of Angoss is behind its success in serving over 4000 users from over 500 clients worldwide.

Ajay -Describe a major case study where using Angoss software helped save a big amount of revenue/costs by innovative data mining.

a-Rogers Telecommunications Inc. is one of the largest Canadian telecommunications providers, serving over 8.5 million customers and a revenue of 11.1 Billion Canadian Dollars (2009). In 2008, Rogers engaged Angoss in order to help with the problem of ballooning accounts receivable for a period of 18 months.

The problem was approached by improving the efficiency of the call centre serving the collections process by a set of predictive models. The first set of models were designed to find accounts likely to default ahead of time in order to take preventative measures. A second set of models were designed to optimize the call centre resources to focus on delinquent accounts likely to pay back most of the outstanding balance. Accounts that were identified as not likely to pack quickly were good candidates for “Early-out” treatment, by forwarding them directly to collection agencies. Angoss hosted Rogers’ data and provided on a regular interval the lists of accounts for each treatment to be deployed by the call centre dialler. As a result of this Rogers estimated an improvement of 10% of the collected sums.

Biography-

Mamdouh has been active in consulting, research, and training in various areas of information technology and software development for the last 20 years. He has worked on numerous projects with major organizations in North America and Europe in the areas of data mining, business analytics, business analysis, and engineering analysis. He has held several consulting positions for solution providers including Predict AG in Basel, Switzerland, and as ANGOSS Corp. Mamdouh is the Director of Professional services for EMEA region of ANGOSS Software. Mamdouh received his PhD in engineering from the University of Toronto and his MBA from the University of Leeds, UK.

Mamdouh is the author of:

"Credit Risk Scorecards: Development and Implmentation using SAS"
 "Data Preparation for Data Mining Using SAS",
 (The Morgan Kaufmann Series in Data Management Systems) (Paperback)
 and co-author of
 "Data Mining: Know it all",Morgan Kaufmann



Eberhard Miethke  works as a senior sales executive for Angoss

 

About Angoss-

Angoss is a global leader in delivering business intelligence software and predictive analytics to businesses looking to improve performance across sales, marketing and risk. With a suite of desktop, client-server and in-database software products and Software-as-a-Service solutions, Angoss delivers powerful approaches to turn information into actionable business decisions and competitive advantage.

Angoss software products and solutions are user-friendly and agile, making predictive analytics accessible and easy to use.

Privacy Browsing Extensions in Google Chrome

coat of arms of the Palaiologos dynasty, the l...
Image via Wikipedia

Using two Chrome Extensions, Disconnect and AdBlock you can be sure of having a vary very clean browsing experience-it is recommended especially if you dont like the auto sharing of your personal preferences and cannot be bothered by the Byzantine maze of social media privacy fineprint.

https://chrome.google.com/extensions/detail/jeoacafpbcihiomhlakheieifhpjdfeo

Disconnect by Brian Kennish

(184) – 44,284 users – Weekly installs: 24,086

Stop major third parties and search engines from tracking the webpages you go to and searches you do.

* Search depersonalization is now optional and off by default. Click the “d” button then the “Depersonalize searches” checkbox to turn this feature on (or back off in case you have trouble getting to Google or Yahoo services). For help with anything else, see the known issues below and ask questions at http://j.mp/dnewgroup.

§

If you’re a typical web user, you’re unintentionally sending your browsing and search history with your name and other personal information to third parties and search engines whenever you’re online.

Take control of the data you share with Disconnect!

From the developer of the top-10-rated Facebook Disconnect extension, Disconnect lets you:

• Disable tracking by third parties like Digg, Facebook, Google, Twitter, and Yahoo, without requiring any setup or significantly degrading the usability of the web.

• Truly depersonalize searches on search engines like Google and Yahoo (by blocking identifying cookies not just changing the appearance of results pages), while staying logged into other services — e.g., so you can search anonymously on Google and access iGoogle at once.

• See how many resource and cookie requests are blocked, in real time

and
https://chrome.google.com/extensions/detail/gighmmpiobklfepjocnamgkkbiglidom
ExtensionsAdBlock

AdBlock

(6937) - 1,615,373 users - Weekly installs: 153,032
The most popular Chrome extension, with over 1.5 million users! Blocks ads all over the web.
Verified author: chromeadblock.com
=================

New in version 2.1: Translated into dozens of languages!
New in version 2.0: Ads are blocked from downloading, instead of just being removed after the fact!

=======================

The official AdBlock For Chrome!  Block all advertisements on all web pages.  Your browser is automatically updated with additions to the filter: just click Install, then visit your favorite website and see the ads disappear!

FAQs:1. This is the official AdBlock extension: the original ad blocker written from the ground up to be optimized in Chrome.  There's an unrelated, older Firefox project called Adblock Plus, and they're working on making a Chrome version out of the old AdThwart codebase.  At the moment AdBlock blocks some ads that AdThwart only hides, but they're working to improve it.  It's available at bit.ly/id2Gqx; if you have trouble with AdBlock, they're good guys and a fine alternative!

 

Cloud Computing with R

Illusion of Depth and Space (4/22) - Rotating ...
Image by Dominic's pics via Flickr

Here is a short list of resources and material I put together as starting points for R and Cloud Computing It’s a bit messy but overall should serve quite comprehensively.

Cloud computing is a commonly used expression to imply a generational change in computing from desktop-servers to remote and massive computing connections,shared computers, enabled by high bandwidth across the internet.

As per the National Institute of Standards and Technology Definition,
Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.

(Citation: The NIST Definition of Cloud Computing

Authors: Peter Mell and Tim Grance
Version 15, 10-7-09
National Institute of Standards and Technology, Information Technology Laboratory
http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc)

R is an integrated suite of software facilities for data manipulation, calculation and graphical display.

From http://cran.r-project.org/doc/FAQ/R-FAQ.html#R-Web-Interfaces

R Web Interfaces

Rweb is developed and maintained by Jeff Banfield. The Rweb Home Page provides access to all three versions of Rweb—a simple text entry form that returns output and graphs, a more sophisticated JavaScript version that provides a multiple window environment, and a set of point and click modules that are useful for introductory statistics courses and require no knowledge of the R language. All of the Rweb versions can analyze Web accessible datasets if a URL is provided.
The paper “Rweb: Web-based Statistical Analysis”, providing a detailed explanation of the different versions of Rweb and an overview of how Rweb works, was published in the Journal of Statistical Software (http://www.jstatsoft.org/v04/i01/).

Ulf Bartel has developed R-Online, a simple on-line programming environment for R which intends to make the first steps in statistical programming with R (especially with time series) as easy as possible. There is no need for a local installation since the only requirement for the user is a JavaScript capable browser. See http://osvisions.com/r-online/ for more information.

Rcgi is a CGI WWW interface to R by MJ Ray. It had the ability to use “embedded code”: you could mix user input and code, allowing the HTMLauthor to do anything from load in data sets to enter most of the commands for users without writing CGI scripts. Graphical output was possible in PostScript or GIF formats and the executed code was presented to the user for revision. However, it is not clear if the project is still active.

Currently, a modified version of Rcgi by Mai Zhou (actually, two versions: one with (bitmap) graphics and one without) as well as the original code are available from http://www.ms.uky.edu/~statweb/.

CGI-based web access to R is also provided at http://hermes.sdu.dk/cgi-bin/go/. There are many additional examples of web interfaces to R which basically allow to submit R code to a remote server, see for example the collection of links available from http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/StatCompCourse.

David Firth has written CGIwithR, an R add-on package available from CRAN. It provides some simple extensions to R to facilitate running R scripts through the CGI interface to a web server, and allows submission of data using both GET and POST methods. It is easily installed using Apache under Linux and in principle should run on any platform that supports R and a web server provided that the installer has the necessary security permissions. David’s paper “CGIwithR: Facilities for Processing Web Forms Using R” was published in the Journal of Statistical Software (http://www.jstatsoft.org/v08/i10/). The package is now maintained by Duncan Temple Lang and has a web page athttp://www.omegahat.org/CGIwithR/.

Rpad, developed and actively maintained by Tom Short, provides a sophisticated environment which combines some of the features of the previous approaches with quite a bit of JavaScript, allowing for a GUI-like behavior (with sortable tables, clickable graphics, editable output), etc.
Jeff Horner is working on the R/Apache Integration Project which embeds the R interpreter inside Apache 2 (and beyond). A tutorial and presentation are available from the project web page at http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RApacheProject.

Rserve is a project actively developed by Simon Urbanek. It implements a TCP/IP server which allows other programs to use facilities of R. Clients are available from the web site for Java and C++ (and could be written for other languages that support TCP/IP sockets).

OpenStatServer is being developed by a team lead by Greg Warnes; it aims “to provide clean access to computational modules defined in a variety of computational environments (R, SAS, Matlab, etc) via a single well-defined client interface” and to turn computational services into web services.

Two projects use PHP to provide a web interface to R. R_PHP_Online by Steve Chen (though it is unclear if this project is still active) is somewhat similar to the above Rcgi and Rweb. R-php is actively developed by Alfredo Pontillo and Angelo Mineo and provides both a web interface to R and a set of pre-specified analyses that need no R code input.

webbioc is “an integrated web interface for doing microarray analysis using several of the Bioconductor packages” and is designed to be installed at local sites as a shared computing resource.

Rwui is a web application to create user-friendly web interfaces for R scripts. All code for the web interface is created automatically. There is no need for the user to do any extra scripting or learn any new scripting techniques. Rwui can also be found at http://rwui.cryst.bbk.ac.uk.

Finally, the R.rsp package by Henrik Bengtsson introduces “R Server Pages”. Analogous to Java Server Pages, an R server page is typically HTMLwith embedded R code that gets evaluated when the page is requested. The package includes an internal cross-platform HTTP server implemented in Tcl, so provides a good framework for including web-based user interfaces in packages. The approach is similar to the use of the brew package withRapache with the advantage of cross-platform support and easy installation.

Also additional R Cloud Computing Use Cases
http://wwwdev.ebi.ac.uk/Tools/rcloud/

ArrayExpress R/Bioconductor Workbench

Remote access to R/Bioconductor on EBI’s 64-bit Linux Cluster

Start the workbench by downloading the package for your operating system (Macintosh or Windows), or via Java Web Start, and you will get access to an instance of R running on one of EBI’s powerful machines. You can install additional packages, upload your own data, work with graphics and collaborate with colleagues, all as if you are running R locally, but unlimited by your machine’s memory, processor or data storage capacity.

  • Most up-to-date R version built for multicore CPUs
  • Access to all Bioconductor packages
  • Access to our computing infrastructure
  • Fast access to data stored in EBI’s repositories (e.g., public microarray data in ArrayExpress)

Using R Google Docs
http://www.omegahat.org/RGoogleDocs/run.pdf
It uses the XML and RCurl packages and illustrates that it is relatively quick and easy
to use their primitives to interact with Web services.

Using R with Amazon
Citation
http://rgrossman.com/2009/05/17/running-r-on-amazons-ec2/

Amazon’s EC2 is a type of cloud that provides on demand computing infrastructures called an Amazon Machine Images or AMIs. In general, these types of cloud provide several benefits:

  • Simple and convenient to use. An AMI contains your applications, libraries, data and all associated configuration settings. You simply access it. You don’t need to configure it. This applies not only to applications like R, but also can include any third-party data that you require.
  • On-demand availability. AMIs are available over the Internet whenever you need them. You can configure the AMIs yourself without involving the service provider. You don’t need to order any hardware and set it up.
  • Elastic access. With elastic access, you can rapidly provision and access the additional resources you need. Again, no human intervention from the service provider is required. This type of elastic capacity can be used to handle surge requirements when you might need many machines for a short time in order to complete a computation.
  • Pay per use. The cost of 1 AMI for 100 hours and 100 AMI for 1 hour is the same. With pay per use pricing, which is sometimes called utility pricing, you simply pay for the resources that you use.

Connecting to R on Amazon EC2- Detailed tutorials
Ubuntu Linux version
https://decisionstats.com/2010/09/25/running-r-on-amazon-ec2/
and Windows R version
https://decisionstats.com/2010/10/02/running-r-on-amazon-ec2-windows/

Connecting R to Data on Google Storage and Computing on Google Prediction API
https://github.com/onertipaday/predictionapirwrapper
R wrapper for working with Google Prediction API

This package consists in a bunch of functions allowing the user to test Google Prediction API from R.
It requires the user to have access to both Google Storage for Developers and Google Prediction API:
see
http://code.google.com/apis/storage/ and http://code.google.com/apis/predict/ for details.

Example usage:

#This example requires you had previously created a bucket named data_language on your Google Storage and you had uploaded a CSV file named language_id.txt (your data) into this bucket – see for details
library(predictionapirwrapper)

and Elastic R for Cloud Computing
http://user2010.org/tutorials/Chine.html

Abstract

Elastic-R is a new portal built using the Biocep-R platform. It enables statisticians, computational scientists, financial analysts, educators and students to use cloud resources seamlessly; to work with R engines and use their full capabilities from within simple browsers; to collaborate, share and reuse functions, algorithms, user interfaces, R sessions, servers; and to perform elastic distributed computing with any number of virtual machines to solve computationally intensive problems.
Also see Karim Chine’s http://biocep-distrib.r-forge.r-project.org/

R for Salesforce.com

At the point of writing this, there seem to be zero R based apps on Salesforce.com This could be a big opportunity for developers as both Apex and R have similar structures Developers could write free code in R and charge for their translated version in Apex on Salesforce.com

Force.com and Salesforce have many (1009) apps at
http://sites.force.com/appexchange/home for cloud computing for
businesses, but very few forecasting and statistical simulation apps.

Example of Monte Carlo based app is here
http://sites.force.com/appexchange/listingDetail?listingId=a0N300000016cT9EAI#

These are like iPhone apps except meant for business purposes (I am
unaware if any university is offering salesforce.com integration
though google apps and amazon related research seems to be on)

Force.com uses a language called Apex  and you can see
http://wiki.developerforce.com/index.php/App_Logic and
http://wiki.developerforce.com/index.php/An_Introduction_to_Formulas
Apex is similar to R in that is OOPs

SAS Institute has an existing product for taking in Salesforce.com data.

A new SAS data surveyor is
available to access data from the Customer Relationship Management
(CRM) software vendor Salesforce.com. at
http://support.sas.com/documentation/cdl/en/whatsnew/62580/HTML/default/viewer.htm#datasurveyorwhatsnew902.htm)

Personal Note-Mentioning SAS in an email to a R list is a big no-no in terms of getting a response and love. Same for being careless about which R help list to email (like R devel or R packages or R help)

For python based cloud see http://pi-cloud.com