Apps for Google Drive

I kind of liked the fact that Google Drive has a lot of apps already- even though it is quite young.

Especially the mechanical engineer in me liked the AutoCAD app and the video editing apps, the online bitcoin wallet, free project scheduling app, the cloud’s first (?) open office document reader and etc

Developers would especially like playing with the OAuth Playground app for Google Drive on the Google Chrome platform.

Check out  for yourself.

https://chrome.google.com/webstore/category/collection/drive_apps

Possible Digital Disruptions by Cyber Actors in USA Electoral Cycle

Some possible electronic disruptions  that threaten to disrupt the electoral cycle in United States of America currently underway is-

1) Limited Denial of Service Attacks (like for 5-8 minutes) on fund raising websites, trying to fly under the radar of network administrators to deny the targeted  fundraising website for a small percentage of funds . Money remains critical to the world’s most expensive political market. Even a 5% dropdown in online fund-raising capacity can cripple a candidate.

2)  Limited Man of the Middle  Attacks on ground volunteers to disrupt ,intercept and manipulate communication flows. Basically cyber attacks at vulnerable ground volunteers in critical counties /battleground /swing states (like Florida)

3) Electro-Magnetic Disruptions of Electronic Voting Machines in critical counties /swing states (like Florida) to either disrupt, manipulate or create an impression that some manipulation has been done.

4) Use search engine flooding (for search engine de-optimization of rival candidates keywords), and social media flooding for disrupting the listening capabilities of sentiment analysis.

5) Selected leaks (including using digital means to create authetntic, fake or edited collateral) timed to embarrass rivals or influence voters , this can be geo-coded and mass deployed.

6) using Internet communications to selectively spam or influence independent or opinionated voters through emails, short messaging service , chat channels, social media.

7) Disrupt the Hillary for President 2016 campaign by Anonymous-Wikileak sympathetic hacktivists.

 

 

Revolution R Enterprise 6.0 launched!

Just got the email-more software is good news!

Revolution R Enterprise 6.0 for 32-bit and 64-bit Windows and 64-bit Red Hat Enterprise Linux (RHEL 5.x and RHEL 6.x) features an updated release of the RevoScaleR package that provides fast, scalable data management and data analysis: the same code scales from data frames to local, high-performance .xdf files to data distributed across a Windows HPC Server cluster or IBM Platform Computing LSF cluster.  RevoScaleR also allows distribution of the execution of essentially any R function across cores and nodes, delivering the results back to the user.

Detailed information on what’s new in 6.0 and known issues:
http://www.revolutionanalytics.com/doc/README_RevoEnt_Windows_6.0.0.pdf

and from the manual-lots of function goodies for Big Data

 

  • IBM Platform LSF Cluster support [Linux only]. The new RevoScaleR function, RxLsfCluster, allows you to create a distributed compute context for the Platform LSF workload manager.
  •  Azure Burst support added for Microsoft HPC Server [Windows only]. The new RevoScaleR function, RxAzureBurst, allows you to create a distributed compute context to have computations performed in the cloud using Azure Burst
  • The rxExec function allows distributed execution of essentially any R function across cores and nodes, delivering the results back to the user.
  • functions RxLocalParallel and RxLocalSeq allow you to create compute context objects for local parallel and local sequential computation, respectively.
  • RxForeachDoPar allows you to create a compute context using the currently registered foreach parallel backend (doParallel, doSNOW, doMC, etc.). To execute rxExec calls, simply register the parallel backend as usual, then set your compute context as follows: rxSetComputeContext(RxForeachDoPar())
  • rxSetComputeContext and rxGetComputeContext simplify management of compute contexts.
  • rxGlm, provides a fast, scalable, distributable implementation of generalized linear models. This expands the list of full-featured high performance analytics functions already available: summary statistics (rxSummary), cubes and cross tabs (rxCube,rxCrossTabs), linear models (rxLinMod), covariance and correlation matrices (rxCovCor),
    binomial logistic regression (rxLogit), and k-means clustering (rxKmeans)example: a Tweedie family with 1 million observations and 78 estimated coefficients (categorical data)
    took 17 seconds with rxGlm compared with 377 seconds for glm on a quadcore laptop

     

    and easier working with R’s big brother SAS language

     

    RevoScaleR high-performance analysis functions will now conveniently work directly with a variety of external data sources (delimited and fixed format text files, SAS files, SPSS files, and ODBC data connections). New functions are provided to create data source objects to represent these data sources (RxTextData, RxOdbcData, RxSasData, and RxSpssData), which in turn can be specified for the ‘data’ argument for these RevoScaleR analysis functions: rxHistogramrxSummary, rxCube, rxCrossTabs, rxLinMod, rxCovCor, rxLogit, and rxGlm.


    example, 

    you can analyze a SAS file directly as follows:


    # Create a SAS data source with information about variables and # rows to read in each chunk

    sasDataFile <- file.path(rxGetOption(“sampleDataDir”),”claims.sas7bdat”)
    sasDS <- RxSasData(sasDataFile, stringsAsFactors = TRUE,colClasses = c(RowNum = “integer”),rowsPerRead = 50)

    # Compute and draw a histogram directly from the SAS file
    rxHistogram( ~cost|type, data = sasDS)
    # Compute summary statistics
    rxSummary(~., data = sasDS)
    # Estimate a linear model
    linModObj <- rxLinMod(cost~age + car_age + type, data = sasDS)
    summary(linModObj)
    # Import a subset into a data frame for further inspection
    subData <- rxImport(inData = sasDS, rowSelection = cost > 400,
    varsToKeep = c(“cost”, “age”, “type”))
    subData

 

The installation instructions and instructions for getting started with Revolution R Enterprise & RevoDeployR for Windows: http://www.revolutionanalytics.com/downloads/instructions/windows.php

Interview Alvaro Tejada Galindo, SAP Labs Montreal, Using SAP Hana with #Rstats

Here is a brief interview with Alvaro Tejada Galindo aka Blag who is a developer working with SAP Hana and R at SAP Labs, Montreal. SAP Hana is SAP’s latest offering in BI , it’s also a database and a computing environment , and using R and HANA together on the cloud can give major productivity gains in terms of both speed and analytical ability, as per preliminary use cases.

Ajay- Describe how you got involved with databases and R language.
Blag-  I used to work as an ABAP Consultant for 11 years, but also been involved with programming since the last 13 years, so I was in touch with SQLServer, Oracle, MySQL and SQLite. When I joined SAP, I heard that SAP HANA was going to use an statistical programming language called “R”. The next day I started my “R” learning.

Ajay- What made the R language a fit for SAP HANA. Did you consider other languages? What is your view on Julia/Python/SPSS/SAS/Matlab languages

Blag- I think “R” is a must for SAP HANA. As the fastest database in the market, we needed a language that could help us shape the data in the best possible way. “R” filled that purpose very well. Right now, “R” is not the only language as “L” can be used as well (http://wiki.tcl.tk/17068) …not forgetting “SQLScript” which is our own version of SQL (http://goo.gl/x3bwh) . I have to admit that I tried Julia, but couldn’t manage to make it work. Regarding Python, it’s an interesting question as I’m going to blog about Python and SAP HANA soon. About Matlab, SPSS and SAS I haven’t used them, so I got nothing to say there.

Ajay- What is your view on some of the limitations of R that can be overcome with using it with SAP HANA.

Blag-  I think mostly the ability of SAP HANA to work with big data. Again, SAP HANA and “R” can work very nicely together and achieve things that weren’t possible before.

Ajay-  Have you considered other vendors of R including working with RStudio, Revolution Analytics, and even Oracle R Enterprise.

Blag-  I’m not really part of the SAP HANA or the R groups inside SAP, so I can’t really comment on that. I can only say that I use RStudio every time I need to do something with R. Regarding Oracle…I don’t think so…but they can use any of our products whenever they want.

Ajay- Do you have a case study on an actual usage of R with SAP HANA that led to great results.

Blag-   Right now the use of “R” and SAP HANA is very preliminary, I don’t think many people has start working on it…but as an example that it works, you can check this awesome blog entry from my friend Jitender Aswani “Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps “ (http://allthingsr.blogspot.com/#!/2012/04/big-data-r-and-hana-analyze-200-million.html)

Ajay- Does your group in SAP plan to give to the R ecosystem by attending conferences like UseR 2012, sponsoring meets, or package development etc

Blag- My group is in charge of everything developers, so sure, we’re planning to get more in touch with R developers and their ecosystem. Not sure how we’re going to deal with it, but at least I’m going to get myself involved in the Montreal R Group.

 

About-

http://scn.sap.com/people/alvaro.tejadagalindo3

Name: Alvaro Tejada Galindo
Email: a.tejada.galindo@sap.com
Profession: Development
Company: SAP Canada Labs-Montreal
Town/City: Montreal
Country: Canada
Instant Messaging Type: Twitter
Instant Messaging ID: Blag
Personal URL: http://blagrants.blogspot.com
Professional Blog URL: http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/u/252210910
My Relation to SAP: employee
Short Bio: Development Expert for the Technology Innovation and Developer Experience team.Used to be an ABAP Consultant for the last 11 years. Addicted to programming since 1997.

http://www.sap.com/solutions/technology/in-memory-computing-platform/hana/overview/index.epx

and from

http://en.wikipedia.org/wiki/SAP_HANA

SAP HANA is SAP AG’s implementation of in-memory database technology. There are four components within the software group:[1]

  • SAP HANA DB (or HANA DB) refers to the database technology itself,
  • SAP HANA Studio refers to the suite of tools provided by SAP for modeling,
  • SAP HANA Appliance refers to HANA DB as delivered on partner certified hardware (see below) as anappliance. It also includes the modeling tools from HANA Studio as well replication and data transformation tools to move data into HANA DB,[2]
  • SAP HANA Application Cloud refers to the cloud based infrastructure for delivery of applications (typically existing SAP applications rewritten to run on HANA).

R is integrated in HANA DB via TCP/IP. HANA uses SQL-SHM, a shared memory-based data exchange to incorporate R’s vertical data structure. HANA also introduces R scripts equivalent to native database operations like join or aggregation.[20] HANA developers can write R scripts in SQL and the types are automatically converted in HANA. R scripts can be invoked with HANA tables as both input and output in the SQLScript. R environments need to be deployed to use R within SQLScript

More blog posts on using SAP and R together

Dealing with R and HANA

http://scn.sap.com/community/in-memory-business-data-management/blog/2011/11/28/dealing-with-r-and-hana
R meets HANA

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/01/29/r-meets-hana

HANA meets R

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/01/26/hana-meets-r
When SAP HANA met R – First kiss

http://scn.sap.com/community/developer-center/hana/blog/2012/05/21/when-sap-hana-met-r–first-kiss

 

Using RODBC with SAP HANA DB-

SAP HANA: My experiences on using SAP HANA with R

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/02/21/sap-hana-my-experiences-on-using-sap-hana-with-r

and of course the blog that started it all-

Jitender Aswani’s http://allthingsr.blogspot.in/

 

 

How big is R on CRAN #rstats

3.87 GB and 3786 packages. Thats what you need to install the whole of R as on CRAN

( Note- Many IT administrators /Compliance Policies in enterprises forbid installing from the Internet in work offices.

Which is where the analytics,$$, and people are)

As downloaded from the CRAN Mirror at UCLA.

Takes 3 hours to download at 1 mbps (I was on an Amazon Ec2 instance)

See screenshot.

Next question- who is the man responsible in the R project for deleting old /depreciated/redundant packages if the authors dont do it.

 

Data Quality in R #rstats

Many Data Quality Formats give problems when importing in your statistical software.A statistical software is quite unable to distingush between $1,000, 1000% and 1,000 and 1000 and will treat the former three as character variables while the third as a numeric variable by default. This issue is further compounded by the numerous ways we can represent date-time variables.

The good thing is for specific domains like finance and web analytics, even these weird data input formats are fixed, so we can fix up a list of handy data quality conversion functions in R for reference.

 

After much muddling about with coverting internet formats (or data used in web analytics) (mostly time formats without date like 00:35:23)  into data frame numeric formats, I found that the way to handle Date-Time conversions in R is

Dataset$Var2= strptime(as.character(Dataset$Var1),”%M:%S”)

The problem with this approach is you will get the value as a Date Time format (02/31/2012 04:00:45-  By default R will add today’s date to it.)  while you are interested in only Time Durations (4:00:45 or actually just the equivalent in seconds).

this can be handled using the as.difftime function

dataset$Var2=as.difftime(paste(dataset$Var1))

or to get purely numeric values so we can do numeric analysis (like summary)

dataset$Var2=as.numeric(as.difftime(paste(dataset$Var1)))

(#Maybe there is  a more elegant way here- but I dont know)

The kind of data is usually one we get in web analytics for average time on site , etc.

 

 

 

 

 

and

for factor variables

Dataset$Var2= as.numeric(as.character(Dataset$Var1))

 

or

Dataset$Var2= as.numeric(paste(Dataset$Var1))

 

Slight problem is suppose there is data like 1,504 – it will be converted to NA instead of 1504

The way to solve this is use the nice gsub function ONLy on that variable. Since the comma is also the most commonly used delimiter , you dont want to replace all the commas, just only the one in that variable.

 

dataset$Variable2=as.numeric(paste(gsub(“,”,””,dataset$Variable)))

 

Now lets assume we have data in the form of % like 0.00% , 1.23%, 3.5%

again we use the gsub function to replace the % value in the string with  (nothing).

 

dataset$Variable2=as.numeric(paste(gsub(“%”,””,dataset$Variable)))

 

 

If you simply do the following for a factor variable, it will show you the level not the value. This can create an error when you are reading in CSV data which may be read as character or factor data type.

Dataset$Var2= as.numeric(Dataset$Var1)

An additional way is to use substr (using substr( and concatenate (using paste) for manipulating string /character variables.

 

iris$sp=substr(iris$Species,1,3) –will reduce the famous Iris species into three digits , without losing any analytical value.

The other issue is with missing values, and na.rm=T helps with getting summaries of numeric variables with missing values, we need to further investigate how suitable, na.omit functions are for domains which have large amounts of missing data and need to be treated.

 

 

Analytics 2012 Conference

A nice conference from the grand old institution of Analytics,  SAS  Institute’s annual analytic pow-wow.

I especially like some of the trainings- and wonder if they could be stored as e-learning modules for students/academics to review

in SAS’s extensive and generous Online Education Program.

Sunday Morning Workshop

SAS Sentiment Analysis Studio: Introduction to Building Models

This course provides an introduction to SAS Sentiment Analysis Studio. It is designed for system designers, developers, analytical consultants and managers who want to understand techniques and approaches for identifying sentiment in textual documents.
View outline
Sunday, Oct. 7, 8:30a.m.-12p.m. – $250

Sunday Afternoon Workshops

Business Analytics Consulting Workshops

This workshop is designed for the analyst, statistician, or executive who wants to discuss best-practice approaches to solving specific business problems, in the context of analytics. The two-hour workshop will be customized to discuss your specific analytical needs and will be designed as a one-on-one session for you, including up to five individuals within your company sharing your analytical goal. This workshop is specifically geared for an expert tasked with solving a critical business problem who needs consultation for developing the analytical approach required. The workshop can be customized to meet your needs, from a deep-dive into modeling methods to a strategic plan for analytic initiatives. In addition to the two hours at the conference location, this workshop includes some advanced consulting time over the phone, making it a valuable investment at a bargain price.
View outline
Sunday, Oct. 7; 1-3 p.m. or 3:30-5:30 p.m. – $200

Demand-Driven Forecasting: Sensing Demand Signals, Shaping and Predicting Demand

This half-day lecture teaches students how to integrate demand-driven forecasting into the consensus forecasting process and how to make the current demand forecasting process more demand-driven.
View outline
Sunday, Oct. 7; 1-5 p.m.

Forecast Value Added Analysis

Forecast Value Added (FVA) is the change in a forecasting performance metric (such as MAPE or bias) that can be attributed to a particular step or participant in the forecasting process. FVA analysis is used to identify those process activities that are failing to make the forecast any better (or might even be making it worse). This course provides step-by-step guidelines for conducting FVA analysis – to identify and eliminate the waste, inefficiency, and worst practices from your forecasting process. The result can be better forecasts, with fewer resources and less management time spent on forecasting.
View outline
Sunday, Oct. 7; 1-5 p.m.

SAS Enterprise Content Categorization: An Introduction

This course gives an introduction to methods of unstructured data analysis, document classification and document content identification. The course also uses examples as the basis for constructing parse expressions and resulting entities.
View outline
Sunday, Oct. 7; 1-5 p.m.

 

 
You can see more on this yourself at –

http://www.sas.com/events/analytics/us/