Hacker Alert- Darpa project 10$ K for summer

If you bleed red,white and blue and know some geo-spatial analysis ,social network analysis and some supervised and unsupervised learning (and unlearning)- here is a chance for you to put your skills for an awesome project


from wired-



For this challenge, Darpa will lodge a selected six to eight teams at George Mason University and provide them with an initial $10,000 for equipment and access to unclassified data sets including “ground-level video of human activity in both urban and rural environments; high-resolution wide-area LiDAR of urban and mountainous terrain, wide-area airborne full motion video; and unstructured amateur photos and videos, such as would be taken from an adversary’s cell phone.” However, participants are encouraged to use any open sourced, legal data sets they want. (In the hackathon spirit, we would encourage the consumption of massive quantities of pizza and Red Bull, too.)


DARPA Innovation House Project

Home | Data Access | Awards | Team Composition | Logisitics | Deliverables | Proposals | Evaluation Criteria | FAQ


Proposals must be one to three pages. Team resumes of any length must be attached and do not count against the page limit. Proposals must have 1-inch margins, use a font size of at least 11, and be delivered in Microsoft Word or Adobe PDF format.

Proposals must be emailed to InnovationHouse@c4i.gmu.edu by 4:00PM ET on Tuesday, July 31, 2012.

Proposals must have a Title and contain at least the following sections with the following contents.

  1. Team Members

Each team member must be listed with name, email and phone.
The Lead Developer should be indicated.
The statement “All team members are proposed as Key Personnel.” must be included.

  1. Capability Description

The description should clearly explain what capability the software is designed to provide the user, how it is proposed to work, and what data it will process.

In addition, a clear argument should be made as to why it is a novel approach that is not incremental to existing methods in the field.

  1. Proposed Phase 1 Demonstration

This section should clearly explain what will be demonstrated at the end of Session I. The description should be expressive, and as concrete as possible about the nature of the designs and software the team intends to produce in Session I.

  1. Proposed Phase 2 Demonstration

This section should clearly explain how the final software capability will be demonstrated as quantitatively as possible (for example, positing the amount of data that will be processed during the demonstration), how much time that will take, and the nature of the results the processing aims to achieve.

In addition, the following sections are optional.

  1. Technical Approach

The technical approach section amplifies the Capability Description, explaining proposed algorithms, coding practices, architectural designs and/or other technical details.

  1. Team Qualifications

Team qualifications should be included if the team?s experience base does not make it obvious that it has the potential to do this level of software development. In that case, this section should make a credible argument as to why the team should be considered to have a reasonable chance of completing its goals, especially under the tight timelines described.

Other sections may be included at the proposers? discretion, provided the proposal does not exceed three pages.







Revolution R Enterprise 6.0 launched!

Just got the email-more software is good news!

Revolution R Enterprise 6.0 for 32-bit and 64-bit Windows and 64-bit Red Hat Enterprise Linux (RHEL 5.x and RHEL 6.x) features an updated release of the RevoScaleR package that provides fast, scalable data management and data analysis: the same code scales from data frames to local, high-performance .xdf files to data distributed across a Windows HPC Server cluster or IBM Platform Computing LSF cluster.  RevoScaleR also allows distribution of the execution of essentially any R function across cores and nodes, delivering the results back to the user.

Detailed information on what’s new in 6.0 and known issues:

and from the manual-lots of function goodies for Big Data


  • IBM Platform LSF Cluster support [Linux only]. The new RevoScaleR function, RxLsfCluster, allows you to create a distributed compute context for the Platform LSF workload manager.
  •  Azure Burst support added for Microsoft HPC Server [Windows only]. The new RevoScaleR function, RxAzureBurst, allows you to create a distributed compute context to have computations performed in the cloud using Azure Burst
  • The rxExec function allows distributed execution of essentially any R function across cores and nodes, delivering the results back to the user.
  • functions RxLocalParallel and RxLocalSeq allow you to create compute context objects for local parallel and local sequential computation, respectively.
  • RxForeachDoPar allows you to create a compute context using the currently registered foreach parallel backend (doParallel, doSNOW, doMC, etc.). To execute rxExec calls, simply register the parallel backend as usual, then set your compute context as follows: rxSetComputeContext(RxForeachDoPar())
  • rxSetComputeContext and rxGetComputeContext simplify management of compute contexts.
  • rxGlm, provides a fast, scalable, distributable implementation of generalized linear models. This expands the list of full-featured high performance analytics functions already available: summary statistics (rxSummary), cubes and cross tabs (rxCube,rxCrossTabs), linear models (rxLinMod), covariance and correlation matrices (rxCovCor),
    binomial logistic regression (rxLogit), and k-means clustering (rxKmeans)example: a Tweedie family with 1 million observations and 78 estimated coefficients (categorical data)
    took 17 seconds with rxGlm compared with 377 seconds for glm on a quadcore laptop


    and easier working with R’s big brother SAS language


    RevoScaleR high-performance analysis functions will now conveniently work directly with a variety of external data sources (delimited and fixed format text files, SAS files, SPSS files, and ODBC data connections). New functions are provided to create data source objects to represent these data sources (RxTextData, RxOdbcData, RxSasData, and RxSpssData), which in turn can be specified for the ‘data’ argument for these RevoScaleR analysis functions: rxHistogramrxSummary, rxCube, rxCrossTabs, rxLinMod, rxCovCor, rxLogit, and rxGlm.


    you can analyze a SAS file directly as follows:

    # Create a SAS data source with information about variables and # rows to read in each chunk

    sasDataFile <- file.path(rxGetOption(“sampleDataDir”),”claims.sas7bdat”)
    sasDS <- RxSasData(sasDataFile, stringsAsFactors = TRUE,colClasses = c(RowNum = “integer”),rowsPerRead = 50)

    # Compute and draw a histogram directly from the SAS file
    rxHistogram( ~cost|type, data = sasDS)
    # Compute summary statistics
    rxSummary(~., data = sasDS)
    # Estimate a linear model
    linModObj <- rxLinMod(cost~age + car_age + type, data = sasDS)
    # Import a subset into a data frame for further inspection
    subData <- rxImport(inData = sasDS, rowSelection = cost > 400,
    varsToKeep = c(“cost”, “age”, “type”))


The installation instructions and instructions for getting started with Revolution R Enterprise & RevoDeployR for Windows: http://www.revolutionanalytics.com/downloads/instructions/windows.php

Facebook Search- The fall of the machines

Increasingly I am beginning to search more and more on Facebook. This is for the following reasons-

1) Facebook is walled off to Google (mostly). While within Facebook , I get both people results and content results (from Bing).

Bing is an okay alternative , though not as fast as Google Instant.

2) Cleaner Web Results When Facebook increases the number of results from 3 top links to say 10 top links, there should be more outbound traffic from FB search to websites.For some reason Google continues to show 14 pages of results… Why? Why not limit to just one page.

3) Better People Search than  Pipl and Google. But not much (or any) image search. This is curious and I am hoping the Instagram results would be added to search results.

4) I am hoping for any company Facebook or Microsoft to challenge Adsense . Adwords already has rivals. Adsense is a de facto monopoly and my experiences in advertising show that content creators can make much more money from a better Adsense (especially ) if Adsense and Adwords do not have a conflict of interest from same advertisers.

Adwords should have been a special case of Adsense for Google.com but it is not.

5) Machine learning can only get you from tau to delta tau. When ad click behavior is inherently dependent on humans who behave mostly on chaotic , or genetic models than linear CPC models. I find FB has an inherent advantage in the quantity and quality of data collected on people behavior rather than click behavior. They are also more aggressive and less apologetic about behavorially targeted  ads.

Additional point- Analytics for Google Analytics is not as rich as analytics from Facebook pages in terms of demographic variables. This can be tested by anyone.