#Rstats gets into Enterprise Cloud Software

Defense Agencies of the United States Departme...
Image via Wikipedia

Here is an excellent example of how websites should help rather than hinder new customers take a demo of the software without being overwhelmed by sweet talking marketing guys who dont know the difference between heteroskedasticity, probability, odds and likelihood.

It is made by Zementis (Dr Michael Zeller has been a frequent guest here) and Revolution Analytics is still the best shot in Enterprise software for #Rstats

Now if only Revo could get into the lucrative Department of Energy or Department of Defense business- they could change the world AND earn some more revenue than they have been doing. But seriously.

Check out http://deployr.revolutionanalytics.com/zementis/ and play with it. or better still mash it with some data viz and ROC curves.- or extend it with some APIS 😉

Jim Goodnight for US Senate: Op Ed

Jim Goodnight, Chief Executive Officer, SAS, U...
Image via Wikipedia

This is NOT an April fool joke or a publicity stunt. It is also not meant to provoke discussion for the sake of provocation.

For a time, as I have studied both US and India , in what makes Government work or fail, academia work or fail, or businesses to work or fail- a common thread is the quality of people involved. Someone who is a wasteful businessman, will be a wasteful politician. Someone who is a flamboyant businessman with flair more than substance will continue that in public life.
Accordingly I have created a Facebook cause-

Jim Goodnight for the US Senate

http://www.causes.com/causes/600220-jim-goodnight-for-the-us-senate

If Donald Trump can run for President, I can think of no one who has done more for the American South. Unlike the tech heavy, Stanford dominated boom in California, the Mid West and South have been declining centers of influence. Cities like Austin Texas or Raleigh, North California are the exception rather than norm there. A friend who went to Duke once told me, the worst thing is to be borne a rural white male who is poor in America. There are no groups lobbying for education or internet hi fi blazing speeds for you. Socially you are expected to walk and thrive alone.

The Southern Baptist Church has managed to infiltrate and influence young minds there- the average conservative American seemed better off and happier in his moderated social behaviour. But the Church exacts a 10 % tithe, and it is efficient in stretching every dollar and every cent of church donations. Government works with the best intentions, but spending someone else’s money (your tax money money by a bureaucrat) is always more inefficient than the actual owner spending it alone. Taxes are higher than the 10 % tithe and seem to accomplish much less social change. You would rather go to work or go to war?

Accordingly I find that on the West Coast there are very few tech savvy leaders with a track record of both fiscal pragmatism, educational reform and job creation. Certainly the industry lobbyist is smarter at evading taxes than the average Joe, and campaign financing is still dependent on deep pockets despite the innovations of internet retail fund raising.

Would you like your Senator to be as considerate of creating jobs as entrepreneurs are. Jim Goodnight here is a metaphor for all entrepreneurs who dont believe in reckless hire-fire,outsourcing and long term views on people.

Click here to spread this cause- perhaps it will make existing politicians more efficient just by the threat of new competition.

http://www.causes.com/causes/600220-jim-goodnight-for-the-us-senate?recruiter_id=8347178



Save the Data

Breakdown of political party representation in...
Image via Wikipedia

I just read an online cause here-

http://sunlightfoundation.com/savethedata/

Some of the most important technology programs that keep Washington accountable are in danger of being eliminated. Data.gov, USASpending.gov, the IT Dashboard and other federal data transparency and government accountability programs are facing a massive budget cut, despite only being a tiny fraction of the national budget. Help save the data and make sure that Congress doesn’t leave the American people in the dark.

I wonder why the federal government/ non profit agencies can help create a SPARQL database, and in days of cloud computing, why a tech major cannot donate storage space to it, after all despite US corporate tax rate being high, US technological companies do end up paying a lower rate thanks to tax breaks/routing overseas revenue.

In the new age data is power, and the US has led in its mission to use technology to further its own values even especially in Middle East. The datasets should be made public and transitioned to the private sector/academia for research and re designing for data augmentation with out straining the massive deficit /borrowing/ fighting 3 wars. Of particular interest would be datasets of campaign finances  and donors especially given large number of retail/small donors/internet marketing in elections as it will also help serve as an example of democracy and change. Even countries like China can create a corruption/expense efficiency tracking internal dashboard with restricted rights to help with rural and urban governance.

Norway Supreme Court orders SAS to pay damages in data espionage case

Check out the details from

Norway Supreme Court orders SAS to pay damages in data espionage case

SAS said the Supreme Court of Norway ordered it Thursday to pay NOK160 million ($27.4 million) to Norwegian Air Shuttle, likely bringing to a conclusion the corporate espionage case in which SAS Norge was found to have improperly accessed and used data in Norwegian’s reservation system. Earlier this year…

http://atwonline.com/international-aviation-regulation/news/norway-supreme-court-orders-sas-pay-damages-data-espionage-ca

Unbelievable stuff!

Also check out Jim Goodnight‘s remarks

http://www.businessleader.com/raleighdurham/Index.aspx?page=impact&PID=387&impactTitle=Business+Leader+of+the+Year

Midway Airlines

When Goodnight spots a problem, he fixes it, in the most direct way possible. So when he heard that Midway Airlines was in trouble, he didn’t hesitate. Especially when he learned that an investment group was interested in buying the airline and moving the hub to another location. He led the investment group that bailed it out for $22 million.

“I just felt it would be a blow to our area to lose its major airline,” Goodnight says. “I looked back to when American had its hub here and we could get anywhere pretty easily. I really wanted that to continue. So we stepped up to the plate.”

They brought in a new CEO, Robert Ferguson, who was responsible, says Goodnight, for bringing Continental Airlines out of bankruptcy. They then took the airline to Wall Street, where public investors kicked in $75 million, $42 million of it to Midway, through an initial public offering.

As of mid-November, Midway Airlines and its commuter partner will operate 218 daily departures between Raleigh-Durham and 25 destinations in 14 states and the District of Columbia. The fleet includes 15 new CRJ aircraft and eight Fokker F100s, and averages less than three years of age ranking it among the youngest in the industry. In addition, Midway recently announced firm orders for 17 Boeing 737-700 aircraft. The first delivery will take place in December 1999.

Computer Education grants from Google

Image representing Google as depicted in Crunc...
Image via CrunchBase

message from the official google blog-

http://googleblog.blogspot.com/2011/01/supporting-computer-science-education.html

With programs like Computer Science for High School (CS4HS), we hope to increase the number of CS majors —and therefore the number of people entering into careers in CS—by promoting computer science curriculum at the high school level.

For the fourth consecutive year, we’re funding CS4HS to invest in the next generation of computer scientists and engineers. CS4HS is a workshop for high school and middle school computer science teachers that introduces new and emerging concepts in computing and provides tips, tools and guidance on how to teach them. The ultimate goals are to “train the trainer,” develop a thriving community of high school CS teachers and spread the word about the awe and beauty of computing.

If you’re a university, community college, or technical School in the U.S., Canada, Europe, Middle East or Africa and are interested in hosting a workshop at your institution, please visit www.cs4hs.com to submit an application for grant funding.Applications will be accepted between January 18, 2011 and February 18, 2011.

In addition to submitting your application, on the CS4HS website you’ll find info on how to organize a workshop, as well as websites and agendas from last year’s participants to give you an idea of how the workshops were structured in the past. There’s also a collection ofCS4HS curriculum modules that previous participating schools have shared for future organizers to use in their own program.

Challenges of Analyzing a dataset (with R)

GIF-animation showing a moving echocardiogram;...
Image via Wikipedia

Analyzing data can have many challenges associated with it. In the case of business analytics data, these challenges or constraints can have a marked effect on the quality and timeliness of the analysis as well as the expected versus actual payoff from the analytical results.

Challenges of Analytical Data Processing-

1) Data Formats- Reading in complete data, without losing any part (or meta data), or adding in superfluous details (that increase the scope). Technical constraints of data formats are relatively easy to navigate thanks to ODBC and well documented and easily search-able syntax and language.

The costs of additional data augmentation (should we pay for additional credit bureau data to be appended) , time of storing and processing the data (every column needed for analysis can add in as many rows as whole dataset, which can be a time enhancing problem if you are considering an extra 100 variables with a few million rows), but above all that of business relevance and quality guidelines will ensure basic data input and massaging are considerable parts of whole analytical project timeline.

2) Data Quality-Perfect data exists in a perfect world. The price of perfect information is one business will mostly never budget or wait for. To deliver inferences and results based on summaries of data which has missing, invalid, outlier data embedded within it makes the role of an analyst just as important as which ever tool is chosen to remove outliers, replace missing values, or treat invalid data.

3) Project Scope-

How much data? How much Analytical detail versus High Level Summary? Timelines for delivery as well as refresh of data analysis? Checks (statistical as well as business)?

How easy is it to load and implement the new analysis in existing Information Technology Infrastructure? These are some of the outer parameters that can limit both your analytical project scope, your analytical tool choice, and your processing methodology.
4) Output Results vis a vis stakeholder expectation management-

Stakeholders like to see results, not constraints, hypothesis ,assumptions , p-value, or chi -square value. Output results need to be streamlined to a decision management process to justify the investment of human time and effort in an analytical project, choice,training and navigating analytical tool complexities and constraints are subset of it. Optimum use of graphical display is a part of aligning results to a more palatable form to stakeholders, provided graphics are done nicely.

Eg Marketing wants to get more sales so they need a clear campaign, to target certain customers via specific channels with specified collateral. In order to base their business judgement, business analytics needs to validate , cross validate and sometimes invalidate this business decision making with clear transparent methods and processes.

Given a dataset- the basic analytical steps that an analyst will do with R are as follows. This is meant as a note for analysts at a beginner level with R.

Package -specific syntax

update.packages() #This updates all packages
install.packages(package1) #This installs a package locally, a one time event
library(package1) #This loads a specified package in the current R session, which needs to be done every R session

CRAN________LOCAL HARD DISK_________R SESSION is the top to bottom hierarchy of package storage and invocation.

ls() #This lists all objects or datasets currently active in the R session

> names(assetsCorr)  #This gives the names of variables within a dataframe
[1] “AssetClass”            “LargeStocksUS”         “SmallStocksUS”
[4] “CorporateBondsUS”      “TreasuryBondsUS”       “RealEstateUS”
[7] “StocksCanada”          “StocksUK”              “StocksGermany”
[10] “StocksSwitzerland”     “StocksEmergingMarkets”

> str(assetsCorr) #gives complete structure of dataset
‘data.frame’:    12 obs. of  11 variables:
$ AssetClass           : Factor w/ 12 levels “CorporateBondsUS”,..: 4 5 2 6 1 12 3 7 11 9 …
$ LargeStocksUS        : num  15.3 16.4 1 0 0 …
$ SmallStocksUS        : num  13.49 16.64 0.66 1 0 …
$ CorporateBondsUS     : num  9.26 6.74 0.38 0.46 1 0 0 0 0 0 …
$ TreasuryBondsUS      : num  8.44 6.26 0.33 0.27 0.95 1 0 0 0 0 …
$ RealEstateUS         : num  10.6 17.32 0.08 0.59 0.35 …
$ StocksCanada         : num  10.25 19.78 0.56 0.53 -0.12 …
$ StocksUK             : num  10.66 13.63 0.81 0.41 0.24 …
$ StocksGermany        : num  12.1 20.32 0.76 0.39 0.15 …
$ StocksSwitzerland    : num  15.01 20.8 0.64 0.43 0.55 …
$ StocksEmergingMarkets: num  16.5 36.92 0.3 0.6 0.12 …

> dim(assetsCorr) #gives dimensions observations and variable number
[1] 12 11

str(Dataset) – This gives the structure of the dataset (note structure gives both the names of variables within dataset as well as dimensions of the dataset)

head(dataset,n1) gives the first n1 rows of dataset while
tail(dataset,n2) gives the last n2 rows of a dataset where n1,n2 are numbers and dataset is the name of the object (here a data frame that is being considered)

summary(dataset) gives you a brief summary of all variables while

library(Hmisc)
describe(dataset) gives a detailed description on the variables

simple graphics can be given by

hist(Dataset1)
and
plot(Dataset1)

As you can see in above cases, there are multiple ways to get even basic analysis about data in R- however most of the syntax commands are intutively understood (like hist for histogram, t.test for t test, plot for plot).

For detailed analysis throughout the scope of analysis, for a business analytics user it is recommended to using multiple GUI, and multiple packages. Even for highly specific and specialized analytical tasks it is recommended to check for a GUI that incorporates the required package.

America's Data Book: Census Abstract 2011

U.S. Census Bureau Regions, Partnership and Da...
Image via Wikipedia

An excellent summary of 2011 Census Statistical abstract was given by NYTimes at

http://www.nytimes.com/interactive/2011/01/07/us/CENSUS.html?hp

Like more white people now enjoy jazz than black people now (presumably who have got rap music), but not details enough on ahem country music

The Data book is at http://www.census.gov/compendia/statab/

What is the Statistical Abstract?

The Statistical Abstract of the United States, published since 1878, is the authoritative and comprehensive summary of statistics on the social, political, and economic organization of the United States.

Use the Abstract as a convenient volume for statistical reference, and as a guide to sources of more information both in print and on the Web.

Sources of data include the Census Bureau, Bureau of Labor Statistics, Bureau of Economic Analysis, and many other Federal agencies and private organizations.

Tables of Interest

1060 – Shopping Centers–Number and Gross Leasable Area [Excel 31K] | [PDF 59K]

1170 – Flow of Funds Accounts-Liabilities of Households and Nonprofit Organizations [Excel 41K] | [PDF 66K]

1172 – Amount of Debt Held by Families-Percent Distribution [Excel 29K] | [PDF 66K]

1173 – Ratios of Debt Payments to Family Income [Excel 857K] | [PDF 64K]

327 – Law Enforcement Officers Killed and Assaulted [Excel 34k] | [PDF 468k]

From the last table you can see , while the number of officers killed or feloniously killed decreased by 20% in past five years, the number of officers assaulted by firearms grew by 20% in the same period.