There are multiple packages in R to read data straight from online datasets.
These are as follows- Continue reading “Using #Rstats for online data access”
Tag: Time series
England rule India- again
If you type the words “business intelligence expert” in Google. you may get the top ranked result as http://goo.gl/pCqUh or Peter James Thomas, a profound name as it can be as it spans three of the most important saints in the church.
The current post for this is very non business -intelligence topic called Wager. http://peterjamesthomas.com/2011/07/20/wager/
It details how Peter, a virtual friend whom I have never met, and who looks suspiciously like Hugh Grant with the hair, and Ajay Ohri (myself) waged a wager on which cricket team would emerge victorious in the ongoing test series . It was a 4 match series, and India needed to win atleast the series or avoid losing it by a difference of 2, to retain their world cricket ranking (in Tests) as number 1.
Sadly at the end of the third test, the Indian cricket team have lost the series, the world number 1 ranking, and some serious respect by 3-0.
What is a Test Match? It is a game of cricket played over 5 days.
Why was Ajay so confident India would win. Because India won the one day world championship this April 2011. The one day series is a one day match of cricket.
There lies the problem. From an analytic point of view, I had been lulled into thinking that past performance was an indicator of future performance, indeed the basis of most analytical assumptions. Quite critically, I managed to overlook the following cricketing points-
1) Cricket performance is different from credit performance. It is the people and their fitness.
India’s strike bowler Zaheer Khan was out due to injury, we did not have any adequate replacement for him. India’s best opener Virender Sehwag was out due to shoulder injury in the first two tests.
Moral – Statistics can be misleading if you do not apply recent knowledge couple with domain expertise (in this case cricket)
2) What goes up must come down. Indeed if a team has performed its best two months back, it is a good sign that cyclicality will ensure performance will go down.
Moral- Do not depend on regression or time series with ignoring cyclical trends.
3) India’s cricket team is aging. England ‘s cricket team is youthful.
I should have gotten this one right. One of the big and understated reasons that the Indian economy is booming -is because we have the youngest population in the world with a median age of 28.
or as http://en.wikipedia.org/wiki/Demographics_of_India
India has more than 50% of its population below the age of 25 and more than 65% hovers below the age of 35. It is expected that, in 2020, the average age of an Indian will be 29 years, compared to 37 for China and 48 for Japan; and, by 2030, India’s dependency ratio should be just over 0.4
India’s population is 1.21 billion people, so potentially a much larger pool of athletes , once we put away our laptops that is.
http://en.wikipedia.org/wiki/Demographics_of_UK
the total population of the United Kingdom was 58,789,194 (I dont have numbers for average age)
Paradoxically India have the oldest cricket team in the world . This calls for detailed investigation and some old timers should give way to new comers after this drubbing.
Moral- Demographics matters. It is the people who vary more than any variable.
4) The Indian cricket team has played much less Test cricket and much more 20:20 and one day matches. 20:20 is a format in which only twenty overs are bowled per side. In Test Matches 90 overs are bowled every day for 5 days.
Stamina is critical in sports.
Moral- Context is important in extrapolating forecasts.
Everything said and done- the English cricket team played hard and fair and deserve to be number ones. I would love to say more on the Indian cricket team, but I now intend to watch Manchester United play soccer.
Credit Downgrade of USA and Triple A Whining
As a person trained , deployed and often asked to comment on macroeconomic shenanigans- I have the following observations to make on the downgrade of US Debt by S&P
1) Credit rating is both a mathematical exercise of debt versus net worth as well as intention to repay. Given the recent deadlock in United States legislature on debt ceiling, it is natural and correct to assume that holding US debt is slightly more risky in 2011 as compared to 2001. That means if the US debt was AAA in 2001 it sure is slightly more risky in 2011.
2) Politicians are criticized the world over in democracies including India, UK and US. This is natural , healthy and enforced by checks and balances by constitution of each country. At the time of writing this, there are protests in India on corruption, in UK on economic disparities, in US on debt vs tax vs spending, Israel on inflation. It is the maturity of the media as well as average educational level of citizenry that amplifies and inflames or dampens sentiment regarding policy and business.
![]()
3) Conspicuous consumption has failed both at an environmental and economic level. Cheap debt to buy things you do not need may have made good macro economic sense as long as the things were made by people locally but that is no longer the case. Outsourcing is not all evil, but it sure is not a perfect solution to economics and competitiveness. Outsourcing is good or outsourcing is bad- well it depends.
4) In 1944 , the US took debt to fight Nazism, build atomic power and generally wage a lot of war and lots of dual use inventions. In 2004-2010 the US took debt to fight wars in Iraq, Afghanistan and bail out banks and automobile companies. Some erosion in the values represented by a free democracy has taken place, much to the delight of authoritarian regimes (who have managed to survive Google and Facebook).
5) A Double A rating is still quite a good rating. Noone is moving out of the US Treasuries- I mean seriously what are your alternative financial resources to park your government or central bank assets, euro, gold, oil, rare earth futures, metals or yen??
6) Income disparity as a trigger for social unrest in UK, France and other parts is an ominous looming threat that may lead to more action than the poor maths of S &P. It has been some time since riots occured in the United States and I believe in time series and cycles especially given the rising Gini coefficients .
Gini indices for the United States at various times, according to the US Census Bureau:[8][9][10]
- 1929: 45.0 (estimated)
- 1947: 37.6 (estimated)
- 1967: 39.7 (first year reported)
- 1968: 38.6 (lowest index reported)
- 1970: 39.4
- 1980: 40.3
- 1990: 42.8
- (Recalculations made in 1992 added a significant upward shift for later values)
- 2000: 46.2
- 2005: 46.9
- 2006: 47.0 (highest index reported)
- 2007: 46.3
- 2008: 46.69
- 2009: 46.8
7) Again I am slightly suspicious of an American Corporation downgrading the American Governmental debt when it failed to reconcile numbers by 2 trillion and famously managed to avoid downgrading Lehman Brothers. What are the political affiliations of the S &P board. What are their backgrounds. Check the facts, Watson.
The Chinese government should be concerned if it is holding >1000 tonnes of Gold and >1 trillion plus of US treasuries lest we have a third opium war (as either Gold or US Treasuries will burst)
. Opium in 1850 like the US Treasuries in 2010 have no inherent value except for those addicted to them.
8 ) Ron Paul and Paul Krugman are the two extremes of economic ideology in the US.
Reminds me of the old saying- Robbing Peter to pay Paul. Both the Pauls seem equally unhappy and biased.
I have to read both WSJ and NYT to make sense of what actually is happening in the US as opinionated journalism has managed to elbow out fact based journalism. Do we need analytics in journalism education/ reporting?
9) Panic buying and selling would lead to short term arbitrage positions. People like W Buffet made more money in the crash of 2008 than people did in the boom years of 2006-7
If stocks are cheap- buy. on the dips. Acquire companies before they go for IPOs. Go buy your own stock if you are sitting on a pile of cash. Buy some technology patents in cloud , mobile, tablet and statistical computing if you have a lot of cash and need to buy some long term assets.
10) Follow all advice above at own risk and no liability to this author 😉
Why open source companies dont dance?
I have been pondering on this seemingly logical paradox for some time now-
1) Why are open source solutions considered technically better but not customer friendly.
2) Why do startups and app creators in social media or mobile get much more press coverage than
profitable startups in enterprise software.
3) How does tech journalism differ in covering open source projects in enterprise versus retail software.
4) What are the hidden rules of the game of enterprise software.
Some observations-
1) Open source companies often focus much more on technical community management and crowd sourcing code. Traditional software companies focus much more on managing the marketing community of customers and influencers. Accordingly the balance of power is skewed in favor of techies and R and D in open source companies, and in favor of marketing and analyst relations in traditional software companies.
Traditional companies also spend much more on hiring top notch press release/public relationship agencies, while open source companies are both financially and sometimes ideologically opposed to older methods of marketing software. The reverse of this is you are much more likely to see Videos and Tutorials by an open source company than a traditional company. You can compare the websites of Cloudera, DataStax, Hadapt ,Appistry and Mapr and contrast that with Teradata or Oracle (which has a much bigger and much more different marketing strategy.
Social media for marketing is also more efficiently utilized by smaller companies (open source) while bigger companies continue to pay influential analysts for expensive white papers that help present the brand.
Lack of budgets is a major factor that limits access to influential marketing for open source companies particularly in enterprise software.
2 and 3) Retail software is priced at 2-100$ and sells by volume. Accordingly technology coverage of these software is based on volume.
Enterprise software is much more expensively priced and has much more discreet volume or sales points. Accordingly the technology coverage of enterprise software is more discreet, in terms of a white paper coming every quarter, a webinar every month and a press release every week. Retail software is covered non stop , but these journalists typically do not charge for “briefings”.
Journalists covering retail software generally earn money by ads or hosting conferences. So they have an interest in covering new stuff or interesting disruptive stuff. Journalists or analysts covering enterprise software generally earn money by white papers, webinars, attending than hosting conferences, writing books. They thus have a much stronger economic incentive to cover existing landscape and technologies than smaller startups.
4) What are the hidden rules of the game of enterprise software.
- It is mostly a white man’s world. this can be proved by statistical demographic analysis
- There is incestuous intermingling between influencers, marketers, and PR people. This can be proved by simple social network analysis of who talks to who and how much. A simple time series between sponsorship and analysts coverage also will prove this (I am working on quantifying this ).
- There are much larger switching costs to enterprise software than retail software. This leads to legacy shoddy software getting much chances than would have been allowed in an efficient marketplace.
- Enterprise software is a less efficient marketplace than retail software in all definitions of the term “efficient markets”
- Cloud computing, and SaaS and Open source threatens to disrupt the jobs and careers of a large number of people. In the long term, they will create many more jobs, but in the short term, people used to comfortable living of enterprise software (making,selling,or writing) will actively and passively resist these changes to the paradigms in the current software status quo.
- Open source companies dont dance and dont play ball. They prefer to hire 4 more college grads than commission 2 more white papers.
and the following with slight changes from a comment I made on a fellow blog-
- While the paradigm on how to create new software has evolved from primarily silo-driven R and D departments to a broader collaborative effort, the biggest drawback is software marketing has not evolved.
- If you want your own version of the open source community editions to be more popular, some standardization is necessary for the corporate decision makers, and we need better marketing paradigms.
- While code creation is crowdsourced, solution implementation cannot be crowdsourced. Customers want solutions to a problem not code.
- Just as open source as a production and licensing paradigm threatens to disrupt enterprise software, it will lead to newer ways to marketing software given the hostility of existing status quo.
Lovely forecasting blog
I really loved this simple, smart and yet elegant explanation of forecasting. even a high school quarterback could understand it, and maybe get a internship job building and running and re running code for Mars shot.
Despite my plea that you remain svelte in real life, I implore you to be naïve in business forecasting – and use a naïve forecasting model early and often. A naïve forecasting model is the most important model you will ever use in business forecasting.
and now the killer line
Purists may argue that the only true naïve forecast is the “no-change” forecast, meaning either a random walk (forecast = last known actual) or a seasonal random walk (e.g. forecast = actual from corresponding period last year). These are referred to as NF1 and NF2 in the Makridakis text (where NF = Naïve Forecast). In our 2006 SAS webseries Finding Flaws in Forecasting, an attendee asked “What about using a simple time series forecast with no intervention as the naïve forecast?” Is that allowed?
i did write a blog article on forecasting some time back, but back then I was a little blogger, with the website name being http://iwannacrib.com
great work in helping make forecasting easier to understand for people who have flower shops and dont have a bee, to help them with the forecasts, nor an geeky email list, not 4000$.
make it easier for the little guy to forecast his sales, so he cuts down on his supply chain inventory, lowering his carbon footprint.
Blog.sas.com take a bow, on labour day, helping workers with easy to understand models.
http://blogs.sas.com/forecasting/index.php?/archives/68-Which-Naive-Model-to-Use.html
Broad Guidelines for Graphs
Here are some broad guidelines for Graphs from EIA.gov , so you can say these are the official graphical guidelines of USA Gov
They can be really useful for sites planning to get into the Tableau Software/NYT /Guardian Infographic mode- or even for communities of blogs that have recurrent needs to display graphical plots- particularly since communication, statistical and design specialists are different areas/expertise/people.
Energy Information Administration Standard
http://www.eia.gov/about/eia_standards.cfm#Standard25
Energy Information Administration Standard 2009-25
Title: Statistical Graphs
Superseded Version: Standard 2002-25
Purpose: To ensure the utility (usefulness to intended users) and objectivity (accuracy, clarity, completeness, and lack of bias) of energy information presented in statistical graphs.
Applicability: All EIA information products.
Required Actions:
- Graphs should be used to show and compare changes, trends and/or relationships, and to assist users in visualizing the conclusions drawn from the data represented.
- A graph should contain sufficient Continue reading “Broad Guidelines for Graphs”
Top Ten Business Analytics Graphs-Line Charts (2/10)
Variations on the line graph can include fan charts in time series which include joining line chart of historic data with ranges of future projections. Another common variation is to plot the linear regression or trend line between the two variables and superimpose it on the graph.
The slope of the line chart shows the rate of change at that particular point , and can also be used to highlight areas of discontinuity or irregular change between two variables.
The basic syntax of line graph is created by first using Plot() function to plot the points and then lines () function to plot the lines between the points.
> str(cars)
‘data.frame’: 50 obs. of 2 variables:
$ speed: num 4 4 7 7 8 9 10 10 10 11 …
$ dist : num 2 10 4 22 16 10 18 26 34 17 …
> plot(cars)
> lines(cars,type=”o”, pch=20, lty=2, col=”green”)
> title(main=”Example Automobiles”, col.main=”blue”, font.main=2)
An example of Time Series Forecasting graph or fan chart is http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=51
Related Articles
- Top Ten Graphs for Business Analytics -Pie Charts (1/10) (decisionstats.com)


