News on R Commercial Development -Rattle- R Data Mining Tool

R RANT- while the European R Core leadership led by the Great Dane, Pierre Dalgaard focuses on the small picture and virtually handing the whole commercial side to Prof Nie and David Smith at Revo Computing other smaller package developers have refused to be treated as cheap R and D developers for enterprise software. How’s the book sales coming along, Prof Peter? Any plans to write another R Book or are you done with writing your version of Mathematica (Ref-Newton). Running the R Core project team must be so hard I recommend the Tarantino movie “Inglorious B…” for Herr Doktors. -END

I believe that individual R Package creators like Prof Harell (Hmisc) , or Hadley Wickham (plyr) deserve a share of the royalties or REVENUE that Revolution Computing, or ANY software company that uses R.

On this note-Some updated news on Rattle the Data Mining Tool created by Dr Graham Williams. Once again R development taken ahead by Down Under chaps while the Big Guys thrash out the road map across the Pond.

Data Mining Resources

Citation –http://datamining.togaware.com/

Rattle is a free and open source data mining toolkit written in the statistical language R using the Gnome graphical interface. It runs under GNU/Linux, Macintosh OS X, and MS/Windows. Rattle is being used in business, government, research and for teaching data mining in Australia and internationally. Rattle can be purchased on DVD (or made available as a downloadable CD image) as a standalone installation for $450USD ($560AUD), using one of the following payment buttons.

The free and open source book, The Data Mining Desktop Survival Guide (ISBN 0-9757109-2-3) simply explains the otherwise complex algorithms and concepts of data mining, with examples to illustrate each algorithm using the statistical language R. The book is being written by Dr Graham Williams, based on his 20 years research and consulting experience in machine learning and data mining. An electronic PDF version is available for a small fee from Togaware ($40AUD/$35USD to cover costs and ongoing development);

Other Resources

  • The Data Mining Software Repository makes available a collection of free (as in libre) open source software tools for data mining
  • The Data Mining Catalogue lists many of the free and commercial data mining tools that are available on the market.
  • The Australasian Data Mining Conferences are supported by Togaware, which also hosts the web site.
  • Information about the Pacific Asia Knowledge Discovery and Data Mining series of conferences is also available.
  • Data Mining course is taught at the Australian National University.
  • See also the Canberra Analytics Practise Group.
  • A Data Mining Course was held at the Harbin Institute of Technology Shenzhen Graduate School, China, 6 December – 13 December 2006. This course introduced the basic concepts and algorithms of data mining from an applications point of view and introduced the use of R and Rattle for data mining in practise.
  • Data Mining Workshop was held over two days at the University of Canberra, 27-28 November, 2006. This course introduced the basic concepts and algorithms for data mining and the use of R and Rattle.

Using R for Data Mining

The open source statistical programming language R (based on S) is in daily use in academia and in business and government. We use R for data mining within the Australian Taxation Office. Rattle is used by those wishing to interact with R through a GUI.

R is memory based so that on 32bit CPUs you are limited to smaller datasets (perhaps 50,000 up to 100,000, depending on what you are doing). Deploying R on 64bit multiple CPU (AMD64) servers running GNU/Linux with 32GB of main memory provides a powerful platform for data mining.

R is open source, thus providing assurance that there will always be the opportunity to fix and tune things that suit our specific needs, rather than rely on having to convince a vendor to fix or tune their product to suit our needs.

Also, by being open source, we can be sure that the code will always be available, unlike some of the data mining products that have disappearded (e.g., IBM’s Intelligent Miner).

See earlier interview-

https://decisionstats.wordpress.com/2009/01/13/interview-dr-graham-williams/

Holiday Fun: Analyzing Facebook Privacy for Ads

So you got a Facebook ID and ticked it in a hurry AND added in your work info. Bad Choice. Even small advertisers like me ( with 225 fans for Decisionstats) can see aggregate numbers of work info BEFORE even advertising.
This can lead to hilarious results-

See Screenshots below- AND note the numbers

1) 400 US females > age 18 work at IBM, SAP, Oracle or Microsoft AND are interested in Women

2) 2940 US females or males > age 18 work at IBM, SAP, Oracle or Microsoft AND are interested in Women

3) 480 US females > age 18 work at IBM, SAP, Oracle or Microsoft AND are interested in Men AND are married

4) 440 US males > age 18 work at IBM, SAP, Oracle or Microsoft AND are interested in Men

5) 40 US males > age 18 work at IBM, SAP, Oracle or Microsoft AND are interested in Men AND are married

[tweetmeme=”decisionstats”]

Interested in males/females while giving out your work info AND your marital status. I hope these are ahem False Positives but seriously do you think these are violations of privacy or not.

Ps- i decided not to advertise after seeing the err statistics.
pps- This is meant to showcase lax ad related privacy for professionals rather than any individual preference or judgment.

Born in the USA?

Here is some econometric search-ing I did

Using Google Public Data-and Wolfram Alpha and The Bureau of Labour Statistics

United States

United States – Monthly Data
Data Series Back
Data
May
2009
June
2009
July
2009
Aug
2009
Sept
2009
Oct
2009
Unemployment Rate (1)
Jump to page with historical data
9.4 9.5 9.4 9.7 9.8 10.2
Change in Payroll Employment (2)
Jump to page with historical data
-303 -463 -304 -154 (P) -219 (P) -190
Average Hourly Earnings (3)
Jump to page with historical data
18.53 18.54 18.59 18.66 (P) 18.67 (P) 18.72
Consumer Price Index (4)
Jump to page with historical data
0.1 0.7 0.0 0.4 0.2 0.3
Producer Price Index (5)
Jump to page with historical data
0.2 1.7 (P) -1.0 (P) 1.7 (P) -0.6 (P) 0.3
U.S. Import Price Index (6)
Jump to page with historical data
1.7 2.7 (R) -0.6 (R) 1.5 (R) 0.2 (R) 0.7
Footnotes
(1) In percent, seasonally adjusted. Annual averages are available for Not Seasonally Adjusted data.
(2) Number of jobs, in thousands, seasonally adjusted.
(3) For production and nonsupervisory workers on private nonfarm payrolls, seasonally adjusted.
(4) All items, U.S. city average, all urban consumers, 1982-84=100, 1-month percent change, seasonally adjusted.
(5) Finished goods, 1982=100, 1-month percent change, seasonally adjusted.
(6) All imports, 1-month percent change, not seasonally adjusted.
(R) Revised
(P) Preliminary
United States – Quarterly Data
Data Series Back
Data
3rd Qtr
2008
4th Qtr
2008
1st Qtr
2009
2nd Qtr
2009
3rd Qtr
2009
Employment Cost Index (1)
Jump to page with historical data
0.6 0.6 0.3 0.4 0.4
Productivity (2)
Jump to page with historical data
-0.1 0.8 0.3 6.9 9.5
Footnotes
(1) Compensation, all civilian workers, quarterly data, 3-month percent change, seasonally adjusted.
(2) Output per hour, nonfarm business, quarterly data, percent change from previous quarter at annual rate, seasonally adjusted.

And also included are the average wages for salary of teachers and average salary per hour of some offshore  prone industries

http://www.bls.gov/oes/2008/may/oes_nat.htm#b25-0000

http://www.bls.gov/oes/2008/may/oes_nat.htm#b11-0000

and

http://www.google.com/publicdata?ds=usunemployment&met=unemployment_rate&idim=state:ST370000:ST540000:ST510000&tdim=true

WHAT THEY PAY TEACHERS (MAY 2008)

Education, Training, and Library Occupations top
Wage Estimates
Occupation Code Occupation Title (click on the occupation title to view an occupational profile) Employment (1) Median Hourly Mean Hourly Mean Annual (2) Mean RSE (3)
25-0000 Education, Training, and Library Occupations 8,451,250 $21.26 $23.30 $48,460 0.5 %
25-1011 Business Teachers, Postsecondary 69,690 (4) (4) $77,340 1.0 %
25-1021 Computer Science Teachers, Postsecondary 32,520 (4) (4) $74,050 1.0 %
25-1022 Mathematical Science Teachers, Postsecondary 45,710 (4) (4) $68,130 0.9 %
25-1031 Architecture Teachers, Postsecondary 6,430 (4) (4) $75,450 1.9 %
25-1032 Engineering Teachers, Postsecondary 32,070 (4) (4) $90,070 1.1 %
25-1041 Agricultural Sciences Teachers, Postsecondary 10,000 (4) (4) $77,770 1.6 %
25-1042 Biological Science Teachers, Postsecondary 51,930 (4) (4) $83,270 2.7 %

WHAT THEY PAY THEMSELVES

Management Occupations top
Wage Estimates
Occupation Code Occupation Title (click on the occupation title to view an occupational profile) Employment (1) Median Hourly Mean Hourly Mean Annual (2) Mean RSE (3)
11-0000 Management Occupations 6,152,650 $42.15 $48.23 $100,310 0.2 %
11-1011 Chief Executives 301,930 $76.23 $77.13 $160,440 0.5 %
11-1021 General and Operations Managers 1,697,690 $44.02 $51.91 $107,970 0.2 %
11-1031 Legislators 64,650 (4) (4) $37,980 1.1 %

and JOBS PRONE TO SHORTAGE /OFFSHORING

Computer and Mathematical Science Occupations top
Wage Estimates
Occupation Code Occupation Title (click on the occupation title to view an occupational profile) Employment (1) Median Hourly Mean Hourly Mean Annual (2) Mean RSE (3)
15-0000 Computer and Mathematical Science Occupations 3,308,260 $34.26 $35.82 $74,500 0.3 %
15-1011 Computer and Information Scientists, Research 26,610 $47.10 $48.51 $100,900 1.1 %
15-1021 Computer Programmers 394,230 $33.47 $35.32 $73,470 0.6 %
15-1031 Computer Software Engineers, Applications 494,160 $41.07 $42.26 $87,900 0.4 %
15-1032 Computer Software Engineers, Systems Software 381,830 $44.44 $45.44 $94,520 0.5 %
15-1041 Computer Support Specialists 545,520 $20.89 $22.29 $46,370 0.3 %
15-1051 Computer Systems Analysts 489,890 $36.30 $37.90 $78,830 0.4 %
15-1061 Database Administrators 115,770 $33.53 $35.05 $72,900 0.8 %
15-1071 Network and Computer Systems Administrators 327,850 $31.88 $33.45 $69,570 0.3 %
15-1081 Network Systems and Data Communications Analysts 230,410 $34.18 $35.50 $73,830 0.4 %
15-1099 Computer Specialists, All Other 191,780 $36.13 $36.54 $76,000 0.5 %
15-2011 Actuaries 18,220 $40.77 $46.14 $95,980 1.4 %
15-2021 Mathematicians 2,770 $45.75 $45.65 $94,960 1.7 %
15-2031 Operations Research Analysts 60,860 $33.17 $35.68 $74,220 0.8 %
15-2041 Statisticians 20,680 $34.91 $35.96 $74,790 1.5 %
15-2091 Mathematical Technicians 1,100 $18.46 $20.24 $42,100 2.7 %
15-2099 Mathematical Science Occupations, All Other 6,600 $26.44 $31.55 $65,630 4.3 %

 

UNEMPLOYED IN THE USA (above)

BY STATE (below)

16 million people out of work. Give or take a million.

How can America pay 5.6 million people UNEMPLOYMENT BENEFITS

Keep another 10 million unemployed,

another 10 million only partially employed.

[tweetmeme source=”decisionstats”]

and still claim aggregate cost savings from offshoring jobs.

Analytics and BI for small biz

I saw a story on Warren B and Goldman S creating a 500$ million pool for small business owners.

  • The program will contribute $200 million to community colleges, universities and other institutions to provide small- business owners with practical business education.

  • Goldman Sachs repaid the $10 billion it was given last year under the taxpayer-funded Troubled Asset Relief Program, plus dividends. The firm continues to benefit from federal guarantees on about $21 billion of long-term debt.

  • Buffett, known as the “Oracle of Omaha” for his investing prowess, is the second-richest American. Berkshire, which invests in companies ranging from retailers to insurers, paid $5 billion in September 2008 to acquire preferred stock in Goldman Sachs that pays a 10 percent dividend. Berkshire, based in Omaha, Nebraska, also gained five-year warrants to buy $5 billion of common stock at $115 per share.

  • ( NOTE Curent Price of GS shares is 172$ – thats a 50% profit on 5 Billion~ 2.5 Billion for Mr Buffett but he is probably waiting for long term capital gains ax rates to kick in before encashing his patriotic  “Buy American. I am” warrants (see NYT op ed by him  http://www.nytimes.com/2008/10/17/opinion/17buffett.html )
  • A better analysis of the above Bloomberg story was given on Bloomberg itself at http://www.bloomberg.com/apps/news?pid=20601039&sid=asjp51YPDwJU
  • A small thought- could smaller businesses gain from efficiencies of programs like SPSS, SAS and R. Or would they be better off with customized GUI’s linked to their POS data.

Anyways a need for analytics for small businesses in inventory management, and sales planning could help. Joe the Plumber could do with some ETS and Regression Models as well.

However apart for Salesforce.com applications this field seems to be totally vacant for analytics. What are IBM SPSS, SAS, or even other stats packages doing for small businesses. or even developing Salesforce.com applications for their own equivalent software

The market could be an interesting one to atleast do a test in. Unless you don’t believe in test and control.

See below the IBM Cognos by IBM itself and the third party app by Pervasive for SAP Integration-

Citation-

http://sites.force.com/appexchange/listingDetail?listingId=a0N300000016YGYEA2

and

http://sites.force.com/appexchange/listingDetail?listingId=a0N300000016am1EAA