The Great Game- How social media changes the Intelligence Industry

Since time immemorial, countries and corporations have used spies to displace existing equilibriums in balance of power or market share dynamics. An integral part of that was technology. From the pox infested rugs given to natives, to the plague rats, to the smuggling of the secret of silk and gunpowder from China to the West to the latest research in cloud seeding by China and Glaciars melting by India- technology espionage has been an integral part in keeping up with each other.

For the first time in history, technology has evolved to the point where tools for communicating securely , storing data has become cheap to the point of just having a small iPhone 3GS with applications for secure transmission. From an analytical purpose the need for analyzing signal from noise and the criticality in mapping chatter with events (like Major Hasan’s online activities)  has also created an opportunity for social media as well as an headache for the people involved. With Citizen Journalism, foreign relations office, and ambassadors with their bully pulpits have been brought down to defending news leaked by Twitter ( Iran) You Tube ( Thailand/Burma/Tibet) and Blogs ( Russia/Georgia). The rise of bot nets, dark clouds to create disruptions as well as hack into accounts for enhancing favourable noise and reducing unfavourable signals has only increased. Blogs have potential to influence customer behavior as they are seen more credible than public relations which is mostly public and rarely on relations.

Techniques like sentiment analysis , social network analysis, text mining and co relation of keywords to triggers remain active research points.

[tweetmeme=”decisionstats”]

The United States remains a leader as you can only think creatively out of a box if you are permitted to behave accordingly out of the box. The remaining countries are torn between a  mix of admiration , envy and plain old copy cat techniques. The rising importance of communities that act more tribal than hitherto loyal technology user lists is the reason almost all major corporates actively seek to cultivate social media communities. The market for blogs and twitter in China or Iran or Russia will have impacts on those government’s efforts to manage their growth as per their national strategic interests. Just like the title of an old and quaint novel- “The Brave New World” of social media and it’s convergence with increasing amounts of text data generated on customers, or citizens is evolving into creating new boundaries and space for itself.A fascinating Great Game in itself.

News on R Commercial Development -Rattle- R Data Mining Tool

R RANT- while the European R Core leadership led by the Great Dane, Pierre Dalgaard focuses on the small picture and virtually handing the whole commercial side to Prof Nie and David Smith at Revo Computing other smaller package developers have refused to be treated as cheap R and D developers for enterprise software. How’s the book sales coming along, Prof Peter? Any plans to write another R Book or are you done with writing your version of Mathematica (Ref-Newton). Running the R Core project team must be so hard I recommend the Tarantino movie “Inglorious B…” for Herr Doktors. -END

I believe that individual R Package creators like Prof Harell (Hmisc) , or Hadley Wickham (plyr) deserve a share of the royalties or REVENUE that Revolution Computing, or ANY software company that uses R.

On this note-Some updated news on Rattle the Data Mining Tool created by Dr Graham Williams. Once again R development taken ahead by Down Under chaps while the Big Guys thrash out the road map across the Pond.

Data Mining Resources

Citation –http://datamining.togaware.com/

Rattle is a free and open source data mining toolkit written in the statistical language R using the Gnome graphical interface. It runs under GNU/Linux, Macintosh OS X, and MS/Windows. Rattle is being used in business, government, research and for teaching data mining in Australia and internationally. Rattle can be purchased on DVD (or made available as a downloadable CD image) as a standalone installation for $450USD ($560AUD), using one of the following payment buttons.

The free and open source book, The Data Mining Desktop Survival Guide (ISBN 0-9757109-2-3) simply explains the otherwise complex algorithms and concepts of data mining, with examples to illustrate each algorithm using the statistical language R. The book is being written by Dr Graham Williams, based on his 20 years research and consulting experience in machine learning and data mining. An electronic PDF version is available for a small fee from Togaware ($40AUD/$35USD to cover costs and ongoing development);

Other Resources

  • The Data Mining Software Repository makes available a collection of free (as in libre) open source software tools for data mining
  • The Data Mining Catalogue lists many of the free and commercial data mining tools that are available on the market.
  • The Australasian Data Mining Conferences are supported by Togaware, which also hosts the web site.
  • Information about the Pacific Asia Knowledge Discovery and Data Mining series of conferences is also available.
  • Data Mining course is taught at the Australian National University.
  • See also the Canberra Analytics Practise Group.
  • A Data Mining Course was held at the Harbin Institute of Technology Shenzhen Graduate School, China, 6 December – 13 December 2006. This course introduced the basic concepts and algorithms of data mining from an applications point of view and introduced the use of R and Rattle for data mining in practise.
  • Data Mining Workshop was held over two days at the University of Canberra, 27-28 November, 2006. This course introduced the basic concepts and algorithms for data mining and the use of R and Rattle.

Using R for Data Mining

The open source statistical programming language R (based on S) is in daily use in academia and in business and government. We use R for data mining within the Australian Taxation Office. Rattle is used by those wishing to interact with R through a GUI.

R is memory based so that on 32bit CPUs you are limited to smaller datasets (perhaps 50,000 up to 100,000, depending on what you are doing). Deploying R on 64bit multiple CPU (AMD64) servers running GNU/Linux with 32GB of main memory provides a powerful platform for data mining.

R is open source, thus providing assurance that there will always be the opportunity to fix and tune things that suit our specific needs, rather than rely on having to convince a vendor to fix or tune their product to suit our needs.

Also, by being open source, we can be sure that the code will always be available, unlike some of the data mining products that have disappearded (e.g., IBM’s Intelligent Miner).

See earlier interview-

https://decisionstats.wordpress.com/2009/01/13/interview-dr-graham-williams/

Holiday Fun: Analyzing Facebook Privacy for Ads

So you got a Facebook ID and ticked it in a hurry AND added in your work info. Bad Choice. Even small advertisers like me ( with 225 fans for Decisionstats) can see aggregate numbers of work info BEFORE even advertising.
This can lead to hilarious results-

See Screenshots below- AND note the numbers

1) 400 US females > age 18 work at IBM, SAP, Oracle or Microsoft AND are interested in Women

2) 2940 US females or males > age 18 work at IBM, SAP, Oracle or Microsoft AND are interested in Women

3) 480 US females > age 18 work at IBM, SAP, Oracle or Microsoft AND are interested in Men AND are married

4) 440 US males > age 18 work at IBM, SAP, Oracle or Microsoft AND are interested in Men

5) 40 US males > age 18 work at IBM, SAP, Oracle or Microsoft AND are interested in Men AND are married

[tweetmeme=”decisionstats”]

Interested in males/females while giving out your work info AND your marital status. I hope these are ahem False Positives but seriously do you think these are violations of privacy or not.

Ps- i decided not to advertise after seeing the err statistics.
pps- This is meant to showcase lax ad related privacy for professionals rather than any individual preference or judgment.

PAWS goes to SF

Conference :Message on Linkedin groupof Decisionstats

 

[tweetmeme source=”decisionstats”]

Predictive Analytics World, Feb 16-17 in San Francisco

The agenda for Predictive Analytics World – Feb. 16-17 2010 in San Francisco – has been posted: http://www.pawcon.com/sanfrancisco/2010/agenda_overview.php

February’s PAW covers hot topics and advanced methods such as social data, uplift modeling (net lift), text mining, massively parallel analytics, in-cloud deployment, and innovative applications that benefit organizations in new and creative ways.

Be sure to register by December 18 for the Super Early Bird to save $400 off the Regular Price:
http://www.predictiveanalyticsworld.com/register.php

And take an additional $50 off the Super Early Bird with discount code: LIN150

Below is some more info – let me know if you have any questions.

-Eric Siegel, Conference Chair

———–

PAW-2010 includes 25 sessions across two tracks, so you can witness how predictive analytics is applied at 1-800-FLOWERS, Amazon.com, AT&T, BBC, Canadian Automobile Association, Charles Schwab, Continental Airlines, Deutsche Postbank, Google, Group RCI, IBM, PASSUR Aerospace, PayPal (eBay), Sun Microsystems, U.S. Army, Visa, Walmart Financial Services, and Younoodle, plus special examples from the U.S. government agencies CBP, NCMI, NGIC, NSA, and SSA.

Keynote speakers include Kim Larsen, Director Advanced Analytics at Charles Schwab, Andreas S. Weigend, Ph.D., Former Chief Scientist at Amazon.com, and Program Chair Eric Siegel, Ph.D., President of Prediction Impact and former Columbia University professor.

Predictive Analytics World is the business-focused event for predictive analytics professionals, managers and commercial practitioners, covering today’s commercial deployment of predictive analytics, across industries and across software vendors.

For more information, including three pre- and post-event workshops:
http://www.predictiveanalyticsworld.com

How to be a BAD blogger?

Here are some tips to being a BAD blogger. This assumes that –

[tweetmeme source=”decisionstats”]

  • you are intelligent enough to know what you speak ( NO- STUPID CLAUSE),
  • are otherwise an interesting person in your offline life,
  • have a good story to tell about yourself, your product or your company ( NO BORING CLAUSES),
  • can spell-check (mostly) (NOT LAZY CLAUSE),
  • can create a free account on wordpress.com or have access to a website where you can post material (NOT LAZY AND STUPID CLAUSES)
  • AND otherwise have a desire to try and be a good blogger.
BAD

Step 1

Credibility

On the Internet everyone is an experienced expert in something.

Ways to wreck credibility-

  1. Offer ads from Adsense before your blog traffic crosses 100 average a day and maximum 200 visitors a day( not views).
  2. Take offers like free travel, books, software from people, products and companies- dont disclose that- and pump them up by flattering reviews.
  3. Scratch the back of a fellow blog monkey- Also known as you praise me in my blog- I will praise you in mine and we think we fooled everyone that we are just networking.
  4. Use shock words and images to differentiate.
  5. Offer ads from Non Adsense advertisers before your traffic crosses 500 average a day and maximum 1000 visitors a day( not views)
  6. Have only ONE advertiser and offer PRIME placement to news of it AND IGNORE corporate rivals completely.
  7. Claim to know people intimately whom you only know via Facebook Mafia Wars.
  8. Offer stuff to guest blogger and forget to follow up on the promise.
  9. Spam people on email and tell them how you are spamming them to HELP them with NEW stuff.
  10. Take money from sponsors, and free content from people. Call it aggregation and community. Pocket all the money
  11. Accept advertising from pornography. Claim you did not know what it was.
  12. Give tips on hacking websites. What goes around will never come around, right?

That should wreck your credibility completely. To build up your credibility ,  do the reverse of the above.

Hard Work

Hard work never killed anyone, but try to blog on boring stuff. Or on politics ,guns, gays and religion (preferably at the same time)

  1. Post a stupid  picture of yourself in the about page  and tell yourself people don’t care on photos anyway.
  2. Touch up your photo image by ADOBE Photoshop or Post an image 10 years younger (or 10 pounds thinner).
  3. Choose a bad theme. Like Violet background and yellow font.
  4. Post images of your kids or your vacation in a professional blog OR /AND post images of your computer or conferences in a personal blog.
  5. DO NOT SPELL CHECK.
  6. Use HTM4.0 . Pretend that CSS is a hit TV show.
  7. Pretend SEO , Tags and Categories is for others. DO NOT make it easy to search your blog.

WRITING

Coleridge was a drug addict. Poe was an alcoholic. Marlowe was killed by a man whom he was treacherously trying to stab. Pope took money to keep a woman’s name out of a satire then wrote a piece so that she could still be recognized anyhow. Chatterton killed himself. Byron was accused of incest. Do you still want to a writer – and if so, why?

Bennett Cerf ( from http://koti.mbnet.fi/pasenka/quotes/q-writ.htm#Writing%20is%20hell

  1. Write on politics and guns on a tech blog, or technology on a politics blog.
  2. Write dis jointed sentences in a hurry and claim it’s okay people wont notice anyways.
  3. Write only in text without ANY Images.
  4. Write 5 posts a day. or Write once in 5 weeks.
  5. Never explore VIDEO or AUDIO in your blog. Podcasts are for frozen peas.
  6. Have an ego bigger than your talent. Write about it.
  7. Be an expert in social media without crossing 1.5 years of blogging, or 25000 unique visitors. or 100,000 views on Internet. Twitter followers and Linkedin connections doesn’t count. Facebook  Fans don’ count either.
  8. Generally make an ass of yourself by not editing or not proof reading your posts.

This should generally make sure that you become a BAD blogger, your blog traffic never crosses into two digits a day and you get back to work on your day job which you are probably good at.

If you do that, tell everyone blogs don’t matter in the 2010’s just as websites never mattered in the 1990’s, or Novels in the 1980’s, or TV in the 1950’s or Talking Pictures in the 1930’s.

Yup.

Born in the USA?

Here is some econometric search-ing I did

Using Google Public Data-and Wolfram Alpha and The Bureau of Labour Statistics

United States

United States – Monthly Data
Data Series Back
Data
May
2009
June
2009
July
2009
Aug
2009
Sept
2009
Oct
2009
Unemployment Rate (1)
Jump to page with historical data
9.4 9.5 9.4 9.7 9.8 10.2
Change in Payroll Employment (2)
Jump to page with historical data
-303 -463 -304 -154 (P) -219 (P) -190
Average Hourly Earnings (3)
Jump to page with historical data
18.53 18.54 18.59 18.66 (P) 18.67 (P) 18.72
Consumer Price Index (4)
Jump to page with historical data
0.1 0.7 0.0 0.4 0.2 0.3
Producer Price Index (5)
Jump to page with historical data
0.2 1.7 (P) -1.0 (P) 1.7 (P) -0.6 (P) 0.3
U.S. Import Price Index (6)
Jump to page with historical data
1.7 2.7 (R) -0.6 (R) 1.5 (R) 0.2 (R) 0.7
Footnotes
(1) In percent, seasonally adjusted. Annual averages are available for Not Seasonally Adjusted data.
(2) Number of jobs, in thousands, seasonally adjusted.
(3) For production and nonsupervisory workers on private nonfarm payrolls, seasonally adjusted.
(4) All items, U.S. city average, all urban consumers, 1982-84=100, 1-month percent change, seasonally adjusted.
(5) Finished goods, 1982=100, 1-month percent change, seasonally adjusted.
(6) All imports, 1-month percent change, not seasonally adjusted.
(R) Revised
(P) Preliminary
United States – Quarterly Data
Data Series Back
Data
3rd Qtr
2008
4th Qtr
2008
1st Qtr
2009
2nd Qtr
2009
3rd Qtr
2009
Employment Cost Index (1)
Jump to page with historical data
0.6 0.6 0.3 0.4 0.4
Productivity (2)
Jump to page with historical data
-0.1 0.8 0.3 6.9 9.5
Footnotes
(1) Compensation, all civilian workers, quarterly data, 3-month percent change, seasonally adjusted.
(2) Output per hour, nonfarm business, quarterly data, percent change from previous quarter at annual rate, seasonally adjusted.

And also included are the average wages for salary of teachers and average salary per hour of some offshore  prone industries

http://www.bls.gov/oes/2008/may/oes_nat.htm#b25-0000

http://www.bls.gov/oes/2008/may/oes_nat.htm#b11-0000

and

http://www.google.com/publicdata?ds=usunemployment&met=unemployment_rate&idim=state:ST370000:ST540000:ST510000&tdim=true

WHAT THEY PAY TEACHERS (MAY 2008)

Education, Training, and Library Occupations top
Wage Estimates
Occupation Code Occupation Title (click on the occupation title to view an occupational profile) Employment (1) Median Hourly Mean Hourly Mean Annual (2) Mean RSE (3)
25-0000 Education, Training, and Library Occupations 8,451,250 $21.26 $23.30 $48,460 0.5 %
25-1011 Business Teachers, Postsecondary 69,690 (4) (4) $77,340 1.0 %
25-1021 Computer Science Teachers, Postsecondary 32,520 (4) (4) $74,050 1.0 %
25-1022 Mathematical Science Teachers, Postsecondary 45,710 (4) (4) $68,130 0.9 %
25-1031 Architecture Teachers, Postsecondary 6,430 (4) (4) $75,450 1.9 %
25-1032 Engineering Teachers, Postsecondary 32,070 (4) (4) $90,070 1.1 %
25-1041 Agricultural Sciences Teachers, Postsecondary 10,000 (4) (4) $77,770 1.6 %
25-1042 Biological Science Teachers, Postsecondary 51,930 (4) (4) $83,270 2.7 %

WHAT THEY PAY THEMSELVES

Management Occupations top
Wage Estimates
Occupation Code Occupation Title (click on the occupation title to view an occupational profile) Employment (1) Median Hourly Mean Hourly Mean Annual (2) Mean RSE (3)
11-0000 Management Occupations 6,152,650 $42.15 $48.23 $100,310 0.2 %
11-1011 Chief Executives 301,930 $76.23 $77.13 $160,440 0.5 %
11-1021 General and Operations Managers 1,697,690 $44.02 $51.91 $107,970 0.2 %
11-1031 Legislators 64,650 (4) (4) $37,980 1.1 %

and JOBS PRONE TO SHORTAGE /OFFSHORING

Computer and Mathematical Science Occupations top
Wage Estimates
Occupation Code Occupation Title (click on the occupation title to view an occupational profile) Employment (1) Median Hourly Mean Hourly Mean Annual (2) Mean RSE (3)
15-0000 Computer and Mathematical Science Occupations 3,308,260 $34.26 $35.82 $74,500 0.3 %
15-1011 Computer and Information Scientists, Research 26,610 $47.10 $48.51 $100,900 1.1 %
15-1021 Computer Programmers 394,230 $33.47 $35.32 $73,470 0.6 %
15-1031 Computer Software Engineers, Applications 494,160 $41.07 $42.26 $87,900 0.4 %
15-1032 Computer Software Engineers, Systems Software 381,830 $44.44 $45.44 $94,520 0.5 %
15-1041 Computer Support Specialists 545,520 $20.89 $22.29 $46,370 0.3 %
15-1051 Computer Systems Analysts 489,890 $36.30 $37.90 $78,830 0.4 %
15-1061 Database Administrators 115,770 $33.53 $35.05 $72,900 0.8 %
15-1071 Network and Computer Systems Administrators 327,850 $31.88 $33.45 $69,570 0.3 %
15-1081 Network Systems and Data Communications Analysts 230,410 $34.18 $35.50 $73,830 0.4 %
15-1099 Computer Specialists, All Other 191,780 $36.13 $36.54 $76,000 0.5 %
15-2011 Actuaries 18,220 $40.77 $46.14 $95,980 1.4 %
15-2021 Mathematicians 2,770 $45.75 $45.65 $94,960 1.7 %
15-2031 Operations Research Analysts 60,860 $33.17 $35.68 $74,220 0.8 %
15-2041 Statisticians 20,680 $34.91 $35.96 $74,790 1.5 %
15-2091 Mathematical Technicians 1,100 $18.46 $20.24 $42,100 2.7 %
15-2099 Mathematical Science Occupations, All Other 6,600 $26.44 $31.55 $65,630 4.3 %

 

UNEMPLOYED IN THE USA (above)

BY STATE (below)

16 million people out of work. Give or take a million.

How can America pay 5.6 million people UNEMPLOYMENT BENEFITS

Keep another 10 million unemployed,

another 10 million only partially employed.

[tweetmeme source=”decisionstats”]

and still claim aggregate cost savings from offshoring jobs.