The Great Game- How social media changes the Intelligence Industry

Since time immemorial, countries and corporations have used spies to displace existing equilibriums in balance of power or market share dynamics. An integral part of that was technology. From the pox infested rugs given to natives, to the plague rats, to the smuggling of the secret of silk and gunpowder from China to the West to the latest research in cloud seeding by China and Glaciars melting by India- technology espionage has been an integral part in keeping up with each other.

For the first time in history, technology has evolved to the point where tools for communicating securely , storing data has become cheap to the point of just having a small iPhone 3GS with applications for secure transmission. From an analytical purpose the need for analyzing signal from noise and the criticality in mapping chatter with events (like Major Hasan’s online activities) has also created an opportunity for social media as well as an headache for the people involved. With Citizen Journalism, foreign relations office, and ambassadors with their bully pulpits have been brought down to defending news leaked by Twitter ( Iran) You Tube ( Thailand/Burma/Tibet) and Blogs ( Russia/Georgia). The rise of bot nets, dark clouds to create disruptions as well as hack into accounts for enhancing favourable noise and reducing unfavourable signals has only increased. Blogs have potential to influence customer behavior as they are seen more credible than public relations which is mostly public and rarely on relations.

Techniques like sentiment analysis , social network analysis, text mining and co relation of keywords to triggers remain active research points.

[tweetmeme=”decisionstats”]

The United States remains a leader as you can only think creatively out of a box if you are permitted to behave accordingly out of the box. The remaining countries are torn between a mix of admiration , envy and plain old copy cat techniques. The rising importance of communities that act more tribal than hitherto loyal technology user lists is the reason almost all major corporates actively seek to cultivate social media communities. The market for blogs and twitter in China or Iran or Russia will have impacts on those government’s efforts to manage their growth as per their national strategic interests. Just like the title of an old and quaint novel- “The Brave New World” of social media and it’s convergence with increasing amounts of text data generated on customers, or citizens is evolving into creating new boundaries and space for itself.A fascinating Great Game in itself.

News on R Commercial Development -Rattle- R Data Mining Tool

R RANT- while the European R Core leadership led by the Great Dane, Pierre Dalgaard focuses on the small picture and virtually handing the whole commercial side to Prof Nie and David Smith at Revo Computing other smaller package developers have refused to be treated as cheap R and D developers for enterprise software. How’s the book sales coming along, Prof Peter? Any plans to write another R Book or are you done with writing your version of Mathematica (Ref-Newton). Running the R Core project team must be so hard I recommend the Tarantino movie “Inglorious B…” for Herr Doktors. -END

I believe that individual R Package creators like Prof Harell (Hmisc) , or Hadley Wickham (plyr) deserve a share of the royalties or REVENUE that Revolution Computing, or ANY software company that uses R.

On this note-Some updated news on Rattle the Data Mining Tool created by Dr Graham Williams. Once again R development taken ahead by Down Under chaps while the Big Guys thrash out the road map across the Pond.

Data Mining Resources

Citation –http://datamining.togaware.com/

Rattle is a free and open source data mining toolkit written in the statistical language R using the Gnome graphical interface. It runs under GNU/Linux, Macintosh OS X, and MS/Windows. Rattle is being used in business, government, research and for teaching data mining in Australia and internationally. Rattle can be purchased on DVD (or made available as a downloadable CD image) as a standalone installation for $450USD ($560AUD), using one of the following payment buttons.

The free and open source book, The Data Mining Desktop Survival Guide (ISBN 0-9757109-2-3) simply explains the otherwise complex algorithms and concepts of data mining, with examples to illustrate each algorithm using the statistical language R. The book is being written by Dr Graham Williams, based on his 20 years research and consulting experience in machine learning and data mining. An electronic PDF version is available for a small fee from Togaware ($40AUD/$35USD to cover costs and ongoing development);

Other Resources

The Data Mining Software Repository makes available a collection of free (as in libre) open source software tools for data mining

The Data Mining Catalogue lists many of the free and commercial data mining tools that are available on the market.

The Australasian Data Mining Conferences are supported by Togaware, which also hosts the web site.

Information about the Pacific Asia Knowledge Discovery and Data Mining series of conferences is also available.

A Data Mining course is taught at the Australian National University.

See also the Canberra Analytics Practise Group.

A Data Mining Course was held at the Harbin Institute of Technology Shenzhen Graduate School, China, 6 December – 13 December 2006. This course introduced the basic concepts and algorithms of data mining from an applications point of view and introduced the use of R and Rattle for data mining in practise.

A Data Mining Workshop was held over two days at the University of Canberra, 27-28 November, 2006. This course introduced the basic concepts and algorithms for data mining and the use of R and Rattle.

Using R for Data Mining

The open source statistical programming language R (based on S) is in daily use in academia and in business and government. We use R for data mining within the Australian Taxation Office. Rattle is used by those wishing to interact with R through a GUI.

R is memory based so that on 32bit CPUs you are limited to smaller datasets (perhaps 50,000 up to 100,000, depending on what you are doing). Deploying R on 64bit multiple CPU (AMD64) servers running GNU/Linux with 32GB of main memory provides a powerful platform for data mining.

R is open source, thus providing assurance that there will always be the opportunity to fix and tune things that suit our specific needs, rather than rely on having to convince a vendor to fix or tune their product to suit our needs.

Also, by being open source, we can be sure that the code will always be available, unlike some of the data mining products that have disappearded (e.g., IBM’s Intelligent Miner).

See earlier interview-

https://decisionstats.wordpress.com/2009/01/13/interview-dr-graham-williams/

Holiday Fun: Analyzing Facebook Privacy for Ads

So you got a Facebook ID and ticked it in a hurry AND added in your work info. Bad Choice. Even small advertisers like me ( with 225 fans for Decisionstats) can see aggregate numbers of work info BEFORE even advertising.
This can lead to hilarious results-

See Screenshots below- AND note the numbers

1) 400 US females > age 18 work at IBM, SAP, Oracle or Microsoft AND are interested in Women

2) 2940 US females or males > age 18 work at IBM, SAP, Oracle or Microsoft AND are interested in Women

3) 480 US females > age 18 work at IBM, SAP, Oracle or Microsoft AND are interested in Men AND are married

4) 440 US males > age 18 work at IBM, SAP, Oracle or Microsoft AND are interested in Men

5) 40 US males > age 18 work at IBM, SAP, Oracle or Microsoft AND are interested in Men AND are married

[tweetmeme=”decisionstats”]

Interested in males/females while giving out your work info AND your marital status. I hope these are ahem False Positives but seriously do you think these are violations of privacy or not.

Ps- i decided not to advertise after seeing the err statistics.
pps- This is meant to showcase lax ad related privacy for professionals rather than any individual preference or judgment.

PAWS goes to SF

Conference :Message on Linkedin groupof Decisionstats

[tweetmeme source=”decisionstats”]

Predictive Analytics World, Feb 16-17 in San Francisco

The agenda for Predictive Analytics World – Feb. 16-17 2010 in San Francisco – has been posted: http://www.pawcon.com/sanfrancisco/2010/agenda_overview.php

February’s PAW covers hot topics and advanced methods such as social data, uplift modeling (net lift), text mining, massively parallel analytics, in-cloud deployment, and innovative applications that benefit organizations in new and creative ways.

Be sure to register by December 18 for the Super Early Bird to save $400 off the Regular Price:
http://www.predictiveanalyticsworld.com/register.php

And take an additional $50 off the Super Early Bird with discount code: LIN150

Below is some more info – let me know if you have any questions.

-Eric Siegel, Conference Chair

———–

PAW-2010 includes 25 sessions across two tracks, so you can witness how predictive analytics is applied at 1-800-FLOWERS, Amazon.com, AT&T, BBC, Canadian Automobile Association, Charles Schwab, Continental Airlines, Deutsche Postbank, Google, Group RCI, IBM, PASSUR Aerospace, PayPal (eBay), Sun Microsystems, U.S. Army, Visa, Walmart Financial Services, and Younoodle, plus special examples from the U.S. government agencies CBP, NCMI, NGIC, NSA, and SSA.

Keynote speakers include Kim Larsen, Director Advanced Analytics at Charles Schwab, Andreas S. Weigend, Ph.D., Former Chief Scientist at Amazon.com, and Program Chair Eric Siegel, Ph.D., President of Prediction Impact and former Columbia University professor.

Predictive Analytics World is the business-focused event for predictive analytics professionals, managers and commercial practitioners, covering today’s commercial deployment of predictive analytics, across industries and across software vendors.

For more information, including three pre- and post-event workshops:
http://www.predictiveanalyticsworld.com

Protected: Thoughts on WPS, SAS , R

How to be a BAD blogger?

Here are some tips to being a BAD blogger. This assumes that –

[tweetmeme source=”decisionstats”]

you are intelligent enough to know what you speak ( NO- STUPID CLAUSE),
are otherwise an interesting person in your offline life,
have a good story to tell about yourself, your product or your company ( NO BORING CLAUSES),
can spell-check (mostly) (NOT LAZY CLAUSE),
can create a free account on wordpress.com or have access to a website where you can post material (NOT LAZY AND STUPID CLAUSES)
AND otherwise have a desire to try and be a good blogger.

Step 1

Credibility

On the Internet everyone is an experienced expert in something.

Ways to wreck credibility-

Offer ads from Adsense before your blog traffic crosses 100 average a day and maximum 200 visitors a day( not views).
Take offers like free travel, books, software from people, products and companies- dont disclose that- and pump them up by flattering reviews.
Scratch the back of a fellow blog monkey- Also known as you praise me in my blog- I will praise you in mine and we think we fooled everyone that we are just networking.
Use shock words and images to differentiate.
Offer ads from Non Adsense advertisers before your traffic crosses 500 average a day and maximum 1000 visitors a day( not views)
Have only ONE advertiser and offer PRIME placement to news of it AND IGNORE corporate rivals completely.
Claim to know people intimately whom you only know via Facebook Mafia Wars.
Offer stuff to guest blogger and forget to follow up on the promise.
Spam people on email and tell them how you are spamming them to HELP them with NEW stuff.
Take money from sponsors, and free content from people. Call it aggregation and community. Pocket all the money
Accept advertising from pornography. Claim you did not know what it was.
Give tips on hacking websites. What goes around will never come around, right?

That should wreck your credibility completely. To build up your credibility , do the reverse of the above.

Hard Work

Hard work never killed anyone, but try to blog on boring stuff. Or on politics ,guns, gays and religion (preferably at the same time)

Post a stupid picture of yourself in the about page and tell yourself people don’t care on photos anyway.
Touch up your photo image by ADOBE Photoshop or Post an image 10 years younger (or 10 pounds thinner).
Choose a bad theme. Like Violet background and yellow font.
Post images of your kids or your vacation in a professional blog OR /AND post images of your computer or conferences in a personal blog.
DO NOT SPELL CHECK.
Use HTM4.0 . Pretend that CSS is a hit TV show.
Pretend SEO , Tags and Categories is for others. DO NOT make it easy to search your blog.

WRITING

Coleridge was a drug addict. Poe was an alcoholic. Marlowe was killed by a man whom he was treacherously trying to stab. Pope took money to keep a woman’s name out of a satire then wrote a piece so that she could still be recognized anyhow. Chatterton killed himself. Byron was accused of incest. Do you still want to a writer – and if so, why?

Bennett Cerf ( from http://koti.mbnet.fi/pasenka/quotes/q-writ.htm#Writing%20is%20hell

Write on politics and guns on a tech blog, or technology on a politics blog.
Write dis jointed sentences in a hurry and claim it’s okay people wont notice anyways.
Write only in text without ANY Images.
Write 5 posts a day. or Write once in 5 weeks.
Never explore VIDEO or AUDIO in your blog. Podcasts are for frozen peas.
Have an ego bigger than your talent. Write about it.
Be an expert in social media without crossing 1.5 years of blogging, or 25000 unique visitors. or 100,000 views on Internet. Twitter followers and Linkedin connections doesn’t count. Facebook Fans don’ count either.
Generally make an ass of yourself by not editing or not proof reading your posts.

This should generally make sure that you become a BAD blogger, your blog traffic never crosses into two digits a day and you get back to work on your day job which you are probably good at.

If you do that, tell everyone blogs don’t matter in the 2010’s just as websites never mattered in the 1990’s, or Novels in the 1980’s, or TV in the 1950’s or Talking Pictures in the 1930’s.

Yup.

Born in the USA?

Here is some econometric search-ing I did

Using Google Public Data-and Wolfram Alpha and The Bureau of Labour Statistics

United States

United States – Monthly Data
Data Series	May 2009	June 2009	July 2009	Aug 2009	Sept 2009	Oct 2009
Unemployment Rate (1)	9.4	9.5	9.4	9.7	9.8	10.2
Change in Payroll Employment (2)	-303	-463	-304	-154	(P) -219	(P) -190
Average Hourly Earnings (3)	18.53	18.54	18.59	18.66	(P) 18.67	(P) 18.72
Consumer Price Index (4)	0.1	0.7	0.0	0.4	0.2	0.3
Producer Price Index (5)	0.2	1.7	(P) -1.0	(P) 1.7	(P) -0.6	(P) 0.3
U.S. Import Price Index (6)	1.7	2.7	(R) -0.6	(R) 1.5	(R) 0.2	(R) 0.7
Footnotes (1) In percent, seasonally adjusted. Annual averages are available for Not Seasonally Adjusted data. (2) Number of jobs, in thousands, seasonally adjusted. (3) For production and nonsupervisory workers on private nonfarm payrolls, seasonally adjusted. (4) All items, U.S. city average, all urban consumers, 1982-84=100, 1-month percent change, seasonally adjusted. (5) Finished goods, 1982=100, 1-month percent change, seasonally adjusted. (6) All imports, 1-month percent change, not seasonally adjusted. (R) Revised (P) Preliminary

United States – Quarterly Data
Data Series	3rd Qtr 2008	4th Qtr 2008	1st Qtr 2009	2nd Qtr 2009	3rd Qtr 2009
Employment Cost Index (1)	0.6	0.6	0.3	0.4	0.4
Productivity (2)	-0.1	0.8	0.3	6.9	9.5
Footnotes (1) Compensation, all civilian workers, quarterly data, 3-month percent change, seasonally adjusted. (2) Output per hour, nonfarm business, quarterly data, percent change from previous quarter at annual rate, seasonally adjusted.

And also included are the average wages for salary of teachers and average salary per hour of some offshore prone industries

http://www.bls.gov/oes/2008/may/oes_nat.htm#b25-0000

http://www.bls.gov/oes/2008/may/oes_nat.htm#b11-0000

and

http://www.google.com/publicdata?ds=usunemployment&met=unemployment_rate&idim=state:ST370000:ST540000:ST510000&tdim=true

WHAT THEY PAY TEACHERS (MAY 2008)

Education, Training, and Library Occupations top
			Wage Estimates
Occupation Code	Occupation Title (click on the occupation title to view an occupational profile)	Employment (1)	Median Hourly	Mean Hourly	Mean Annual (2)	Mean RSE (3)
25-0000	Education, Training, and Library Occupations	8,451,250	$21.26	$23.30	$48,460	0.5 %
25-1011	Business Teachers, Postsecondary	69,690	(4)	(4)	$77,340	1.0 %
25-1021	Computer Science Teachers, Postsecondary	32,520	(4)	(4)	$74,050	1.0 %
25-1022	Mathematical Science Teachers, Postsecondary	45,710	(4)	(4)	$68,130	0.9 %
25-1031	Architecture Teachers, Postsecondary	6,430	(4)	(4)	$75,450	1.9 %
25-1032	Engineering Teachers, Postsecondary	32,070	(4)	(4)	$90,070	1.1 %
25-1041	Agricultural Sciences Teachers, Postsecondary	10,000	(4)	(4)	$77,770	1.6 %
25-1042	Biological Science Teachers, Postsecondary	51,930	(4)	(4)	$83,270	2.7 %

WHAT THEY PAY THEMSELVES

Management Occupations top
			Wage Estimates
Occupation Code	Occupation Title (click on the occupation title to view an occupational profile)	Employment (1)	Median Hourly	Mean Hourly	Mean Annual (2)	Mean RSE (3)
11-0000	Management Occupations	6,152,650	$42.15	$48.23	$100,310	0.2 %
11-1011	Chief Executives	301,930	$76.23	$77.13	$160,440	0.5 %
11-1021	General and Operations Managers	1,697,690	$44.02	$51.91	$107,970	0.2 %
11-1031	Legislators	64,650	(4)	(4)	$37,980	1.1 %

and JOBS PRONE TO SHORTAGE /OFFSHORING

Computer and Mathematical Science Occupations top
			Wage Estimates
Occupation Code	Occupation Title (click on the occupation title to view an occupational profile)	Employment (1)	Median Hourly	Mean Hourly	Mean Annual (2)	Mean RSE (3)
15-0000	Computer and Mathematical Science Occupations	3,308,260	$34.26	$35.82	$74,500	0.3 %
15-1011	Computer and Information Scientists, Research	26,610	$47.10	$48.51	$100,900	1.1 %
15-1021	Computer Programmers	394,230	$33.47	$35.32	$73,470	0.6 %
15-1031	Computer Software Engineers, Applications	494,160	$41.07	$42.26	$87,900	0.4 %
15-1032	Computer Software Engineers, Systems Software	381,830	$44.44	$45.44	$94,520	0.5 %
15-1041	Computer Support Specialists	545,520	$20.89	$22.29	$46,370	0.3 %
15-1051	Computer Systems Analysts	489,890	$36.30	$37.90	$78,830	0.4 %
15-1061	Database Administrators	115,770	$33.53	$35.05	$72,900	0.8 %
15-1071	Network and Computer Systems Administrators	327,850	$31.88	$33.45	$69,570	0.3 %
15-1081	Network Systems and Data Communications Analysts	230,410	$34.18	$35.50	$73,830	0.4 %
15-1099	Computer Specialists, All Other	191,780	$36.13	$36.54	$76,000	0.5 %
15-2011	Actuaries	18,220	$40.77	$46.14	$95,980	1.4 %
15-2021	Mathematicians	2,770	$45.75	$45.65	$94,960	1.7 %
15-2031	Operations Research Analysts	60,860	$33.17	$35.68	$74,220	0.8 %
15-2041	Statisticians	20,680	$34.91	$35.96	$74,790	1.5 %
15-2091	Mathematical Technicians	1,100	$18.46	$20.24	$42,100	2.7 %
15-2099	Mathematical Science Occupations, All Other	6,600	$26.44	$31.55	$65,630	4.3 %

UNEMPLOYED IN THE USA (above)

BY STATE (below)

16 million people out of work. Give or take a million.

How can America pay 5.6 million people UNEMPLOYMENT BENEFITS

Keep another 10 million unemployed,

another 10 million only partially employed.

[tweetmeme source=”decisionstats”]

and still claim aggregate cost savings from offshoring jobs.

Please share:

Please share:

Please share:

Predictive Analytics World, Feb 16-17 in San Francisco

Please share:

Please share:

United States

Please share: