C4ISTAR for Hacking and Cyber Conflict

As per http://en.wikipedia.org/wiki/C4ISTAR

C2I stands for command, control, and intelligence.

C3I stands for command, control, communications, and intelligence.

C4I stands for command, control, communications, computers, and (military) intelligence.

C4ISTAR is the British acronym used to represent the group of the military functions designated by C4 (command, control, communications, computers), I (military intelligence), and STAR (surveillance, target acquisition, and reconnaissance) in order to enable the coordination of operations

I increasingly believe that cyber conflict will develop its own terminology and theory and paradigms in due time. In the meantime, it will adopt paradigms from existing military literature and adapt it to the unique sub culture of cyber conflict for both offensive, defensive as well as pre-emptive actions. Here I am theorizing for a case of targeted hacking attacks rather than massive attacks that bring down a website for a few hours and achieve nothing but a few press headlines . I would also theorize on countering such attacks.

So what would be the C4ISTAR for –

1) Media company supporting SOPA/PIPA/Take down Mega Upload-

Command and Control refers to the ability of commanders to direct forces-

This will be the senior executives including the members of board, legal officers, and public relationship/marketing people. Their name is available from corporate websites, and social media scraping can ensure both a list of contact addresses (online) as well as biases for phishing /malware attacks. This could also include phone (flooding or voicemail hacking ) attacks , and attacks against the email server of the company rather than the corporate website.

Communications- This will include all online and social media channels including websites of the media company , but also  those of the press relations firms handling communications , phones,websites- anything which the target is likely to communicate externally (and if possible internal communication)

Timing is everything- coordinating attacks immediately is juevenile, but it might be more mature to attack on vulnerable days like product launches or just before a board of directors meeting


Most corporates have an in-house research team, they can be easily targeted using social media channels, but also offline research and digging deep. Targeting intelligence corps of the target corporate is likely to produce a much better disruption. Eventually they can be persuaded to stop working for that corporate.

Computers- Anything that runs on electricity and can be disabled – should be disabled. This might require much more creativity than just flooding.

 surveillance-  This can be both online as well as offline, and would be of electronic assets, likely responses for the attack, and the key people who are to be disrupted.

target acquisition-  at least ten people within each corporate can and should be ideally disrupted, rather than just the website. this would call for social media scraping, and prior planning. even email in-boxes can be disrupted (if all else fails)

and reconnaissance-

study your target companies, target employees, and their strategies.

Then segment and prioritize in a list of  matrix of 10  to 10, who is more vulnerable and who is more valuable to attack.

the C4ISTAR for -a hacker activist organization is much more complicated but forensics reveal that most hackers tend to leave a signature style (in terms of computers,operating systems,machine ids,communication, tools, or even port numbers used)

the best defense for a media rich company to prevent hacking attacks is to first identify its own C4ISTAR structure for its digital content strategy and then fortify as well as scrub vulnerabilities (including from online information regarding its own employees)

(to be continued)


The Hacker Attitude

Quantitative Modeling for Arbitrage Positions in Ad KeyWords Internet Marketing

Assume you treat an ad keyword as an equity stock. There are slight differences in the cost for advertising for that keyword across various locations (Zurich vs Delhi) and various channels (Facebook vs Google) . You get revenue if your website ranks naturally in organic search for the keyword, and you have to pay costs for getting traffic to your website for that keyword.
An arbitrage position is defined as a riskless profit when cost of keyword is less than revenue from keyword. We take examples of Adsense  and Adwords primarily.
There are primarily two types of economic curves on the foundation of which commerce of the  internet  resides-
1) Cost Curve- Cost of Advertising to drive traffic into the website  (Google Adwords, Twitter Ads, Facebook , LinkedIn ads)
2) Revenue Curve - Revenue from ads clicked by the incoming traffic on website (like Adsense, LinkAds, Banner Ads, Ad Sharing Programs , In Game Ads)
The cost and revenue curves are primarily dependent on two things
1) Type of KeyWord-Also subdependent on
a) Location of Prospective Customer, and
b) Net Present Value of Good and Service to be eventually purchased
For example , keyword for targeting sales of enterprise “business intelligence software” should ideally be costing say X times as much as keywords for “flower shop for birthdays” where X is the multiple of the expected payoffs from sales of business intelligence software divided by expected payoff from sales of flowers (say in Location, Daytona Beach ,Florida or Austin, Texas)
2) Traffic Volume – Also sub-dependent on Time Series and
a) Seasonality -Annual Shoppping Cycle
b) Cyclicality- Macro economic shifts in time series
The cost and revenue curves are not linear and ideally should be continuous in a definitive exponential or polynomial manner, but in actual reality they may have sharp inflections , due to location, time, as well as web traffic volume thresholds
Type of Keyword – For example ,keywords for targeting sales for Eminem Albums may shoot up in a non linear manner after the musician dies.
The third and not so publicly known component of both the cost and revenue curves is factoring in internet industry dynamics , including relative market share of internet advertising platforms, as well as percentage splits between content creator and ad providing platforms.
For example, based on internet advertising spend, people belive that the internet advertising is currently heading for a duo-poly with Google and Facebook are the top two players, while Microsoft/Skype/Yahoo and LinkedIn/Twitter offer niche options, but primarily depend on price setting from Google/Bing/Facebook.
It is difficut to quantify  the elasticity and efficiency of market curves as most literature and research on this is by in-house corporate teams , or advisors or mentors or consultants to the primary leaders in a kind of incesteous fraternal hold on public academic research on this.
It is recommended that-
1) a balance be found in the need for corporate secrecy to protest shareholder value /stakeholder value maximization versus the need for data liberation for innovation and grow the internet ad pie faster-
2) Cost and Revenue Curves between different keywords, time,location, service providers, be studied by quants for hedging inetrent ad inventory or /and choose arbitrage positions This kind of analysis is done for groups of stocks and commodities in the financial world, but as commerce grows on the internet this may need more specific and independent quants.
3) attention be made to how cost and revenue curves mature as per level of sophistication of underlying economy like Brazil, Russia, China, Korea, US, Sweden may be in different stages of internet ad market evolution.
For example-
A study in cost and revenue curves for certain keywords across domains across various ad providers across various locations from 2003-2008 can help academia and research (much more than top ten lists of popular terms like non quantitative reports) as well as ensure that current algorithmic wightings are not inadvertently given away.
Part 2- of this series will explore the ways to create third party re-sellers of keywords and measuring impacts of search and ad engine optimization based on keywords.

Topic Models

Some stuff on Topic Models-


In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract “topics” that occur in a collection of documents. An early topic model was probabilistic latent semantic indexing (PLSI), created by Thomas Hofmann in 1999.[1] Latent Dirichlet allocation (LDA), perhaps the most common topic model currently in use, is a generalization of PLSI developed by David Blei, Andrew Ng, and Michael Jordan in 2002, allowing documents to have a mixture of topics.[2] Other topic models are generally extensions on LDA, such as Pachinko allocation, which improves on LDA by modeling correlations between topics in addition to the word correlations which constitute topics. Although topic models were first described and implemented in the context of natural language processing, they have applications in other fields such as bioinformatics.


In statistics, latent Dirichlet allocation (LDA) is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. For example, if observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word’s creation is attributable to one of the document’s topics. LDA is an example of a topic model

David M Blei’s page on Topic Models-


The topic models mailing list is a good forum for discussing topic modeling.

In R,

Some resources I compiled on Slideshare based on the above-

Continue reading