I would be very surprised if 260,000 documents and not even one was a counter-intelligence dis information move. Why was ALL the information stored in one place- maybe Wikileaks would leak the launch codes of the missiles next.
One more data visualization for Tableau– R watchers can not how jjplot by Facebook Analytics and Tableau are replacing GGPLOT 2 as visualization standards- (GGPLOT 2 needs a better GUI maybe using pyqt than the Deducer currently- maybe they can create GGPLOT extensions for Red R yet)
and yes stranger stupid things have happened in diplomacy and intelligence (like India exploding the nuclear bomb on exactly the same date and same place —-surprising CIA, but we are supposed to be on the same side atleast for the next decade) but it would be wrong not to cross reference the cables with the some verification.
Tableau gives great data viz though, but I dont think all 260,000 cables are valid data points (and boy they must really be regretting creating the internet at DARPA and DoD- but you can always blame Al Gore for that)
Jill Dyché is a partner and co-founder of Baseline Consulting. She is responsible for key client strategies and market analysis in the areas of data governance, business intelligence, master data management, and customer relationship management.
Jill counsels boards of directors on the strategic importance of their information investments.
Jill is the author of three books on the business value of IT. Jill’s first book, e-Data (Addison Wesley, 2000) has been published in eight languages. She is a contributor to Impossible Data Warehouse Situations: Solutions from the Experts (Addison Wesley, 2002), and her book, The CRM Handbook (Addison Wesley, 2002), is the bestseller on the topic.
Complex Event Processing (CEP- not to be confused by Circular Probability Error) is defined processing many events happening across all the layers of an organization, identifying the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time.
Oracle CEP is a Java application server for the development and deployment of high-performance event driven applications. It can detect patterns in the flow of events and message payloads, often based on filtering, correlation, and aggregation across event sources, and includes industry leading temporal and ordering capabilities. It supports ultra-high throughput (1 million/sec++) and microsecond latency.
Tibco is also trying to get into this market (it claims to have a 40 % market share in the public CEP market 😉 though probably they have not measured the DoE and DoD as worthy of market share yet
What it is: Methods 1 through 3 look at historical data and traditional architectures with information stored in the warehouse. In this environment, it often takes months of data cleansing and preparation to get the data ready to analyze. Now, what if you want to make a decision or determine the effect of an action in real time, as a sale is made, for instance, or at a specific step in the manufacturing process. With streaming data architectures, you can look at data in the present and make immediate decisions. The larger flood of data coming from smart phones, online transactions and smart-grid houses will continue to increase the amount of data that you might want to analyze but not keep. Real-time streaming, complex event processing (CEP) and analytics will all come together here to let you decide on the fly which data is worth keeping and which data to analyze in real time and then discard.
When you use it: Radio-frequency identification (RFID) offers a good user case for this type of architecture. RFID tags provide a lot of information, but unless the state of the item changes, you don’t need to keep warehousing the data about that object every day. You only keep data when it moves through the door and out of the warehouse.
The same concept applies to a customer who does the same thing over and over. You don’t need to keep storing data for analysis on a regular pattern, but if they change that pattern, you might want to start paying attention.
Figure 4: Traditional architecture vs. streaming architecture
In academia here is something called SASE Language
The query below retrieves the total trading volume of Google stocks in the 4 hour period after some bad news occurred.
PATTERN SEQ(News a, Stock+ b[ ])WHERE [symbol] AND a.type = 'bad' AND b[i].symbol = 'GOOG' WITHIN 4 hoursHAVING b[b.LEN].volume < 80%*b.volumeRETURN sum(b[ ].volume)
The next query reports a one-hour period in which the price of a stock increased from 10 to 20 and its trading volume stayed relatively stable.
PATTERN SEQ(Stock+ a)WHERE [symbol] AND a.price = 10 AND a[i].price > a[i-1].price AND a[a.LEN].price = 20 WITHIN 1 hourHAVING avg(a.volume) ≥ a.volumeRETURN a.symbol, a.price
The third query detects a more complex trend: in an hour, the volume of a stock started high, but after a period of price increasing or staying relatively stable, the volume plummeted.
PATTERN SEQ(Stock+ a, Stock b)WHERE [symbol] AND a.volume > 1000 AND a[i].price > avg(a[…i-1].price)) AND b.volume < 80% * a[a.LEN].volume WITHIN 1 hourRETURN a.symbol, a.(price,volume), b.(price,volume)
(note from Ajay-
I was not really happy about the depth of resources on CEP available online- there seem to be missing bits and pieces in both open source, academic and corporate information- one reason for this is the obvious military dual use of this technology- like feeds from Satellite, Audio Scans, etc)
You cant DELETE a Facebook Account- it gets deactivated NOT DELETED.
You have to delete photo albums one by one, but if you have a folder like profile photos or wall photos or mobile uploads (you cant delete these folders you have to delete those photos one by one)
So I had to delete 1100 friends, delete all Facebook Pages I created, and then download the account- (photos) which were now a more easy to download zip file of 92 mb. And I deleted all the 250+ Likes I had given to things I had flippantly liked- it was horrifying because if you accumulate all that info- it basically gives you a big lead in estimating my psychological profile- and thats not stuff I want to be used for selling.
Then I deactivated it- no like Lord Voldermort’s horcruxes you cant delete it all.
and Facebook shows you ads even if you clean your profile and your friends and can longer see any preference for any product.
Facebook treats data like prisoners – even if you are released they WILL maintain your record.
20 years later they would be able to blackmail all the people of all countries in the WORLD- by that much info.
And Linkedin is still getting deleted- I got this email from them-
basically if you have an active group for whom you are the only owner you cant delete yourself- you have to delete the group or find another owner.
If it took me 2 days to download all my info, and wipe my social media for just 3 yrs of using it (albiet at an expert enough level to act as a social media consultant to some companies)- I am not sure what today’s generation of young people who jump to twitter and Facebook at early ages would face after say 5-10 years of data is collected on them. Lots of Ads I guess!
The Youtube Promoted Videos (basically a video form of Adsense) can really help companies like Oracle, SAP, IBM, Netezza, SAS Insititute, AsterData, Rapid Miner, Pentaho, JasperSoft, Teradata, Revolution who create
either corporate videos/training videos or upload their seminar, webinar,conference videos to Youtube.
Making a video is hard work in itself- doing an A/ B test with Youtube Promoted videos might just get a better ROI for your video marketing budget and IMHO embeddable videos from Youtube are much better and easier to share than Videos that can be seen only after registration on a company web site. You want to get the word out for your software, or you want to get website views?
Here is a brief one question interview with James Kobielus, Senior Analyst, Forrester.
Ajay-Describe the five most important events in Predictive Analytics you saw in 2010 and the top three trends in 2011 as per you.
Five most important developments in 2010:
Continued emergence of enterprise-grade Hadoop solutions as the core of the future cloud-based platforms for advanced analytics
Development of the market for analytic solution appliances that incorporate several key features for advanced analytics: massively parallel EDW appliance, in-database analytics and data management function processing, embedded statistical libraries, prebuilt logical domain models, and integrated modeling and mining tools
Integration of advanced analytics into core BI platforms with user-friendly, visual, wizard-driven, tools for quick, exploratory predictive modeling, forecasting, and what-if analysis by nontechnical business users
Convergence of predictive analytics, data mining, content analytics, and CEP in integrated tools geared to real-time social media analytics
Emergence of CRM and other line-of-business applications that support continuously optimized “next-best action” business processes through embedding of predictive models, orchestration engines, business rules engines, and CEP agility
Three top trends I see in the coming year, above and beyond deepening and adoption of the above-bulleted developments:
All-in-memory, massively parallel analytic architectures will begin to gain a foothold in complex EDW environments in support of real-time elastic analytics
Further crystallization of a market for general-purpose “recommendation engines” that, operating inline to EDWs, CEP environments, and BPM platforms, enable “next-best action” approaches to emerge from today’s application siloes
Incorporation of social network analysis functionality into a wider range of front-office business processes to enable fine-tuned behavioral-based customer segmentation to drive CRM optimization
James serves Business Process & Applications professionals. He is a leading expert on data warehousing, predictive analytics, data mining, and complex event processing. In addition to his core coverage areas, James contributes to Forrester’s research in business intelligence, data integration, data quality, and master data management.
PREVIOUS WORK EXPERIENCE
James has a long history in IT research and consulting and has worked for both vendors and research firms. Most recently, he was at Current Analysis, an IT research firm, where he was a principal analyst covering topics ranging from data warehousing to data integration and the Semantic Web. Prior to that position, James was a senior technical systems analyst at Exostar (a hosted supply chain management and eBusiness hub for the aerospace and defense industry). In this capacity, James was responsible for identifying and specifying product/service requirements for federated identity, PKI, and other products. He also worked as an analyst for the Burton Group and was previously employed by LCC International, DynCorp, ADEENA, International Center for Information Technologies, and the North American Telecommunications Association. He is both well versed and experienced in product and market assessments. James is a widely published business/technology author and has spoken at many industry events