The Gospel as per WikiLeaks

Logo used by Wikileaks
Image via Wikipedia

– First Assume Nothing-

I would be very surprised if 260,000 documents and not even one was a counter-intelligence dis information move. Why was ALL the information stored in one place- maybe Wikileaks would leak the launch codes of the missiles next.

One more data visualization for Tableau– R watchers can not how jjplot by Facebook Analytics and Tableau are replacing GGPLOT 2 as visualization standards- (GGPLOT 2 needs a better GUI maybe using pyqt than the Deducer currently- maybe they can create GGPLOT extensions for Red R yet)

and yes stranger stupid things have happened in diplomacy and intelligence (like India exploding the nuclear bomb on exactly the same date and same place —-surprising CIA, but we are supposed to be on the same side atleast for the next decade) but it would be wrong not to cross reference the cables with the some verification.

Tableau gives great data viz though, but I dont think all 260,000 cables are valid data points (and boy they must really be regretting creating the internet at DARPA and DoD- but you can always blame Al Gore for that)

Jobs in Analytics

Here are some jobs from Vincent Granville, founder Analyticbridge. Please contact him directly- I just thought the Season of Joy should have better jobs than currently.

————————————————————————————–

Several job ads recently posted on DataShaping / AnalyticBridge, across United Sates and in Europe. Use the DataShaping search box to find more opportunities.

Job ads are posted at:

 

Selected opportunities:

Quantitative Modeling Consultants – Agilex (Alexandria, VA)
Sr. Software Development Engineers – Agilex (Alexandria, VA)
Actuary – FBL Financial Group (Des Moines, IA)
Relevance scientist – Yandex Labs (Palo Alto, CA)
Research Engineer, Search Ranking – Chomp (San Francisco, CA)
Mathematical Modeling and Optimization – Exxon (Clinton, NJ)
Data Analyst – DISH Network (Englewood, CO)
Sr Aviation Planning Research & Data Analyst – Port of Seattle (Seattle, WA)
Statistician / Quantitative Analyst – Indeed (Austin, TX)
Statistician – Pratt & Whitney (East Hartford, CT)
Biostatistician – The J. David Gladstone Institutes (San Francisco, CA)
Customer Service Representative (oklahoma, OK)
Program Associate – Cambridge Systematics (Washington D.C., DC)
Sr Risk Analyst – Paypal (Omaha, NE)
Sr. Actuarial Analyst – Farmers (Simi Valley, CA)
Senior Statistician, Data Services – Equifax (Alpharetta, GA)
Business Intelligence Analyst – Burbery (NYC, NY)
Fact Extraction – Amazon (Seattle, WA)
Senior Researcher – Bing (Bellevue, WA)
Senior Statistical Research Analyst – Walt Disney (Lake Buena Vista, FL)
Statistician – Capital One (Nottingham, NH)
Lead Data Analyst – Barclays (Northampton, UK)
Analytical Data Scientist – Aviagen (Huntsville, AL or Edinburgh, UK)
VP of Engineering for Analytics (Bay Area, CA)
Senior Software Engineer – Numenta (Redwood City, CA)
Numenta Internship Program – Numenta (Redwood City, CA)
Director of Analytics – Mozilla Corporation (Mountain View, CA)
Senior Sales Engineer – Statsoft (NY, NY)

Short Interview Jill Dyche

Here is brief one question interview with Jill Dyche , founder Baseline Consulting.

 

In 2010.

 

  • It was more about consciousness-raising in the executive suite—
  • getting C-level managers to understand the ongoing value proposition of BI,
  • why MDM isn’t their father’s database, and
  • how data governance can pay for itself over time.
  • Some companies succeeded with these consciousness-raising efforts. Some didn’t.

 

But three big ones in 2011 would be:

  1. Predictive analytics in the cloud. The technology is now ready, and so is the market—and that includes SMB companies.
  2. Enterprise search being baked into (commoditized) BI software tools. (The proliferation of static reports is SO 2006!)
  3. Data governance will begin paying dividends. Until now it was all about common policies for data. In 2011, it will be about ROI.

I do a “Predictions for the coming year” article every January for TDWI,

Note- Jill ‘s January TDWI article seems worth waiting for in this case.

About-

Source-http://www.baseline-consulting.com/pages/page.asp?page_id=49125

Partner and Co-Founder

Jill Dyché is a partner and co-founder of Baseline Consulting.  She is responsible for key client strategies and market analysis in the areas of data governance, business intelligence, master data management, and customer relationship management. 

Jill counsels boards of directors on the strategic importance of their information investments.

Author

Jill is the author of three books on the business value of IT. Jill’s first book, e-Data (Addison Wesley, 2000) has been published in eight languages. She is a contributor to Impossible Data Warehouse Situations: Solutions from the Experts (Addison Wesley, 2002), and her book, The CRM Handbook (Addison Wesley, 2002), is the bestseller on the topic. 

Jill’s work has been featured in major publications such as Computerworld, Information Week, CIO Magazine, the Wall Street Journal, the Chicago Tribune and Newsweek.com. Jill’s latest book, Customer Data Integration (John Wiley and Sons, 2006) was co-authored with Baseline partner Evan Levy, and shows the business breakthroughs achieved with integrated customer data.

Industry Expert

Jill is a featured speaker at industry conferences, university programs, and vendor events. She serves as a judge for several IT best practice awards. She is a member of the Society of Information Managementand Women in Technology, a faculty member of TDWI, and serves as a co-chair for the MDM Insight conference. Jill is a columnist for DM Review, and a blogger for BeyeNETWORK and Baseline Consulting.

 

Complex Event Processing- SASE Language

Logo of the anti-RFID campaign by German priva...
Image via Wikipedia

Complex Event Processing (CEP- not to be confused by Circular Probability Error) is defined processing many events happening across all the layers of an organization, identifying the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time.

Software supporting CEP are-

Oracle http://www.oracle.com/us/technologies/soa/service-oriented-architecture-066455.html

Oracle CEP is a Java application server for the development and deployment of high-performance event driven applications. It can detect patterns in the flow of events and message payloads, often based on filtering, correlation, and aggregation across event sources, and includes industry leading temporal and ordering capabilities. It supports ultra-high throughput (1 million/sec++) and microsecond latency.

Tibco is also trying to get into this market (it claims to have a 40 % market share in the public CEP market 😉 though probably they have not measured the DoE and DoD as worthy of market share yet

– see webcast by TIBCO ‘s head here http://www.tibco.com/products/business-optimization/complex-event-processing/default.jsp

and product info here-http://www.tibco.com/products/business-optimization/complex-event-processing/businessevents/default.jsp

TIBCO is the undisputed leader in complex event processing (CEP) software with over 40 percent market share, according to a recent IDC Study.

A good explanation of how social media itself can be used as an analogy for CEP is given in this SAS Global Paper

http://support.sas.com/resources/papers/proceedings10/040-2010.pdf

You can see a report on Predictive Analytics and Data Mining  in q1 2010 also from SAS’s website  at –http://www.sas.com/news/analysts/forresterwave-predictive-analytics-dm-104388-0210.pdf

A very good explanation on architecture involved is given by SAS CTO Keith Collins here on SAS’s Knowledge Exchange site,

http://www.sas.com/knowledge-exchange/risk/four-ways-divide-conquer.html

What it is: Methods 1 through 3 look at historical data and traditional architectures with information stored in the warehouse. In this environment, it often takes months of data cleansing and preparation to get the data ready to analyze. Now, what if you want to make a decision or determine the effect of an action in real time, as a sale is made, for instance, or at a specific step in the manufacturing process. With streaming data architectures, you can look at data in the present and make immediate decisions. The larger flood of data coming from smart phones, online transactions and smart-grid houses will continue to increase the amount of data that you might want to analyze but not keep. Real-time streaming, complex event processing (CEP) and analytics will all come together here to let you decide on the fly which data is worth keeping and which data to analyze in real time and then discard.

When you use it: Radio-frequency identification (RFID) offers a good user case for this type of architecture. RFID tags provide a lot of information, but unless the state of the item changes, you don’t need to keep warehousing the data about that object every day. You only keep data when it moves through the door and out of the warehouse.

The same concept applies to a customer who does the same thing over and over. You don’t need to keep storing data for analysis on a regular pattern, but if they change that pattern, you might want to start paying attention.

Figure  4: Traditional architecture vs. streaming architecture

Figure 4: Traditional architecture vs. streaming architecture

 

In academia  here is something called SASE Language

  • A rich declarative event language
  • Formal semantics of the event language
  • Theorectical underpinnings of CEP
  • An efficient automata-based implementation

http://sase.cs.umass.edu/

and

http://avid.cs.umass.edu/sase/index.php?page=navleft_1col

Financial Services

The query below retrieves the total trading volume of Google stocks in the 4 hour period after some bad news occurred.

PATTERN SEQ(News a, Stock+ b[ ])WHERE   [symbol]    AND	a.type = 'bad'    AND	b[i].symbol = 'GOOG' WITHIN  4 hoursHAVING  b[b.LEN].volume < 80%*b[1].volumeRETURN  sum(b[ ].volume)

The next query reports a one-hour period in which the price of a stock increased from 10 to 20 and its trading volume stayed relatively stable.

PATTERN	SEQ(Stock+ a[])WHERE 	 [symbol]   AND	  a[1].price = 10   AND	  a[i].price > a[i-1].price   AND	  a[a.LEN].price = 20            WITHIN  1 hourHAVING	avg(a[].volume) ≥ a[1].volumeRETURN	a[1].symbol, a[].price

The third query detects a more complex trend: in an hour, the volume of a stock started high, but after a period of price increasing or staying relatively stable, the volume plummeted.

PATTERN SEQ(Stock+ a[], Stock b)WHERE 	 [symbol]   AND	  a[1].volume > 1000   AND	  a[i].price > avg(a[…i-1].price))   AND	  b.volume < 80% * a[a.LEN].volume           WITHIN  1 hourRETURN	a[1].symbol, a[].(price,volume), b.(price,volume)

(note from Ajay-

 

I was not really happy about the depth of resources on CEP available online- there seem to be missing bits and pieces in both open source, academic and corporate information- one reason for this is the obvious military dual use of this technology- like feeds from Satellite, Audio Scans, etc)

Why social media is an one way street- cant close accounts

Update to https://decisionstats.com/2010/11/24/deleting-twitter-facebooklinkedin-accepting-life/

You cant DELETE a Facebook Account- it gets deactivated NOT DELETED.

You have to delete photo albums one by one, but if you have a folder like profile photos or wall photos or mobile uploads  (you cant delete these folders you have to delete those photos one by one)

So I had to delete 1100 friends, delete all Facebook Pages I created, and then download the account- (photos) which were now a more easy to download zip file of 92 mb. And I deleted all the 250+ Likes I had given to things I had flippantly liked- it was horrifying because if you accumulate all that info- it basically gives you a big lead in estimating my psychological profile- and thats not stuff I want to be used for selling.

Then I deactivated it- no like Lord Voldermort’s horcruxes you cant delete it all.

and Facebook shows you ads even if you clean your profile and your friends and can longer see any preference for any product.

Facebook treats data like prisoners – even if you are released they WILL maintain your record.

20 years later they would be able to blackmail all the people  of all countries in the WORLD- by that much info.

And Linkedin is still getting deleted- I got this email from them-

basically if you have an active group for whom you are the only owner you cant delete yourself- you have to delete the group or find another owner.

Sigh!

If it took me 2 days to download all my info, and wipe my social media for just 3 yrs of using it (albiet at an expert enough level to act as a social media consultant to some companies)- I am not sure what today’s generation of young people who jump to twitter and Facebook at early ages would face after say 5-10 years of data is collected on them. Lots of Ads I guess!

Increasing views to Youtube Videos

YouTube
Image via Wikipedia

The Youtube Promoted Videos (basically a video form of Adsense) can really help companies like Oracle, SAP, IBM, Netezza, SAS Insititute, AsterData, Rapid Miner, Pentaho,  JasperSoft, Teradata, Revolution who create

either corporate videos/training videos or upload their seminar, webinar,conference videos to Youtube.

Making a video is hard work in itself- doing an A/ B test with Youtube Promoted videos might just get a better ROI for your video marketing budget and IMHO embeddable videos from Youtube are much better and easier to share than Videos that can be seen only after registration on a company web site. You want to get the word out for your software, or you want to get website views?

Brief Interview with James G Kobielus

Here is a brief one question interview with James Kobielus, Senior Analyst, Forrester.

Ajay-Describe the five most important events in Predictive Analytics you saw in 2010 and the top three trends in 2011 as per you.

Jim-

Five most important developments in 2010:

  • Continued emergence of enterprise-grade Hadoop solutions as the core of the future cloud-based platforms for advanced analytics
  • Development of the market for analytic solution appliances that incorporate several key features for advanced analytics: massively parallel EDW appliance, in-database analytics and data management function processing, embedded statistical libraries, prebuilt logical domain models, and integrated modeling and mining tools
  • Integration of advanced analytics into core BI platforms with user-friendly, visual, wizard-driven, tools for quick, exploratory predictive modeling, forecasting, and what-if analysis by nontechnical business users
  • Convergence of predictive analytics, data mining, content analytics, and CEP in integrated tools geared  to real-time social media analytics
  • Emergence of CRM and other line-of-business applications that support continuously optimized “next-best action” business processes through embedding of predictive models, orchestration engines, business rules engines, and CEP agility

Three top trends I see in the coming year, above and beyond deepening and adoption of the above-bulleted developments:

  • All-in-memory, massively parallel analytic architectures will begin to gain a foothold in complex EDW environments in support of real-time elastic analytics
  • Further crystallization of a market for general-purpose “recommendation engines” that, operating inline to EDWs, CEP environments, and BPM platforms, enable “next-best action” approaches to emerge from today’s application siloes
  • Incorporation of social network analysis functionality into a wider range of front-office business processes to enable fine-tuned behavioral-based customer segmentation to drive CRM optimization

About –http://www.forrester.com/rb/analyst/james_kobielus

James G. Kobielus
Senior Analyst, Forrester Research

RESEARCH FOCUS

James serves Business Process & Applications professionals. He is a leading expert on data warehousing, predictive analytics, data mining, and complex event processing. In addition to his core coverage areas, James contributes to Forrester’s research in business intelligence, data integration, data quality, and master data management.

PREVIOUS WORK EXPERIENCE

James has a long history in IT research and consulting and has worked for both vendors and research firms. Most recently, he was at Current Analysis, an IT research firm, where he was a principal analyst covering topics ranging from data warehousing to data integration and the Semantic Web. Prior to that position, James was a senior technical systems analyst at Exostar (a hosted supply chain management and eBusiness hub for the aerospace and defense industry). In this capacity, James was responsible for identifying and specifying product/service requirements for federated identity, PKI, and other products. He also worked as an analyst for the Burton Group and was previously employed by LCC International, DynCorp, ADEENA, International Center for Information Technologies, and the North American Telecommunications Association. He is both well versed and experienced in product and market assessments. James is a widely published business/technology author and has spoken at many industry events