Interview- Top Data Mining Blogger on Earth , Sandro Saitta

Surajustement Modèle 2
Image via Wikipedia

If you do a Google search for Data Mining Blog- for the past several years one Blog will come on top. data mining blog – Google Search http://bit.ly/kEdPlE

To honor 5 years of Sandro Saitta’s blog (yes thats 5 years!) , we cover an exclusive interview with him where he reveals his unique sauce for cool techie blogging.

Ajay- Describe your journey as a scientist and data miner, from early experiences, to schooling to your work/research/blogging.

Sandro- My first experience with data mining was my master project. I used decision tree to predict pollen concentration for the following week using input data such as wind, temperature and rain. The fact that an algorithm can make a computer learn from experience was really amazing to me. I found it so interesting that I started a PhD in data mining. This time, the field of application was civil engineering. Civil engineers put a lot of sensors on their structure in order to understand how they behave. With all these sensors they generate a lot of data. To interpret these data, I used data mining techniques such as feature selection and clustering. I started my blog, Data Mining Research, during my PhD, to share with other researchers.

I then started applying data mining in the stock market as my first job in industry. I realized the difference between image recognition, where 99% correct classification rate is state of the art, and stock market, where you’re happy with 55%. However, the company ambiance was not as good as I thought, so I moved to consulting. There, I applied data mining in behavioral targeting to increase click-through rates. When you compare the number of customers who click with the ones who don’t, then you really understand what class imbalance mean. A few months ago, I accepted a very good opportunity at SICPA. I’m looking forward to resolving new challenges there.

Ajay- Your blog is the top ranked blog for “data mining blog”. Could you share some tips on better blogging for analytics and technical people

Sandro- It’s always difficult to start a blog, since at the beginning you have no reader. Writing for nobody may seem stupid, but it is not. By writing my first posts during my PhD I was reorganizing my ideas. I was expressing concepts which were not always clear to me. I thus learned a lot and also improved my English level. Of course, it’s still not perfect, but I hope most people can understand me.

Next come the readers. A few dozen each week first. To increase this number, I then started to learn SEO (Search Engine Optimization) by reading books and blogs. I tested many techniques that increased Data Mining Research visibility in the blogosphere. I think SEO is interesting when you already have some content published (which means not at the very beginning of your blog). After a while, once your blog is nicely ranked, the main task is to work on the content of the blog. To be of interest, your content must be particular: original, informative or provocative for example. I also had the chance to have a good visibility thanks to well-known people in the field like Kevin Hillstrom, Gregory Piatetsky-Shapiro, Will Dwinnell / Dean Abbott, Vincent Granville, Matthew Hurst and many others.

Ajay- Whats your favorite statistical software and what are the various softwares that you have worked with.
Could you compare and contrast these software as well.

Sandro- My favorite software at this point is SAS. I worked with it for two years. Once you know the language, you can perform ETL and data mining so easily. It’s also very fast compared to others. There are a lot of tools for data mining, but I cannot think of a tool that is as powerful as SAS and, in the same time, has a high-level programming language behind it.

I also worked with R and Matlab. R is very nice since you have all the up-to-date data mining algorithms implemented. However, working in the memory is not always a good choice, especially for ETL. Matlab is an excellent tool for prototyping. It’s not so fast and certainly not done for ETL, but the price is low regarding all the possibilities for data mining. According to me, SAS is the best choice for ETL and a good choice for data mining. Of course, there is the price.

Ajay- What are your favorite techniques and training resources for learning basics of data mining to say statisticians or business management graduates.

Sandro- I’m the kind of guy who likes to read books. I read data mining books one after the other. The fact that the same concepts are explained differently (and by different people) helps a lot in learning a topic like data mining. Of course, nothing replaces experience in the field. You can read hundreds of books, you will still not be a good practitioner until you really apply data mining in specific fields. My second choice after books is blogs. By reading data mining blogs, you will really see the issues and challenges in the field. It’s still not experience, but we are closer. Finally, web resources and networks such as KDnuggets of course, but also AnalyticBridge and LinkedIn.

Ajay- Describe your hobbies and how they help you ,if at all in your professional life.

Sandro- One of my hobbies is reading. I read a lot of books about data mining, SEO, Google as well as Sci-Fi and Fantasy. I’m a big fan of Asimov by the way. My other hobby is playing tennis. I think I simply use my hobbies as a way to find equilibrium in my life. I always try to find the best balance between work, family, friends and sport.

Ajay- What are your plans for your website for 2011-2012.

Sandro- I will continue to publish guest posts and interviews. I think it is important to let other people express themselves about data mining topics. I will not write about my current applications due to the policies of my current employer. But don’t worry, I still have a lot to write, whether it is technical or not. I will also emphasis more on my experience with data mining, advices for data miners, tips and tricks, and of course book reviews!

Standard Disclosure of Blogging- Sandro awarded me the Peoples Choice award for his blog for 2010 and carried out my interview. There is a lot of love between our respective wordpress blogs, but to reassure our puritan American readers- it is platonic and intellectual.

About Sandro S-



Sandro Saitta is a Data Mining Research Engineer at SICPA Security Solutions. He is also a blogger at Data Mining Research (www.dataminingblog.com). His interests include data mining, machine learning, search engine optimization and website marketing.

You can contact Mr Saitta at his Twitter address- 

https://twitter.com/#!/dataminingblog

Who writes white papers?

A social network diagram
Image via Wikipedia

There are four main types of commercial white papers:

  • Business benefits: Makes a business case for a certain technology or methodology.
  • Technical: Describes how a certain technology works.
  • Hybrid: Combines business benefits with technical details in a single document.
  • Policy: Makes a case for a certain political solution to a societal or economic challenge.
Name the best white paper you ever read? (comment that in the field)..
What categoy of white papers is the best?
Do you think white papers are too expensive or they give adequate ROI?
To be continued- including

  1. demographic and social network analysis of analysts and white paper sponsors to measure interaction effects.
  2. white papers segmented by type of software company
  3. proc freq analysis of the words frequency data viz in white papers written by same analysts for different companies on same topics.
  4. Race and ethnic analysis of influencers and analysts in Business Analysts and Business Intelligence. – Null hypothesis – it is not a white mans world, women, Hispanics and other minorities are adequately represented.
Why I am doing this?
I am writing a white paper on WHO writes a white paper? 
Sponsorships are invited- but academics and startups in analytics may be preferred.

What is a White Paper?

Christine and Jimmy Wales
Image via Wikipedia

As per Jimmy Wales and his merry band at Wiki (pedia not leaky-ah)- The emphasis is mine

What is the best white paper you have read in the past 15 years.

Categories are-

  • Business benefits: Makes a business case for a certain technology or methodology.
  • Technical: Describes how a certain technology works.
  • Hybrid: Combines business benefits with technical details in a single document.
  • Policy: Makes a case for a certain political solution to a societal or economic challenge.
——————————————————————————————————————————————————



white paper is an authoritative report or guide that helps solve a problem. White papers are used to educate readers and help people make decisions, and are often requested and used in politics, policy, business, and technical fields. In commercial use, the term has also come to refer to documents used by businesses as a marketing or sales tool. Policy makers frequently request white papers from universities or academic personnel to inform policy developments with expert opinions or relevant research.

Government white papers

In the Commonwealth of Nations, “white paper” is an informal name for a parliamentary paper enunciating government policy; in the United Kingdom these are mostly issued as “Command papers“. White papers are issued by the government and lay out policy, or proposed action, on a topic of current concern. Although a white paper may on occasion be a consultation as to the details of new legislation, it does signify a clear intention on the part of a government to pass new law. White Papers are a “…. tool of participatory democracy … not [an] unalterable policy commitment.[1] “White Papers have tried to perform the dual role of presenting firm government policies while at the same time inviting opinions upon them.” [2]

In Canada, a white paper “is considered to be a policy document, approved by Cabinet, tabled in the House of Commons and made available to the general public.”[3] A Canadian author notes that the “provision of policy information through the use of white and green papers can help to create an awareness of policy issues among parliamentarians and the public and to encourage an exchange of information and analysis. They can also serve as educational techniques”.[4]

“White Papers are used as a means of presenting government policy preferences prior to the introduction of legislation”; as such, the “publication of a White Paper serves to test the climate of public opinion regarding a controversial policy issue and enables the government to gauge its probable impact”.[5]

By contrast, green papers, which are issued much more frequently, are more open ended. These green papers, also known as consultation documents, may merely propose a strategy to be implemented in the details of other legislation or they may set out proposals on which the government wishes to obtain public views and opinion.

White papers published by the European Commission are documents containing proposals for European Union action in a specific area. They sometimes follow a green paper released to launch a public consultation process.

For examples see the following:

 Commercial white papers

Since the early 1990s, the term white paper has also come to refer to documents used by businesses and so-called think tanks as marketing or sales tools. White papers of this sort argue that the benefits of a particular technologyproduct or policy are superior for solving a specific problem.

These types of white papers are almost always marketing communications documents designed to promote a specific company’s or group’s solutions or products. As a marketing tool, these papers will highlight information favorable to the company authorizing or sponsoring the paper. Such white papers are often used to generate sales leads, establish thought leadership, make a business case, or to educate customers or voters.

There are four main types of commercial white papers:

  • Business benefits: Makes a business case for a certain technology or methodology.
  • Technical: Describes how a certain technology works.
  • Hybrid: Combines business benefits with technical details in a single document.
  • Policy: Makes a case for a certain political solution to a societal or economic challenge.

Resources

  • Stelzner, Michael (2007). Writing White Papers: How to capture readers and keep them engaged. Poway, California: WhitePaperSource Publishing. pp. 214. ISBN 9780977716937.
  • Bly, Robert W. (2006). The White Paper Marketing Handbook. Florence, Kentucky: South-Western Educational Publishing. pp. 256. ISBN 9780324300826.
  • Kantor, Jonathan (2009). Crafting White Paper 2.0: Designing Information for Today’s Time and Attention Challenged Business Reader. Denver,Colorado: Lulu Publishing. pp. 167.ISBN 9780557163243.

Tom Davenport to Keynote at PAW New York

Unidentified building, Babson College - IMG 0443
Image via Wikipedia

message from Predictive Analytics World. If you are NY based you may want to drop in and listen.———————————————————————————-Tom Davenport to Keynote at
Predictive Analytics World New York

Take advantage of Super Early Bird Pricing by May 20th and recognize savings of $400. Additional savings when you bring the team*

Announcing Tom Davenport Keynote:
Thomas Davenport Every Day Analytics:
Making Leading Edge Commonplace
Thomas Davenport
President’s Distinguished Prof, Babson College
Author, Competing on Analytics & Analytics at Work

Join your peers October 17-21, 2011 at the Hilton New York for Predictive Analytics World, the business event for predictive analytics professionals, managers and commercial practitioners, covering today’s commercial deployment of predictive analytics, across industries and across software vendors.

PAW NYC
 promises to once again break records as the biggest cross-vendor predictive analytics event ever. The conference program is packed with the top predictive analytics experts, practitioners, authors and business thought leaders, including keynote addresses from Thomas Davenport, author of Competing on Analytics: The New Science of Winning, and PAW Program Chair Eric Siegel, plus special sessions from industry heavy-weights Usama Fayyad and John Elder.

RAVE REVIEWS:I came to PAW because it provides case studies relevant to my industry. It has lived up to the expectation and I think it’s the best analytics conference I’ve ever attended!

Shaohua Zhang, Senior Data Mining Analyst
Rogers Telecommunications

Hands down, best applied analytics conference I have ever attended. Great exposure to cutting-edge predictive techniques and I was able to turn around and apply some of those learnings to my work immediately. I’ve never been able to say that after any conference I’ve attended before!

Jon Francis, Senior Statistician
T-Mobile

PAW NYC’s agenda covers black box trading, churn modeling, crowdsourcing, demand forecasting, ensemble models, fraud detection, healthcare, insurance applications, law enforcement, litigation, market mix modeling, mobile analytics, online marketing, risk management, social data, supply chain management, targeting direct marketing, uplift modeling (net lift), and other innovative applications that benefit organizations in new and creative ways.


Take advantage of Super Early Bird Pricing and realize
$400 in savings before May 20, 2011.

Note:  Each additional attendee from the same company registered at the same time receives an extra $200 off the Conference Pass.

Register Now!


eMetrics New York

Chromebooks for enterprise BI

From-

http://googleblog.blogspot.com/

Chromebooks will be available online June 15 in the U.S., U.K., France, Germany, Netherlands, Italy and Spain. More countries will follow in the coming months. In the U.S., Chromebooks will be available from Amazon and Best Buy and internationally from leading retailers.

Even with dedicated IT departments, businesses and schools struggle with the same complex, costly and insecure computers as the rest of us. To address this, we’re also announcing Chromebooks for Business and Education.

and

http://www.google.com/chromebook/business-education.html#

Chromebooks: work better.

Crashes, long boot times, application conflicts, endless updates, viruses, security issues and obsolete hardware all frustrate IT managers and end users – and most users don’t need or want the complexity and annoyance of their current PCs.

Increasingly the browser is the only tool users need, making a new and better computing model possible. Chromebooks can instantly run your browser-based apps, whether in the cloud or behind your firewall, and apps virtualized through technologies like Citrix®. And an entire fleet of Chromebooks can be managed from one web-based console – making life better for users and IT admins alike.

Contact Sales

Heritage offers 3 million chump change for Monkeys

My perspective is life is not fair, and if someone offers me 1 mill a year so they make 1 bill a year, I would still take it, especially if it leads to better human beings and better humanity on this planet. Health care isnt toothpaste.

Unless there are even more fine print changes involved- there exist several players in the pharma sector who do build and deploy models internally for denying claims or prospecting medical doctors with freebies, but they might just get caught with the new open data movement

————————————————————————————————–

A note from KDNuggets-

Heritage Health Prizereleased a second set of data on May 4. They also recently modified their ruleswhich now demand complete exclusivity and seem to disallow use of other tools (emphasis mine – Gregory PS)

21. LICENSE
By registering for the Competition, each Entrant (a) grants to Sponsor and its designees a worldwide, exclusive (except with respect to Entrant) , sub-licensable (through multiple tiers), transferable, fully paid-up, royalty-free, perpetual, irrevocable right to use, not use, reproduce, distribute (through multiple tiers), create derivative works of, publicly perform, publicly display, digitally perform, make, have made, sell, offer for sale and import the entry and the algorithm used to produce the entry, as well as any other algorithm, data or other information whatsoever developed or produced at any time using the data provided to Entrant in this Competition (collectively, the “Licensed Materials”), in any media now known or hereafter developed, for any purpose whatsoever, commercial or otherwise, without further approval by or payment to Entrant (the “License”) and
(b) represents that he/she/it has the unrestricted right to grant the License. 
Entrant understands and agrees that the License is exclusive except with respect to Entrant: Entrant may use the Licensed Materials solely for his/her/its own patient management and other internal business purposes but may not grant or otherwise transfer to any third party any rights to or interests in the Licensed Materials whatsoever.

This has lead to a call to boycott the competition by Tristan, who also notes that academics cannot publish their results without prior written approval of the Sponsor.

Anthony Goldbloom, CEO of Kaggle, emailed the HHP participants on May 4

HPN have asked me to pass on the following message: “The Heritage Provider Network is sponsoring the Heritage Health Prize to spur innovation and creative thinking in healthcare. HPN, however, is a medical group and must retain an exclusive license to the algorithms created using its data so as to ensure that the algorithms are used responsibly, and are only used to provide better health care to patients and not for improper purposes.
Put simply, while the competition hopes to spur innovation, this is not a competition regarding movie ratings or chess results. We hope that the clarifications we have made to the Rules and the FAQ adequately address your concerns and look forward to your participation in the competition.”

What do you think? Will the exclusive license prevent you from participating?

Forecasting World Events Team

a large and diverse panel of forecasters, including substantial representation from government, academia, “think tanks,” and industry. Here are a few other details concerning your fellow participants:
  • At this time, over 600 people are being invited to participate. Please note that we expect that new participants will be joining the panel on a rolling basis for years to come.
  • Around 85% of these 600+ participants have at least a Bachelor’s degree, and over 60% of them have advanced degrees.
  • In terms of background training, participants represent a range of academic fields. Around 40% report a Social-Behavioral Science background, but there is also significant representation from those with backgrounds in Business (15%), the Humanities (13%), Engineering (12%), and the Natural Sciences (10%), among others.
  • The average participant age is 43 years-old, with a standard deviation of 15 years.
  • The panel’s gender composition is 75% men / 25% women, and this closely mirrors the gender ratio for all FWE registrants.
  • In addition to participation from individuals overseas, we are pleased to have eligible participants representing 44 of the 50 United States.
We are currently scheduled to begin the core forecasting study in late summer, a few months later than we initially anticipated. In the meantime, we will be readying our web-based forecasting environment and assembling our initial set of forecasting questions. As our formal launch date approaches, we will be contacting you with a link to the forecasting website and any other information you’ll need to get started. Between now and then we may reach out to you with other related announcements.
Finally, registration remains open, and we encourage you to “spread the word” by sharing our registration homepage link with your friends and colleagues.
Thanks once again for your interest in Forecasting World Events. We look forward to you joining us this summer.
Sincerely,
The Forecasting World Events Team
E-mail is not a secure form of communication.

The confidentiality of this message cannot be guaranteed.

ps- above message was from this new contest. Enter at your initiative. Buyer Beware!.