The Big Data Summit Agenda

Here is the agenda for the Big Data Summit in NY, which I am planning to attend. Registration is free so feel …er free to drop in.

the agenda for the evening. the event will begin promptly at 6 p.m

  • Welcome Reception: Snacks and drinks
  • A Plan for Large Scale Data Analytics: How to Utilize Aster nCluster and Hadoop in a Symbiotic Relationship to Support Processing in Excess of 100 Billion Rows Per Month
    – Michael Brown,
    EVP, Software Engineering, and Will Duckworth, Director, Software Engineering, comScore, Inc.
  • Making Sense of Hadoop – Its Fit With Data Warehouses
    – Colin White, President and Founder of BI Research
  • MapReduce Inside a Database System – When and How
    Case Studies from ShareThis, Specific Media, and Others
    – Tasso Argyros, Chief Technology Officer and Co-Founder of Aster Data
  • Large-Scale Analytics at LinkedIn
    – Jonathan Goldman, Former Principal Scientist at LinkedIn
  • Networking Mixer: Beer, wine, hot hors d’oeuvres

Location:
The Roosevelt Hotel
45 East 45th Street
New York, NY 10017

For more information click here: Big Data Summit

For Additional Info contact

Ryan
ryan.garrett@asterdata.com
(cell) 415-609-4745


Interview Thomas C. Redman Author Data Driven

Here is an interview with Tom Redman, author of Data Driven. Among the first to recognize the need for high-quality data in the information age, Dr. Redman established the AT&T Bell Laboratories Data Quality Lab in 1987 and led it until 1995. He is the author of four books, two patents and leads his own consulting group. In many respects the “Data Doc’ as his nickname is- is also the father of Data Quality Evangelism.

tom redman

Ajay- Describe your career as a science student to an author of science and strategy books.
Redman: I took the usual biology, chemistry, and physics classes in college.  And I worked closely with oceanographers in graduate school.  More importantly, I learned directly from two masters.  First, was Dr. Basu, who was at Florida State when I was.  He thought more deeply and clearly about the nature of data and what we can learn from them than anyone I’ve met since.  And second is people in the Bell Labs’ community who were passionate about making communications better. What I learned there was you don’t always need “scientific proof” to mover forward.


Ajay- What kind of bailout do you think the Government can give to the importance of science education in this country.

Redman: I don’t think the government should bail science education per se. Science departments should compete for students just like the English and anthropology departments do.  At the same time, I do think the government should support some audacious goals, such as slowing global warming or energy independence.  These could well have the effect of increasing demand for scientists and science education.

Ajay- Describe your motivations for writing your book Data Driven-Profiting from your most important business asset.

Redman: Frankly I was frustrated.  I’ve spent the last twenty years on data quality and organizations that improve gain enormous benefit.  But so few do.  I set out to figure out why that was and what to do about it.

Ajay- What can various segments of readers learn from this book-
a college student, a manager, a CTO, a financial investor and a business intelligence vendor.

Redman: I narrowed my focus to the business leader and I want him or her to take away three points.  First, data should be managed as aggressively and professionally as your other assets.  Second, they are unlike other assets in some really important ways and you’ll have to learn how to manage them.  Third, improving quality is a great place to start.

Ajay- Garbage in Garbage out- How much money and time do you believe is given to data quality in data projects.

Redman:   By this I assume you mean data warehouse, BI, and other tech projects.  And the answer is “not near enough.”  And it shows in the low success rate of those projects.

Ajay-Consider a hypothetical scenario- Instead of creating and selling fancy algorithms , a business intelligence vendor uses simple Pareto principle to focus on data quality and design during data projects. How successful do you think that would be?

Redman: I can’t speak to the market, but I do know that if organizations are loaded with problems and opportunities.  They could make great progress on most important ones if could clearly state the problem and bring high-quality data and simple techniques to bear.  But there are a few that require high-powered algorithms.  Unfortunately those require high-quality data as well.

Ajay- How and when did you first earn the nickname “Data Doc”. Who gave it to you and would you rather be known by some other names.

Redman: One of my clients started calling me that about a dozen years ago.  But I felt uncomfortable and didn’t put it on my business card until about five years ago.  I’ve grown to really like it.

Ajay- The pioneering work at AT & T Bell laboratories and at Palo Alto laboratory- who do you think are the 21st century successors of these laboratories. Do you think lab work has become too commercialized even in respected laboratories like Microsoft Research and Google’s research in mathematics.

Redman: I don’t know.  It may be that the circumstances of the 20th century were conducive to such labs and they’ll never happen again.  You have to remember two things about Bell Labs.  First, was the cross-fertilization that stemmed from having leading-edge work in dozens of areas.  Second, the goal is not just invention, but innovation, the end-to-end process which starts with invention and ends with products in the market.  AT&T, Bell Labs’ parent, was quite good at turning invention to product.  These points lead me to think that the commercial aspect of laboratory work is so much the better.

Ajay-What does ” The Data Doc” do to relax and maintain a work life balance. How important do you think is work-life balance for creative people and researchers.

Redman: I think everyone needs a balance, not just creative people.  Two things have made this easier for me.  First, I like what I do.  A lot of days it is hard to distinguish “work” from “play.”  Second is my bride of thirty-three years, Nancy.  She doesn’t let me go overboard too often.

Biography-

Dr. Thomas C. Redman is President of Navesink Consulting Group, based in Little Silver, NJ.  Known by many as “the Data Doc” (though “Tom” works too), Dr. Redman was the first to extend quality principles to data and information.  By advancing the body of knowledge, his innovations have raised the standard of data quality in today’s information-based economy.

Dr. Redman conceived the Data Quality Lab at AT&T Bell Laboratories in 1987 and led it until 1995.  There he and his team developed the first methods for improving data quality and applied them to important business problems, saving AT&T tens of millions of dollars. He started Navesink Consulting Group in 1996 to help other organizations improve their data, while simultaneously lowering operating costs, increasing revenues, and improving customer satisfaction and business relationships.

Since then – armed with proven, repeatable tools, techniques and practical advice – Dr. Redman has helped clients in fields ranging from telecommunications, financial services, and dot coms, to logistics, consumer goods, and government agencies. His work has helped organizations understand the importance of high-quality data, start their data quality programs, and also save millions of dollars per year.

Dr. Redman holds a Ph.D. in statistics from Florida State University.  He is an internationally renowned lecturer and the author of numerous papers, including “Data Quality for Competitive Advantage” (Sloan Management Review, Winter 1995) and “Data as a Resource: Properties, Implications, and Prescriptions” (Sloan Management Review, Fall 1998). He has written four books: Data Driven (Harvard Business School Press, 2008), Data Quality: The Field Guide (Butterworth-Heinemann, 2001), Data Quality for the Information Age (Artech, 1996) and Data Quality: Management and Technology (Bantam, 1992). He was also invited to contribute two chapters to Juran’s Quality Handbook, Fifth Edition (McGraw Hill, 1999). Dr. Redman holds two patents.

About Navesink Consulting Group (http://www.dataqualitysolutions.com/ )

Navesink Consulting Group was formed in 1996 and was the first company to focus on data quality.  Led by Dr. Thomas Redman, “the Data Doc” and former AT&T Bell Labs director, we have helped clients understand the importance of high-quality data, start their data quality programs, and save millions of dollars per year.

Our approach is not a cobbling together of ill-fitting ideas and assertions – it is based on rigorous scientific principles that have been field-tested in many industries, including financial services (see more under “Our clients”).  We offer no silver bullets; we don’t even offer shortcuts. Improving data quality is hard work.

But with a dedicated effort, you should expect order-of-magnitude improvements and, as a direct result, an enormous boost in your ability to manage risk, steer a course through the crisis, and get back on the growth curve.

Ultimately, Navesink Consulting brings tangible, sustainable improvement in your business performance as a result of superior quality data.

SPSS gets Directions

A link to the Predictive Analytic Conference by SPSS ( the first after the Big Blue announcement) at http://www.spss.com/spssdirections/na/index.htm

Should be interesting for existing clients and SPSS watchers.

spss

WordPress.com and Facebook.com Web Stats

dstats

WordPress.com stats is quite nice and easy for blogger to maintain as there are no plugins etc and most importantly the system ignores your own logins and spurious traffic from web spiders or crawlers.

So while the daily average for the site is 140 views ( or ~100 unque people) if added to 600 in daily newsletter (or around 200 reads) that’s almost a readership of 300 per day. No wonder my old 13 dollars a month server could not cope up.

If you have a blog and use wordpress, wordpress.com is thus both a cheaper as well as more traffic generating option.

I love the Facebook.com stats for the FB page though – the segmentations  are quite neat while interactions do have a chance to spill into personal networking between people of common interests.

fb

Big Data in the Big Apple

y

I am all set to go to the Big Data summit on OCTOBER    1 in  New York

It talks  on Aster Data and their  passion in success in crunching data faster than the speed of thought. Interesting stuff includes Map/Reduce , Hadoop, Big Data and people who have experience in them.

As  a graduate student who is about  to  start his thesis in

statistical ( Regression , Em Algorithm, K Means Clustering) computation (Chunking , and aggregation) of :

MIXED data

  • (structured row and column numbers) and
  • (UNSTRUCTURED text) using  ENTERPRISE SPECIFIC SEARCH

in a  computing  environment that

  • uses HPC nodes and a MIXED GRID of desktops AND IDLE WEB SERVERS on  THE INTERNET

seminars like these are the only way to learn cutting edge stuff.

I would also be present in SAS Data Mining 2009 Conference in Las Vegas in October. Hope to see you there.

At both the conferences I would be interviewing people ( preferably using Video and someone to ask the questions- my spoken accent is very bad). Also rather sheepishly- I will be giving an interview at the Big Data Summit. I have given only two interviews till now-

One for my mentor Vincent Granville, founder Analyticbridge here in April 2008 and the other was on my poetry .

So I guess the interview is on the other side 🙂

StiMulating Conversation

Stimulating conversation is the bait

Stimulating conversation is the bait

Lure the curious monkey to his zoo like fate

curious_george

Come stimulate the conversation for a while

Amuse us O Exotic one, with your pungent style.

We are all egalitarian, at least we have to pretend

This is the American south dont you comprehend.

Stay quiet and keep shut, do you job , move your nut

Ajays friend

Our patience is as deep the color in our skin,

Go ahead and slave for us, lest we begin

There are trees in Tennessee , tall enough to hang you

Curiosity killed the cat- it will noose the monkey too.

( Inspired by a Real life incident


Analyzing Monkeys

I once promised a reader long time back that I would not get into politics but something unexpected hit me like a big truck.

At what point do you decide your boss is a racist. How do you analyze the difference between jokes and racial insults.

Another interesting analysis

Citation Emerald