Interview Zach Goldberg, Google Prediction API

Here is an interview with Zach Goldberg, who is the product manager of Google Prediction API, the next generation machine learning analytics-as-an-api service state of the art cloud computing model building browser app.
Ajay- Describe your journey in science and technology from high school to your current job at Google.

Zach- First, thanks so much for the opportunity to do this interview Ajay!  My personal journey started in college where I worked at a startup named Invite Media.   From there I transferred to the Associate Product Manager (APM) program at Google.  The APM program is a two year rotational program.  I did my first year working in display advertising.  After that I rotated to work on the Prediction API.

Ajay- How does the Google Prediction API help an average business analytics customer who is already using enterprise software , servers to generate his business forecasts. How does Google Prediction API fit in or complement other APIs in the Google API suite.

Zach- The Google Prediction API is a cloud based machine learning API.  We offer the ability for anybody to sign up and within a few minutes have their data uploaded to the cloud, a model built and an API to make predictions from anywhere. Traditionally the task of implementing predictive analytics inside an application required a fair amount of domain knowledge; you had to know a fair bit about machine learning to make it work.  With the Google Prediction API you only need to know how to use an online REST API to get started.

You can learn more about how we help businesses by watching our video and going to our project website.

Ajay-  What are the additional use cases of Google Prediction API that you think traditional enterprise software in business analytics ignore, or are not so strong on.  What use cases would you suggest NOT using Google Prediction API for an enterprise.

Zach- We are living in a world that is changing rapidly thanks to technology.  Storing, accessing, and managing information is much easier and more affordable than it was even a few years ago.  That creates exciting opportunities for companies, and we hope the Prediction API will help them derive value from their data.

The Prediction API focuses on providing predictive solutions to two types of problems: regression and classification. Businesses facing problems where there is sufficient data to describe an underlying pattern in either of these two areas can expect to derive value from using the Prediction API.

Ajay- What are your separate incentives to teach about Google APIs  to academic or researchers in universities globally.

Zach- I’d refer you to our university relations page

Google thrives on academic curiosity. While we do significant in-house research and engineering, we also maintain strong relations with leading academic institutions world-wide pursuing research in areas of common interest. As part of our mission to build the most advanced and usable methods for information access, we support university research, technological innovation and the teaching and learning experience through a variety of programs.

Ajay- What is the biggest challenge you face while communicating about Google Prediction API to traditional users of enterprise software.

Zach- Businesses often expect that implementing predictive analytics is going to be very expensive and require a lot of resources.  Many have already begun investing heavily in this area.  Quite often we’re faced with surprise, and even skepticism, when they see the simplicity of the Google Prediction API.  We work really hard to provide a very powerful solution and take care of the complexity of building high quality models behind the scenes so businesses can focus more on building their business and less on machine learning.

 

 

How to make an analytics project?

Some of the process methodologies I have used and been exposed to while making analytics projects are-1) DMAIC/Six Sigma

While Six Sigma was initially a quality control system, it has also been very succesful in managing projects. The various stages of an analytical project can be divided using the DMAIC methodology.

DMAIC stands for

  • Define
  • Measure
  • Analyze
  • Improve
  • Control

Related to this is DMADV, ( “Design For Six Sigma”)

  • Define
  • Measure and identify CTQs
  • Analyze
  • Design
  • Verify

2) CRISP
CRISP-DM stands for Cross Industry Standard Process for Data Mining

CRISP-DM breaks the process of data mining into six major phases- and these can be used for business analytics projects as well.

  • Business Understanding
  • Data Understanding
  • Data Preparation
  • Modeling
  • Evaluation
  • Deployment

3) SEMMA
SEMMA  stands for

  • sample
  • explore
  • modify
  • model
  • assess

4) ISO 9001

ISO 9001 is a certification as well as a philosophy for making a Quality Management System to measure , reduce and eliminate error and customer complaints. Any customer complaint or followup has to be treated as an error, logged, and investigated for control.

5) LEAN
LEAN is a philosophy to eliminate Wastage in a process. Applying LEAN principles to analytics projects helps a lot in eliminating project bottlenecks, technology compatibility issues and data quality resolution. I think LEAN would be great in data quality issues, and IT infrastructure design because that is where the maximum waste is observed in analytics projects.

6) Demings Plan Do Check Act cycle.

Interview Dan Steinberg Founder Salford Systems

Here is an interview with Dan Steinberg, Founder and President of Salford Systems (http://www.salford-systems.com/ )

Ajay- Describe your journey from academia to technology entrepreneurship. What are the key milestones or turning points that you remember.

 Dan- When I was in graduate school studying econometrics at Harvard,  a number of distinguished professors at Harvard (and MIT) were actively involved in substantial real world activities.  Professors that I interacted with, or studied with, or whose software I used became involved in the creation of such companies as Sun Microsystems, Data Resources, Inc. or were heavily involved in business consulting through their own companies or other influential consultants.  Some not involved in private sector consulting took on substantial roles in government such as membership on the President’s Council of Economic Advisors. The atmosphere was one that encouraged free movement between academia and the private sector so the idea of forming a consulting and software company was quite natural and did not seem in any way inconsistent with being devoted to the advancement of science.

 Ajay- What are the latest products by Salford Systems? Any future product plans or modification to work on Big Data analytics, mobile computing and cloud computing.

 Dan- Our central set of data mining technologies are CART, MARS, TreeNet, RandomForests, and PRIM, and we have always maintained feature rich logistic regression and linear regression modules. In our latest release scheduled for January 2012 we will be including a new data mining approach to linear and logistic regression allowing for the rapid processing of massive numbers of predictors (e.g., one million columns), with powerful predictor selection and coefficient shrinkage. The new methods allow not only classic techniques such as ridge and lasso regression, but also sub-lasso model sizes. Clear tradeoff diagrams between model complexity (number of predictors) and predictive accuracy allow the modeler to select an ideal balance suitable for their requirements.

The new version of our data mining suite, Salford Predictive Modeler (SPM), also includes two important extensions to the boosted tree technology at the heart of TreeNet.  The first, Importance Sampled learning Ensembles (ISLE), is used for the compression of TreeNet tree ensembles. Starting with, say, a 1,000 tree ensemble, the ISLE compression might well reduce this down to 200 reweighted trees. Such compression will be valuable when models need to be executed in real time. The compression rate is always under the modeler’s control, meaning that if a deployed model may only contain, say, 30 trees, then the compression will deliver an optimal 30-tree weighted ensemble. Needless to say, compression of tree ensembles should be expected to be lossy and how much accuracy is lost when extreme compression is desired will vary from case to case. Prior to ISLE, practitioners have simply truncated the ensemble to the maximum allowable size.  The new methodology will substantially outperform truncation.

The second major advance is RULEFIT, a rule extraction engine that starts with a TreeNet model and decomposes it into the most interesting and predictive rules. RULEFIT is also a tree ensemble post-processor and offers the possibility of improving on the original TreeNet predictive performance. One can think of the rule extraction as an alternative way to explain and interpret an otherwise complex multi-tree model. The rules extracted are similar conceptually to the terminal nodes of a CART tree but the various rules will not refer to mutually exclusive regions of the data.

 Ajay- You have led teams that have won multiple data mining competitions. What are some of your favorite techniques or approaches to a data mining problem.

 Dan- We only enter competitions involving problems for which our technology is suitable, generally, classification and regression. In these areas, we are  partial to TreeNet because it is such a capable and robust learning machine. However, we always find great value in analyzing many aspects of a data set with CART, especially when we require a compact and easy to understand story about the data. CART is exceptionally well suited to the discovery of errors in data, often revealing errors created by the competition organizers themselves. More than once, our reports of data problems have been responsible for the competition organizer’s decision to issue a corrected version of the data and we have been the only group to discover the problem.

In general, tackling a data mining competition is no different than tackling any analytical challenge. You must start with a solid conceptual grasp of the problem and the actual objectives, and the nature and limitations of the data. Following that comes feature extraction, the selection of a modeling strategy (or strategies), and then extensive experimentation to learn what works best.

 Ajay- I know you have created your own software. But are there other software that you use or liked to use?

 Dan- For analytics we frequently test open source software to make sure that our tools will in fact deliver the superior performance we advertise. In general, if a problem clearly requires technology other than that offered by Salford, we advise clients to seek other consultants expert in that other technology.

 Ajay- Your software is installed at 3500 sites including 400 universities as per http://www.salford-systems.com/company/aboutus/index.html What is the key to managing and keeping so many customers happy?

 Dan- First, we have taken great pains to make our software reliable and we make every effort  to avoid problems related to bugs.  Our testing procedures are extensive and we have experts dedicated to stress-testing software . Second, our interface is designed to be natural, intuitive, and easy to use, so the challenges to the new user are minimized. Also, clear documentation, help files, and training videos round out how we allow the user to look after themselves. Should a client need to contact us we try to achieve 24-hour turn around on tech support issues and monitor all tech support activity to ensure timeliness, accuracy, and helpfulness of our responses. WebEx/GotoMeeting and other internet based contact permit real time interaction.

 Ajay- What do you do to relax and unwind?

 Dan- I am in the gym almost every day combining weight and cardio training. No matter how tired I am before the workout I always come out energized so locating a good gym during my extensive travels is a must. I am also actively learning Portuguese so I look to watch a Brazilian TV show or Portuguese dubbed movie when I have time; I almost never watch any form of video unless it is available in Portuguese.

 Biography-

http://www.salford-systems.com/blog/dan-steinberg.html

Dan Steinberg, President and Founder of Salford Systems, is a well-respected member of the statistics and econometrics communities. In 1992, he developed the first PC-based implementation of the original CART procedure, working in concert with Leo Breiman, Richard Olshen, Charles Stone and Jerome Friedman. In addition, he has provided consulting services on a number of biomedical and market research projects, which have sparked further innovations in the CART program and methodology.

Dr. Steinberg received his Ph.D. in Economics from Harvard University, and has given full day presentations on data mining for the American Marketing Association, the Direct Marketing Association and the American Statistical Association. After earning a PhD in Econometrics at Harvard Steinberg began his professional career as a Member of the Technical Staff at Bell Labs, Murray Hill, and then as Assistant Professor of Economics at the University of California, San Diego. A book he co-authored on Classification and Regression Trees was awarded the 1999 Nikkei Quality Control Literature Prize in Japan for excellence in statistical literature promoting the improvement of industrial quality control and management.

His consulting experience at Salford Systems has included complex modeling projects for major banks worldwide, including Citibank, Chase, American Express, Credit Suisse, and has included projects in Europe, Australia, New Zealand, Malaysia, Korea, Japan and Brazil. Steinberg led the teams that won first place awards in the KDDCup 2000, and the 2002 Duke/TeraData Churn modeling competition, and the teams that won awards in the PAKDD competitions of 2006 and 2007. He has published papers in economics, econometrics, computer science journals, and contributes actively to the ongoing research and development at Salford.

Interview Scott Gidley CTO and Founder, DataFlux

Here is an interview with Scott Gidley, CTO and co-founder of leading data quality ccompany DataFlux . DataFlux is a part of SAS Institute and in 2011 acquired Baseline Consulting besides launching the latest version of their Master Data Management  product. Continue reading “Interview Scott Gidley CTO and Founder, DataFlux”

Interview Beth Schultz Editor AllAnalytics.com

Here is an interview with Beth Scultz Editor in Chief, AllAnalytics.com .

Allanalytics.com http://www.allanalytics.com/ is the new online community on Predictive Analytics, and its a bit different in emphasizing quality more than just quantity. Beth is veteran in tech journalism and communities.

Ajay-Describe your journey in technology journalism and communication. What are the other online communities that you have been involved with?

Beth- I’m a longtime IT journalist, having begun my career covering the telecommunications industry at the brink of AT&T’s divestiture — many eons ago. Over the years, I’ve covered the rise of internal corporate networking; the advent of the Internet and creation of the Web for business purposes; the evolution of Web technology for use in building intranets, extranets, and e-commerce sites; the move toward a highly dynamic next-generation IT infrastructure that we now call cloud computing; and development of myriad enterprise applications, including business intelligence and the analytics surrounding them. I have been involved in developing online B2B communities primarily around next-generation enterprise IT infrastructure and applications. In addition, Shawn Hessinger, our community editor, has been involved in myriad Web sites aimed at creating community for small business owners.

 Ajay- Technology geeks get all the money while journalists get a story. Comments please

Beth- Great technology geeks — those being the ones with technology smarts as well as business savvy — do stand to make a lot of money. And some pursue that to all ends (with many entrepreneurs gunning for the acquisition) while others more or less fall into it. Few journalists, at least few tech journalists, have big dollars in mind. The gratification for journalists comes in being able to meet these folks, hear and deliver their stories — as appropriate — and help explain what makes this particular technology geek developing this certain type of product or service worth paying attention to.

 Ajay- Describe what you are trying to achieve with the All Analytics community and how it seeks to differentiate itself with other players in this space.

 Beth- With AllAnaltyics.com, we’re concentrating on creating the go-to site for CXOs, IT professionals, line-of-business managers, and other professionals to share best practices, concrete experiences, and research about data analytics, business intelligence, information optimization, and risk management, among many other topics. We differentiate ourself by featuring excellent editorial content from a top-notch group of bloggers, access to industry experts through weekly chats, ongoing lively and engaging message board discussions, and biweekly debates.

We’re a new property, and clearly in rapid building mode. However, we’ve already secured some of the industry’s most respected BI/analytics experts to participate as bloggers. For example, a small sampling of our current lineup includes the always-intrigueing John Barnes, a science fiction novelist and statistics guru; Sandra Gittlen, a longtime IT journalist with an affinity for BI coverage; Olivia Parr-Rud, an internationally recognized expert in BI and organizational alignment; Tom Redman, a well-known data-quality expert; and Steve Williams, a leading BI strategy consultant. I blog daily as well, and in particular love to share firsthand experiences of how organizations are benefiting from the use of BI, analytics, data warehousing, etc. We’ve featured inside looks at analytics initiatives at companies such as 1-800-Flowers.com, Oberweis Dairy, the Cincinnati Zoo & Botanical Garden, and Thomson Reuters, for example.

In addition, we’ve hosted instant e-chats with Web and social media experts Joe Stanganelli and Pierre DeBois, and this Friday, Aug. 26, at 3 p.m. ET we’ll be hosting an e-chat with Marshall Sponder, Web metrics guru and author of the newly published book, Social Media Analytics: Effective Tools for Building, Interpreting, and Using Metrics. (Readers interested in participating in the chat do need to fill out a quick registration form, available here http://www.allanalytics.com/register.asp . The chat is available here http://www.allanalytics.com/messages.asp?piddl_msgthreadid=241039&piddl_msgid=439898#msg_439898 .

Experts participating in our biweekly debate series, called Point/Counterpoint, have broached topics such as BI in the cloud, mobile BI and whether an analytics culture is truly possible to build.

Ajay-  What are some tips you would like to share about writing tech stories to aspiring bloggers.

Beth- I suppose my best advice is this: Don’t write about technology for technology’s sake. Always strive to tell the audience why they should care about a particular technology, product, or service. How might a reader use it to his or her company’s advantage, and what are the potential benefits? Improved productivity, increased revenue, better customer service? Providing anecdotal evidence goes a long way toward delivering that message, as well.

Ajay- What are the other IT world websites that have made a mark on the internet.

Beth- I’d be remiss if I didn’t give a shout out to UBM TechWeb sites, including InformationWeek, which has long charted the use of IT within the enterprise; Dark Reading, a great source for folks interested in securing an enterprise’s information assets; and Light Reading, which takes the pulse of the telecom industry.

 Biography- 

Beth Schultz has more than two decades of experience as an IT writer and editor. Most recently, she brought her expertise to bear writing thought-provoking editorial and marketing materials on a variety of technology topics for leading IT publications and industry players. Previously, she oversaw multimedia content development, writing and editing for special feature packages at Network World. Beth has a keen ability to identify business and technology trends, developing expertise through in-depth analysis and early-adopter case studies. Over the years, she has earned more than a dozen national and regional editorial excellence awards for special issues from American Business Media, American Society of Business Press Editors, Folio.net, and others.

 

Interview Jaime Fitzgerald President Fitzgerald Analytics

Here is an interview with noted analytics expert Jaime Fitzgerald, of Fitzgerald Analytics.

Ajay-Describe your career journey from being a Harvard economist to being a text analytics thought leader.

 Jaime- I was attracted to economics because of the logic, the structured and systematic approach to understanding the world and to solving problems. In retrospect, this is the same passion for logic in problem solving that drives my business today.

About 15 years ago, I began working in consulting and initially took a traditional career path. I worked for well-known strategy consulting firms including First Manhattan Consulting Group, Novantas LLC, Braun Consulting, and for the former Japan-focused division of Deloitte Consulting, which had spun off as an independent entity. I was the only person in their New York City office for whom Japanese was not the first language.

While I enjoyed traditional consulting, I was especially passionate about the role of data, analytics, and process improvement. In traditional strategy consulting, these are important factors, but I had a vision for a “next generation” approach to strategy consulting that would be more transparent, more robust, and more focused on the role that information, analysis, and process plays in improving business results. I often explain that while my firm is “not your father’s consulting model,” we have incorporated key best practices from traditional consulting, and combined them with an approach that is more data-centric, technology-centric, and process-centric.

At the most fundamental level, I was compelled to found Fitzgerald Analytics more than six years ago by my passion for the role information plays in improving results, and ultimately improving lives. In my vision, data is an asset waiting to be transformed into results, including profit as well as other results that matter deeply to people. For example,one of the most fulfilling aspects of our work at Fitzgerald Analytics is our support of non-profits and social entrepreneurs, who we help increase their scale and their success in achieving their goals.

Ajay- How would you describe analytics as a career option to future students. What do you think are the most essential qualities an analytics career requires.

Jaime- My belief is that analytics will be a major driver of job-growth and career growth for decades. We are just beginning to unlock the full potential of analytics, and already the demand for analytic talent far exceeds the supply.

To succeed in analytics, the most important quality is logic. Many people believe that math or statistical skills are the most important quality, but in my experience, the most essential trait is what I call “ThoughtStyle” — critical thinking, logic, an ability to break down a problem into components, into sub-parts.

Ajay -What are your favorite techniques and methodologies in text analytics. How do you see social media and Big Data analytics as components of text analytics

 Jaime-We do a lot of work for our clients measuring Customer Experience, by which I mean the experience customers have when interacting with our clients. For example, we helped a major brokerage firm to measure 12 key “Moments that Matter,” including the operational aspects of customer service, customer satisfaction and sentiment, and ultimately customer behavior. Clients care about this a lot, because customer experience drives customer loyalty, which in turn drives customer behavior, customer loyalty, and customer profitability.

Text analytics plays a key role in these projects because much of our data on customer sentiment comes via unstructured text data. For example, we have access to call center transcripts and notes, to survey responses, and to social media comments.

We use a variety of methods, some of which I’m not in a position to describe in great detail. But at a high level, I would say that our favorite text analytics methodologies are “hybrid solutions” which use a two-step process to answer key questions for clients:

Step 1: convert unstructured data into key categorical variables (for example, using contextual analysis to flag users who are critical vs. neutral vs. advocates)

Step 2: linking sentiment categories to customer behavior and profitability (for example, linking customer advocacy and loyalty with customer profits as well as referral volume, to define the ROI that clients accrue for customer satisfaction improvements)

Ajay- Describe your consulting company- Fitzgerald Analytics and some of the work that you have been engaged in.

 Jaime- Our mission is to “illuminate reality” using data and to convert Data to Dollars for our clients. We have a track record of doing this well, with concrete and measurable results in the millions of dollars. As a result, 100% of our clients have engaged us for more than one project: a 100% client loyalty rate.

Our specialties–and most frequent projects–include customer profitability management projects, customer segmentation, customer experience management, balanced scorecards, and predictive analytics. We are often engaged to address high-stakes analytic questions, including issues that help to set long-term strategy. In other cases, clients hire us to help them build their internal capabilities. We have helped build several brand new analytic teams for clients, which continue to generate millions of dollars of profits with their fact-based recommendations.

Our methodology is based on Steven Covey’s principle: “begin with the end in mind,” the concept of starting with the client’s goal and working backwards from there. I often explain that our methods are what you would have gotten if Steven Covey had been a data analyst…we are applying his principles to the world of data analytics.

Ajay- Analytics requires more and more data while privacy requires the least possible data. What do you think are the guidelines that need to be built in sharing internet browsing and user activity data and do we need regulations just like we do for sharing financial data.

 Jaime- Great question. This is an essential challenge of the big data era. My perspective is that firms who depend on user data for their analysis need to take responsibility for protecting privacy by using data management best practices. Best practices to adequately “mask” or remove private data exist…the problem is that these best practices are often not applied. For example, Facebook’s practice of sharing unique user IDs with third-party application companies has generated a lot of criticism, and could have been avoided by applying data management best practices which are well known among the data management community.

If I were able to influence public policy, my recommendation would be to adopt a core set of simple but powerful data management standards that would protect consumers from perhaps 95% of the privacy risks they face today. The number one standard would be to prohibit sharing of static, personally identifiable user IDs between companies in a manner that creates “privacy risk.” Companies can track unique customers without using a static ID…they need to step up and do that.

Ajay- What are your favorite text analytics software that you like to work with.

 Jaime- Because much of our work in deeply embedded into client operations and systems, we often use the software our clients already prefer. We avoid recommending specific vendors unless our client requests it. In tandem with our clients and alliance partners, we have particular respect for Autonomy, Open Text, Clarabridge, and Attensity.

Biography-

http://www.fitzgerald-analytics.com/jaime_fitzgerald.html

The Founder and President of Fitzgerald Analytics, Jaime has developed a distinctively quantitative, fact-based, and transparent approach to solving high stakes problems and improving results.  His approach enables translation of Data to Dollars™ using methodologies clients can repeat again and again.  He is equally passionate about the “human side of the equation,” and is known for his ability to link the human and the quantitative, both of which are needed to achieve optimal results.

Experience: During more than 15 years serving clients as a management strategy consultant, Jaime has focused on customer experience and loyalty, customer profitability, technology strategy, information management, and business process improvement.  Jaime has advised market-leading banks, retailers, manufacturers, media companies, and non-profit organizations in the United States, Canada, and Singapore, combining strategic analysis with hands-on implementation of technology and operations enhancements.

Career History: Jaime began his career at First Manhattan Consulting Group, specialists in financial services, and was later a Co-Founder at Novantas, the strategy consultancy based in New York City.  Jaime was also a Manager for Braun Consulting, now part of Fair Isaac Corporation, and for Japan-based Abeam Consulting, now part of NEC.

Background: Jaime is a graduate of Harvard University with a B.A. in Economics.  He is passionate and supportive of innovative non-profit organizations, their effectiveness, and the benefits they bring to our society.

Upcoming Speaking Engagements:   Jaime is a frequent speaker on analytics, information management strategy, and data-driven profit improvement.  He recently gave keynote presentations on Analytics in Financial Services for The Data Warehousing Institute, the New York Technology Council, and the Oracle Financial Services Industry User Group. A list of Jaime’s most interesting presentations on analyticscan be found here.

He will be presenting a client case study this fall at Text Analytics World re:   “New Insights from ‘Big Legacy Data’: The Role of Text Analytics” 

Connecting with Jaime:  Jaime can be found at Linkedin,  and Twitter.  He edits the Fitzgerald Analytics Blog.

Google Plus Games : Crime City or Fun with Funzio on G+

Probably the best designed game on Google Plus right now is Funzio’s Crime City at Google Plus

Funzio which has Zynga alumni http://www.funzio.com/games/ creates a mix of the best games in social game history Farmville and Mafia Wars (with some ideas from the classic Dope Wars) to make https://plus.google.com/u/0/games/865772480172

CRIME CITY

Zynga better hurry up with Farmville on this new G+ platform and the new platform needs to sort some teeny quality issues (which I shall elaborate later)