Some ways to test and use cloud computing for free for yourself-
- Windows Azure
- Amazon Ec2
- Google Storage
The folks at Microsoft Azure announced a 90 day free trial Continue reading “Cloud Computing by Windows , Amazon and Google for free”
Some ways to test and use cloud computing for free for yourself-
The folks at Microsoft Azure announced a 90 day free trial Continue reading “Cloud Computing by Windows , Amazon and Google for free”
Here is an interview with Elissa Fink, VP Marketing of that new wonderful software called Tableau that makes data visualization so nice and easy to learn and work with.
Ajay- Describe your career journey from high school to over 20 plus years in marketing. What are the various trends that you have seen come and go in marketing.
Elissa- I studied literature and linguistics in college and didn’t discover analytics until my first job selling advertising for the Wall Street Journal. Oddly enough, the study of linguistics is not that far from decision analytics: they both are about taking a structured view of information and trying to see and understand common patterns. At the Journal, I was completely captivated analyzing and comparing readership data. At the same time, the idea of using computers in marketing was becoming more common. I knew that the intersection of technology and marketing was going to radically change things – how we understand consumers, how we market and sell products, and how we engage with customers. So from that point on, I’ve always been focused on technology and marketing, whether it’s working as a marketer at technology companies or applying technology to marketing problems for other types of companies. There have been so many interesting trends. Taking a long view, a key trend I’ve noticed is how marketers work to understand, influence and motivate consumer behavior. We’ve moved marketing from where it was primarily unpredictable, qualitative and aimed at talking to mass audiences, where the advertising agency was king. Now it’s a discipline that is more data-driven, quantitative and aimed at conversations with individuals, where the best analytics wins. As with any trend, the pendulum swings far too much to either side causing backlashes but overall, I think we are in a great place now. We are using data-driven analytics to understand consumer behavior. But pure analytics is not the be-all, end-all; good marketing has to rely on understanding human emotions, intuition and gut feel – consumers are far from rational so taking only a rational or analytical view of them will never explain everything we need to know.
Ajay- Do you think technology companies are still predominantly dominated by men . How have you seen diversity evolve over the years. What initiatives has Tableau taken for both hiring and retaining great talent.
Elissa- The thing I love about the technology industry is that its key success metrics – inventing new products that rapidly gain mass adoption in pursuit of making profit – are fairly objective. There’s little subjective nature to the counting of dollars collected selling a product and dollars spent building a product. So if a female can deliver a better product and bigger profits faster and better, then that female is going to get the resources, jobs, power and authority to do exactly that. That’s not to say that the technology industry is gender-blind, race-blind, etc. It isn’t – technology is far from perfect. For example, the industry doesn’t have enough diversity in positions of power. But I think overall, in comparison to a lot of other industries, it’s pretty darn good at giving people with great ideas the opportunities to realize their visions regardless of their backgrounds or characteristics.
At Tableau, we are very serious about bringing in and developing talented people – they are the key to our growth and success. Hiring is our #1 initiative so we’ve spent a lot of time and energy both on finding great candidates and on making Tableau a place that they want to work. This includes things like special recruiting events, employee referral programs, a flexible work environment, fun social events, and the rewards of working for a start-up. Probably our biggest advantage is the company itself – working with people you respect on amazing, cutting-edge products that delight customers and are changing the world is all too rare in the industry but a reality at Tableau. One of our senior software developers put it best when he wrote “The emphasis is on working smarter rather than longer: family and friends are why we work, not the other way around. Tableau is all about happy, energized employees executing at the highest level and delivering a highly usable, high quality, useful product to our customers.” People who want to be at a place like that should check out our openings at http://www.tableausoftware.com/jobs.
Ajay- What are most notable features in tableau’s latest edition. What are the principal software that competes with Tableau Software products and how would you say Tableau compares with them.
Elissa- Tableau 6.1 will be out in July and we are really excited about it for 3 reasons.
First, we’re introducing our mobile business intelligence capabilities. Our customers can have Tableau anywhere they need it. When someone creates an interactive dashboard or analytical application with Tableau and it’s viewed on a mobile device, an iPad in particular, the viewer will have a native, touch-optimized experience. No trying to get your fingertips to act like a mouse. And the author didn’t have to create anything special for the iPad; she just creates her analytics the usual way in Tableau. Tableau knows the dashboard is being viewed on an iPad and presents an optimized experience.
Second, we’ve take our in-memory analytics engine up yet another level. Speed and performance are faster and now people can update data incrementally rapidly. Introduced in 6.0, our data engine makes any data fast in just a few clicks. We don’t run out of memory like other applications. So if I build an incredible dashboard on my 8-gig RAM PC and you try to use it on your 2-gig RAM laptop, no problem.
And, third, we’re introducing more features for the international markets – including French and German versions of Tableau Desktop along with more international mapping options. It’s because we are constantly innovating particularly around user experience that we can compete so well in the market despite our relatively small size. Gartner’s seminal research study about the Business Intelligence market reported a massive market shift earlier this year: for the first time, the ease-of-use of a business intelligence platform was more important than depth of functionality. In other words, functionality that lots of people can actually use is more important than having sophisticated functionality that only specialists can use. Since we focus so heavily on making easy-to-use products that help people rapidly see and understand their data, this is good news for our customers and for us.
Ajay- Cloud computing is the next big thing with everyone having a cloud version of their software. So how would you run Cloud versions of Tableau Server (say deploying it on an Amazon Ec2 or a private cloud)
Elissa- In addition to the usual benefits espoused about Cloud computing, the thing I love best is that it makes data and information more easily accessible to more people. Easy accessibility and scalability are completely aligned with Tableau’s mission. Our free product Tableau Public and our product for commercial websites Tableau Digital are two Cloud-based products that deliver data and interactive analytics anywhere. People often talk about large business intelligence deployments as having thousands of users. With Tableau Public and Tableau Digital, we literally have millions of users. We’re serving up tens of thousands of visualizations simultaneously – talk about accessibility and scalability! We have lots of customers connecting to databases in the Cloud and running Tableau Server in the Cloud. It’s actually not complex to set up. In fact, we focus a lot of resources on making installation and deployment easy and fast, whether it’s in the cloud, on premise or what have you. We don’t want people to have spend weeks or months on massive roll-out projects. We want it to be minutes, hours, maybe a day or 2. With the Cloud, we see that people can get started and get results faster and easier than ever before. And that’s what we’re about.
Ajay- Describe some of the latest awards that Tableau has been wining. Also how is Tableau helping universities help address the shortage of Business Intelligence and Big Data professionals.
Elissa-Tableau has been very fortunate. Lately, we’ve been acknowledged by both Gartner and IDC as the fastest growing business intelligence software vendor in the world. In addition, our customers and Tableau have won multiple distinctions including InfoWorld Technology Leadership awards, Inc 500, Deloitte Fast 500, SQL Server Magazine Editors’ Choice and Community Choice awards, Data Hero awards, CODiEs, American Business Awards among others. One area we’re very passionate about is academia, participating with professors, students and universities to help build a new generation of professionals who understand how to use data. Data analysis should not be exclusively for specialists. Everyone should be able to see and understand data, whatever their background. We come from academic roots, having been spun out of a Stanford research project. Consequently, we strongly believe in supporting universities worldwide and offer 2 academic programs. The first is Tableau For Teaching, where any professor can request free term-length licenses of Tableau for academic instruction during his or her courses. And, we offer a low-cost Student Edition of Tableau so that students can choose to use Tableau in any of their courses at any time.
Elissa Fink is Tableau Software’s Vice President of Marketing. With 20+ years helping companies improve their marketing operations through applied data analysis, Elissa has held executive positions in marketing, business strategy, product management, and product development. Prior to Tableau, Elissa was EVP Marketing at IXI Corporation, now owned by Equifax. She has also served in executive positions at Tele Atlas (acquired by TomTom), TopTier Software (acquired by SAP), and Nielsen/Claritas. Elissa also sold national advertising for the Wall Street Journal. She’s a frequent speaker and has spoken at conferences including the DMA, the NCDM, Location Intelligence, the AIR National Forum and others. Elissa is a graduate of Santa Clara University and holds an MBA in Marketing and Decision Systems from the University of Southern California.
Elissa first discovered Tableau late one afternoon at her previous company. Three hours later, she was still “at play” with her data. “After just a few minutes using the product, I was getting answers to questions that were taking my company’s programmers weeks to create. It was instantly obvious that Tableau was on a special mission with something unique to offer the world. I just had to be a part of it.”
To know more – read at http://www.tableausoftware.com/
and existing data viz at http://www.tableausoftware.com/learn/gallery
Storm seasons: measuring and tracking key indicators
|
What’s happening with local real estate prices?
|
How are sales opportunities shaping up?
|
Identify your best performing products
|
|
|
|
|
|
|
Applying user-defined parameters to provide context
|
Not all tech companies are rocket ships
|
What’s really driving the economy?
|
Considering factors and industry influencers
|
The complete orbit along the inside, or around a fixed circle
|
How early do you have to be at the airport?
|
What happens if sales grow but so does customer churn?
|
What are the trends for new retail locations?
|
How have student choices changed?
|
Do patients who disclose their HIV status recover better?
|
Closer look at where gas prices swing in areas of the U.S.
|
U.S. Census data shows more women of greater age
|
Where do students come from and how does it affect their grades?
|
Tracking customer service effectiveness
|
Comparing national and local test scores
|
What factors correlate with high overall satisfaction ratings?
|
|
Fund inflows largely outweighed outflows well after the bubble
|
Which programs are competing for federal stimulus dollars?
|
Oil prices and volatility
|
A classic candlestick chart
|
How do oil, gold and CPI relate to the GDP growth rate?
|
However this is what Phil Rack the reseller is quoting on http://www.minequest.com/Pricing.html
Windows Desktop Price: $884 on 32-bit Windows and $1,149 on 64-bit Windows.
The Bridge to R is available on the Windows platforms and is available for free to customers who
license WPS through MineQuest,LLC. Companies and organizations outside of North America
may purchase a license for the Bridge to R which starts at $199 per desktop or $599 per serverWindows Server Price: $1,903 per logical CPU for 32-bit and $2,474 for 64-bit.
Note that Linux server versions are available but do not yet support the Eclipse IDE and are
command line only
WPS sure seems going well-but their pricing is no longer fixed and on the home website, you gotta fill a form. Ditt0 for the 30 day free evaluation
http://www.teamwpc.co.uk/products/wps/modules/core
The table below provides a summary of data formats presently supported by the WPS Core module.
Data File Format | Un-Compressed Data |
Compressed Data |
||
---|---|---|---|---|
Read | Write | Read | Write | |
SD2 (SAS version 6 data set) | ![]() |
![]() |
||
SAS7BDAT (SAS version 7 data set) | ![]() |
![]() |
![]() |
|
SAS7BDAT (SAS version 8 data set) | ![]() |
![]() |
![]() |
|
SAS7BDAT (SAS version 9 data set) | ![]() |
![]() |
![]() |
|
SASSEQ (SAS version 8/9 sequential file) | ![]() |
![]() |
![]() |
|
V8SEQ (SAS version 8 sequential file) | ![]() |
![]() |
![]() |
|
V9SEQ (SAS version 9 sequential file) | ![]() |
![]() |
![]() |
|
WPD (WPS native data set) | ![]() |
![]() |
![]() |
![]() |
WPDSEQ (WPS native sequential file) | ![]() |
![]() |
||
XPORT (transport format) | ![]() |
![]() |
Additional access to EXCEL, SPSS and dBASE files is supported by utilising the WPS Engine for DB Filesmodule.
and they have a new product release on Valentine Day 2011 (oh these Europeans!)
From the press release at http://www.teamwpc.co.uk/press/wps2_5_1_released
WPS Version 2.5.1 Released
New language support, new data engines, larger datasets, improved scalabilityLONDON, UK – 14 February 2011 – World Programming today released version 2.5.1 of their WPS software for workstations, servers and mainframes.
WPS is a competitively priced, high performance, highly scalable data processing and analytics software product that allows users to execute programs written in the language of SAS. WPS is supported on a wide variety of hardware and operating system platforms and can connect to and work with many types of data with ease. The WPS user interface (Workbench) is frequently praised for its ease of use and flexibility, with the option to include numerous third-party extensions.
This latest version of the software has the ability to manipulate even greater volumes of data, removing the previous 2^31 (2 billion) limit on number of observations.
Complimenting extended data processing capabilities, World Programming has worked hard to boost the performance, scalability and reliability of the WPS software to give users the confidence they need to run heavy workloads whilst delivering maximum value from available computer power.
WPS version 2.5.1 offers additional flexibility with the release of two new data engines for accessing Greenplum and SAND databases. WPS now comes with eleven data engines and can access a huge range of commonly used and industry-standard file-formats and databases.
Support in WPS for the language of SAS continues to expand with more statistical procedures, data step functions, graphing controls and many other language items and options.
WPS version 2.5.1 is available as a free upgrade to all licensed users of WPS.
Summary of Main New Features:
- Supporting Even Larger Datasets
WPS is now able to process very large data sets by lifting completely the previous size limit of 2^31 observations.- Performance and Scalability Boosted
Performance and scalability improvements across the board combine to ensure even the most demanding large and concurrent workloads are processed efficiently and reliably.- More Language Support
WPS 2.5.1 continues the expansion of it’s language support with over 70 new language items, including new Procedures, Data Step functions and many other language items and options.- Statistical Analysis
The procedure support in WPS Statistics has been expanded to include PROC CLUSTER and PROC TREE.- Graphical Output
The graphical output from WPS Graphing has been expanded to accommodate more configurable graphics.- Hash Tables
Support is now provided for hash tables.- Greenplum®
A new WPS Engine for Greenplum provides dedicated support for accessing the Greenplum database.- SAND®
A new WPS Engine for SAND provides dedicated support for accessing the SAND database.- Oracle®
Bulk loading support now available in the WPS Engine for Oracle.- SQL Server®
To enhance existing SQL Server database access, a new SQLSERVR (please note spelling) facility in the ODBC engine.More Information:
Existing Users should visit www.teamwpc.co.uk/support/wps/release where you can download a readme file containing more information about all the new features and fixes in WPS 2.5.1.
New Users should visit www.teamwpc.co.uk/products/wps where you can explore in more detail all the features available in WPS or request a free evaluation.
and from http://www.teamwpc.co.uk/products/wps/data it seems they are going on the BIG DATA submarine as well-
WPS is now able to handle extremely large data sets now that the previous limit of 2^31 observations has been lifted.
Often I am asked by clients, friends and industry colleagues on the suitability or unsuitability of particular software for analytical needs. My answer is mostly-
It depends on-
1) Cost of Type 1 error in purchase decision versus Type 2 error in Purchase Decision. (forgive me if I mix up Type 1 with Type 2 error- I do have some weird childhood learning disabilities which crop up now and then)
Here I define Type 1 error as paying more for a software when there were equivalent functionalities available at lower price, or buying components you do need , like SPSS Trends (when only SPSS Base is required) or SAS ETS, when only SAS/Stat would do.
The first kind is of course due to the presence of free tools with GUI like R, R Commander and Deducer (Rattle does have a 500$ commercial version).
The emergence of software vendors like WPS (for SAS language aficionados) which offer similar functionality as Base SAS, as well as the increasing convergence of business analytics (read predictive analytics), business intelligence (read reporting) has led to somewhat brand clutter in which all softwares promise to do everything at all different prices- though they all have specific strengths and weakness. To add to this, there are comparatively fewer business analytics independent analysts than say independent business intelligence analysts.
2) Type 2 Error- In this case the opportunity cost of delayed projects, business models , or lower accuracy – consequences of buying a lower priced software which had lesser functionality than you required.
To compound the magnitude of error 2, you are probably in some kind of vendor lock-in, your software budget is over because of buying too much or inappropriate software and hardware, and still you could do with some added help in business analytics. The fear of making a business critical error is a substantial reason why open source software have to work harder at proving them competent. This is because writing great software is not enough, we need great marketing to sell it, and great customer support to sustain it.
As Business Decisions are decisions made in the constraints of time, information and money- I will try to create a software purchase matrix based on my knowledge of known softwares (and unknown strengths and weakness), pricing (versus budgets), and ranges of data handling. I will add in basically an optimum approach based on known constraints, and add in flexibility for unknown operational constraints.
I will restrain this matrix to analytics software, though you could certainly extend it to other classes of enterprise software including big data databases, infrastructure and computing.
Noted Assumptions- 1) I am vendor neutral and do not suffer from subjective bias or affection for particular software (based on conferences, books, relationships,consulting etc)
2) All software have bugs so all need customer support.
3) All software have particular advantages , strengths and weakness in terms of functionality.
4) Cost includes total cost of ownership and opportunity cost of business analytics enabled decision.
5) All software marketing people will praise their own software- sometimes over-selling and mis-selling product bundles.
Software compared are SPSS, KXEN, R,SAS, WPS, Revolution R, SQL Server, and various flavors and sub components within this. Optimized approach will include parallel programming, cloud computing, hardware costs, and dependent software costs.
To be continued-
Some ambiguity about Libre Office and why it needed to change from Open Office- just when Open Office seemed so threatening on the desktop
FROM- http://www.documentfoundation.org/faq/
A: Not at all. The Document Foundation will continue to be focused on developing, supporting, and promoting the same software, and it’s very much business as usual. We are simply moving to a new and more appropriate organisational model for the next decade – a logical development from Sun’s inspirational launch a decade ago.
A: For ten years we have used the same name – “OpenOffice.org” – for both the Community and the software. We’ve decided it removes ambiguity to have a different name for the two, so the Community is now “The Document Foundation”, and the software “LibreOffice”. Note: there are other examples of this usage in the free software community – e.g. the Mozilla Foundation with the Firefox browser.
A: We would like to have that possibility open to us in the future…
A: The OpenOffice.org trademark is owned by Oracle Corporation. Our hope is that Oracle will donate this to the Foundation, along with the other assets it holds in trust for the Community, in due course, once legal etc issues are resolved. However, we need to continue work in the meantime – hence “LibreOffice” (“free office”).
A: Since Oracle’s takeover of Sun Microsystems, the Community has been under “notice to quit” from our previous Collabnet infrastructure. With today’s announcement of a Foundation, we now have an entity which can own our emerging new infrastructure.
A: We want The Document Foundation to be open to code contributions from as many people as possible. We are delighted to announce that the enhancements produced by the Go-OOo team will be merged into LibreOffice, effective immediately. We hope that others will follow suit.
A: The Document Foundation cannot answer for other bodies. However, there is nothing in the licence arrangements to stop companies continuing to release commercial derivatives of LibreOffice. The new Foundation will also mean companies can contribute funds or resources without worries that they may be helping a commercial competitor.
A: The Document Foundation sets out deliberately to be as developer friendly as possible. We do not demand that contributors share their copyright with us. People will gain status in our community based on peer evaluation of their contributions – not by who their employer is.
A: LibreOffice is The Document Foundation’s reason for existence. We do not have and will not have a commercial product which receives preferential treatment. We only have one focus – delivering the best free office suite for our users – LibreOffice.
—————————————————————————————————-
Non Microsoft and Non Oracle vendors are indeed going to find it useful the possiblities of bundling a free Libre Office that reduces the total cost of ownership for analytics software. Right now, some of the best free advertising for Microsoft OS and Office is done by enterprise software vendors who create Windows Only Products and enable MS Office integration better than Open Office integration. This is done citing user demand- but it is a chicken egg dilemma- as functionality leads to enhanced demand. Microsoft on the other hand is aware of this dependence and has made SQL Server and SQL Analytics (besides investing in analytics startups like Revolution Analytics) along with it’s own infrastructure -Azure Cloud Platform/EC2 instances.
Running R on an Amazon EC2 has following benefits-
1) Elastic Memory and Number of Processors for heavy computation
2) Affordable micro instances for smaller datasets (2 cents per hour for Unix to 3 cents per hour).
3) An easy to use interface console for managing datasets as well as processes
Running R on an Amazon EC2 on Windows Instance has following additional benefits-
1) Remote Desktop makes operation of R very easy
2) 64 Bit R can be used
3) You can also use your evaluation of Revolution R Enterprise (which is free to academics) and quite inexpensive for enterprise software for corporates.
You can thus combine R GUIs (like Rattle , R Cmdr or Deducer based upon your need for statistical analysis, data mining or graphical analysis) , with 64 Bit OS, and Revolution’s REvoScaler Package to manage huge huge datasets at a very easy to use analytics solution.
Pricing-for Computation on EC2
Standard On-Demand Instances | Linux/UNIX Usage | Windows Usage |
---|---|---|
Small (Default) | $0.085 per hour | $0.12 per hour |
Large | $0.34 per hour | $0.48 per hour |
Extra Large | $0.68 per hour | $0.96 per hour |
Micro On-Demand Instances | Linux/UNIX Usage | Windows Usage |
Micro | $0.02 per hour | $0.03 per hour |
High-Memory On-Demand Instances | ||
Extra Large | $0.50 per hour | $0.62 per hour |
Double Extra Large | $1.00 per hour | $1.24 per hour |
Quadruple Extra Large | $2.00 per hour | $2.48 per hour |
High-CPU On-Demand Instances | ||
Medium | $0.17 per hour | $0.29 per hour |
Extra Large | $0.68 per hour | $1.16 per hour |
Cluster Compute Instances | ||
Quadruple Extra Large | $1.60 per hour | N/A* |
* Windows is not currently available for Cluster Compute Instances. |
Internet Data Transfer
The pricing below is based on data transferred “in” and “out” of Amazon EC2.
Data Transfer In | US & EU Regions | APAC Region |
---|---|---|
All Data Transfer | Free until Nov 1, 2010 * |
Free until Nov 1, 2010 * |
Data Transfer Out ** |
US & EU Regions | APAC Region |
---|---|---|
First 1 GB per Month | $0.00 per GB | $0.00 per GB |
Up to 10 TB per Month | $0.15 per GB | $0.19 per GB |
Amazon EBS Volumes- To store data
Amazon EBS Snapshots to Amazon S3 (priced the same as Amazon S3)
|
http://aws.amazon.com/ec2/#pricing Other costs are optional to your needs
Based on the above- I set out to try and create a how-to DIY for running R (and RevolutionR on 64bit Windows on EC2)
1) Logon to https://console.aws.amazon.com/ec2/home
2) Launch Windows Instance
Left Margin-AMI-
Top Windows – Select Windows 64 AMI
(note if you select SQL Server it will cost you extra)
Then go through the following steps and launch instance
notepad, and copy and paste the Private Key (looks like gibberish)- and click Decrypt.
Note the new password generated- this is your Remote Desktop Password.
Click on the .rdp file (or Shortcut file created earlier)- It will connect to your Windows instance.
Enter the new generated password in Remote Desktop
This looks like a new clean machine with just Windows OS installed on it.
Install Chrome (or any other browser) if you do not use Internet Explorer
Install Acrobat Reader (for documentation), Revolution R Enterprise~ 490 mb (it will automatically ask to install the .NET framework-4 files) and /or R
Install packages (I recommend installing R Commander, Rattle and Deducer). Apart from the fact that these GUIs are quite complimentary- they also will install almost all main packages that you need for analysis (as their dependencies) Revolution R installs parallel programming packages by default.
If you want to save your files for working later, you can make a snapshot (go to amazon console-ec2- left margin- ABS -Snapshot- you will see an attached memory (green light)- click on create snapshot to save your files for working later
If you want to use my Windows snapshot you can work on it , just when you start your Amazon Ec2 you can click on snapshots and enter details (see snapshot name below) for making a copy or working on it for exploring either 64 bit R, or multi core cloud computing or just trying out Revolution R’s new packages for academic purposes.
Here is an Interview with Donald Farmer of Microsoft talking about the passion for the exciting business intelligence projects at MS.
Q Describe your career from high school to your current job responsibilities at Microsoft. How can technology companies in America work together to grow the home pool of American science students ( irrespective of market share battles)
A My background is relatively unusual for a technology professional, although at Microsoft one meets people with a very wide range of backgrounds. I had little interest in studying Computer Science formally. For me, software was always a means to an end: a way of solving what were, for me, “more interesting” problems. Of course, I cannot deny that computer science is a compelling subject in itself, just not for me. Yet, from my early teens in Scotland, I had computers to try (starting with the justly famous Sinclair range) and I used them to store, classify and analyze the data I needed for my other work. So, as I studied philosophy and languages, and as I worked in history, archaeology, forestry, fish-farming and so on (through many variations) before I became more completely involved in Business Intelligence, I used database techniques extensively.
I spent some years as a consultant, building all sorts of applications, My first predictive application enabled fish-farmers with private water supplies to balance the needs of fish production and hydro-electric generation based on past, present and predicted rainfall. I believe that application is still in use today, 15 years later!
Later, I joined an excellent group of developers and analysts at AppsMart, building a data mart rapid-development application. That brought me into the Microsoft sphere, as we built on the SQL Server platform and were actively involved in the SQL Server Data Warehouse ecosystem.
With the dot-com bust of 2000, I happily found an opportunity to work with Microsoft. There I started working on Analysis Services, later leading a team of program managers in Integration Services. In that time, we did some really interesting work along with Zhaohui Tang’s team, integrating Data Mining capabilities with our ETL tool, to enable predictive analytics in the flow of data. The implications of this technique are still only being realized: we have used it for imputing missing data, and have an interesting patent on how to use this technique for detecting outliers in streaming data. In addition, we included fuzzy matching techniques from Surajit Chaudhuri’s team, to give even more flexibility.
More recently I have been working in Data Mining, with a marvelous and energetic team under Jamie MacLennan, and then in the last couple of years I have been managing a super team of Program Managers building the client interfaces for our new PowerPivot application.
My current role is not focused on a single product, but rather I look across all the business intelligence products to see how we can engage our engineering knowledge ever more effectively with customers, partners, analysts and, of course, with other teams across Microsoft.
So, as you can see my background is very varied. In some ways, that means that I am not well placed to speak to how the USA can better grow a pool of science students, as I was never one myself. Yet, I do think there are some lessons I can share. Firstly, we should not make the mistake of focusing only on science and technology as an end in itself. We do need to encourage the use of information science techniques in all appropriate fields, including liberal arts, and also “power professions” such as medicine and law. The USA provides wonderful educational opportunities in these fields, but all too often young people have to choose between science and arts. Many of the best talents I have met in the world of analytics have backgrounds which are very diverse.
Q) Describe the current status of SQL Server and Microsoft Data Mining. What are the areas in Business Intelligence we can see much more excitement and innovation in the coming few months from you guys.
A) Data Mining remains one of the most popular technologies in the SQL Server stack. I have presented recently in China, Germany, The Netherlands and the UK, and at every conference the data mining sessions were among the most popular and the most successful. This speaks volumes about the interest in this field. it also reflects how successfully Microsoft has broadened our user base by shipping the Excel Data Mining Add-ins.
Q) How is Microsoft’s cloud computing venture Azure going? How is Sharepoint doing? What do you personally feel on the remote sharing and computing model.
A) Azure and Sharepoint are, of course, very different beasts. Windows Azure, and especially SQL Azure which we launched at PDC in November, are proving to be very popular. In particular SQL Server Azure is really succeeding with it’s strong development and management story – you design and manage cloud databases with the same tools and techniques as you do for on-premise databases. There has been a fabntastic response to this, especially from emerging economies where the idea of having Microsoft manage your data infrastructure at any scale is very attractive. At TechEd South Africa, for example, David Robinson from the SQL Azure team got a tremendous reception. However, there are difficulties in emerging economies because of poor bandwidth. Shortly after David and I were in South Africa, local businesses held a race: they tied a usb stick with files to the leg of a carrier pigeon and set it off home from Pietermaritzburg to Durban, simultaneously trying to download the same files between the same locations online. The pigeon won!
So, I do think the cloud offers tremendous opportunities for business to scale and manage their resources effectively, but it’s early days.
Q And when can I start do data mining from within my Excel workbook- I remember working on a SQL Server Analysis Plugin for an cloud Excel prototype last year.
A You should be using Excel for data mining right now. Just go to http://www.sqlserverdatamining.com and look for the links, on the right hand side of the page. These are released products. You can also go to http://www.sqlserverdatamining.com/cloud to try an experimental cloud service – but it is only experimental and could be up or down at any time.
For more conventional, OLAP-like, analytics you should also try out PowerPivot in beta. See http://www.powerpivot.com . PowerPivot is an application that plugs into Excel and enables business users to build quite complex models, over basically unlimited data volumes, quickly and easily. It’s proving to be hugely popular already. I am sure it will dominate much of the BI news in 2010.
Q) What are the risks, and challenges in creating new technology when working for an Industry leader like Microsoft where the spotlight is on every step you take and the competition is brutal.
A) I simply don’t think about brutal competition. Even in nature I see far more symbiosis than competition. I personally think competition is a very negative mindset although the term “competitor” is the common shorthand for another vendor in the space and I do use it that way myself – but more from habit than conviction.
In the database world, you might say Oracle are our competitors. Yet most of the Oracle customers I know (and I was an Oracle customer myself once) are also SQL Server customers. Often they use Reporting Services, or Analysis Services. Integration Services had to ship a fast-loading Oracle destination, because so many customers want to use SQL Server tools to load Oracle databases. I see far more cases like that, where the picture is complex and symbiotic, than I do of outright competition.
In the analytic space, almost every tool out there has one feature in common – one feature which everyone uses. Export to Excel.
I genuinely love working with our partners, and I am lucky to have good friends throughout the industry: at SAP, Oracle, IBM, SAS … you name it. We all benefit from empowering businesses with better tools. As the old saying goes, “the rising tide lifts all boats.”
Q) In terms of Lines of Code, Microsoft may have given the maximum number of shared libraries and code away- yet sometimes comes from a perception problem because of vintage. Do you think all cool tech companies become not so cool after some years, even if they dont fundamentally change.
A) I think the idea of a company being “cool” is itself just a phase we’re going through as an industry as we’re growing up. As the tech industry matures, you’ll see more emphasis on value, and net contributution. In many ways, Microsoft, and IBM I think, are ahead of the curve, as companies which are valued for their stability, resources and our ability to continually provide compelling new solutions and services. I travel a lot, and I see classrooms in western China, and emerging businesses in Africa, and women starting to work in new careers in the Middle East, and I don’t see them prioritizing cool. But I do see them doing amazing things with Microsoft technology.
Q) Describe your blogging style and what best tips would you give to technology bloggers.
A) I don’t blog enough, sadly, although I do try.
I have two blogs. One, at http://blogs.technet.com/sqlserverexperts/ is a shared “SQL Server Experts” blog. It’s very focussed on Microsoft technologies, of course. I especially like to blog about trends that I am seeing in my work with customers. My other blog, at http://beyeblogs.com/donaldfarmer/ is more personal, and includes gleanings from my other interests. I especially like doing my first blog of April there – that’s always fun.
My advice to bloggers should probably be “do what I say, not what I do.” However, most important I think, is to be authentic in your voice. My business intelligence bloggers are Jill Dyche, Evan Levy, David Loshin, William McKnight and Neil Raden – all of them blog quite regularly and are always great to read. There are others out there who are just as interesting, but don’t quite have the same rhythm to their blogging. I admire, but sadly fail to emulate, those who blog regularly and effectively.
Q) What do you do when not at work.
A) My wife is an artist, and she keeps me busy helping out with events and projects. We live on a wild couple of acres in Washington and caring for that is a lot of fun too. Otherwise, I mostly read, cook and play the piano. I love cooking, although I’m not sure how good I am – my son is now a professional chef, so perhaps I had some influence. I play the piano badly, but I can lose myself in that. I read very well. I love to read poetry – and I struggle to read Chinese poetry in the original. It’s such a fascinating language, and the poetry is so complex and yet so simple. That will be a lifetime study.
Biography-
Donald Farmer is the Principal Program Manager, SQL Server Data Mining, at Microsoft Corp.