Home » Posts tagged 'books' (Page 2)
Tag Archives: books
In part 3 of the series for predictions for 2012, here is Jill Dyche, Baseline Consulting/DataFlux.
Part 2 was Timo Elliot, SAP at http://www.decisionstats.com/timo-elliott-on-2012/ and Part 1 was Jim Kobielus, Forrester at http://www.decisionstats.com/jim-kobielus-on-2012/
Ajay: What are the top trends you saw happening in 2011?
Well, I hate to say I saw them coming, but I did. A lot of managers committed some pretty predictable mistakes in 2011. Here are a few we witnessed in 2011 live and up close:
1. In the spirit of “size matters,” data warehouse teams continued to trumpet the volumes of stored data on their enterprise data warehouses. But a peek under the covers of these warehouses reveals that the data isn’t integrated. Essentially this means a variety of heterogeneous virtual data marts co-located on a single server. Neat. Big. Maybe even worthy of a magazine article about how many petabytes you’ve got. But it’s not efficient, and hardly the example of data standardization and re-use that everyone expects from analytical platforms these days.
2. Development teams still didn’t factor data integration and provisioning into their project plans in 2011. So we saw multiple projects spawn duplicate efforts around data profiling, cleansing, and standardization, not to mention conflicting policies and business rules for the same information. Bummer, since IT managers should know better by now. The problem is that no one owns the problem. Which brings me to the next mistake…
3. No one’s accountable for data governance. Yeah, there’s a council. And they meet. And they talk. Sometimes there’s lunch. And then nothing happens because no one’s really rewarded—or penalized for that matter—on data quality improvements or new policies. And so the reports spewing from the data mart are still fraught and no one trusts the resulting decisions.
But all is not lost since we’re seeing some encouraging signs already in 2012. And yes, I’d classify some of them as bona-fide trends.
Ajay: What are some of those trends?
Job descriptions for data stewards, data architects, Chief Data Officers, and other information-enabling roles are becoming crisper, and the KPIs for these roles are becoming more specific. Data management organizations are being divorced from specific lines of business and from IT, becoming specialty organizations—okay, COEs if you must—in their own rights. The value proposition for master data management now includes not just the reconciliation of heterogeneous data elements but the support of key business strategies. And C-level executives are holding the data people accountable for improving speed to market and driving down costs—not just delivering cleaner data. In short, data is becoming a business enabler. Which, I have to just say editorially, is better late than never!
Ajay: Anything surprise you, Jill?
I have to say that Obama mentioning data management in his State of the Union speech was an unexpected but pretty powerful endorsement of the importance of information in both the private and public sector.
I’m also sort of surprised that data governance isn’t being driven more frequently by the need for internal and external privacy policies. Our clients are constantly asking us about how to tightly-couple privacy policies into their applications and data sources. The need to protect PCI data and other highly-sensitive data elements has made executives twitchy. But they’re still not linking that need to data governance.
I should also mention that I’ve been impressed with the people who call me who’ve had their “aha!” moment and realize that data transcends analytic systems. It’s operational, it’s pervasive, and it’s dynamic. I figured this epiphany would happen in a few years once data quality tools became a commodity (they’re far from it). But it’s happening now. And that’s good for all types of businesses.
Jill Dyché has written three books and numerous articles on the business value of information technology. She advises clients and executive teams on leveraging technology and information to enable strategic business initiatives. Last year her company Baseline Consulting was acquired by DataFlux Corporation, where she is currently Vice President of Thought Leadership. Find her blog posts on www.dataroundtable.com.
Events in the field of data that impacted us in 2011
1) Oracle unveiled plans for R Enterprise. This is one of the strongest statements of its focus on in-database analytics. Oracle also unveiled plans for a Public Cloud
2) SAS Institute released version 9.3 , a major analytics software in industry use.
3) IBM acquired many companies in analytics and high tech. Again.However the expected benefits from Cognos-SPSS integration are yet to show a spectacular change in market share.
2011 Selected acquisitions
Q1 Labs October 2011
Algorithmics September 2011i2 August 2011
Tririga March 2011
4) SAP promised a lot with SAP HANA- again no major oohs and ahs in terms of market share fluctuations within analytics.
5) Amazon continued to lower prices of cloud computing and offer more options.
6) Google continues to dilly -dally with its analytics and cloud based APIs. I do not expect all the APIs in the Google APIs suit to survive and be viable in the enterprise software space. This includes Google Cloud Storage, Cloud SQL, Prediction API at https://code.google.com/apis/console/b/0/ Some of the location based , translation based APIs may have interesting spin offs that may be very very commercially lucrative.
7) Microsoft -did- hmm- I forgot. Except for its investment in Revolution Analytics round 1 many seasons ago- very little excitement has come from MS plans in data mining- The plugins for cloud based data mining from Excel remain promising yet , while Azure remains a stealth mode starter.
8) Revolution Analytics promised us a GUI and didnt deliver (till yet ) . But it did reveal a much better Enterprise software Revolution R 5.0 is one of the strongest enterprise software in the R /Stat Computing space and R’s memory handling problem is now an issue of perception than actual stuff thanks to newer advances in how it is used.
9) More conferences, more books and more news on analytics startups in 2011. Big Data analytics remained a strong buzzword. Expect more from this space including creative uses of Hadoop based infrastructure.
10) Data privacy issues continue to hamper and impede effective analytics usage. So does rational and balanced regulation in some of the most advanced economies. We expect more regulation and better guidelines in 2012.
You can go to https://code.google.com/apis/console/b/0/
Unlike Android and other free stuff these APIs are very promising for revenue generation as some of them are very unique to Google itself, and already some are being offered on a Pricing Tier. There are 18 APIs in total with 3 APIs having Pricing while the rest are in beta stages.
I am just listing down all the APIs in one place – (more…)
If you do a Google search for Data Mining Blog- for the past several years one Blog will come on top. data mining blog – Google Search http://bit.ly/kEdPlE
To honor 5 years of Sandro Saitta’s blog (yes thats 5 years!) , we cover an exclusive interview with him where he reveals his unique sauce for cool techie blogging.
Sandro- My first experience with data mining was my master project. I used decision tree to predict pollen concentration for the following week using input data such as wind, temperature and rain. The fact that an algorithm can make a computer learn from experience was really amazing to me. I found it so interesting that I started a PhD in data mining. This time, the field of application was civil engineering. Civil engineers put a lot of sensors on their structure in order to understand how they behave. With all these sensors they generate a lot of data. To interpret these data, I used data mining techniques such as feature selection and clustering. I started my blog, Data Mining Research, during my PhD, to share with other researchers.
I then started applying data mining in the stock market as my first job in industry. I realized the difference between image recognition, where 99% correct classification rate is state of the art, and stock market, where you’re happy with 55%. However, the company ambiance was not as good as I thought, so I moved to consulting. There, I applied data mining in behavioral targeting to increase click-through rates. When you compare the number of customers who click with the ones who don’t, then you really understand what class imbalance mean. A few months ago, I accepted a very good opportunity at SICPA. I’m looking forward to resolving new challenges there.
Ajay- Your blog is the top ranked blog for “data mining blog”. Could you share some tips on better blogging for analytics and technical people
Sandro- It’s always difficult to start a blog, since at the beginning you have no reader. Writing for nobody may seem stupid, but it is not. By writing my first posts during my PhD I was reorganizing my ideas. I was expressing concepts which were not always clear to me. I thus learned a lot and also improved my English level. Of course, it’s still not perfect, but I hope most people can understand me.
Next come the readers. A few dozen each week first. To increase this number, I then started to learn SEO (Search Engine Optimization) by reading books and blogs. I tested many techniques that increased Data Mining Research visibility in the blogosphere. I think SEO is interesting when you already have some content published (which means not at the very beginning of your blog). After a while, once your blog is nicely ranked, the main task is to work on the content of the blog. To be of interest, your content must be particular: original, informative or provocative for example. I also had the chance to have a good visibility thanks to well-known people in the field like Kevin Hillstrom, Gregory Piatetsky-Shapiro, Will Dwinnell / Dean Abbott, Vincent Granville, Matthew Hurst and many others.
Ajay- Whats your favorite statistical software and what are the various softwares that you have worked with.
Could you compare and contrast these software as well.
Sandro- My favorite software at this point is SAS. I worked with it for two years. Once you know the language, you can perform ETL and data mining so easily. It’s also very fast compared to others. There are a lot of tools for data mining, but I cannot think of a tool that is as powerful as SAS and, in the same time, has a high-level programming language behind it.
I also worked with R and Matlab. R is very nice since you have all the up-to-date data mining algorithms implemented. However, working in the memory is not always a good choice, especially for ETL. Matlab is an excellent tool for prototyping. It’s not so fast and certainly not done for ETL, but the price is low regarding all the possibilities for data mining. According to me, SAS is the best choice for ETL and a good choice for data mining. Of course, there is the price.
Ajay- What are your favorite techniques and training resources for learning basics of data mining to say statisticians or business management graduates.
Sandro- I’m the kind of guy who likes to read books. I read data mining books one after the other. The fact that the same concepts are explained differently (and by different people) helps a lot in learning a topic like data mining. Of course, nothing replaces experience in the field. You can read hundreds of books, you will still not be a good practitioner until you really apply data mining in specific fields. My second choice after books is blogs. By reading data mining blogs, you will really see the issues and challenges in the field. It’s still not experience, but we are closer. Finally, web resources and networks such as KDnuggets of course, but also AnalyticBridge and LinkedIn.
Ajay- Describe your hobbies and how they help you ,if at all in your professional life.
Sandro- One of my hobbies is reading. I read a lot of books about data mining, SEO, Google as well as Sci-Fi and Fantasy. I’m a big fan of Asimov by the way. My other hobby is playing tennis. I think I simply use my hobbies as a way to find equilibrium in my life. I always try to find the best balance between work, family, friends and sport.
Ajay- What are your plans for your website for 2011-2012.
Sandro- I will continue to publish guest posts and interviews. I think it is important to let other people express themselves about data mining topics. I will not write about my current applications due to the policies of my current employer. But don’t worry, I still have a lot to write, whether it is technical or not. I will also emphasis more on my experience with data mining, advices for data miners, tips and tricks, and of course book reviews!
Standard Disclosure of Blogging- Sandro awarded me the Peoples Choice award for his blog for 2010 and carried out my interview. There is a lot of love between our respective wordpress blogs, but to reassure our puritan American readers- it is platonic and intellectual.
About Sandro S-
Sandro Saitta is a Data Mining Research Engineer at SICPA Security Solutions. He is also a blogger at Data Mining Research (www.dataminingblog.com). His interests include data mining, machine learning, search engine optimization and website marketing.
You can contact Mr Saitta at his Twitter address-
- HIGHLIGHTS from REXER Survey :R gives best satisfaction (decisionstats.com)
- Participate in the 2011 Rexer Data Mining Survey (r-bloggers.com)
- KDNuggets Survey on R (decisionstats.com)
- New book on BigData Analytics and Data mining using #Rstats with a GUI (decisionstats.com)
- Solmentum: Solar Meets Data Mining (gigaom.com)
- KDnuggets: R used in 1 in 4 analytics projects (revolutionanalytics.com)
Just updated Decisionstats.com
Note the new D logo on top of web page.
Also redesigned the following pages-
- About DecisionStats The page contains motivations and story behind creating it. I also cleaned up the privacy declarations and the flat rate for ad /sponsors.
- Ajay @ arts Updated my art /design portfolio. Apparently I wrote a lot of poems (11) in 2010. Also added my 3 poetry books for e downloads, 1 cartoon mini book, many more videos, 1 play, some photography.
- R Package Creating- Plans for me to write some code. By the time I get down to actually writing some stuff, someone has already created a package for it. Next I plan to create a GUI for BI dashboards, and a Web analytics package (thats your hint to start writing….;)
- Some general SEO stuff. Meta Tags. Traffic Analysis. Linking. Blinking.
- HIRE ME
Heres a new contest for SAS users. The prizes are books, so students should be interested as well.
|New Points for Prizes Contest|
Win SAS books!
|Contribute content or SAS code to sasCommunity.org for your chance to WIN! To qualify, simply add or edit articles between April 11, 2011 and May 9, 2011 (GMT). Creation of a first-time profile on sasCommunity.org gives you 1,000 points. For each valid article creation or edit, 100 points will be earned. The user with the most points collected during this time wins SAS Press Books!
|Become a sasCommunity Guru|
|The sasCommunity support team has been hard at work adding new features and is pleased to announce a points system that recognizes each user’s contributions to the site. Every time you contribute by creating a page, updating it, or just doing a little wiki gardening, you earn points.Earning points is automatic and simple – all you have to do is contribute! Creating your account starts you with 1000 points and all the current users have been credited with points dating back to the site coming online in April 2007.
- Greenplum & SAS Pair on ‘Big Analytics’ (java.sys-con.com)
- Nordstrom and the SAS Institute (sarahlynnwheeler.wordpress.com)
- SAS programming (bravecricket.wordpress.com)
- The Popularity of Data Analysis Software (R vs SAS vs SPSS, etc.) (r-bloggers.com)