Home » Posts tagged 'how to' (Page 4)
Tag Archives: how to
Jill Dyche on 2012
In part 3 of the series for predictions for 2012, here is Jill Dyche, Baseline Consulting/DataFlux.
Part 2 was Timo Elliot, SAP at
http://www.decisionstats.com/timo-elliott-on-2012/
and Part 1 was Jim Kobielus, Forrester at
http://www.decisionstats.com/jim-kobielus-on-2012/
Ajay: What are the top trends you saw happening in 2011?
Well, I hate to say I saw them coming, but I did. A lot of managers committed some pretty predictable mistakes in 2011. Here are a few we witnessed in 2011 live and up close:
1. In the spirit of “size matters,” data warehouse teams continued to trumpet the volumes of stored data on their enterprise data warehouses. But a peek under the covers of these warehouses reveals that the data isn’t integrated. Essentially this means a variety of heterogeneous virtual data marts co-located on a single server. Neat. Big. Maybe even worthy of a magazine article about how many petabytes you’ve got. But it’s not efficient, and hardly the example of data standardization and re-use that everyone expects from analytical platforms these days.
2. Development teams still didn’t factor data integration and provisioning into their project plans in 2011. So we saw multiple projects spawn duplicate efforts around data profiling, cleansing, and standardization, not to mention conflicting policies and business rules for the same information. Bummer, since IT managers should know better by now. The problem is that no one owns the problem. Which brings me to the next mistake…
3. No one’s accountable for data governance. Yeah, there’s a council. And they meet. And they talk. Sometimes there’s lunch. And then nothing happens because no one’s really rewarded—or penalized for that matter—on data quality improvements or new policies. And so the reports spewing from the data mart are still fraught and no one trusts the resulting decisions.
But all is not lost since we’re seeing some encouraging signs already in 2012. And yes, I’d classify some of them as bona-fide trends.
Ajay: What are some of those trends?
Job descriptions for data stewards, data architects, Chief Data Officers, and other information-enabling roles are becoming crisper, and the KPIs for these roles are becoming more specific. Data management organizations are being divorced from specific lines of business and from IT, becoming specialty organizations—okay, COEs if you must—in their own rights. The value proposition for master data management now includes not just the reconciliation of heterogeneous data elements but the support of key business strategies. And C-level executives are holding the data people accountable for improving speed to market and driving down costs—not just delivering cleaner data. In short, data is becoming a business enabler. Which, I have to just say editorially, is better late than never!
Ajay: Anything surprise you, Jill?
I have to say that Obama mentioning data management in his State of the Union speech was an unexpected but pretty powerful endorsement of the importance of information in both the private and public sector.
I’m also sort of surprised that data governance isn’t being driven more frequently by the need for internal and external privacy policies. Our clients are constantly asking us about how to tightly-couple privacy policies into their applications and data sources. The need to protect PCI data and other highly-sensitive data elements has made executives twitchy. But they’re still not linking that need to data governance.
I should also mention that I’ve been impressed with the people who call me who’ve had their “aha!” moment and realize that data transcends analytic systems. It’s operational, it’s pervasive, and it’s dynamic. I figured this epiphany would happen in a few years once data quality tools became a commodity (they’re far from it). But it’s happening now. And that’s good for all types of businesses.
About-
Jill Dyché has written three books and numerous articles on the business value of information technology. She advises clients and executive teams on leveraging technology and information to enable strategic business initiatives. Last year her company Baseline Consulting was acquired by DataFlux Corporation, where she is currently Vice President of Thought Leadership. Find her blog posts on www.dataroundtable.com.
R Concerto- Computer Adaptive Tests
A really nice use for R is education
http://www.psychometrics.cam.ac.uk/page/300/concerto-testing-platform.htm
Concerto: R-Based Online Adaptive Testing Platform
Concerto is a web based, adaptive testing platform for creating and running rich, dynamic tests. It combines the flexibility of HTML presentation with the computing power of the R language, and the safety and performance of the MySQL database. It’s totally free for commercial and academic use, and it’s open source. If you have any questions, you feel like generously supporting the project, or you want to develop a commerical test on the platform, feel free to email Michal Kosinski.
We rely as much as possible on popular open source packages in order to maximize the safety and reliability of the system, and to ensure that its elements are kept up-to-date.
Why choose Concerto?
- Simple to use: Check our Step-by-Step tutorial to see how to create a test in minutes.
- Flexibility: You can use the R engine to apply virtually any IRT or CAT models.
- Scalability: Modular design, MySQL tables, and low system requirements allow the testing of thousands for pennies.
- Reliability: Concerto relies on popular, constantly updated, and reliable elements used by millions of users world-wide.
- Elegant feedback and items: The flexibility of the HTML layer and the power of R allow you to use (or generate on the fly!) polished multi-media items, as well as feedback full of graphs and charts generated by R for each test taker.
- Low costs: It’s free and open-source!

Demonstration tests:
Concerto explained:
- Project’s website at Google Code
- Tutorial: Create a simple test step-by-step
- Tutorial: Create an adaptive test with Concerto
- Concerto Manual
- CAT Concerto
Get Concerto:
Before installing concerto you may prefer to test it using a demo account on our server.Email Michal Kosinski in order to get demo account.
- Project hosted at: Google Code
Training in Concerto:
Next session 9th Dec 2011: book early!
Commercial tests and Concerto:
Concerto is an open-source project so anyone can use it free of charge, even for commercial purposes. However, it might be faster and less expensive to hire our experienced team to develop your test, provide support and maintenance, and take responsibility for its smooth and reliable operation. Contact us!
Pune Hackathon
message from Jimmy Wales and friends-
Pune Wikimedia hackathon.
Date: 10-12 February 2012
Venue: Symbiosis Institute of Computer Studies & Research (SICSR) at
Symbiosis International University, Pune,India
Extremely rough event page, soon to get more details:
https://www.mediawiki.org/wiki/Pune_Hackathon_Feb_2012
As you know, a Wikimedia hackathon is a chance to learn how to develop
using MediaWiki, Phonegap, and our other technologies, and to work
alongside experts. Software engineers, designers, and translators are
welcome. We’re tentatively planning to focus on internationalisation and
localisation, mobile Wikipedia access, and the JavaScript-based gadgets
framework.
Registration link:
http://is.gd/rjpNOA
If you’re interested, please register to request an invitation, and feel
free to publicize. Thanks!
Revolution Webinar Series #Rstats
Revolution Analytics Webinar-
|
Featured Webinar
|
![]() |
||
| David Champagne CTO, Revolution Analytics |
||
| Tuesday, December 20th | ||
| 11:00AM – 11:30AM Pacific Click here for the webinar time in your local time zone |
||
Traditional IT infrastructure is simply unable to meet
the demands of the new “Big Data Analytics” landscape. Many enterprises are turning to the “R” statistical programming language and Hadoop (both open source projects) as a potential solution. This webinar will introduce the statistical capabilities of R within the Hadoop ecosystem. We’ll cover:
- An introduction to new packages developed by Revolution Analytics to facilitate interaction with the data stores HDFS and HBase so that they can be leveraged from the R environment
- An overview of how to write Map Reduce jobs in R using Hadoop
- Special considerations that need to be made when working with R and Hadoop.
We’ll also provide additional resources that are available to people interested in integrating R and Hadoop.
|
Upcoming Webinars
|
| Wed, Dec 14th 11:00AM – 11:30AM PT |
Revolution R Enterprise – 100% R and MoreR users already know why the R language is the lingua franca of statisticians today: because it’s the most powerful statistical language in the world. Revolution Analytics builds on the power of open source R, and adds performance, productivity and integration features to create Revolution R Enterprise. In this webinar, author and blogger David Smith will introduce the additional capabilities of Revolution R Enterprise. |
Free Machine Learning at Stanford
One of the cornerstones of the technology revolution, Stanford now offers some courses for free via distance learning. One of the more exciting courses is of course- machine learning
About The Course
This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition. Topics include: (i) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI). The course will also draw from numerous case studies and applications, so that you’ll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.
The Instructor
Professor Andrew Ng is Director of the Stanford Artificial Intelligence Lab, the main AI research organization at Stanford, with 20 professors and about 150 students/post docs. At Stanford, he teaches Machine Learning, which with a typical enrollment of 350 Stanford students, is among the most popular classes on campus. His research is primarily on machine learning, artificial intelligence, and robotics, and most universities doing robotics research now do so using a software platform (ROS) from his group.
- When does the class start?The class will start in January 2012 and will last approximately ten weeks.
- What is the format of the class?The class will consist of lecture videos, which are broken into small chunks, usually between eight and twelve minutes each. Some of these may contain integrated quiz questions. There will also be standalone quizzes that are not part of video lectures, and programming assignments.
- Will the text of the lectures be available?We hope to transcribe the lectures into text to make them more accessible for those not fluent in English. Stay tuned.
- Do I need to watch the lectures live?No. You can watch the lectures at your leisure.
- Can online students ask questions and/or contact the professor?Yes, but not directly There is a Q&A forum in which students rank questions and answers, so that the most important questions and the best answers bubble to the top. Teaching staff will monitor these forums, so that important questions not answered by other students can be addressed.
- Will other Stanford resources be available to online students?No.
- How much programming background is needed for the course?The course includes programming assignments and some programming background will be helpful.
- Do I need to buy a textbook for the course?No.
- How much does it cost to take the course?Nothing: it’s free!
- Will I get university credit for taking this course?No.Interested in learning machine learning-
Well here is the website to enroll
http://jan2012.ml-class.org/
Interview Zach Goldberg, Google Prediction API
Here is an interview with Zach Goldberg, who is the product manager of Google Prediction API, the next generation machine learning analytics-as-an-api service state of the art cloud computing model building browser app.
Ajay- Describe your journey in science and technology from high school to your current job at Google.
Zach- First, thanks so much for the opportunity to do this interview Ajay! My personal journey started in college where I worked at a startup named Invite Media. From there I transferred to the Associate Product Manager (APM) program at Google. The APM program is a two year rotational program. I did my first year working in display advertising. After that I rotated to work on the Prediction API.
Ajay- How does the Google Prediction API help an average business analytics customer who is already using enterprise software , servers to generate his business forecasts. How does Google Prediction API fit in or complement other APIs in the Google API suite.
Zach- The Google Prediction API is a cloud based machine learning API. We offer the ability for anybody to sign up and within a few minutes have their data uploaded to the cloud, a model built and an API to make predictions from anywhere. Traditionally the task of implementing predictive analytics inside an application required a fair amount of domain knowledge; you had to know a fair bit about machine learning to make it work. With the Google Prediction API you only need to know how to use an online REST API to get started.
You can learn more about how we help businesses by watching our video and going to our project website.
Ajay- What are the additional use cases of Google Prediction API that you think traditional enterprise software in business analytics ignore, or are not so strong on. What use cases would you suggest NOT using Google Prediction API for an enterprise.
Zach- We are living in a world that is changing rapidly thanks to technology. Storing, accessing, and managing information is much easier and more affordable than it was even a few years ago. That creates exciting opportunities for companies, and we hope the Prediction API will help them derive value from their data.
The Prediction API focuses on providing predictive solutions to two types of problems: regression and classification. Businesses facing problems where there is sufficient data to describe an underlying pattern in either of these two areas can expect to derive value from using the Prediction API.
Ajay- What are your separate incentives to teach about Google APIs to academic or researchers in universities globally.
Zach- I’d refer you to our university relations page-
Google thrives on academic curiosity. While we do significant in-house research and engineering, we also maintain strong relations with leading academic institutions world-wide pursuing research in areas of common interest. As part of our mission to build the most advanced and usable methods for information access, we support university research, technological innovation and the teaching and learning experience through a variety of programs.
Ajay- What is the biggest challenge you face while communicating about Google Prediction API to traditional users of enterprise software.
Zach- Businesses often expect that implementing predictive analytics is going to be very expensive and require a lot of resources. Many have already begun investing heavily in this area. Quite often we’re faced with surprise, and even skepticism, when they see the simplicity of the Google Prediction API. We work really hard to provide a very powerful solution and take care of the complexity of building high quality models behind the scenes so businesses can focus more on building their business and less on machine learning.




