As you look down the road, what are the three major challenges you see for vendors who keep trying to solve big data and other “now” problems with old tools?
Old tools and traditional architectures cannot scale effectively to handle massive data volumes that reach 100’s of terabytes nor can they effectively process large data volumes in a high performance manner. Further, they are restricted to what SQL querying allows. The three challenges I have noted are:
First, performance, specifically, poor performance on large data volumes and heavy workloads: The pre-existing systems rely on storing data in a traditional DBMS or data warehouse and then extracting a sample of data to a separate processing tier. This greatly restricts data insights and analytics as only a sample of data is analyzed and understood. As more data is stored in these systems they suffer from performance degradation as more users try to access the system concurrently. Additionally moving masses of data out of the traditional DBMS to a separate processing tier adds latency and slows down analytics and response times. This pre-existing architecture greatly limits performance especially as data sizes grow.
Second, limited analytics: Pre-existing systems rely mostly on SQL for data querying and analysis. SQL poses several limitations and is not suited for ad hoc querying, deep data exploration and a range of other analytics. MapReduce overcomes the limitations of SQL and SQL-MapReduce in particular opens up a new class of analytics that cannot be achieved with SQL alone.
And, third, limitations of types of data that can be stored and analyzed: Traditional systems are not designed for non-relational or unstructured data. New solutions such as Aster Data’s are designed from the ground up to handle both relational and non-relational data. Organizations want to store and process a range of data types and do this in a single platform. New solutions allow for different data types to be handled in a single platform whereas pre-existing architectures and solutions are specialized around a single data type or format – this restricts the diversity of analytics that can be performed on these systems.
Here is an interview with Karen Lopez who has worked in data modeling for almost three decades and is a renowned data management expert in her field.
Data professionals need to know about the data domain in addition to the data structure domain – Karen Lopez
Ajay- Describe your career in science. How would you persuade younger students to take more science courses.
Karen- I’ve always had an interest in science and I attribute that to the great science teachers I had. I studied information systems at Purdue University though a unique program that focuses on systems analysis and computer technologies. I’m one of the few who studied data and process modeling in an undergraduate program 25+ years ago.
I believe that it is very important that we find a way of attracting more scientists to teach. In both the natural and computer sciences, it’s difficult for institutions to tempt scientists away from professional positions that offer much greater compensation. So I support programs that find ways to make that happen.
Ajay- If you had to give advice to a young person starting their career in BI and had to give them advice in just three points – what would they be?
Karen- Wow. It’s tough to think of just three things, but these are recommendations that I make often:
– Remember that every design decision should be made based on cost, benefit, and risk. If you can’t clearly describe these for every side of a decision, then you aren’t doing design; you are guessing.
– No one beside you is responsible for advancing your skills and keeping an eye on emerging practices. Don’t expect your employer to lay out a career plan that is in your best interest. That’s not their job. Data professionals need to know about the data domain in addition to the data structure domain. The best database or data warehouse design in the world is worse than uses useless if the how the data is processed is wrong. Remember to expand your knowledge about data, not just the data structures and tools.
– All real-world work involves collaboration and negotiation. There is no one right answer that works for every situation. Building your skills in these areas will pay off significantly.
Ajay- What do you think is the best way for a technical consultant and client to be on the same page regarding requirements. Which methodology or template have you used, and which has given you the most success.
Karen- While I’m a huge fan of modeling (data modeling and other modeling), I still think that giving clients a prototype or mockup of something that looks real to them goes a long way. We need to build tools and competencies to develop these prototypes quickly. It’s a lost art in the data world.
Ajay- What are the special incentives that make Canada a great place for tech entrepreneurs rather than say go to the United States. ( Note- Disclaimer I have family in Canada and study in the US)
Karen- I prefer not to think of this as an either-or decision. I immigrated to Canada from the US about 15 years ago, but most of our business is outside of Canada. I have enjoyed special incentives here in Canada for small businesses as well as special programs that allowed me to work in Canada as a technical professional before I moved here permanently.
Overall, I have found Canadian employers more open to sponsoring foreign workers and it is easier for them to do so than what my US clients experience. Having said that, a significant portion of my work over the last few years has been on global projects where we leverage online collaboration tools to meet our goals. The advent of these tools has made it much easier to work from wherever I am and to work with others regardless of their visa statuses.
Where a company forms is less tied to where one lives or works these days.
Ajay- Could you tell us more about the Zachman framework (apart from the wikipedia reference)? A practical example on how you used it on an actual project would be great.
There are many misunderstandings about John’s intent, such as the myth that he requires big upfront modeling (he doesn’t), that the Framework is a methodology (it isn’t), or that it can only be used to build computer systems (it can be used for more than that).
I have used the Zachman Framework to develop a joint Business-IT Strategic Information Systems Plan as well as to inventory and track progress of multi-project programs. One interesting use was a paper I authored for the Canadian Information Processing Society (CIPS) on how various educational programs, specializations, and certifications map to the Zachman Framework. I later developed a presentation about this mapping for a Zachman conference.
For a specific project, the Zachman Framework allows business to understand where their enterprise assets are being managed – and how well they are managed. It’s not an IT thing; it’s an enterprise architecture thing.
Ajay- What does Karen Lopez do for fun when not at work, traveling, speaking or blogging.
Karen- Sometimes it seems that’s all I do. I enjoy volunteering for IT-related organizations such as DAMA and CIPS. I participate in the accreditation of college and university educational programs in Canada and abroad. As a member of data-related standards bodies, namely the Association for Retail Technology Standards and the American Dental Association, I help develop industry standard data models. I’ve also been a spokesperson for a CIPS program to encourage girls to take more math and science courses throughout their student careers so that they may have access to great opportunities in the future.
I like to think of myself as a runner; last year I completed my first half marathon, which I’d never thought was possible. I am studying Hindi and Sanskrit. I’m also a addicted to reading and am thankful that some of it I actually get paid to do.
Karen López is a Senior Project Manager at InfoAdvisors, Inc. Karen is a frequent speaker at DAMA conferences and DAMA Chapters. She has 20+ years of experience in project and data management on large, multi-project programs. Karen specializes in the practical application of data management principles. Karen is also the ListMistress and moderator of the InfoAdvisors Discussion Groups at http://www.infoadvisors.com. You can reach her at www.twitter.com/datachick