What's my website traffic, Dude?

Some website traffic numbers for potential server cost sharers. The server is now slow despite caching enabled and server RAM ramped up, way beyond my student budget. Please bear with me and do continue to visit it.

Thanks to you- I have managed to ramp up to over 20000 number of visits a month ( Aug figure incomplete till 28th only). This is quite feel good considering that I decided to move to full time blogging just 3 months ago from earlier consulting- blog mode. I am now going back to student school and the blog quantity is expected to better in quality ( and less in quantity) as I hope to get some homework done after finding some money to buy textbooks.

Month Unique visitors Number of visits Pages Hits Bandwidth
May 2009 1527 3775 20903 42640 1000.02 MB
Jun 2009 4097 9082 55946 139918 6.39 GB
Jul 2009 17364 24532 126613 293489 9.18 GB
Aug 2009 11788 21909 114317 255289 9.25 GB

Interview Dylan Jones DataQualityPro.com

Here is an interview with Dylan Jones the founder/editor of Dataqualitypro.com , the site to go to for anything related to Data Quality discussions. Dylan is a great charming person and in this interview talks candidly on his views.Dylan Jones

Ajay: Describe your career in science and in business intelligence. How would you convince young students to take more maths and science courses for scientific careers.

Dylan: My main education for the profession was a degree in Information Technology and Software Development. No surprises what my first job entailed – software development for an IT company!

That role took me straight into the trials and tribulations of business intelligence and data quality. After a couple of years I went freelance and have pretty much worked for myself ever since. There has been a constant thread of data quality, business intelligence and data migration throughout my career which culminated in me setting up the more recent social media initiatives to try and pull professionals together in this space.

In all honesty, I’m probably the worst person to give career advice Ajay as I’m a hopeless dreamer. I’ve never really structured my career. I fell into data quality early on and it has led me to work in some wonderful places and with some great people, largely by accident and fate.

I have a simple philosophy, do what you love doing. I’m incredibly lucky to wake up every day with an absolute passion for what I do. In the past, whenever I have found myself working in a situation that I find soul destroying (and in our profession that can happen regularly) I move on to something new.

So, my advice for people starting out would be to first question what makes them happy in life. Don’t simply follow the herd. The internet has totally transformed the rules of the game in terms of finding an outlet for your skills so follow your heart, not conventional wisdom.

That said, I think there are some core skills that will always provide a springboard. Maths is obviously one of those skills that can open many doors but I would also advise people to learn about marketing, sales and other business fundamentals. From a business intelligence perspective it really adds an attractive dimension to your skills if you can link technical ability with a deeper understanding of how businesses operate.

Ajay You are a top expert and publisher on BI topics. Tell us something about

a) http://www.datamigrationpro.com/

b) http://www.dataqualitypro.com/

c) Involvement with the DataFlux community of experts

d) Your latest venture http://www.dqvote.com

Dylan- Data Migration Pro was my first foray into the social media space. I realised that very few people were talking about the challenges and techniques of data migration. On average, large organisations implement around 4 migration projects a year and most end in failure. A lot of this is due to a lack of awareness. Having worked for so long in this space I felt it was time to create a social media site to bring the wider community together. So we now have forums, regular articles, tools and techniques on the site with about 1400 members worldwide plus lots of plans in the pipeline for 2010.

Data Quality Pro followed on from the success of Data Migration Pro and our speed of growth really demonstrates how important data quality is right now. Again, awareness of the basic techniques and best-practices is key. I think many organisations are really starting to recognise the importance of better data quality management practices so a lot of our focus is on giving people practical advice and tools to get started. We are a community publishing platform, I do write regularly but we’ve always had a significant community contribution from expert practitioners and authors.

I didn’t just want to take a corporate viewpoint with these communities. As a result they are very much focused on the individual. That is why we post so many features on how to promote your skills, search for work, gain personal skills and generally get ahead in the profession. Data Quality Pro has just under 2,000 members and about 6,000 regular visitors a month so it demonstrates just how many people are really committed to learning about this discipline as it impacts practically every part of the business. I also think it is an excellent career choice as so many projects are dependent on good quality data there will always be demand.

The DataFlux community of experts is a great resource that I’ve actually admired for some time. I am a big fan of Jill Dyche who used to write on the community and of course there is a great line-up on there now with experts like David Loshin, Joyce Norris-Montanari and Mike Ferguson so I was delighted to be invited to participate. DataFlux have sponsored our sites from the very beginning and without their support we wouldn’t have grown to our current size. So although I’m vendor independent, it’s great to be sharing my thoughts and ideas with people who visit their site.

DQVote.com is a relatively new initiative. I noticed that there was some great data quality content being linked through platforms like Twitter but it would essentially become hard to find after several days. Also, there was no way for the community to vote on what content they found especially useful. DQVote.com allows people to promote their own content but also to vote and share other useful data quality articles, blogs, presentations, videos, tutorials – anything that adds value to the data quality community. It is also a great springboard for emerging data quality bloggers and publishers of useful content.

Ajay- Do you think BI projects can be more successful if we reward data entry people, or at least pay more for better quality data rather than ask them to fill in database tables as fast as they can? Especially in offshore call centres.

Dylan- Data entry is a pet frustration of mine. I regularly visit companies who are investing hundreds of thousands of pounds in data quality technology and consultants but nothing in grass-roots education and cultural change. They would rather create cleansing factories than resolve the issues at source.

So, yes I completely agree, the reward system has to change. I personally suffer from this all the time – call centre staff record incorrect or incomplete information about my service or account and it leads to billing errors, service problems, annoyance and eventually lost business. Call centre staff are not to blame, they are simply rewarded on the volume of customer service calls they can make, they are not encouraged to enter good quality data. The fault ultimately lies with the corporations that use these services and I don’t think offshore or onshore makes a difference. I’ve witnessed terrible data quality in-house also. The key is to have service level agreements on what quality of data is acceptable. I also think a reward structure as opposed to a penalty structure can be a much more progressive way of improving the quality of call-centre data.

Ajay- What are the top 5 things that you can help summarize your views on Business Intelligence – assume you are speaking to a class of freshmen statisticians.

Dylan- Business intelligence is wholly dependent on data quality. Accessibility, timeliness, accuracy, completeness, duplication – data quality dimensions like these can dramatically change the value of business intelligence to the organisation. Take nothing for granted with data, assume nothing. I have never, ever, assessed a dataset in a large business that did not have some serious data defects that were impacting decision making.

As statisticians, they therefore possess the tools to help organisations discover and measure these defects. They can find ways to continuously improve and ensure that future decisions are based on reliable data.

I would also add that business intelligence is not just about technology, it is about interpreting data to determine trends that will enable a company to improve their competitive advantage. Statistics are important but freshmen must also understand how organisations really create value for their customers.

My advice is to therefore step away from the tools and learn how the business operates on the ground. Really listen to workers and customers as they can bring the data to life. You will be able to create far more accurate dashboards and reports of where the issues and opportunities lie within a business if you immerse yourself with the people who create the data and the senior management who depend on the quality of your business intelligence platforms.

Ajay- Which software have you personally coded or implemented. Which one did you like the best and why?

Dylan- I’ve used most of the BI and DQ tools out there, all have strengths and weaknesses so it is very subjective. I have my favourites but I try to remain vendor neutral so I’ll have to gracefully decline on this one Ajay!

However, I did build a data profiling and data quality assessment tool several years ago. To be honest, that is the tool I like best because it had a range of features I still haven’t seen implemented so far in any other tools. If I ever get chance, and if no other vendor comes up with the same concept, I may yet take it to market. For now though, two young kids, two communities and a 12 hour day mean it is something of pipedream.

Ajay-What does Dylan Jones do when not helping data quality of the world go better.

Dylan- I’ve recently had another baby boy so kids take up most of whatever free time I have left. When we do get a break though I like to head to my home town and just hang out on the beach or go up into the mountains. I love travelling and as I effectively work completely online now, we’re really trying to figure out a way of combining travel and work.

Biography-

Dylan Jones is the founder and editor of Data Quality Pro and Data Migration Pro, the leading online expert community resources. Since the early nineties he has been helping large organisations tackle major information management challenges. He now devotes his time to fostering greater awareness, community and education in the fields of data quality and data migration via the use of social media channels. Dylan can be contacted via his profile page at http://www.dataqualitypro.com/data-quality-dylan-jones/ or at http://www.twitter.com/dataqualitypro

SAP caught stealing patents: Pays $139 Million

Curt Monash was right. SAP does have questionable business ethics. It has been caught stealing ideas of as many as 5 patents and has been told to pay 139 Million $.

How many more patents does SAP have in it’s closet ( wink wink). By Funding Blogs, and Blog Communities how much time is SAP trying to buy, by raising prices aribitarily for locked in customers and using the one time gain to buy companies with better decision management pedigrees.

Don’t belive me, huh. Here is PC World or just google/bing for SAP, patent lawsuit

http://www.pcworld.com/businesscenter/article/170899/versata_wins_139m_damages_in_sap_patent_lawsuit.html

Software HIStory: Bass Institute Part 1

or How SAS Institute needs to take competition from WPS, (sas language compiler) in an alliance with IBM, and from R (open source predictive analytics with tremendous academic support) and financial pressure from Microsoft and SAP more seriously.

On the weekend, I ran into Jeff Bass, owner of BASS Institute. BASS Institute provided a SAS -like compiler in the 1980’s , was very light compared to the then clunky SAS ( which used multiple floppies), and sold many copies. It ran out of money when the shift happened to PCs and SAS Institute managed to reach that first.

Today the shift is happening to cloud computing and though SAS has invested 70 Million in it, it still continues to SUPPORT Microsoft by NOT supporting or even offering financial incentives for customers to use  Ubuntu Linux server and Ubuntu Linux desktop. For academic students it charges 25$ per Windows license, and thus helping sell much more copies of Windows Vista. Why does it not give the Ubuntu Linux version free to students. Why does SAS Institute continue to give the online doc free to people who use it’s language, and undercut it. More importantly why does SAS charge LESS money for excellent software in the BI space. It is one of the best and cheapest BI software and the most expensive desktop software. Why Does the SAS Institute not support Hadoop , Map/Reduce database systems insted of focusing on Oracle, Teradata relationships and feelings ??

Anyways, back to Jeff Bass- This is part 1 of the interview.

Ajay- Jeff, tell us all about the BASS Institute?

Jeff-

the BASS system has been off the market for about 20 years and is an example of old, command line, DOS based software that has been far surpassed by modern products – including SAS for the PC platform.  It was fun providing a “SAS like” language for people on PCs – running MS DOS – but I scrapped the product when PC SAS became a reasonably useable product and PC’s got enough memory and hard disk space.
 
BASS was a SAS “work alike”…it would run many (but certainly not all) SAS programs with few modifications.  It required a DOS PC with 640K of RAM and a hard disk with 1MB of available space.  We used to demo it on a Toshiba laptop with NO hard disk and only a floppy drive.  It was a true compiler that parsed the data / proc step input code and generated 8086 assembly language that went through mild optimization, and then executed.
 
I no longer have the source code…it was saved to an ancient Irwin RS-232 tape drive onto tapes that no longer exist…it is fun how technology has moved on in 20 years!  The BASS system was written in Microsoft Pascal and the code for the compiler was similar to the code that would be generated by the Unix YACC “compiler compiler” when fed the syntax of the SAS data step language.  BASS included the “DATA Step” and the most basic PROCS, like MEANS, FREQ, REG, TTEST, PRINT, SORT and others.  Parts of the system were written in 8086 assembler (I have to smile when I remember that).  If I was to recreate it today, I would probably use YACC and have it produce R source code…but that is an idea I am never likely to spend any time on.
 
We sold quite a few copies of the software and BASS Institute, Incorporated was a going concern until PC SAS became debugged and reliable.  Then there was no point in continuing it.  But I think it would be fun for someone to write a modern open source version of a SAS compiler (the data step and basic procs were developed in the public domain at NC State University before Sall and Goodnight took the company private, so as long as no copyrighted code was used in any way, an open source compiler would probably be legal).
 
I still use SAS (my company has an enterprise license), but only very rarely.  I use R more often and am a big fan of free software (sometimes called open source software, but I like the free software foundation’s distinction at fsf.org).  I appreciated your recommendation of the book “R for SAS and SPSS Users” on your website.  I bought it for my Kindle immediately upon reading about it on your website.I no longer work in the software world; I’m a reimbursement and health policy director for the biotech firm Amgen, where I have worked since 1990 or so…  I also serve on the boards of a couple of non-profit organizations in the health care field.

the BASS system has been off the market for about 20 years and is an example of old, command line, DOS based software that has been far surpassed by modern products – including SAS for the PC platform.  It was fun providing a “SAS like” language for people on PCs – running MS DOS – but I scrapped the product when PC SAS became a reasonably useable product and PC’s got enough memory and hard disk space.

 

BASS was a SAS “work alike”…it would run many (but certainly not all) SAS programs with few modifications.  It required a DOS PC with 640K of RAM and a hard disk with 1MB of available space.  We used to demo it on a Toshiba laptop with NO hard disk and only a floppy drive.  It was a true compiler that parsed the data / proc step input code and generated 8086 assembly language that went through mild optimization, and then executed.

 

I no longer have the source code…it was saved to an ancient Irwin RS-232 tape drive onto tapes that no longer exist…it is fun how technology has moved on in 20 years!  The BASS system was written in Microsoft Pascal and the code for the compiler was similar to the code that would be generated by the Unix YACC “compiler compiler” when fed the syntax of the SAS data step language.  BASS included the “DATA Step” and the most basic PROCS, like MEANS, FREQ, REG, TTEST, PRINT, SORT and others.  Parts of the system were written in 8086 assembler (I have to smile when I remember that).  If I was to recreate it today, I would probably use YACC and have it produce R source code…but that is an idea I am never likely to spend any time on.

 

We sold quite a few copies of the software and BASS Institute, Incorporated was a going concern until PC SAS became debugged and reliable.  Then there was no point in continuing it.  But I think it would be fun for someone to write a modern open source version of a SAS compiler (the data step and basic procs were developed in the public domain at NC State University before Sall and Goodnight took the company private, so as long as no copyrighted code was used in any way, an open source compiler would probably be legal).

 

I still use SAS (my company has an enterprise license), but only very rarely.  I use R more often and am a big fan of free software (sometimes called open source software, but I like the free software foundation’s distinction at fsf.org).  I appreciated your recommendation of the book “R for SAS and SPSS Users” on your website.  I bought it for my Kindle immediately upon reading about it on your website.

 

I’m a reimbursement and health policy director for the biotech firm Amgen, where I have worked since 1990 or so…  I also serve on the boards of a couple of non-profit organizations in the health care field.

Ajay- Any comments on WPS?

Jeff- I’m glad WPS is out there.  I think alternatives help keep the SAS folks aware that they have to care about competition, at least a little 😉

( Note from Ajay-

You can see more on WPS at http://www.teamwpc.co.uk/home

wps

and on SAS at http://www.sas.com/


MS Smacks Google Docs with Slideshare

Our favorite drop outs from the Phd Program just learned that they should not moon the giant. The company founded in Paul Allen building at Stanford, also known as Gogol /Google announced they would create a Cloud OS with much fan fare. Only to find their own cloud prodocutivity offering Google Docs bested by Slideshare.

Now you can import your Gmail attachments Google docs into slideshare, for much better professional sharing within your office.

Here is an embedded SlideShare ppt called Google Hacks, note the much better visual appeal in this vis a vis your Google Docs.

Well as for the Stanford dropouts this is what happens when you dont complete your Phd education.

Citation- http://www.slideshare.net/rickdog/google-hacks

As per Cloud Computing and Office productivity goes,

Harvard Dropouts (Microsoft) 1- Stanford Dropouts ( Google) 0

Unless Google creates a cloud version of Open Office- but who needs that anyway?

who needs search- just ctrl F

Google Hacks

View more documents from rickdog.
Disclaimer- The author uses Google Docs extensively. If you are from Google. Please do not block his Gmail id , guys.
Academic Disclaimer-The author intends to complete his Phd. these are his personal views only.

OT-The Dude of Data

I am creating a new website www.dudeofdata.com which will be more irreverant, more sarcastic on corporate hypocricy in analytics and statistics. It will feature the hack kings of mashing software and would be expunged for lots of stuff. Strategic intention would be to make statistics “cool’ for my fellow hosts in America- I see less American faces in science blocks than East Asian faces.

You can follow the twitter page www.twitter.com/dudeofdata in the meantime for the launch. It should take me less than an hour to set up the WordPress instal and CMS- but my hours are spent learning classes, homework, washing dishes and cooking food, ( and also learning rock climbing and playing soccer with the people).

In the meantime here is a salute to Airlines- I recently flew 30 hours from India to USA and can relate.