WordPress.com and Facebook.com Web Stats

dstats

WordPress.com stats is quite nice and easy for blogger to maintain as there are no plugins etc and most importantly the system ignores your own logins and spurious traffic from web spiders or crawlers.

So while the daily average for the site is 140 views ( or ~100 unque people) if added to 600 in daily newsletter (or around 200 reads) that’s almost a readership of 300 per day. No wonder my old 13 dollars a month server could not cope up.

If you have a blog and use wordpress, wordpress.com is thus both a cheaper as well as more traffic generating option.

I love the Facebook.com stats for the FB page though – the segmentations  are quite neat while interactions do have a chance to spill into personal networking between people of common interests.

fb

Big Data in the Big Apple

y

I am all set to go to the Big Data summit on OCTOBER    1 in  New York

It talks  on Aster Data and their  passion in success in crunching data faster than the speed of thought. Interesting stuff includes Map/Reduce , Hadoop, Big Data and people who have experience in them.

As  a graduate student who is about  to  start his thesis in

statistical ( Regression , Em Algorithm, K Means Clustering) computation (Chunking , and aggregation) of :

MIXED data

  • (structured row and column numbers) and
  • (UNSTRUCTURED text) using  ENTERPRISE SPECIFIC SEARCH

in a  computing  environment that

  • uses HPC nodes and a MIXED GRID of desktops AND IDLE WEB SERVERS on  THE INTERNET

seminars like these are the only way to learn cutting edge stuff.

I would also be present in SAS Data Mining 2009 Conference in Las Vegas in October. Hope to see you there.

At both the conferences I would be interviewing people ( preferably using Video and someone to ask the questions- my spoken accent is very bad). Also rather sheepishly- I will be giving an interview at the Big Data Summit. I have given only two interviews till now-

One for my mentor Vincent Granville, founder Analyticbridge here in April 2008 and the other was on my poetry .

So I guess the interview is on the other side 🙂

StiMulating Conversation

Stimulating conversation is the bait

Stimulating conversation is the bait

Lure the curious monkey to his zoo like fate

curious_george

Come stimulate the conversation for a while

Amuse us O Exotic one, with your pungent style.

We are all egalitarian, at least we have to pretend

This is the American south dont you comprehend.

Stay quiet and keep shut, do you job , move your nut

Ajays friend

Our patience is as deep the color in our skin,

Go ahead and slave for us, lest we begin

There are trees in Tennessee , tall enough to hang you

Curiosity killed the cat- it will noose the monkey too.

( Inspired by a Real life incident


Analyzing Monkeys

I once promised a reader long time back that I would not get into politics but something unexpected hit me like a big truck.

At what point do you decide your boss is a racist. How do you analyze the difference between jokes and racial insults.

Another interesting analysis

Citation Emerald

Interview Augusto Albeghi (Straycat) —Founder Straysoft

An interview with Augusto (StrayCat), a Startup Entrepreneur with an interesting technology StraySoft.

Ajay- Describe your career as a BI consultant.

Straycat- I’m an aerospace engineer who had to turn to IT right after graduation because of the Italian aerospace industry crisis in the first half of the 90’s . My first job was by the company now called Accenture, as a simple developer. I was part of a large project for a large US food corporation.

We built an enterprise level reporting and budgeting system based on what was later to become Hyperion. After that I had various experiences, always as an IT professional, always focusing on BI or related subjects. I worked for the Milan Airport Authority, the l’Oreal group and couple of local software houses. Now I’m a project manager by a large Italian consulting firm but, most of all, I’m a bootstrapping entrepreneur.

Ajay- How do you think we can teach BI at an early stage to young students.

Straycat- I think that the main problem resides in the naïve university approach toward business data analysis. Collecting data is considered trivial compared to other related subjects.

Data availability is often given for granted, then equations are written upon them. No use to say it is not trivial at all and there is an entire class of problems which students are not aware of.

A few lessons spent focusing on data quality, aggregations, measure definitions etc are enough to create the necessary awareness of the problem. It’s no longer cool telling to be ignorant on the subject!

Ajay- Describe the most challenging project you ever did. Name a project which led to the biggest dollar impact.

Straycat- About three years ago we signed a contract with a large fashion firm here in Italy to reengineer their entire business intelligence setup. It has been a project ranging from sales to production, from accounting to human resources.

It impacted almost a thousand users in six different time zones. The main challenge we had to tackle was the fragmentation of their legacy BI systems, which produced different jargon and practices across the corporation. We changed the database and the presentation layer, built a modern datawarehouse, and worked relentlessly on change management.

I can’t disclose figures but the new unified system shed light on some bad practices, revealed inefficiencies and provided a whole new set of analytics that increased market awareness.

Ajay- Describe your start-up StraySoft and what it is hoping to accomplish.

Straycat-

StraySoft is a small and fresh startup devoted to build Business Intelligence applications.

It produces Viney@rd an Excel/SQL Server based spreadsheet automation and BI tool.

I have personal reasons for embarking in such a project but the kick off came from a sudden realization. Despite the terrific sophistication level provided by current BI tools, the one thing each and every user wants is to have data in MS Excel.

This is simply a fact, users get data, elaborate them and make Excel reports. It’s not a matter of features, people feel in full control only when they have an Excel file.

Why? Because Excel is able to address a single cell, and the figures within can be adjusted at will and saved in a familiar place like C: .

So, the original idea (2 years and one half ago) was to create a tool to refresh a complex layout without disrupting it i.e. a tool which could address query results into single cells.

This can be done by Excel alone but it’s far too difficult even for and advanced user.

Viney@rd features this but I soon realized that, if I wanted to go down this path, I had to tackle a second issue: the data provided by the systems are never the data required by the user. I’m not talking about bad analysis or wrong KPIs; even if the architects did everything fine, the human brain works according to categories that often are not saved within a database.

Example: you are a salesman and you have 4 customers who make the 75% of your business. Plus you have 40 customers who make the other 25% of your business.

Question: “how many customers do you have?”, reply: “I have 4, customer A,B.C and D. A is bla bla bla, B is bla bla bla, C etc. etc. Oh, by the way I have some others but they are marginal.”

The salesman needs any kind of information about the 4, and just few hints about the rest; every detailed information about the rest is perceived as clutter. He needs a screen with 4 ultra detailed sheets per customer, not a customer ABC report with 44 rows.

So far nothing revolutionary, what is revolutionary is that the user himself must be able to tag the 4 main customers according to his own perception of the customer importance.

If one of the small customers is going to place a large order, than it must become important as well and should immediately take the fifth place, to be automatically demoted when the opportunity expires.

The point is that these rules are defined heuristically by the human brain and have so many exceptions that can be handled only by a human brain. This consideration led to implementing the unique feature of letting the users change their data directly by an Excel table.

The Viney@rd database is easy to be fed by traditional techniques but Excel sheet data can be saved within as well. This gives the best of both worlds, a central repository for “conventional” data, so no more “spreadsheet hell” nightmares, but the ability to classify and adjust the data still working in Excel.

This approach has limits, specifically when we talk about large amounts of data, I’m the first to admit it, but I still think that it’s the one thing that can popularize BI among business users.

When large vendors will embrace this, I’ll remind them this interview! :o) Viney@rd now is in its infancy but already implements these two core features.

There are a lot of things to do, and many features to add to take it to a full corporate level, but I enjoy the process so much that I can’t stop working on it!

I’ve been asked “What if I buy from you and you go belly up next year?”. My reply is that you must shoot me to stop me from working on it! I still have a long list of features to implement and I’m not going to dismiss the fun!

For example, did you ever notice that people think naturally in terms of information streams ….?

I’ll consider myself successful when 3 conditions will be met:

a) I’ll have a body of satisfied users which had their working lives improved by my products

b) I’ll make a living out of StraySoft together with the employees, when I’ll have some

c) people will think to me as the Business Intelligence “enfant terrible”.

Ajay-  What do you do in your spare time ?

StrayCat- Sorry? What’s spare time? Jokes apart, I devote time to my wife, who’s really supportive in this effort. Late at night, before falling asleep, I’m used to read for half an hour: I’m passionate about history; but the events I really never miss are the Italian National Rugby Union Team matches.

Ajay- why do you tweet using the name Stray Cat ?

Augusto– I named the company StraySoft after the adoption of a stray cat; the full story is told here http://www.straysoft.com/dblog/articolo.asp?id=30.

The twitter name came as natural as naming the company. I know that someone may find it awkward but I feel like going upstream on that! Secondly, I want to keep my consulting activity and StraySoft totally separate for a matter of convenience. I did not, and will never propose my product to my consulting customers.

Ajay-  What visible trends in Business Intelligence do you fore see for the next two to three years.

Augusto- The #1 trend is that all the main vendors (excluding Microsoft, which already did) finally realized that there’s a midrange market which needs BI more than ever.

What they’re doing wrong is targeting this segment with the same enterprise class tools which miss the few key features required by this market.

The #2 trend is the rising of workgroup BI and the new dignity given to informal analysis. This is a whole new approach I do not share completely but I admit it has its strengths.

The #3 trend is at the opposite side of the spectrum; unconventional databases (columnar stores, appliances etc.) are becoming increasingly popular to manage very large amounts of data.

There are two fake trends: Clouds and SaaS. They’ll get a share of the market but will not become, in the foreseeable future, the reference architecture. Thank you again for giving me voice. All the best. Augusto Albeghi

Ajay-To know more on Augusto’s startup and Vineyard please see www.straysoft.com

R and SAS- Together again at PAWS

Two of my favorite speakers ( though maybe not favorite to each other) speak at PAWS ,

Anne Milley from SAS and David Smith, REvolution Computing.Also a great author and writer, Stephen Baker from Numerati ( that mathematical equivalent of The Godfather). More events at the link below.

Hmmmm- I hope they attend each other’s sessions just to keep up, but is that asking too much?

Citation-http://www.predictiveanalyticsworld.com/dc/2009/agenda.php#day1-22

7:30pm-10:00pm
useR Meeting
Room: Magnolia
– Sponsored by  Please join the group at www.meetup.com/R-users-DC/

R is an open source programming language for statistical computing, data analysis, and graphical visualization. R has an estimated one million users worldwide, and its user base is growing. While most commonly used within academia, in fields such as computational biology and applied statistics, it is gaining currency in commercial areas such as quantitative finance and business intelligence.

Among R’s strengths as a language are its powerful built-in tools for inferential statistics, its compact modeling syntax, its data visualization capabilities, and its ease of connectivity with persistent data stores (from databases to flatfiles).

In addition, R is open source nature and extensible via add-on “packages” allowing it to keep up with the leading edge in academic research.

For all its strengths, though, R has an admittedly steep learning curve; the first steps towards learning and using R can be challenging.

This DC R Users Group is dedicated to bringing together area practitioners of R to exchange knowledge, inspire new users, and spur the adoption of R for innovative research and commercial applications.


Wednesday October 21, 2009

8:00am-9:00am
Registration & Continental Breakfast


9:00am-9:50am
Keynote
Room: Magnolia
Opportunities and Pitfalls:
What the World Does and Doesn’t Want from Predictive Analytics

Mathematicians and statisticians are churning through mountains of data in their efforts to model and predict human behavior. The goal is to optimize every function possible, from sales and marketing to the enterprise itself. These Numerati are guided by the two dominant models of the late 20th century, the modeling of financial markets and of industrial systems. How do humans fit into these systems? And what will their response be when the analytic systems appear to misunderstand them or invade their privacy?

Stephen Baker joins PAW to directly address the Numerati. In his keynote presentation, Mr. Baker will guide us toward the untapped goldmines where predictive analytics will be embraced and thrive, and teach us to anticipate and maneuver around two central pitfalls: Consumer misperception of us, and our inadvertent mistreatment of them.

Moderator: Eric Siegel, Program Chair, Predictive Analytics World

Speaker: Stephen Baker, BusinessWeek – author, The Numerati


9:50am-10:10am
Platinum Sponsor Presentation
Room: Magnolia
Strength in Numbers: ACE!

As more organizations are beginning their analytical journey or reinvigorating their existing efforts, Analytic Centers of Excellence (ACEs) are helping them along the way. The interest in ACEs is growing across industries as organizations seek better ways to tap into their analytic infrastructure-most importantly, scarce high-end analytic expertise to improve results. We will highlight valuable best practices for achieving greater analytic bandwidth realizing more and better evidence-based decisions.

Moderator: Eric Siegel, Program Chair, Predictive Analytics World

Speaker: Anne Milley, Senior Director of Tech. Product Marketing, SAS

Red R- A new beginning

Check out an interesting new interface to R.

Note I haven’t tested it but plan to do so shortly as I am currently using Ubuntu 9 almost exclusively nowadays.

R fans who are  not quite overjoyed  with the wonderful beauty and charm  of the traditional R GUI may want to give it a try.

Citation-

http://code.google.com/p/r-orange/

Note- This website does not assume responsibilty for any software glitches as R comes with no warranty- unlike other softwares that come loaded with both a warranty and then bug-fix patches.

redr