Statistics on Social Media

Some official statistics on social media from the owners themselves

1) Facebook-

Date -17 Nov 2011


People on Facebook

Interview James Dixon Pentaho

Here is an interview with James Dixon the founder of Pentaho, self confessed Chief Geek and CTO. Pentaho has been growing very rapidly and it makes open source Business Intelligence solutions- basically the biggest chunk of enterprise software market currently.

Ajay-  How would you describe Pentaho as a BI product for someone who is completely used to traditional BI vendors (read non open source). Do the Oracle lawsuits over Java bother you from a business perspective?


Pentaho has a full suite of BI software:

* ETL: Pentaho Data Integration

* Reporting: Pentaho Reporting for desktop and web-based reporting

* OLAP: Mondrian ROLAP engine, and Analyzer or Jpivot for web-based OLAP client

* Dashboards: CDF and Dashboard Designer

* Predictive Analytics: Weka

* Server: Pentaho BI Server, handles web-access, security, scheduling, sharing, report bursting etc

We have all of the standard BI functionality.

The Oracle/Java issue does not bother me much. There are a lot of software companies dependent on Java. If Oracle abandons Java a lot resources will suddenly focus on OpenJDK. It would be good for OpenJDK and might be the best thing for Java in the long term.

Ajay-  What parts of Pentaho’s technology do you personally like the best as having an advantage over other similar proprietary packages.

Describe the latest Pentaho for Hadoop offering and Hadoop/HIVE ‘s advantage over say Map Reduce and SQL.

James- The coolest thing is that everything is pluggable:

* ETL: New data transformation steps can be added. New orchestration controls (job entries) can be added. New perspectives can be added to the design UI. New data sources and destinations can be added.

* Reporting: New content types and report objects can be added. New data sources can be added.

* BI Server: Every factory, engine, and layer can be extended or swapped out via configuration. BI components can be added. New visualizations can be added.

This means it is very easy for Pentaho, partners, customers, and community member to extend our software to do new things.

In addition every engine and component can be fully embedded into a desktop or web-based application. I made a youtube video about our philosophy:

Our Hadoop offerings allow ETL developers to work in a familiar graphical design environment, instead of having to code MapReduce jobs in Java or Python.

90% of the Hadoop use cases we hear about are transformation/reporting/analysis of structured/semi-structured data, so an ETL tool is perfect for these situations.

Using Pentaho Data Integration reduces implementation and maintenance costs significantly. The fact that our ETL engine is Java and is embeddable means that we can deploy the engine to the Hadoop data nodes and transform the data within the nodes.

Ajay-  Do you think the combination of recession, outsourcing,cost cutting, and unemployment are a suitable environment for companies to cut technology costs by going out of their usual vendor lists and try open source for a change /test projects.

Jamie- Absolutely. Pentaho grew (downloads, installations, revenue) throughout the recession. We are on target to do 250% of what we did last year, while the established vendors are flat in terms of new license revenue.

Ajay-  How would you compare the user interface of reports using Pentaho versus other reporting software. Please feel free to be as specific.

James- We have all of the everyday, standard reporting features covered.

Over the years the old tools, like Crystal Reports, have become bloated and complicated.

We don’t aim to have 100% of their features, because we’d end us just as complicated.

The 80:20 rule applies here. 80% of the time people only use 20% of their features.

We aim for 80% feature parity, which should cover 95-99% of typical use cases.

Ajay-  Could you describe the Pentaho integration with R as well as your relationship with Weka. Jaspersoft already has a partnership with Revolution Analytics for RevoDeployR (R on a web server)-

Any  R plans for Pentaho as well?

James- The feature set of R and Weka overlap to a small extent – both of them include basic statistical functions. Weka is focused on predictive models and machine learning, whereas R is focused on a full suite of statistical models. The creator and main Weka developer is a Pentaho employee. We have integrated R into our ETL tool. (makes me happy 🙂 )

(probably not a good time to ask if SAS integration is done as well for a big chunk of legacy base SAS/ WPS users)


As “Chief Geek” (CTO) at Pentaho, James Dixon is responsible for Pentaho’s architecture and technology roadmap. James has over 15 years of professional experience in software architecture, development and systems consulting. Prior to Pentaho, James held key technical roles at AppSource Corporation (acquired by Arbor Software which later merged into Hyperion Solutions) and Keyola (acquired by Lawson Software). Earlier in his career, James was a technology consultant working with large and small firms to deliver the benefits of innovative technology in real-world environments.

Creating an Anonymous Bot

or Surfing the Net Anonmously and Having some Fun.

On the weekend, while browsing through I came across an intriguing offer-

Basically projects asking for increasing Youtube Views-


So this is one way I though it could be done-

1) Create an IP Address Anonymizer

Thats pretty simple- I used the Tor Project at

Basically it uses a peer to peer network to  connect to the internet and you can reset the connection as you want-so it hides your IP address.

Also useful for sending hatemail- limitation uses Firefox browser only.And also your webpage default keeps changing languages as the ip address changes.


The Tor Project is a 501(c)(3) non-profit based in the United States. The official address of the organization is:

The Tor Project
969 Main Street, Suite 206
Walpole, MA 02081 USA
Check your IP address at

2) Creating a Bot or an automatic clicking code ( without knowing code)

Go to

Remember when you could create an Excel Macro by just recording the Macro (in Excel 2003)

So while surfing if you need to do something again and again (like go the same Youtube video and clicking Like 5000 times) you can press record Macro

  • Do the action you want repeated again and again.
  • Click save Macro
  • Now run the Macro in a loop using the iMacro extension.

see screenshot below-

Note I have added two lines of code -WAIT SECONDS= 6

This means everytime the code runs in a loop it will wait for 6 seconds and then reload.

However I recommend you create a random number of wait seconds using Google Spreadsheet and the function RANDBETWEEN(5,400) (to limit between 5 and 400 seconds) and also use CONCATENATE with click and drag to create RANDOM wait times (instead of typing it say 500 times yourself)


That’s it – Your Anonymous Bot is ready.

See the  analytical results for my personal favourite Streaming Poetry video

Easy isn’t it. Lines of code written= 0 , Number of Views =335 (before I grew bored)

Note- Officially it is against Youtube Terms to  use scripts or Bots so I did it for Research Purposes only. And the needs to look into the activities underway at and also and

The final word on these activities is by or

Using Red R- R with a Visual Interface

For people complaining about the GUI on R, here is the ah Enterprise Version of R called Red R.

It is available at the website at


You can read more there or just go through the short video created by them at

Basically it is a click and point method of using R with the ability to store schemas and thus very good for repeatable operations as well.

Not bad for epic software, huh?