Ah! The Internet.

On the Internet I am not brown or black or white. I am Anonymous and yet myself. I am free to choose  whatever identity I wish to choose, free to drink from whatever pools of knowledge my local government wishes to forbid. The Internet does not care about how rich or poor I may be. It has ways to track exactly where I am, but it has tools to disguise that as well. On the internet the strongest government, the richest corporation and the deepest pockets can tremble before the bits and bytes of a talented and motivated hacker working from his basement in his parents house.
There are no losers on the Internet: only winners. Except for those who seek to covet and control the uncontrollable- the human desire to seek knowledge beyond the confines of whatever cave they may find themselves borne in.
There are no countries to wage war on the Internet: there is nothing to kill and die for. The Internet allowed a million writers to write and publish without the interference of brokers and intermediaries. It allowed a billion people to download a trillion songs that were locked away in some rich man’s virtual vault. It allowed a dozen countries to overthrow their dictators without wasting a billion worth of goods and treasure.

On the Internet, everyone is equal, free and true to the own nature they choose, not the fate that is chosen by corporation, country or circumstance.
Ah! The Internet- it will set you free.

Interview Rapid-I -Ingo Mierswa and Simon Fischer

Here is an interview with Dr Ingo Mierswa , CEO of Rapid -I and Dr Simon Fischer, Head R&D. Rapid-I makes the very popular software Rapid Miner – perhaps one of the earliest leading open source software in business analytics and business intelligence. It is quite easy to use, deploy and with it’s extensions and innovations (including compatibility with R )has continued to grow tremendously through the years.

In an extensive interview Ingo and Simon talk about algorithms marketplace, extensions , big data analytics, hadoop, mobile computing and use of the graphical user interface in analytics.

Special Thanks to Nadja from Rapid I communication team for helping coordinate this interview.( Statuary Blogging Disclosure- Rapid I is a marketing partner with Decisionstats as per the terms in https://decisionstats.com/privacy-3/)

Ajay- Describe your background in science. What are the key lessons that you have learnt while as scientific researcher and what advice would you give to new students today.

Ingo: My time as researcher really was a great experience which has influenced me a lot. I have worked at the AI lab of Prof. Dr. Katharina Morik, one of the persons who brought machine learning and data mining to Europe. Katharina always believed in what we are doing, encouraged us and gave us the space for trying out new things. Funnily enough, I never managed to use my own scientific results in any real-life project so far but I consider this as a quite common gap between science and the “real world”. At Rapid-I, however, we are still heavily connected to the scientific world and try to combine the best of both worlds: solving existing problems with leading-edge technologies.

Simon: In fact, during my academic career I have not worked in the field of data mining at all. I worked on a field some of my colleagues would probably even consider boring, and that is theoretical computer science. To be precise, my research was in the intersection of game theory and network theory. During that time, I have learnt a lot of exciting things, none of which had any business use. Still, I consider that a very valuable experience. When we at Rapid-I hire people coming to us right after graduating, I don’t care whether they know the latest technology with a fancy three-letter acronym – that will be forgotten more quickly than it came. What matters is the way you approach new problems and challenges. And that is also my recommendation to new students: work on whatever you like, as long as you are passionate about it and it brings you forward.

Ajay-  How is the Rapid Miner Extensions marketplace moving along. Do you think there is a scope for people to say create algorithms in a platform like R , and then offer that algorithm as an app for sale just like iTunes or Android apps.

 Simon: Well, of course it is not going to be exactly like iTunes or Android apps are, because of the more business-orientated character. But in fact there is a scope for that, yes. We have talked to several developers, e.g., at our user conference RCOMM, and several people would be interested in such an opportunity. Companies using data mining software need supported software packages, not just something they downloaded from some anonymous server, and that is only possible through a platform like the new Marketplace. Besides that, the marketplace will not only host commercial extensions. It is also meant to be a platform for all the developers that want to publish their extensions to a broader community and make them accessible in a comfortable way. Of course they could just place them on their personal Web pages, but who would find them there? From the Marketplace, they are installable with a single click.

Ingo: What I like most about the new Rapid-I Marketplace is the fact that people can now get something back for their efforts. Developing a new algorithm is a lot of work, in some cases even more that developing a nice app for your mobile phone. It is completely accepted that people buy apps from a store for a couple of Dollars and I foresee the same for sharing and selling algorithms instead of apps. Right now, people can already share algorithms and extensions for free, one of the next versions will also support selling of those contributions. Let’s see what’s happening next, maybe we will add the option to sell complete RapidMiner workflows or even some data pools…

Ajay- What are the recent features in Rapid Miner that support cloud computing, mobile computing and tablets. How do you think the landscape for Big Data (over 1 Tb ) is changing and how is Rapid Miner adapting to it.

Simon: These are areas we are very active in. For instance, we have an In-Database-Mining Extension that allows the user to run their modelling algorithms directly inside the database, without ever loading the data into memory. Using analytic databases like Vectorwise or Infobright, this technology can really boost performance. Our data mining server, RapidAnalytics, already offers functionality to send analysis processes into the cloud. In addition to that, we are currently preparing a research project dealing with data mining in the cloud. A second project is targeted towards the other aspect you mention: the use of mobile devices. This is certainly a growing market, of course not for designing and running analyses, but for inspecting reports and results. But even that is tricky: When you have a large screen you can display fancy and comprehensive interactive dashboards with drill downs and the like. On a mobile device, that does not work, so you must bring your reports and visualizations very much to the point. And this is precisely what data mining can do – and what is hard to do for classical BI.

Ingo: Then there is Radoop, which you may have heard of. It uses the Apache Hadoop framework for large-scale distributed computing to execute RapidMiner processes in the cloud. Radoop has been presented at this year’s RCOMM and people are really excited about the combination of RapidMiner with Hadoop and the scalability this brings.

 Ajay- Describe the Rapid Miner analytics certification program and what steps are you taking to partner with academic universities.

Ingo: The Rapid-I Certification Program was created to recognize professional users of RapidMiner or RapidAnalytics. The idea is that certified users have demonstrated a deep understanding of the data analysis software solutions provided by Rapid-I and how they are used in data analysis projects. Taking part in the Rapid-I Certification Program offers a lot of benefits for IT professionals as well as for employers: professionals can demonstrate their skills and employers can make sure that they hire qualified professionals. We started our certification program only about 6 months ago and until now about 100 professionals have been certified so far.

Simon: During our annual user conference, the RCOMM, we have plenty of opportunities to talk to people from academia. We’re also present at other conferences, e.g. at ECML/PKDD, and we are sponsoring data mining challenges and grants. We maintain strong ties with several universities all over Europe and the world, which is something that I would not want to miss. We are also cooperating with institutes like the ITB in Dublin during their training programmes, e.g. by giving lectures, etc. Also, we are leading or participating in several national or EU-funded research projects, so we are still close to academia. And we offer an academic discount on all our products 🙂

Ajay- Describe the global efforts in making Rapid Miner a truly international software including spread of developers, clients and employees.

Simon: Our clients already are very international. We have a partner network in America, Asia, and Australia, and, while I am responding to these questions, we have a training course in the US. Developers working on the core of RapidMiner and RapidAnalytics, however, are likely to stay in Germany for the foreseeable future. We need specialists for that, and it would be pointless to spread the development team over the globe. That is also owed to the agile philosophy that we are following.

Ingo: Simon is right, Rapid-I already is acting on an international level. Rapid-I now has more than 300 customers from 39 countries in the world which is a great result for a young company like ours. We are of course very strong in Germany and also the rest of Europe, but also concentrate on more countries by means of our very successful partner network. Rapid-I continues to build this partner network and to recruit dynamic and knowledgeable partners and in the future. However, extending and acting globally is definitely part of our strategic roadmap.

Biography

Dr. Ingo Mierswa is working as Chief Executive Officer (CEO) of Rapid-I. He has several years of experience in project management, human resources management, consulting, and leadership including eight years of coordinating and leading the multi-national RapidMiner developer team with about 30 developers and contributors world-wide. He wrote his Phd titled “Non-Convex and Multi-Objective Optimization for Numerical Feature Engineering and Data Mining” at the University of Dortmund under the supervision of Prof. Morik.

Dr. Simon Fischer is heading the research & development at Rapid-I. His interests include game theory and networks, the theory of evolutionary algorithms (e.g. on the Ising model), and theoretical and practical aspects of data mining. He wrote his PhD in Aachen where he worked in the project “Design and Analysis of Self-Regulating Protocols for Spectrum Assignment” within the excellence cluster UMIC. Before, he was working on the vtraffic project within the DFG Programme 1126 “Algorithms for large and complex networks”.

http://rapid-i.com/content/view/181/190/ tells you more on the various types of Rapid Miner licensing for enterprise, individual and developer versions.

(Note from Ajay- to receive an early edition invite to Radoop, click here http://radoop.eu/z1sxe)

 

How to surf anonymously on the mobile- Use Orbot

This is an interesting use case of anonymous surfing through mobile by using Tor Project on the Android Mobile OS.

Source- https://guardianproject.info/apps/orbot/
 

Orbot requires different configuration depending on the Android operating system version it is used on.

For standard Android 1.x devices (G1, MyTouch3G, Hero, Droid Eris, Cliq, Moment)

  • WEB BROWSING: You can use the Orweb Privacy Browser which we offer, which only works via Orbot and Tor.
  • For Instant Messsaging, please try Gibberbot which provides integrated, optional support for Orbot and Tor.

For Android 2.x devices: Droid, Nexus, Evo, Galaxy

  • WEB BROWSING: Non-rooted devices should use Firefox for Android with our ProxyMob Add-On to browse via the Tor network. Rooted devices can take advantage of transparent proxying (see below) and do not need an additional app installed.
  • Transparent Proxying: You must root your device in order for Orbot to work transparently for all web and DNS traffic. If you root your device, whether it is 1.x or 2.x based, Orbot will automatically, transparently proxy all web traffic on port 80 and 443 and all DNS requests. This includes the built-in Browser, Gmail, YouTube, Maps and any other application that uses standard web traffic.
  • For Instant Messsaging, please try Gibberbot which provides integrated, optional support for Orbot and Tor.

Developers

Contribution to #Rstats by Revolution

I have been watching for Revolution Analytics product almost since the inception of the company. It has managed to sail over storms, naysayers and critics with simple and effective strategy of launching good software, making good partnerships and keeping up media visibility with white papers, joint webinars, blogs, conferences and events.

However this is a listing of all technical contributions made by Revolution Analytics products to the #rstats project.

1) Useful Packages mostly in parallel processing or more efficient computing like

 

2) RevoScaler package to beat R’s memory problem (this is probably the best in my opinion as it is yet to be replicated by the open source version and is a clear cut reason for going in for the paid version)

http://www.revolutionanalytics.com/products/enterprise-big-data.php

  • Efficient XDF File Format designed to efficiently handle huge data sets.
  • Data Step Functionality to quickly clean, transform, explore, and visualize huge data sets.
  • Data selection functionality to store huge data sets out of memory, and select subsets of rows and columns for in-memory operation with all R functions.
  • Visualize Large Data sets with line plots and histograms.
  • Built-in Statistical Algorithms for direct analysis of huge data sets:
    • Summary Statistics
    • Linear Regression
    • Logistic Regression
    • Crosstabulation
  • On-the-fly data transformations to include derived variables in models without writing new data files.
  • Extend Existing Analyses by writing user- defined R functions to “chunk” through huge data sets.
  • Direct import of fixed-format text data files and SAS data sets into .xdf format

 

3) RevoDeploy R for  API based R solution – I somehow think this feature will get more important as time goes on but it seems a lower visibility offering right now.

http://www.revolutionanalytics.com/products/enterprise-deployment.php

  • Collection of Web services implemented as a RESTful API.
  • JavaScript and Java client libraries, allowing users to easily build custom Web applications on top of R.
  • .NET Client library — includes a COM interoperability to call R from VBA
  • Management Console for securely administrating servers, scripts and users through HTTP and HTTPS.
  • XML and JSON format for data exchange.
  • Built-in security model for authenticated or anonymous invocation of R Scripts.
  • Repository for storing R objects and R Script execution artifacts.

 

4) Revolutions IDE (or Productivity Environment) for a faster coding environment than command line. The GUI by Revolution Analytics is in the works. – Having used this- only the Code Snippets function is a clear differentiator from newer IDE and GUI. The code snippets is awesome though and even someone who doesnt know much R can get analysis set up quite fast and accurately.

http://www.revolutionanalytics.com/products/enterprise-productivity.php

  • Full-featured Visual Debugger for debugging R scripts, with call stack window and step-in, step-over, and step-out capability.
  • Enhanced Script Editor with hover-over help, word completion, find-across-files capability, automatic syntax checking, bookmarks, and navigation buttons.
  • Run Selection, Run to Line and Run to Cursor evaluation
  • R Code Snippets to automatically generate fill-in-the-blank sections of R code with tooltip help.
  • Object Browser showing available data and function objects (including those in packages), with context menus for plotting and editing data.
  • Solution Explorer for organizing, viewing, adding, removing, rearranging, and sourcing R scripts.
  • Customizable Workspace with dockable, floating, and tabbed tool windows.
  • Version Control Plug-in available for the open source Subversion version control software.

 

Marketing contributions from Revolution Analytics-

1) Sponsoring R sessions and user meets

2) Evangelizing R at conferences  and partnering with corporate partners including JasperSoft, Microsoft , IBM and others at http://www.revolutionanalytics.com/partners/

3) Helping with online initiatives like http://www.inside-r.org/ (which is curiously dormant and now largely superseded by R-Bloggers.com) and the syntax highlighting tool at http://www.inside-r.org/pretty-r. In addition Revolution has been proactive in reaching out to the community

4) Helping pioneer blogging about R and Twitter Hash tag discussions , and contributing to Stack Overflow discussions. Within a short while, #rstats online community has overtaken a lot more established names- partly due to decentralized nature of its working.

 

Did I miss something out? yes , they share their code by GPL.

 

Let me know by feedback

What to do if you see a possible GPL violation

GNU Lesser General Public License
Image via Wikipedia

Well I have played with software (mostly but not exclusively) analytical, and I admire the zeal and energy of both open source and closed source practioners- all having relatively decent people executing strategies their investors or owners tell them to do (closed source) or motivated by their own self sense of cool-change the world-openness (open source)

What I dont get is people stealing open source code- repackaging without adding major contributions- claiming patent pending stuff- and basically making money by creating CLOSED source from the open source software-(as open source is yet to break the enterprise glass cieling)

you are either open source or you arent.

bi- sexuality is okay. bi-codability is not.

Next time you see someone stealing some community’s open source code- refer to this excellent link.

 

But, we cannot act on our own if we do not hold copyright. Thus, be sure to find out who the copyright holders of the software are before reporting a violation.

http://www.gnu.org/licenses/gpl-violation.html

Violations of the GNU Licenses

If you think you see a violation of the GNU GPLLGPLAGPL, or FDL, the first thing you should do is double-check the facts:

  • Does the distribution contain a copy of the License?
  • Does it clearly state which software is covered by the License? Does it say anything misleading, perhaps giving the impression that something is covered by the License when in fact it is not?
  • Is source code included in the distribution?
  • Is a written offer for source code included with a distribution of just binaries?
  • Is the available source code complete, or is it designed for linking in other non-free modules?

If there seems to be a real violation, the next thing you need to do is record the details carefully:

  • the precise name of the product
  • the name of the person or organization distributing it
  • email addresses, postal addresses and phone numbers for how to contact the distributor(s)
  • the exact name of the package whose license is violated
  • how the license was violated:
    • Is the copyright notice of the copyright holder included?
    • Is the source code completely missing?
    • Is there a written offer for source that’s incomplete in some way? This could happen if it provides a contact address or network URL that’s somehow incorrect.
    • Is there a copy of the license included in the distribution?
    • Is some of the source available, but not all? If so, what parts are missing?

The more of these details that you have, the easier it is for the copyright holder to pursue the matter.

Once you have collected the details, you should send a precise report to the copyright holder of the packages that are being misused. The copyright holder is the one who is legally authorized to take action to enforce the license.

If the copyright holder is the Free Software Foundation, please send the report to <license-violation@gnu.org>. It’s important that we be able to write back to you to get more information about the violation or product. So, if you use an anonymous remailer, please provide a return path of some sort. If you’d like to encrypt your correspondence, just send a brief mail saying so, and we’ll make appropriate arrangements.

Note that the GPL, and other copyleft licenses, are copyright licenses. This means that only the copyright holders are empowered to act against violations. The FSF acts on all GPL violations reported on FSF copyrighted code, and we offer assistance to any other copyright holder who wishes to do the same.

But, we cannot act on our own if we do not hold copyright. Thus, be sure to find out who the copyright holders of the software are before reporting a violation.

 

Bringing Poetry to Life

Here is a new poetry book.

———————————————————————————————–

I’m excited to let you know about Carol Calkins who is releasing her first book of poetry entitled Bring Poetry to Life. This book is a powerful compilation of poetry touching on the most important moments in our everyday lives from new beginnings, to special people and events, to endings and saying goodbye.  Carol who found her life purpose through poetry is excited to release the first of a series of poetry books on Amazon. Grab your copy of Bring Poetry to Life today on Amazon.com – Find out more about Carol and her new book at http://www.bringpoetrytolife.com

We Said Goodbye a Thousand Times

 

Don’t be sad about my parting

Don’t feel like you never said goodbye

For you and I both know deep in our hearts

That We Said Goodbye a Thousand Times

And shared so much love and joy every day

 

Be happy that I am now at peace

Be joyful that I have lived a wonderful life

Be happy that we have shared so much together

 

And remember I am always with you in a thought and a sigh

Every day when you see the beauty in nature think of me

Every day when you see the colorful flowers think of me

Every day when you see a frisky animal prancing around think of me

Every day when you look into the eyes of someone you love think of me

 

And know beyond a doubt that I am with you in everything you do

And know beyond a doubt that I am with you in everything you say

And know beyond a doubt that I am with you in every quiet moment of your life

 

Don’t be sad about my parting

Don’t feel like you never said goodbye

For you and I both know deep in our hearts

That We Said Goodbye a Thousand Times

And shared so much love and joy every day

 

 

Creating an Anonymous Bot

or Surfing the Net Anonmously and Having some Fun.

On the weekend, while browsing through http://freelancer.com I came across an intriguing offer-

http://www.freelancer.com/projects/by-job/YouTube.html

Basically projects asking for increasing Youtube Views-

Hmm.Hmm.Hmm

So this is one way I though it could be done-

1) Create an IP Address Anonymizer

Thats pretty simple- I used the Tor Project at http://www.torproject.org/easy-download.html.en

Basically it uses a peer to peer network to  connect to the internet and you can reset the connection as you want-so it hides your IP address.

Also useful for sending hatemail- limitation uses Firefox browser only.And also your webpage default keeps changing languages as the ip address changes.

Note-

The Tor Project is a 501(c)(3) non-profit based in the United States. The official address of the organization is:

The Tor Project
969 Main Street, Suite 206
Walpole, MA 02081 USA
Check your IP address at http://www.whatismyip.com/

2) Creating a Bot or an automatic clicking code ( without knowing code)

Go to https://addons.mozilla.org/en-US/firefox/addon/3863/

Remember when you could create an Excel Macro by just recording the Macro (in Excel 2003)

So while surfing if you need to do something again and again (like go the same Youtube video and clicking Like 5000 times) you can press record Macro

  • Do the action you want repeated again and again.
  • Click save Macro
  • Now run the Macro in a loop using the iMacro extension.

see screenshot below-

Note I have added two lines of code -WAIT SECONDS= 6

This means everytime the code runs in a loop it will wait for 6 seconds and then reload.

However I recommend you create a random number of wait seconds using Google Spreadsheet and the function RANDBETWEEN(5,400) (to limit between 5 and 400 seconds) and also use CONCATENATE with click and drag to create RANDOM wait times (instead of typing it say 500 times yourself)

see https://spreadsheets.google.com/ccc?key=tr18JVEE2TmAuH5V8fzJLRA#gid=0

That’s it – Your Anonymous Bot is ready.

See the  analytical results for my personal favourite Streaming Poetry video http://www.youtube.com/watch?v=a5yReaKRHOM

Easy isn’t it. Lines of code written= 0 , Number of Views =335 (before I grew bored)

Note- Officially it is against Youtube Terms http://www.youtube.com/t/terms to  use scripts or Bots so I did it for Research Purposes only. And the http://Freelancer.com needs to look into the activities underway at http://www.freelancer.com/projects/by-job/YouTube.html and also http://www.freelancer.com/projects/by-job/Facebook.html and http://www.freelancer.com/projects/by-job/Social-Networking.html

The final word on these activities is by http://xkcd.com or