Interesting webinar from the leader in corporate analytics, SAS
Month: September 2010
Trrrouble in land of R…and Open Source Suggestions
Recently some comments by Ross Ihake , founder of R Statistical Software on Revolution Analytics, leading commercial vendor of R….. came to my attention-
http://www.stat.auckland.ac.nz/mail/archive/r-downunder/2010-May/000529.html
[R-downunder] Article on Revolution Analytics
Ross Ihaka ihaka at stat.auckland.ac.nz
Mon May 10 14:27:42 NZST 2010
- Previous message: [R-downunder] Article on Revolution Analytics
- Next message: [R-downunder] Article on Revolution Analytics
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 09/05/10 09:52, Murray Jorgensen wrote: > Perhaps of interest: > > http://www.theregister.co.uk/2010/05/06/revolution_commercial_r/ Please note that R is "free software" not "open source". These guys are selling a GPLed work without disclosing the source to their part of the work. I have complained to them and so far they have given me the brush off. I am now considering my options. Don't support these guys by buying their product. The are not feeding back to the rights holders (the University of Auckland and I are rights holders and they didn't even have the courtesy to contact us). -- Ross Ihaka Email: ihaka at stat.auckland.ac.nz Department of Statistics Phone: (64-9) 373-7599 x 85054 University of Auckland Fax: (64-9) 373-7018 Private Bag 92019, Auckland New Zealand and from http://www.theregister.co.uk/2010/05/06/revolution_commercial_r/ Open source purists probably won't be all too happy to learn that Revolution is going to be employing an "open core" strategy, which means the core R programs will remain open source and be given tech support under a license model, but the key add-ons that make R more scalable will be closed source and sold under a separate license fee. Because most of those 2,500 add-ons for R were built by academics and Revolution wants to supplant SPSS and SAS as the tools used by students, Revolution will be giving the full single-user version of the R Enterprise stack away for free to academics. Conclusion- So one co-founder of R is advocating not to buy from Revolution Analytics , which has the other co-founder of R, Gentleman on its board. Source- http://www.revolutionanalytics.com/aboutus/leadership.php
2) If Revolution Analytics is using 2500 packages for free but insisting on getting paid AND closing source of it’s packages (which is a technical point- how exactly can you prevent source code of a R package from being seen)
Maybe there can be a PACKAGE marketplace just like Android Apps, Facebook Apps, and Salesforce.com Apps – so atleast some of the thousands of R package developers can earn – sorry but email lists do not pay mortgages and no one is disputing the NEED for commercializing R or rewarding developers.
Though Barr created SAS, he gave up control to Goodnight and Sall https://decisionstats.wordpress.com/2010/06/02/sas-early-days/
and Goodnight and Sall do pay their developers well- to the envy of not so well paid counterparts.
3) I really liked the innovation of Revolution Analytics RevoScalar, and I wish that the default R dataset be converted to XDF dataset so that it basically kills
off the R criticism of being slow on bigger datasets. But I also realize the need for creating an analytics marketplace for R developers and R students- so academic version of R being free and Revolution R being paid seems like a trade off.
Note- You can still get a job faster as a stats student if you mention SAS and not R as a statistical skill- not all stats students go into academics.
4) There can be more elegant ways of handling this than calling for ignoring each other as REVOLUTION and Ihake seem to be doing to each other.
I can almost hear people in Cary, NC chuckling at Norman Nie, long time SPSS opponent and now REVOLUTION CEO, and his antagonizing R’s academicians within 1 year of taking over- so I hope this ends well for all. The road to hell is paved with good intentions- so if REVOLUTION can share some source code with say R Core members (even Microsoft shares source code with partners)- and R Core and Revolution agree on a licensing royalty from each other, they can actually speed up R package creation rather than allow this 2 decade effort to end up like S and S plus and TIBCO did.
Maybe Richard Stallman can help-or maybe Ihaka has a better sense of where things will go down in a couple of years-he must know something-he invented it, didnt he
On 09/05/10 09:52, Murray Jorgensen wrote: > Perhaps of interest: > > http://www.theregister.co.uk/2010/05/06/revolution_commercial_r/ Please note that R is "free software" not "open source". These guys are selling a GPLed work without disclosing the source to their part of the work. I have complained to them and so far they have given me the brush off. I am now considering my options. Don't support these guys by buying their product. The are not feeding back to the rights holders (the University of Auckland and I are rights holders and they didn't even have the courtesy to contact us). -- Ross Ihaka Email: ihaka at stat.auckland.ac.nz Department of Statistics Phone: (64-9) 373-7599 x 85054 University of Auckland Fax: (64-9) 373-7018 Private Bag 92019, Auckland New Zealand
Google AppInventor -Android and Business Intelligence
Here is a great new tool for techies to start creating Android Apps right away- even if you have no knowledge of the platform. Of course there are existing great number of apps- including my favorite Android Data Mining App in R – called AnalyticDroid http://analyticdroid.togaware.com/
Basically it calls the Rattle (R Analytical Tool To Learn Easily) Data Mining GUI -enabling data mining from an Android Mobile using remote computing.
I dont know if any other statistical application is available on Android Mobiles- though SAS did have a presentation on using SAS on IPhone
http://www.wuss.org/proceedings09/09WUSSProceedings/papers/dpr/DPR-Truong.pdf
All you need to do is go to http://appinventor.googlelabs.com/about/index.html and request access (yes there is a 2 week approval waiting line)
Because App Inventor provides access to a GPS-location sensor, you can build apps that know where you are. You can build an app to help you remember where you parked your car, an app that shows the location of your friends or colleagues at a concert or conference, or your own custom tour app of your school, workplace, or a museum. You can write apps that use the phone features of an Android phone. You can write an app that periodically texts “missing you” to your loved ones, or an app “No Text While Driving” that responds to all texts automatically with “sorry, I’m driving and will contact you later”. You can even have the app read the incoming texts aloud to you (though this might lure you into responding). App Inventor provides a way for you to communicate with the web. If you know how to write web apps, you can use App Inventor to write Android apps that talk to your favorite web sites, such as Amazon and Twitter.
Here is a not so statistical Android App I am trying to create called Hang-Out
using the current GPS location of your phone to find nearest Pub, Movie or Diner and catch Bus- Train based on your location city, the GPS and time of request and schedule of those cities public transport- very much WIP
Movie Review- Dabangg
This movie falls in the must -see category. Not for cinematic excellence, or a great action choreography, not for the terrific Bollywood Song and Dance,
an excellent debut by Arbaaz Khan (as Producer), Abhinav Kashyap (Anurag Kashyap’s brother) as Director or even Shotgun Sinha’s Daughter, lovely Sonakshi Sinha’s charming looks. But for great clean wholesome entertainment- Dabang tells us why we loved Movies in the first place.
Salman Khan- the muscular good looking hunk turns his best performance in a tour de force. Watch it right away- it’s currently breaking all movie turnout records in India.
Google moving on from MapReduce: rest of world still catching up
Apparently it is true as per the Register, but details in a paper next month- It is called Google Caffeine.
http://www.theregister.co.uk/2010/09/09/google_caffeine_explained/
Caffeine expands on BigTable to create a kind of database programming model that lets the company make changes to its web index without rebuilding the entire index from scratch. “[Caffeine] is a database-driven, Big Table–variety indexing system,” Lipkovitz tells The Reg, saying that Google will soon publish a paper discussing the system. The paper, he says, will be delivered next month at the USENIX Symposium on Operating Systems Design and Implementation (OSDI).
and interestingly
MapReduce, he says, isn’t suited to calculations that need to occur in near real-time.
MapReduce is a sequence of batch operations, and generally, Lipkovits explains, you can’t start your next phase of operations until you finish the first. It suffers from “stragglers,” he says. If you want to build a system that’s based on series of map-reduces, there’s a certain probability that something will go wrong, and this gets larger as you increase the number of operations. “You can’t do anything that takes a relatively short amount of time,” Lipkovitz says, “so we got rid of it.”
With Caffeine, Google can update its index by making direct changes to the web map already stored in BigTable. This includes a kind of framework that sits atop BigTable, and Lipkovitz compares it to old-school database programming and the use of “database triggers.”
but most importantly
In 2004, Google published research papers on GFS and MapReduce that became the basis for the open source Hadoop platform now used by Yahoo!, Facebook, and — yes — Microsoft. But as Google moves beyond GFS and MapReduce, Lipokovitz stresses that he is “not claiming that the rest of the world is behind us.”
But oh no!
“We’re in business of making searches useful,” he says. “We’re not in the business of selling infrastructure
But I say why not- Search is good and advertising is okay
There is more (not evil) money in infrastructure (of big data) as there is in advertising. But the advertising guys disagree
Tale of Two Apps
Whom to follow on Twitter- Google Follow Finder vs Twitter’s own Twitter Suggests
http://followfinder.googlelabs.com/search?user=decisionstats
vs
http://twitter.com/invitations/twitter_suggests
(Twitter Suggests thinks I like following celebrities- Cricketers and Bollywood Stars- while Google Friend Follow (a Google Labs App- thinks I like to follow Data Techies)
Google Wins!
Creating an Anonymous Bot
or Surfing the Net Anonmously and Having some Fun.
On the weekend, while browsing through http://freelancer.com I came across an intriguing offer-
http://www.freelancer.com/projects/by-job/YouTube.html
Basically projects asking for increasing Youtube Views-
Hmm.Hmm.Hmm
So this is one way I though it could be done-
1) Create an IP Address Anonymizer
Thats pretty simple- I used the Tor Project at http://www.torproject.org/easy-download.html.en
Basically it uses a peer to peer network to connect to the internet and you can reset the connection as you want-so it hides your IP address.
Also useful for sending hatemail- limitation uses Firefox browser only.And also your webpage default keeps changing languages as the ip address changes.
Note-
Check your IP address at http://www.whatismyip.com/The Tor Project is a 501(c)(3) non-profit based in the United States. The official address of the organization is:
The Tor Project
969 Main Street, Suite 206
Walpole, MA 02081 USA
2) Creating a Bot or an automatic clicking code ( without knowing code)
Go to https://addons.mozilla.org/en-US/firefox/addon/3863/
Remember when you could create an Excel Macro by just recording the Macro (in Excel 2003)
So while surfing if you need to do something again and again (like go the same Youtube video and clicking Like 5000 times) you can press record Macro
- Do the action you want repeated again and again.
- Click save Macro
- Now run the Macro in a loop using the iMacro extension.
see screenshot below-
Note I have added two lines of code -WAIT SECONDS= 6
This means everytime the code runs in a loop it will wait for 6 seconds and then reload.
However I recommend you create a random number of wait seconds using Google Spreadsheet and the function RANDBETWEEN(5,400) (to limit between 5 and 400 seconds) and also use CONCATENATE with click and drag to create RANDOM wait times (instead of typing it say 500 times yourself)
see https://spreadsheets.google.com/ccc?key=tr18JVEE2TmAuH5V8fzJLRA#gid=0
That’s it – Your Anonymous Bot is ready.
See the analytical results for my personal favourite Streaming Poetry video http://www.youtube.com/watch?v=a5yReaKRHOM
Easy isn’t it. Lines of code written= 0 , Number of Views =335 (before I grew bored)
Note- Officially it is against Youtube Terms http://www.youtube.com/t/terms to use scripts or Bots so I did it for Research Purposes only. And the http://Freelancer.com needs to look into the activities underway at http://www.freelancer.com/projects/by-job/YouTube.html and also http://www.freelancer.com/projects/by-job/Facebook.html and http://www.freelancer.com/projects/by-job/Social-Networking.html
The final word on these activities is by http://xkcd.com or
















