Dataists shake up R community with a rocking contest

Flipboard
Image by Johan Larsson via Flickr

Newly created Dataists are creating waves on Hacker News and beyond with their innovative contest- A Recommendation Engine for R Packages.

Not only is the contest useful, it is likely to teach R Users some data hacking skills, as well as the basics of creating a GitHub Project.

Read more here-http://www.dataists.com/2010/10/using-data-tools-to-find-data-tools-the-yo-dawg-of-data-hacking/

For that reason, we’ve settled on the more manageable question, “which packages are most often installed by normal R users?”

This last question could potentially be answered in a variety of ways. Our current approach uses a convenience sample of installation data that we’ve collected from volunteers in the R community, who kindly agreed to send us a list of the packages they have on their systems. We’ve anonymized this data and compiled a set of metadata-based predictors that allow us to predict the installation probabilities quite well. We’re releasing all of our current work, including the data we have and all of the code we’ve used so far for our exploratory analyses. The contest itself will go live on Kaggle on Sunday and will end four months from Sunday on February 10, 2011. The rules, prizes and official data sets are all described below.

Rules and Prizes

To win the contest, you need to predict the probability that a user U has a package P installed on their system for every pair, (U, P). We’ll assess your performance using ROC methods, which will be evaluated against a held out test data set. The winning team will receive 3 UseR! books of their choosing. In order to win the contest, you’ll have to provide your analysis code to us by creating a fork of our GitHub repository. You’ll also be required to provide a written description of your approach. We’re asking for so much openness from the winning team because we want this contest to serve as a stepping stone for the R community. We’re also hoping that enterprising data hackers will extend the lessons learned through this contest to other programming languages.

Extract from-http://www.dataists.com/2010/10/using-data-tools-to-find-data-tools-the-yo-dawg-of-data-hacking/

Read the full article there

Backedup or Hacked Up

Decisionstats.com was undergoing some hacking attacks this past two weeks.

Backing up WordPress Blog-

  • use export feature in wordpress to create files.
  • If your number of articles is more than 70, then create multiple usernames, export using bulk apply to seperate wordpress xml files.
  • then in wordpress.com site, import the various xml files ( note multiple file method avoid corruption and using bulk apply -change author – makes it very fast)

Hackedup

  1. once your wordpress.com blog is updated, use a wild card redirect so as to preserve your search engine traffic. Your Backup is now online even if your original site is hacked
  2. Use the server to access your .htaccess file to check whether rouge redirects happened.
  3. Use logs of server access ( painful but true) to pinpoint IP addresses of hack attacks ( note hackers WOULD use relay servers to disguise IP addresses)
  4. To prevent domain name hijacking, make sure your who.is information is private
  5. Change your email passwords, security questions, server passwords. Use random password generators to create secure passwords.
  6. To prevent rogue malware from infecting your laptop create a dual boot Ubuntu/Windows laptop using a 10 minute tutorial. Use the Ubuntu Linux boot to do all the above operations.
  7. Inform the Federal authorities in cyber crime division with the server logs and a SPECIFIC complaint ( no rambling sob stories)
  8. Pray to God, Matt ( both Cutts and Mullenweg), and if all above steps fail ask Donncha O Caoimh at http://ocaoimh.ie/about/ to step in. 
%d bloggers like this: