March 2013 – DECISION STATS

Book Promotion- Click, Buy, Lie , Die

To build awareness of Eric Siegel’s new, acclaimed book, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (published by Wiley Feb. 19), an offer ya can’t refuse.
Order the book on April 3 via Amazon ($15) for:

1. Free access to the first of 4 modules of the author’s online training program, Predictive Analytics Applied

2. A 35% discount off the full training ($495), or its in-person version, Predictive Analytics for Business, Marketing & Web ($1,495 – Apr 25-26 in NYC)

3. Automatic entrance into a drawing to receive a pass for any Predictive Analytics World this year (San Francisco, Chicago, DC, Boston, London, or Berlin).

—

Ajay- at $15 a pop, and quite a nice book, it’s a steal! See book review here–

https://decisionstats.com/2013/02/25/book-review-predictive-analytics-the-power-to-predict-who-will-click-buy-lie-or-die/

How to learn SQL injection

In my previous post in the hacker series https://decisionstats.com/2013/03/20/hacking-for-beginners-top-website-hacks/ , we noted that SQL Injection remains a top method for security vulnerabilities. Accordingly- here is a list of resources to learn SQL Injection

Definition

SQL injection is a code injection technique that exploits a security vulnerability in an application’s software. The vulnerability happens when user input is either incorrectly filtered for string literal escape characters embedded in SQL statements or user input is not strongly typed and unexpectedly executed. SQL injection is mostly known as an attack vector for websites but can be used to attack any type of SQL database.

Basic Tools

SQL Inject Me

https://addons.mozilla.org/en-us/firefox/addon/sql-inject-me/

SQL Inject Me is the Exploit-Me tool used to test for SQL Injection vulnerabilities.

The tool works by submitting your HTML forms and substituting the form value with strings that are representative of an SQL Injection attack.The tool works by sending database escape strings through the form fields. It then looks for database error messages that are output into the rendered HTML of the page.

The tool does not attempting to compromise the security of the given system. It looks for possible entry points for an attack against the system. There is no port scanning, packet sniffing, password hacking or firewall attacks done by the tool.

Hackbar

https://addons.mozilla.org/en-US/firefox/addon/hackbar/

and http://code.google.com/p/hackbar/

This toolbar will help you in testing sql injections, XSS holes and site security. It is NOT a tool for executing standard exploits and it will NOT teach you how to hack a site

SQLMap

http://sqlmap.org/

sqlmap is an open source penetration testing tool that automates the process of detecting and exploiting SQL injection flaws and taking over of database servers.

Basic Tutorials ( in order of learning)

http://sqlzoo.net/hack/

A site for testing SQL Injection attacks. It is a test system and can be used for honing your SQL Skills.

Intermediate Tutorials on End to End SQL Injection

Step 1: Finding Vulnerable Website:

Step 2: Checking the Vulnerability:

To check the vulnerability , add the single quotes(‘) at the end of the url and hit enter.

If you got an error message , then it means that the site is vulnerable

Step 3: Finding Number of columns:

Step 4: Find the Vulnerable columns:

Step 5: Finding version,database,user

Step 6: Finding the Table Name

Step 8: Finding the Admin Panel:

from http://www.breakthesecurity.com/2010/12/hacking-website-using-sql-injection.html

Next Tutorial uses an automated tool called Havij from

http://www.itsecteam.com/products/havij-v116-advanced-sql-injection/

and the tutorial is at

http://cybersucks.blogspot.in/2013/01/hacking-website-using-sql-injectionfull.html

Google’s Product Strategy

Copy an idea from existing product. Make worse interface, but give more freebies. Do not charge money ( or charge vastly reduced ). Launch without warning.
Write a blog post every three month on the launched product
Watch product lose money as they did not charge any/some money to begin with.
Close the product down without warning.
Repeat.

Hacking for Beginners- Top Website Hacks

I really liked this 2002 presentation on Website Hacks at blackhat.com/presentations/bh-asia-02/bh-asia-02-shah.pdf . It explains in a easy manner some common fundamentals in hacking websites. Take time to go through this- its a good example of how hacking tutorials need to be created if you want to expand the number of motivated hackers.

However a more recent list of hacks is here-

https://blog.whitehatsec.com/top-ten-web-hacking-techniques-of-2012/

The Top Ten

Honorable Mention

11. Using WordPress as a intranet and internet port scanner

12. .Net Cross Site Scripting – Request Validation Bypassing (1)

13. Bruteforcing/Abusing search functions with no-rate checks to collect data

14. Browser Event Hijacking (2, 3)

But a more widely used ranking method for Website Hacking is here. Note it is a more formal but probably a more recent document than the pdf above. If only it could be made into an easier to read tutorial, it would greatly improve website exploit security strength.

https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project

The Release Candidate for the OWASP Top 10 for 2013 is now available here: OWASP Top 10 – 2013 – Release Candidate

The OWASP Top 10 – 2013 Release Candidate includes the following changes as compared to the 2010 edition:

A1 Injection
A2 Broken Authentication and Session Management (was formerly A3)
A3 Cross-Site Scripting (XSS) (was formerly A2)
A4 Insecure Direct Object References
A5 Security Misconfiguration (was formerly A6)
A6 Sensitive Data Exposure (merged from former A7 Insecure Cryptographic Storage and former A9 Insufficient Transport Layer Protection)
A7 Missing Function Level Access Control (renamed/broadened from former A8 Failure to Restrict URL Access)
A8 Cross-Site Request Forgery (CSRF) (was formerly A5)
A9 Using Known Vulnerable Components (new but was part of former A6 – Security Misconfiguration)
A10 Unvalidated Redirects and Forwards

—
Once again, I am presenting this as an example of how lucid documentation can help spread technological awareness to people affected by technical ignorance and lacking the savvy and chops for self-learning. If you need better cyber security, you need better documentation and tutorials on hacking for improving the quantity and quality of the pool of available hackers and bringing in young blood to enhance your cyber security edge.

New Delhi UseRs March 2013 MeetUp #rstats

The fifth New Delhi UseRs Meet Up happened at Mimir Tech’s premises in Green Park, New Delhi. I presented on using GUIs for easier transitioning to R from other software but limited it to Deducer (for data visualization -specifically templates and facets in GGPLOT) and Rattle (for Data Mining). We also discussed a couple of things including how to apply R in other business domains, and open source alternatives to Meetup.com .

March meet up new delhi users from Ajay Ohri

Interview Jeroen Ooms OpenCPU #rstats

Below an interview with Jeroen Ooms, a pioneer in R and web development. Jeroen contributes to R by developing packages and web applications for multiple projects.

Ajay- What are you working on these days?
Jeroen- My research revolves around challenges and opportunities of using R in embedded applications and scalable systems. After developing numerous web applications, I started the OpenCPU project about 1.5 year ago, as a first attempt at a complete framework for proper integration of R in web services. As I work on this, I run into challenges that shape my research, and sometimes become projects in their own. For example, the RAppArmor package provides the security framework for OpenCPU, but can be used for other purposes as well. RAppArmor interfaces to some methods in the Linux kernel, related to setting security and resource limits. The github page contains the source code, installation instructions, video demo’s, and a draft of a paper for the journal of statistical software. Another example of a problem that appeared in OpenCPU is that applications that used to work were breaking unexpectedly later on due to changes in dependency packages on CRAN. This is actually a general problem that affects almost all R users, as it compromises reliability of CRAN packages and reproducibility of results. In a paper (forthcoming in The R Journal), this problem is discussed in more detail and directions for improvement are suggested. A preprint of the paper is available on arXiv: http://arxiv.org/abs/1303.2140.

I am also working on software not directly related to R. For example, in project Mobilize we teach high school students in Los Angeles the basics of collecting and analyzing data. They use mobile devices to upload surveys with questions, photos, gps, etc using the ohmage software. Within Mobilize and Ohmage, I am in charge of developing web applications that help students to visualize the data they collaboratively collected. One public demo with actual data collected by students about snacking behavior is available at: http://jeroenooms.github.com/snack. The application allows students to explore their data, by filtering, zooming, browsing, comparing etc. It helps students and teachers to access and learn from their data, without complicated tools or programming. This approach would easily generalize to other fields, like medical data or BI. The great thing about this application is that it is fully client side; the backend is simply a CSV file. So it is very easy to deploy and maintain.

Ajay-What’s your take on difference between OpenCPU and RevoDeployR ?
Jeroen- RevoDeployR and OpenCPU both provide a system for development of R web applications, but in a fairly different context. OpenCPU is open source and written completely in R, whereas RevoDeployR is proprietary and written in Java. I think Revolution focusses more on a complete solution in a corporate environment. It integrates with the Revolution Enterprise suite and their other big data products, and has built-in functionality for authentication, managing privileges, server administration, support for MS Windows, etc. OpenCPU on the other hand is much smaller and should be seen as just a computational backend, analogous to a database backend. It exposes a clean HTTP api to call R functions to be embedded in larger systems, but is not a complete end-product in itself.

OpenCPU is designed to make it easy for a statistician to expose statistical functionality that will used by web developers that do not need to understand or learn R. One interesting example is how we use OpenCPU inside OpenMHealth, a project that designs an architecture for mobile applications in the health domain. Part of the architecture are so called “Data Processing Units”, aka DPU’s. These are simple, modular I/O units that do various sorts of data processing, similar to unix tools, but then over HTTPS. For example, the mobility dpu is used to calculate distances between gps coordinates via a simple http call, which OpenCPU maps to the corresponding R function implementing the harversine formula.

Ajay- What are your views on Shiny by RStudio?
Jeroen- RStudio seems very promising. Like Revolution, they deliver a more full featured product than any of my projects. However, RStudio is completely open source, which is great because it allows anyone to leverage the software and make it part of their projects. I think this is one of the reasons why the product has gotten a lot of traction in the community, which has in turn provided RStudio with great feedback to further improve the product. It illustrates how open source can be a win-win situation. I am currently developing a package to run OpenCPU inside RStudio, which will make developing and running OpenCPU apps much easier.

Ajay- Are you still developing excellent RApache web apps (which IMHO could be used for visualization like business intelligence tools?)
Jeroen– The OpenCPU framework was a result of those webapps (including ggplot2 for graphical exploratory analysis, lme4 for online random effects modeling, stockplot for stock predictions and irttool.com, an R web application for online IRT analysis). I started developing some of those apps a couple of years ago, and realized that I was repeating a large share of the infrastructure for each application. Based on those experiences I extracted a general purpose framework. Once the framework is done, I’ll go back to developing applications 🙂

Ajay- You have helped build web apps, openCPU, RAppArmor, Ohmage , Snack , mobility apps .What’s your thesis topic on?
Jeroen- My thesis revolves around all of the technical and social challenges of moving statistical computing beyond the academic and private labs, into more public, accessible and social places. Currently statistics is still done to mostly manually by specialists using software to load data, perform some analysis, and produce results that end up in a report or presentation. There are great opportunities to leverage the open source analysis and visualization methods that R has to offer as part of open source stacks, services, systems and applications. However, several problems need to be addressed before this can actually be put in production. I hope my doctoral research will contribute to taking a step in that direction.

Ajay- R is RAM constrained but the cloud offers lots of RAM. Do you see R increasing in usage on the cloud? why or why not?
Jeroen- Statistical computing can greatly benefit from the resources that the cloud has to offer. Software like OpenCPU, RStudio, Shiny and RevoDeployR all provide some approach of moving computation to centralized servers. This is only the beginning. Statisticians, researchers and analysts will continue to increasingly share and publish data, code and results on social cloud-based computing platforms. This will address some of the hardware challenges, but also contribute towards reproducible research and further socialize data analysis, i.e. improve learning, collaboration and integration.

That said, the cloud is not going to solve all problems. You mention the need for more memory, but that is only one direction to scale in. Some of the issues we need to address are more fundamental and require new algorithms, different paradigms, or a cultural change. There are many exciting efforts going on that are at least as relevant as big hardware. Gelman’s mc-stan implements a new MC method that makes Bayesian inference easier and faster while supporting more complex models. This is going to make advanced Bayesian methods more accessible to applied researchers, i.e. scale in terms of complexity and applicability. Also Javascript is rapidly becoming more interesting. Performance of Google’s javascript engine V8 outruns any other scripting language at this point, and the huge Javascript community provides countless excellent software libraries. For example D3 is a graphics library that is about to surpass R in terms of functionality, reliability and user base. The snack viz that I developed for Mobilize is based largely on D3. Finally, Julia is another young language for technical computing with lots of activity and very smart people behind it. These developments are just as important for the future of statistical computing as big data solutions.

About-
You can read more on Jeroen and his work at http://jeroenooms.github.com/ and reach out to him here http://www.linkedin.com/in/datajeroen

Running R and RStudio Server on Red Hat Linux RHEL #rstats

Installing R

sudo rpm -ivh http://dl.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm

(OR sudo rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm )

THEN

sudo yum install R

THEN

sudo R

(and to paste in Linux Window- just use Shift + Insert)

To Install RStudio (from http://www.rstudio.com/ide/download/server)

32-bit

wget http://download2.rstudio.org/rstudio-server-0.97.320-i686.rpm

sudo yum install --nogpgcheck rstudio-server-0.97.320-i686.rpm

OR 64-bit

wget http://download2.rstudio.org/rstudio-server-0.97.320-x86_64.rpm

sudo yum install --nogpgcheck rstudio-server-0.97.320-x86_64.rpm

Then

sudo rstudio-server verify-installation

Changing Firewalls in your RHEL

-Change to Root

```
sudo bash 
```

-Change directory

cd etc/sysconfig

-Read Iptables ( or firewalls file)

vi iptables

( to quite vi , press escape, then colon : then q )

-Change Iptables to open port 8787

/sbin/iptables -A INPUT -p tcp --dport 8787 -j ACCEPT

Add new user name (here newuser1)

sudo useradd newuser1

Change password in new user name

sudo passwd newuser1

Now just login to IPADDRESS:8787 with user name and password above

(credit- IBM SmartCloud Support ,http://www.youtube.com/watch?v=woVjq83gJkg&feature=player_embedded, Rstudio help, David Walker http://datamgmt.com/installing-r-and-rstudio-on-redhat-or-centos-linux/, www.google.com ,Michael Grieb)

3. Automatic entrance into a drawing to receive a pass for any Predictive Analytics World this year (San Francisco, Chicago, DC, Boston, London, or Berlin).

Please share:

Please share:

Please share:

The Top Ten

Honorable Mention

Please share:

Please share:

Please share:

Please share: