July 2012 – Page 4 – DECISION STATS

Machine Learning to Translate Code from different programming languages

Google Translate has been a pioneer in using machine learning for translating various languages (and so is the awesome Google Transliterate)

I wonder if they can expand it to programming languages and not just human languages.

Issues in ~~converting~~ translating programming language code

1) Paths referred for stored objects

2) Object Names should remain the same and not translated

3) Multiple Functions have multiple uses , sometimes function translate is not straightforward

I think all these issues are doable, solveable and more importantly profitable.

I look forward to the day a iOS developer can convert his code to Android app code by simple upload and download.

Anonymous grows up and matures…Anonanalytics.com

I liked the design, user interfaces and the conceptual ideas behind the latest Anonymous hactivist websites (much better than the shabby graphic design of Wikileaks, or Friends of Wikileaks, though I guess they have been busy what with Julian’s escapades and Syrian emails)

I disagree (and let us agree to disagree some of the time)

with the complete lack of respect for Graphical User Interfaces for tools. If dDOS really took off due to LOIC, why not build a GUI for SQL Injection (or atleats the top 25 vulnerability testing as by this list http://www.sans.org/top25-software-errors/

Shouldnt Tor be embedded within the next generation of Loic.

Automated testing tools are used by companies like Adobe (and others)… so why not create simple GUI for the existing tools.., I may be completely offtrack here.. but I think hacker education has been a critical misstep[ that has undermined Western Democracies preparedness for Cyber tactics by hostile regimes)…. how to create the next generation of hackers by easy tutorials (see codeacademy and build appropriate modules)

-A slick website to be funded by Bitcoins (Money can buy everything including Mastercard and Visa, but Bitcoins are an innovative step towards an internet economy currency)

-A collobrative wiki

http://wiki.echelon2.org/wiki/Main_Page

Seriously dude, why not make this a part of Wikipedia- (i know Jimmy Wales got shifty eyes, but can you trust some1 )

-Analytics for Anonymous (sighs! I should have thought about this earlier)

http://anonanalytics.com/ (can be used to play and bill both sides of corporate espionage and be cyber private investigators)

What We Do

We provide the public with investigative reports exposing corrupt companies. Our team includes analysts, forensic accountants, statisticians, computer experts, and lawyers from various jurisdictions and backgrounds. All information presented in our reports is acquired through legal channels, fact-checked, and vetted thoroughly before release. This is both for the protection of our associates as well as groups/individuals who rely on our work.

_and lastly creative content for Pinterest.com and Public Relations ( what next-? Tom Cruise to play Julian Assange in the new Movie ?)

http://www.par-anoia.net/ />Potentially Alarming Research: Anonymous Intelligence AgencyInformation is and will be free. Expect it. ~ Anonymous

Links of interest

Latest Scientology Mails (Austria)
Full FBI call transcript
Arrest Tracker
HBGary Email Viewer
The Pirate Bay Proxy
We Are Anonymous – Book
To be announced…

Rapid Miner User Conference 2012

One of those cool conferences that is on my bucket list- this time in Hungary (That’s a nice place)

But I am especially interested in seeing how far Radoop has come along !

Disclaimer- Rapid Miner has been a Decisionstats.com sponsor for many years. It is also a very cool software but I like the R Extension facility even more!

—————————————————————

and not very expensive too compared to other User Conferences in Europe!-

http://rcomm2012.org/index.php/registration/prices

Information about Registration

Early Bird registration until July 20th, 2012.
Normal registration from July 21st, 2012 until August 13th, 2012.
Latest registration from August 14th, 2012 until August 24th, 2012.
Students have to provide a valid Student ID during registration.
The Dinner is included in the All Days and in the Conference packages.
All prices below are net prices. Value added tax (VAT) has to be added if applicable.

Prices for Regular Visitors

Days and Event	Early Bird Rate	Normal Rate	Latest Registration
Tuesday (Training / Development 1)	190 Euro	230 Euro	280 Euro
Wednesday + Thursday (Conference)	290 Euro	350 Euro	420 Euro
Friday (Training / Development 2 and Exam)	190 Euro	230 Euro	280 Euro
All Days *(Full Package)*	610 Euro	740 Euro	900 Euro

Prices for Authors and Students

In case of students, please note that you will have to provide a valid student ID during registration.

Days and Event	Early Bird Rate	Normal Rate	Latest Registration
Tuesday (Training / Development 1)	90 Euro	110 Euro	140 Euro
Wednesday + Thursday (Conference)	140 Euro	170 Euro	210 Euro
Friday (Training / Development 2 and Exam)	90 Euro	110 Euro	140 Euro
All Days *(Full Package)*	290 Euro	350 Euro	450 Euro

http://rcomm2012.org/index.php/program

Program

Time Slot	Tuesday Training / Workshop 1	Wednesday Conference 1	Thursday Conference 2	Friday Training / Workshop 2
09:00 – 10:30		Introductory Speech Ingo Mierswa; Rapid-I Data Analysis NeurophRM: Integration of the Neuroph framework into RapidMiner Miloš Jovanović, Jelena Stojanović, Milan Vukićević, Vera Stojanović, Boris Delibašić (University of Belgrade)	To be announced (Invited Talk) To be announced Recommender Systems Extending RapidMiner with Recommender Systems Algorithms Matej Mihelčić, Nino Antulov-Fantulin, Matko Bošnjak, Tomislav Šmuc (Ruđer Bošković Institute) Implementation of User Based Collaborative Filtering in RapidMiner Sérgio Morais, Carlos Soares (Universidade do Porto)	Parallel Training / Workshop Session Advanced Data Mining and Data Transformations or Development Workshop Part 2
10:30 – 12:30		Data Analysis Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner Mennatallah Amer, Markus Goldstein (DFKI) Customers’ LifeStyle Targeting on Big Data using Rapid Miner Maksim Drobyshev (LifeStyle Marketing Ltd) Robust GPGPU Plugin Development for RapidMiner Andor Kovács, Zoltán Prekopcsák (Budapest University of Technology and Economics)	Extensions Image Mining Extension – Year After Radim Burget, Václav Uher, Jan Mašek (Brno University of Technology) Incorporating R Plots into RapidMiner Reports Peter Jeszenszky (University of Debrecen) An Octave Extension for RapidMiner Sylvain Marié (Schneider Electric)
12:30 – 13:30		Lunch	Lunch	Lunch
13:30 – 15:00	Parallel Training / Workshop Session Basic Data Mining and Data Transformations or Development Workshop Part 1	Applications Application of RapidMiner in Steel Industry Research and Development Bengt-Henning Maas, Hakan Koc, Martin Bretschneider (Salzgitter Mannesmann Forschung) A Comparison of Data-driven Models for Forecast River Flow Milan Cisty, Juraj Bezak (Slovak University of Technology) Portfolio Optimization Using Local Linear Regression Ensembles in Rapid Miner Gábor Nagy, Tamás Henk, Gergő Barta (Budapest University of Technology and Economics)	Unstructured Data Processing Data Streams with the RapidMiner Streams-Plugin Christian Bockermann, Hendrik Blom (TU Dortmund) Automated Creation of Corpuses for the Needs of Sentiment Analysis Peter Koncz, Jan Paralic (Technical University of Kosice) Demonstration News from the Rapid-I Labs Simon Fischer; Rapid-I This short session demonstrates the latest developments from the Rapid-I lab and will let you how you can build powerful analysis processes and routines by using those RapidMiner tools.	Certification Exam
15:00 – 17:00		Book Presentation and Game Show Data Mining for the Masses: A New Textbook on Data Mining for Everyone Matthew North (Washington & Jefferson College) Matthew North presents his new book “Data Mining for the Masses” introducing data mining to a broader audience and making use of RapidMiner for practical data mining problems. Game Show Did you miss last years’ game show “Who wants to be a data miner?”? Use RapidMiner for problems it was never created for and beat the time and other contestants!	User Support Get some Coffee for free – Writing Operators with RapidMiner Beans Christian Bockermann, Hendrik Blom (TU Dortmund) Meta-Modeling Execution Times of RapidMiner operators Matija Piškorec, Matko Bošnjak, Tomislav Šmuc (Ruđer Bošković Institute)
19:00		Social Event (Conference Dinner)	Social Event (Visit of Bar District)

Training: Basic Data Mining and Data Transformations

This is a short introductory training course for users who are not yet familiar with RapidMiner or only have a few experiences with RapidMiner so far. The topics of this training session include

Basic Usage
- User Interface
- Creating and handling RapidMiner repositories
- Starting a new RapidMiner project
- Operators and processes
- Loading data from flat files
- Storing data, processes, and results
Predictive Models
- Linear Regression
- Naïve Bayes
- Decision Trees
Basic Data Transformations
- Changing names and roles
- Handling missing values
- Changing value types by discretization and dichotimization
- Normalization and standardization
- Filtering examples and attributes
Scoring and Model Evaluation
- Applying models
- Splitting data
- Evaluation methods
- Performance criteria
- Visualizing Model Performance

Training: Advanced Data Mining and Data Transformations

This is a short introductory training course for users who already know some basic concepts of RapidMiner and data mining and have already used the software before, for example in the first training on Tuesday. The topics of this training session include

Advanced Data Handling
- Sampling
- Balancing data
- Joins and Aggregations
- Detection and removal of outliers
- Dimensionality reduction
Control process execution
- Remember process results
- Recall process results
- Loops
- Using branches and conditions
- Exception handling
- Definition of macros
- Usage of macros
- Definition of log values
- Clearing log tables
- Transforming log tables to data

Development Workshop Part 1 and Part 2

Want to exchange ideas with the developers of RapidMiner? Or learn more tricks for developing own operators and extensions? During our development workshops on Tuesday and Friday, we will build small groups of developers each working on a small development project around RapidMiner. Beginners will get a comprehensive overview of the architecture of RapidMiner before making the first steps and learn how to write own operators. Advanced developers will form groups with our experienced developers, identify shortcomings of RapidMiner and develop a new extension which might be presented during the conference already. Unfinished work can be continued in the second workshop on Friday before results might be published on the Marketplace or can be taken home as a starting point for new custom operators.

Talking on Big Data Analytics

I am ~~going~~ being sponsored to a Government of India sponsored talk on Big Data Analytics at Bangalore on Friday the 13 th of July. If you are in Bangalore, India you may drop in for a dekko. Schedule and Abstracts (i am on page 7 out 9) .

Your tax payer money is hard at work- (hassi majak only if you are a desi. hassi to fassi.)

13 July 2012 (9.30 – 11.00 & 11.30 – 1.00)
Big Data Big Analytics
The talk will showcase using open source technologies in statistical computing for big data, namely the R programming language and its use cases in big data analysis. It will review case studies using the Amazon Cloud, custom packages in R for Big Data, tools like Revolution Analytics RevoScaleR package, as well as the newly launched SAP Hana used with R. We will also review Oracle R Enterprise. In addition we will show some case studies using BigML.com (using Clojure) , and approaches using PiCloud. In addition it will showcase some of Google APIs for Big Data Analysis.

Lastly we will talk on social media analysis ,national security use cases (i.e. cyber war) and privacy hazards of big data analytics.

Schedule

View more presentations from Ajay Ohri.

Abstracts

View more documents from Ajay Ohri.

Working with a large number of files for reading into R #rstats

Using the dir() and list.files() commands lists all the files in a particular directory. These can be interactively read by R, by referencing to specific parts of the list created by the above two commands. This is useful when you are working with a large number of files, that get generated or re-generated after specific time periods (like web server log files)

> getwd()
[1] “C:/Users/KUs/Documents”
> path=”C:/Users/KUs/Desktop/tester”
> dir(path)
[1] “tester.csv” “tester2.csv” “tester3.csv””tester4.csv”
> setwd(path)
> read.table(file=dir(path)[1],sep=”t”,header=T)
X1 X2 X3 X4
1 to be 2 B

> read.table(file=dir(path)[4],sep=”,”,header=T)
zoo bee doo bee.1 daa
1 12 32 43 34 qwerty

Happy Higgs- Boson Day

Dark Matters from PHD Comics on Vimeo.

The Higgs Boson Explained from PHD Comics on Vimeo.

Saving Output in R for Presentations

While SAS language has a beautifully designed ODS (Output Delivery System) for saving output from certain analysis in excel files (and html and others), in R one can simply use the object, put it in a write.table and save it a csv file using the file parameter within write.table.

As a business analytics consultant, the output from a Proc Means, Proc Freq (SAS) or a summary/describe/table command (in R) is to be presented as a final report. Copying and pasting is not feasible especially for large amounts of text, or remote computers.

Using the following we can simple save the output in R

> getwd()
[1] “C:/Users/KUs/Desktop/Ajay”
> setwd(“C:\Users\KUs\Desktop”)

#We shifted the directory, so we can save output without putting the entire path again and again for each step.

#I have found the summary command most useful for initial analysis and final display (particularly during the data munging step)

nams=summary(ajay)

# I assigned a new object to the analysis step (summary), it could also be summary,names, describe (HMisc) or table (for frequency analysis),
> write.table(nams,sep=”,”,file=”output.csv”)

Note: This is for basic beginners in R using it for business analytics dealing with large number of variables.

pps: Note

If you have a large number of files in a local directory to be read in R, you can avoid typing the entire path again and again by modifying the file parameter in the read.table and changing the working directory to that folder

setwd(“C:/Users/KUs/Desktop/”)
ajayt1=read.table(file=”test1.csv”,sep=”,”,header=T)

ajayt2=read.table(file=”test2.csv”,sep=”,”,header=T)

and so on…

maybe there is a better approach somewhere on Stack Overflow or R help, but this will work just as well.

you can then merge the objects created ajayt1 and ajayt2… (to be continued)

Please share:

Links of interest

Please share:

Information about Registration

Prices for Regular Visitors

Days and Event

Early Bird Rate

Normal Rate

Latest Registration

Prices for Authors and Students

Days and Event

Early Bird Rate

Normal Rate

Latest Registration

Program

Time Slot

Tuesday Training / Workshop 1

Wednesday Conference 1

Thursday Conference 2

Friday Training / Workshop 2

09:00 – 10:30

10:30 – 12:30

12:30 – 13:30

Lunch

Lunch

Lunch

13:30 – 15:00

15:00 – 17:00

19:00

Social Event (Conference Dinner)

Social Event (Visit of Bar District)

Training: Basic Data Mining and Data Transformations

Training: Advanced Data Mining and Data Transformations

Development Workshop Part 1 and Part 2

Please share:

Please share:

Please share:

Please share:

Please share:

Time
Slot

Tuesday
Training / Workshop 1

Wednesday
Conference 1

Thursday
Conference 2

Friday
Training / Workshop 2