Home » Posts tagged 'Languages' (Page 2)
Tag Archives: Languages
Interview Jason Kuo SAP Analytics #Rstats
Here is an interview with Jason Kuo who works with SAP Analytics as Group Solutions Marketing Manager. Jason answers questions on SAP Analytics and it’s increasing involvement with R statistical language.
Ajay- What made you choose R as the language to tie important parts of your technology platform like HANA and SAP Predictive Analysis. Did you consider other languages like Julia or Python.
Jason- It’s the most popular. Over 50% of the statisticians and data analysts use R. With 3,500+ algorithms its arguably the most comprehensive statistical analysis language. That said,we are not closing the door on others.
Ajay- When did you first start getting interested in R as an analytics platform?
Jason- SAP has been tracking R for 5+ years. With R’s explosive growth over the last year or two, it made sense for us to dramatically increase our investment in R.
Ajay- Can we expect SAP to give back to the R community like Google and Revolution Analytics does- by sponsoring Package development or sponsoring user meets and conferences?
Will we see SAP’s R HANA package in this year’s R conference User 2012 in Nashville
Jason- Yes. We plan to provide a specific driver for HANA tables for input of the data to native R. This planned for end of 2012. We’ll then review our event strategy. SAP has been a sponsor of Predictive Analytics World for several years and was indeed a founding sponsor. We may be attending the year’s R conference in Nashville.
Ajay- What has been some of the initial customer feedback to your analytics expansion and offerings.
Jason- We have completed two very successful Pilots of the R Integration for HANA with two of SAP’s largest customers.
About-
Jason has over 15 years of BI and Data Warehousing industry experience. Having worked at Oracle, Business Objects, and now SAP, Jason has been involved in numerous technical marketing roles involving performance management dashboards, information management, text analysis, predictive analytics, and now big data. He has a bachelor’s of science in operations research from the University of Michigan.
Interview BigML.com
Here is an interview with Charlie Parker, head of large scale online algorithms at http://bigml.com
Ajay- Describe your own personal background in scientific computing, and how you came to be involved with machine learning, cloud computing and BigML.com
Charlie- I am a machine learning Ph.D. from Oregon State University. Francisco Martin (our founder and CEO), Adam Ashenfelter (the lead developer on the tree algorithm), and myself were all studying machine learning at OSU around the same time. We all went our separate ways after that.
Francisco started Strands and turned it into a 100+ million dollar company building recommender systems. Adam worked for CleverSet, a probabilistic modeling company that was eventually sold to Cisco, I believe. I worked for several years in the research labs at Eastman Kodak on data mining, text analysis, and computer vision.
When Francisco left Strands to start BigML, he brought in Justin Donaldson who is a brilliant visualization guy from Indiana, and an ex-Googler named Jose Ortega who is responsible for most of our data infrastructure. They pulled in Adam and I a few months later. We also have Poul Petersen, a former Strands employee, who manages our herd of servers. He is a wizard and makes everyone else’s life much easier.
Ajay- You use clojure for the back end of BigML.com .Are there any other languages and packages you are considering? What makes clojure such a good fit for cloud computing ?
Charlie- Clojure is a great language because it offers you all of the benefits of Java (extensive libraries, cross-platform compatibility, easy integration with things like Hadoop, etc.) but has the syntactical elegance of a functional language. This makes our code base small and easy to read as well as powerful.
We’ve had occasional issues with speed, but that just means writing the occasional function or library in Java. As we build towards processing data at the Terabyte level, we’re hoping to create a framework that is language-agnostic to some extent. So if we have some great machine learning code in C, for example, we’ll use Clojure to tie everything together, but the code that does the heavy lifting will still be in C. For the API and Web layers, we use Python and Django, and Justin is a huge fan of HaXe for our visualizations.
Ajay- Current support is for Decision Trees. When can we see SVM, K Means Clustering and Logit Regression?
Charlie- Right now we’re focused on perfecting our infrastructure and giving you new ways to put data in the system, but expect to see more algorithms appearing in the next few months. We want to make sure they are as beautiful and easy to use as the trees are. Without giving too much away, the first new thing we will probably introduce is an ensemble method of some sort (such as Boosting or Bagging). Clustering is a little further away but we’ll get there soon!
Ajay- How can we use the BigML.com API using R and Python.
Charlie- We have a public github repo for the language bindings. https://github.com/bigmlcom/io Right now, there there are only bash scripts but that should change very soon. The python bindings should be there in a matter of days, and the R bindings in probably a week or two. Clojure and Java bindings should follow shortly after that. We’ll have a blog post about it each time we release a new language binding. http://blog.bigml.com/
Ajay- How can we predict large numbers of observations using a Model that has been built and pruned (model scoring)?
Charlie- We are in the process of refactoring our backend right now for better support for batch prediction and model evaluation. This is something that is probably only a few weeks away. Keep your eye on our blog for updates!
Ajay- How can we export models built in BigML.com for scoring data locally.
Charlie- This is as simple as a call to our API. https://bigml.com/developers/models The call gives you a JSON object representing the tree that is roughly equivalent to a PMML-style representation.
About-
You can read about Charlie Parker at http://www.linkedin.com/pub/charles-parker/11/85b/4b5 and the rest of the BigML team at
Play Color Cipher and Visual Cryptography
I was just reading up on my weekly to-read list and came across this interesting method. It is called Play Color Cipher-
Each Character ( Capital, Small letters, Numbers (0-9), Symbols on the keyboard ) in the plain text is substituted with a color block from the available 18 Decillions of colors in the world [11][12][13] and at the receiving end the cipher text block (in color) is decrypted in to plain text block. It overcomes the problems like “Meet in the middle attack, Birthday attack and Brute force attacks [1]”.
It also reduces the size of the plain text when it is encrypted in to cipher text by 4 times, with out any loss of content. Cipher text occupies very less buffer space; hence transmitting through channel is very fast. With this the transportation cost through channel comes down.
Reference-
http://www.ijcaonline.org/journal/number28/pxc387832.pdf
Visual Cryptography is indeed an interesting topic-
Visual cryptography, an emerging cryptography technology, uses the characteristics of human vision to decrypt encrypted
images. It needs neither cryptography knowledge nor complex computation. For security concerns, it also ensures that hackers
cannot perceive any clues about a secret image from individual cover images. Since Naor and Shamir proposed the basic
model of visual cryptography, researchers have published many related studies.
Visual cryptography (VC) schemes hide the secret image into two or more images which are called
shares. The secret image can be recovered simply by stacking the shares together without any complex
computation involved. The shares are very safe because separately they reveal nothing about the secret image.
Visual Cryptography provides one of the secure ways to transfer images on the Internet. The advantage
of visual cryptography is that it exploits human eyes to decrypt secret images .
References-
Color Visual Cryptography Scheme Using Meaningful Shares
http://csis.bits-pilani.ac.in/faculty/murali/netsec-10/seminar/refs/muralikrishna4.pdf
Visual cryptography for color images
http://csis.bits-pilani.ac.in/faculty/murali/netsec-10/seminar/refs/muralikrishna3.pdf
Other Resources
- http://users.telenet.be/d.rijmenants/en/visualcrypto.htm
- Visual Crypto – One-time Image Create two secure images from one by Robert Hansen
- Visual Crypto Java Applet at the University of Regensburg
- Visual Cryptography Kit Software to create image layers
- On-line Visual Crypto Applet by Leemon Baird
- Extended Visual Cryptography (pdf) by Mizuho Nakajima and Yasushi Yamaguchi
- Visual Cryptography Paper by Moni Noar and Adi Shamir
- Visual Crypto Talk (pdf) by Frederik Vercauteren ESAT Leuven
- http://cacr.uwaterloo.ca/~dstinson/visual.html
- t the University of Salerno web page on visual cryptogrpahy.
- Visual Crypto Page by Doug Stinson
Constructions and Bounds for Visual Cryptography
Lecture Notes in Computer Science 1099 (1996), 416-428 (23rd International Colloquium on Automata, Languages and Programming).- Visual Cryptography for General Access Structures
Information and Computation 129 (1996), 86-106 (this paper is an expanded and revised version of the conference paper). - On the Contrast in Visual Cryptography Schemes
Journal of Cryptology 12 (1999), 261-289. - Extended Schemes for Visual Cryptography
Theoretical Computer Science 250 (2001), 143-161. - Threshold Visual Cryptography Schemes With Specified Whiteness Levels of Reconstructed Pixels
Designs, Codes and Cryptography 25 (2002), 15-61. - Contrast Optimal Threshold Visual Cryptography Schemes
SIAM J. on Discrete Math. 16 (2003), 224-261. - “Visual Cryptography: Seeing is Believing” availablehere,
- example- face http://cacr.uwaterloo.ca/~dstinson/VCS-happyface.html
- flag http://cacr.uwaterloo.ca/~dstinson/VCS-flag.html
- pi http://cacr.uwaterloo.ca/~dstinson/VCS-pi.html
- Simple implementation of the visual cryptography scheme based on Moni Naor and Adi Shamir, Visual Cryptography, EUROCRYPT 1994, pp1–12. This technique allows visual information like pictures to be encrypted so that decryption can be done visually.The code outputs two files. Try printing them on two separate transparencies and putting them one on top of the other to see the hidden message. http://algorito.com/algorithm/visual-cryptography
Visual Cryptography
- Moni Naor and Adi Shamir, Visual Cryptography , Eurocrypt 94. Postscript , gzipped Postscript
- Moni Naor and Adi Shamir, Visual Cryptography II , Cambridge Workshop on Protocols, 1996. Postscript, gzipped Postscript
- Moni Naor and Benny Pinkas, Visual Authentication , Crypto 97. Postscript, gzipped Postscript
–
Ajay- I think a combination of sharing and color ciphers would prove more helpful to secure Internet Communication than existing algorithms. It also levels the playing field from computationally rich players to creative coders.
Interview Michal Kosinski , Concerto Web Based App using #Rstats
Here is an interview with Michal Kosinski , leader of the team that has created Concerto – a web based application using R. What is Concerto? As per http://www.psychometrics.cam.ac.uk/page/300/concerto-testing-platform.htm
Concerto is a web based, adaptive testing platform for creating and running rich, dynamic tests. It combines the flexibility of HTML presentation with the computing power of the R language, and the safety and performance of the MySQL database. It’s totally free for commercial and academic use, and it’s open source
Ajay- Describe your career in science from high school to this point. What are the various stats platforms you have trained on- and what do you think about their comparative advantages and disadvantages?
Michal- I started with maths, but quickly realized that I prefer social sciences – thus after one year, I switched to a psychology major and obtained my MSc in Social Psychology with a specialization in Consumer Behaviour. At that time I was mostly using SPSS – as it was the only statistical package that was taught to students in my department. Also, it was not too bad for small samples and the rather basic analyses I was performing at that time.
My more recent research performed during my Mphil course in Psychometrics at Cambridge University followed by my current PhD project in social networks and research work at Microsoft Research, requires significantly more powerful tools. Initially, I tried to squeeze as much as possible from SPSS/PASW by mastering the syntax language. SPSS was all I knew, though I reached its limits pretty quickly and was forced to switch to R. It was a pretty dreary experience at the start, switching from an unwieldy but familiar environment into an unwelcoming command line interface, but I’ve quickly realized how empowering and convenient this tool was.
I believe that a course in R should be obligatory for all students that are likely to come close to any data analysis in their careers. It is really empowering – once you got the basics you have the potential to use virtually any method there is, and automate most tasks related to analysing and processing data. It is also free and open-source – so you can use it wherever you work. Finally, it enables you to quickly and seamlessly migrate to other powerful environments such as Matlab, C, or Python.
Ajay- What was the motivation behind building Concerto?
Michal- We deal with a lot of online projects at the Psychometrics Centre – one of them attracted more than 7 million unique participants. We needed a powerful tool that would allow researchers and practitioners to conveniently build and deliver online tests.
Also, our relationships with the website designers and software engineers that worked on developing our tests were rather difficult. We had trouble successfully explaining our needs, each little change was implemented with a delay and at significant cost. Not to mention the difficulties with embedding some more advanced methods (such as adaptive testing) in our tests.
So we created a tool allowing us, psychometricians, to easily develop psychometric tests from scratch an publish them online. And all this without having to hire software developers.
Ajay -Why did you choose R as the background for Concerto? What other languages and platforms did you consider. Apart from Concerto, how else do you utilize R in your center, department and University?
Michal- R was a natural choice as it is open-source, free, and nicely integrates with a server environment. Also, we believe that it is becoming a universal statistical and data processing language in science. We put increasing emphasis on teaching R to our students and we hope that it will replace SPSS/PASW as a default statistical tool for social scientists.
Ajay -What all can Concerto do besides a computer adaptive test?
Michal- We did not plan it initially, but Concerto turned out to be extremely flexible. In a nutshell, it is a web interface to R engine with a built-in MySQL database and easy-to-use developer panel. It can be installed on both Windows and Unix systems and used over the network or locally.
Effectively, it can be used to build any kind of web application that requires a powerful and quickly deployable statistical engine. For instance, I envision an easy to use website (that could look a bit like SPSS) allowing students to analyse their data using a web browser alone (learning the underlying R code simultaneously). Also, the authors of R libraries (or anyone else) could use Concerto to build user-friendly web interfaces to their methods.
Finally, Concerto can be conveniently used to build simple non-adaptive tests and questionnaires. It might seem to be slightly less intuitive at first than popular questionnaire services (such us my favourite Survey Monkey), but has virtually unlimited flexibility when it comes to item format, test flow, feedback options, etc. Also, it’s free.
Ajay- How do you see the cloud computing paradigm growing? Do you think browser based computation is here to stay?
Michal - I believe that cloud infrastructure is the future. Dynamically sharing computational and network resources between online service providers has a great competitive advantage over traditional strategies to deal with network infrastructure. I am sure the security concerns will be resolved soon, finishing the transformation of the network infrastructure as we know it. On the other hand, however, I do not see a reason why client-side (or browser) processing of the information should cease to exist – I rather think that the border between the cloud and personal or local computer will continually dissolve.
About
Michal Kosinski is Director of Operations for The Psychometrics Centre and Leader of the e-Psychometrics Unit. He is also a research advisor to the Online Services and Advertising group at the Microsoft Research Cambridge, and a visiting lecturer at the Department of Mathematics in the University of Namur, Belgium. You can read more about him at http://www.michalkosinski.com/
You can read more about Concerto at http://code.google.com/p/concerto-platform/ and http://www.psychometrics.cam.ac.uk/page/300/concerto-testing-platform.htm
Oracle launches its version of R #rstats
From-
http://www.oracle.com/us/corporate/press/1515738
Integrates R Statistical Programming Language into Oracle Database 11g
News Facts
Comprehensive In-Database Platform for Advanced Analytics
Supporting Quotes
| Oracle Advanced Analytics — an option to Oracle Database 11g Enterprise Edition – extends the database into a comprehensive advanced analytics platform through two major components: Oracle R Enterprise and Oracle Data Mining. With Oracle Advanced Analytics, customers have a comprehensive platform for real-time analytic applications that deliver insight into key business subjects such as churn prediction, product recommendations, and fraud alerting.
Oracle R Enterprise tightly integrates the open source R programming language with the database to further extend the database with Rs library of statistical functionality, and pushes down computations to the database. Oracle R Enterprise dramatically advances the capability for R users, and allows them to use their existing R development skills and tools, and scripts can now also run transparently and scale against data stored in Oracle Database 11g. Oracle Data Mining provides powerful data mining algorithms that run as native SQL functions for in-database model building and model deployment. It can be accessed through the SQL Developer extension Oracle Data Miner to build, evaluate, share and deploy predictive analytics methodologies. At the same time the high-performance Oracle-specific data mining algorithms are accessible from R. |
||
BENEFITS |
||
|
| Oracle R Hadoop Connector | Gives R users high performance native access to Hadoop Distributed File System (HDFS) and MapReduce programming framework. |
|---|
Exciting Contest at CrowdANALYTIX
A new contest from a relatively new website. This one is fast and furious and has a decent chunk of money!
From
http://www.crowdanalytix.com/contests/airport-guest-sentiment-analysis-1544282253/view/
| Submission Deadline: Sun, 26 February 2012 05:00 AM UTC |
Results Announced by: Mon, 05 March 2012 05:00 AM UTC |
| Category: Text Analytics | Function: Aerospace & Aviation |
| Title |
| Analysis of sentiment and its intensity – feedback from airport guests |
| Description |
| ABC (name intentionally obfuscated) is one of the best managed and highly profitable airports in India. As with all well managed airports, ABC would like to understand what guests feel about their experience when traveling, using or transiting through their airport. ABC has a website in which guests can visit and leave behind a comment, agree or disagree with others’ comments, or respond to a comment confirming or negating the expressed opinion.
The goal of this contest is to create a summarization of the opinions, feelings and sentiments expressed in the comments left behind by guests on the website. This information is being provided as data for solvers. Some understanding of the intensity of the opinion, feeling or sentiment will also be useful. For example, if there is a consistent demand for more spas across guest conversations, it needs to be highlighted. Consistent positive or negative sentiments and opinions need to be discovered and highlighted.
Data
Guest comments have been crawled and provided to you. The data consists approximately 1000 comments from guests including the timestamp of those comments. Personal information (name, email etc) have been hidden. This data is publicly available
Solver Expectations:
Participants may submit entries before the deadline. If a participant submits multiple entries, the entry submitted last before the deadline will be considered as the participant’s submission.
The following deliverables are expected to be submitted:
Timeline and Prizes:
This contest begin on 16 Feb 2012 and will last for a duration of 9 days.
Prizes:
|
On Software
1) All software has bugs. Sometimes this is because people have been told to code in a hurry to meet shipping deadlines. Sometimes it is due to the way metal and other software interact with it. Mostly it is karma.
2) In the 21 st Century,It is okay to insult someone over his software , but not over most other things. Sometimes I think people are passionate not just for their own software but to just diss the other guys. It is a politically convenient release.
3) Bloggers writing about software are full of bull-by products. If they were any good in writing code, they would not have time to write a blog. Mostly bloggers on code are people whose coding enthusiasm is more than their coding competence.
4) Software is easier than it looks to people who know it. To those who dont know how to code, it will always be a bit of magic.
5) Despite immense progress, initiatives and encouragement- the number of females writing code is too low . Comparatively, figuratively and literally. If you are a male and want a social life- get into marketing while the hair is still black.
Man walks into Bar. Says to Women at Bar. ” Hey,What do you do, Me- I write code”
See!
6) People who write software end up making more money not just because they create useful stuff that helps get work done faster or helps reduce boredom for people. They make more money because they are mostly passionate, logical problem thinkers, focused, hard working and better read on a variety of subjects than others. That’s your cue to how to make money even if you cannot code.
7) I would rather write much more code rather than write poetry. But I sometimes think they are related. Just manipulating words in different languages to manipulate output in different machines or people.
8) Kids should be taught software at early age , as that is a skill that helps in their education and thinking. More education for the kids!
9) Laying off talented software people because you found a cheaper , younger alternative half across the globe is sometimes evil. It is also inevitable. Learn more software as you grow older.
10) The best software is the one in your head. It was written by a better programmer too.









