Interview Rapid-I -Ingo Mierswa and Simon Fischer

Here is an interview with Dr Ingo Mierswa , CEO of Rapid -I and Dr Simon Fischer, Head R&D. Rapid-I makes the very popular software Rapid Miner – perhaps one of the earliest leading open source software in business analytics and business intelligence. It is quite easy to use, deploy and with it’s extensions and innovations (including compatibility with R )has continued to grow tremendously through the years.

In an extensive interview Ingo and Simon talk about algorithms marketplace, extensions , big data analytics, hadoop, mobile computing and use of the graphical user interface in analytics.

Special Thanks to Nadja from Rapid I communication team for helping coordinate this interview.( Statuary Blogging Disclosure- Rapid I is a marketing partner with Decisionstats as per the terms in https://decisionstats.com/privacy-3/)

Ajay- Describe your background in science. What are the key lessons that you have learnt while as scientific researcher and what advice would you give to new students today.

Ingo: My time as researcher really was a great experience which has influenced me a lot. I have worked at the AI lab of Prof. Dr. Katharina Morik, one of the persons who brought machine learning and data mining to Europe. Katharina always believed in what we are doing, encouraged us and gave us the space for trying out new things. Funnily enough, I never managed to use my own scientific results in any real-life project so far but I consider this as a quite common gap between science and the “real world”. At Rapid-I, however, we are still heavily connected to the scientific world and try to combine the best of both worlds: solving existing problems with leading-edge technologies.

Simon: In fact, during my academic career I have not worked in the field of data mining at all. I worked on a field some of my colleagues would probably even consider boring, and that is theoretical computer science. To be precise, my research was in the intersection of game theory and network theory. During that time, I have learnt a lot of exciting things, none of which had any business use. Still, I consider that a very valuable experience. When we at Rapid-I hire people coming to us right after graduating, I don’t care whether they know the latest technology with a fancy three-letter acronym – that will be forgotten more quickly than it came. What matters is the way you approach new problems and challenges. And that is also my recommendation to new students: work on whatever you like, as long as you are passionate about it and it brings you forward.

Ajay-  How is the Rapid Miner Extensions marketplace moving along. Do you think there is a scope for people to say create algorithms in a platform like R , and then offer that algorithm as an app for sale just like iTunes or Android apps.

 Simon: Well, of course it is not going to be exactly like iTunes or Android apps are, because of the more business-orientated character. But in fact there is a scope for that, yes. We have talked to several developers, e.g., at our user conference RCOMM, and several people would be interested in such an opportunity. Companies using data mining software need supported software packages, not just something they downloaded from some anonymous server, and that is only possible through a platform like the new Marketplace. Besides that, the marketplace will not only host commercial extensions. It is also meant to be a platform for all the developers that want to publish their extensions to a broader community and make them accessible in a comfortable way. Of course they could just place them on their personal Web pages, but who would find them there? From the Marketplace, they are installable with a single click.

Ingo: What I like most about the new Rapid-I Marketplace is the fact that people can now get something back for their efforts. Developing a new algorithm is a lot of work, in some cases even more that developing a nice app for your mobile phone. It is completely accepted that people buy apps from a store for a couple of Dollars and I foresee the same for sharing and selling algorithms instead of apps. Right now, people can already share algorithms and extensions for free, one of the next versions will also support selling of those contributions. Let’s see what’s happening next, maybe we will add the option to sell complete RapidMiner workflows or even some data pools…

Ajay- What are the recent features in Rapid Miner that support cloud computing, mobile computing and tablets. How do you think the landscape for Big Data (over 1 Tb ) is changing and how is Rapid Miner adapting to it.

Simon: These are areas we are very active in. For instance, we have an In-Database-Mining Extension that allows the user to run their modelling algorithms directly inside the database, without ever loading the data into memory. Using analytic databases like Vectorwise or Infobright, this technology can really boost performance. Our data mining server, RapidAnalytics, already offers functionality to send analysis processes into the cloud. In addition to that, we are currently preparing a research project dealing with data mining in the cloud. A second project is targeted towards the other aspect you mention: the use of mobile devices. This is certainly a growing market, of course not for designing and running analyses, but for inspecting reports and results. But even that is tricky: When you have a large screen you can display fancy and comprehensive interactive dashboards with drill downs and the like. On a mobile device, that does not work, so you must bring your reports and visualizations very much to the point. And this is precisely what data mining can do – and what is hard to do for classical BI.

Ingo: Then there is Radoop, which you may have heard of. It uses the Apache Hadoop framework for large-scale distributed computing to execute RapidMiner processes in the cloud. Radoop has been presented at this year’s RCOMM and people are really excited about the combination of RapidMiner with Hadoop and the scalability this brings.

 Ajay- Describe the Rapid Miner analytics certification program and what steps are you taking to partner with academic universities.

Ingo: The Rapid-I Certification Program was created to recognize professional users of RapidMiner or RapidAnalytics. The idea is that certified users have demonstrated a deep understanding of the data analysis software solutions provided by Rapid-I and how they are used in data analysis projects. Taking part in the Rapid-I Certification Program offers a lot of benefits for IT professionals as well as for employers: professionals can demonstrate their skills and employers can make sure that they hire qualified professionals. We started our certification program only about 6 months ago and until now about 100 professionals have been certified so far.

Simon: During our annual user conference, the RCOMM, we have plenty of opportunities to talk to people from academia. We’re also present at other conferences, e.g. at ECML/PKDD, and we are sponsoring data mining challenges and grants. We maintain strong ties with several universities all over Europe and the world, which is something that I would not want to miss. We are also cooperating with institutes like the ITB in Dublin during their training programmes, e.g. by giving lectures, etc. Also, we are leading or participating in several national or EU-funded research projects, so we are still close to academia. And we offer an academic discount on all our products 🙂

Ajay- Describe the global efforts in making Rapid Miner a truly international software including spread of developers, clients and employees.

Simon: Our clients already are very international. We have a partner network in America, Asia, and Australia, and, while I am responding to these questions, we have a training course in the US. Developers working on the core of RapidMiner and RapidAnalytics, however, are likely to stay in Germany for the foreseeable future. We need specialists for that, and it would be pointless to spread the development team over the globe. That is also owed to the agile philosophy that we are following.

Ingo: Simon is right, Rapid-I already is acting on an international level. Rapid-I now has more than 300 customers from 39 countries in the world which is a great result for a young company like ours. We are of course very strong in Germany and also the rest of Europe, but also concentrate on more countries by means of our very successful partner network. Rapid-I continues to build this partner network and to recruit dynamic and knowledgeable partners and in the future. However, extending and acting globally is definitely part of our strategic roadmap.

Biography

Dr. Ingo Mierswa is working as Chief Executive Officer (CEO) of Rapid-I. He has several years of experience in project management, human resources management, consulting, and leadership including eight years of coordinating and leading the multi-national RapidMiner developer team with about 30 developers and contributors world-wide. He wrote his Phd titled “Non-Convex and Multi-Objective Optimization for Numerical Feature Engineering and Data Mining” at the University of Dortmund under the supervision of Prof. Morik.

Dr. Simon Fischer is heading the research & development at Rapid-I. His interests include game theory and networks, the theory of evolutionary algorithms (e.g. on the Ising model), and theoretical and practical aspects of data mining. He wrote his PhD in Aachen where he worked in the project “Design and Analysis of Self-Regulating Protocols for Spectrum Assignment” within the excellence cluster UMIC. Before, he was working on the vtraffic project within the DFG Programme 1126 “Algorithms for large and complex networks”.

http://rapid-i.com/content/view/181/190/ tells you more on the various types of Rapid Miner licensing for enterprise, individual and developer versions.

(Note from Ajay- to receive an early edition invite to Radoop, click here http://radoop.eu/z1sxe)

 

Indian Business Schools Alumni try to grow more equal

A message from one the IIM (Indian Institute of Management) alumni networks, just an example of how any global organization should make extra efforts to make things more equal- and (thus position their brand for a differentiated place for attracting talent)

http://en.wikipedia.org/wiki/Indian_Institutes_of_Management

The Indian Institutes of Management (IIMs), are graduate business schools in India that also conduct research and provide consultancy services in the field of management to various sectors of the Indian economy. They were created by the Indian Government[1] with the aim of identifying the brightest intellectual talent[1] available in the student community of India and training it in the best management techniques available in the world, to ultimately create a pool of elite managers to manage and lead the various sections of the Indian economy.

The IIMs are considered the top business schools in India.[3] All the IIMs are completely autonomous institutes owned and financed by the Central Government of India. In order of establishment, the IIMs are located at Calcutta (Kolkata), Ahmedabad, Bangalore, Lucknow, Kozhikode (Calicut), Indore, Shillong, Ranchi, Rohtak, Raipur, Trichy, Kashipur and Udaipur. (My alma mater is Lucknow)

 

IIMs being role models have shared knowledge and skills with other institutions to improve their quality and standards in management education


————————————————————————————————————–
IIM A Alumni Association has been reaching out to the alumni associations of other IIMs to broad base the brotherhood (no offense to the fairer sex. Couldn’t think of a replacement word).

IM Calcutta Alumni Association has been conducting a lecture series and has invited us for the next edition. The topic is “The Unlimited Person”

India’s ambitions today – particulary reflected in the Corporate Sector – are Unlimited. What mind-set does it take to realise these ambitions ? Minds that live in the past or in the future – as too many Indian minds do – limit themselves, their companies and their country.

This presentation gives several examples of our current average mind-set and talks about ways in which an unlimited mind-set can emerge, creating “The Unlimited Person”

The speaker will be IIM Calcutta alumnus Shashi Maudgal, Chief Marketing Officer of Hindalco Industries of the Aditya Birla Group. The date is Friday June 24th at Gulmohar at the India Habitat Centre . Time 7 pm.

We hope you will come for this lecture and benefit from Shashi’s experience and insights.

Jayaraman -PGP ’70 / Sunil Kala PGP ’73 / Salil Agrawal PGP ’83

T. Venkateswaran PGP ’85 / Rahul Aggarwal PGP ’94

Calling #Rstats lovers and bloggers – to work together on “The R Programming wikibook”

so you think u like R, huh. Well it is time to pay it forward.

Message from a dear R blogger, Tal G from Tel Aviv (creator of R-bloggers.com and SAS-X.com)

———————————————————————————————————-
Calling R lovers and bloggers – to work together on “The R Programming wikibook”
Posted: 20 Jun 2011 07:05 AM PDT

This post is a call for both R community members and R-bloggers, to come and help make The R Programming wikibook be amazing:

Dear R community member – please consider giving a visit to The R Programming wikibook. If you wish to contribute your knowledge and editing skills to the project, then you could learn how to write in wiki-markup here, and how to edit a wikibook here (you can even use R syntax highlighting in the wikibook). You could take information into the site from the (soon to be) growing list of available R resources for harvesting.

Dear R blogger, you can help The R Programming wikibook by doing the following:

Write to your readers about the project and invite them to join.
Add your blog’s R content as an available resource for other editors to use for the wikibook. Here is how to do that:
First, make a clear indication on your blog that your content is licensed under cc-by-sa copyrights (*see what it means at the end of the post). You can do this by adding it to the footer of your blog, or by writing a post that clearly states that this is the case (what a great opportunity to write to your readers about the project…).
Next, go and add a link, to where all of your R content is located on your site, to the resource page (also with a link to the license post, if you wrote one). For example, since I write about other things besides R, I would give a link to my R category page, and will also give a link to this post. If you do not know how to add it to the wiki, just e-mail me about it (tal.galili@gmail.com).
If you are an R blogger, besides living up to the spirit of the R community, you will benefit from joining this project in that every time someone will use your content on the wikibook, they will add your post as a resource. In the long run, this is likely to help visitors of the site get to know about you and strengthen your site’s SEO ranking. Which reminds me, if you write about this, I always appreciate a link back to my blog

* Having a cc-by-sa copyrights means that you will agree that anyone may copy, distribute, display, and make derivative works based on your content, only if they give the author (you) the credits in the manner specified by you. And also that the user may distribute derivative works only under a license identical to the license that governs the original work.

———-

Three more points:

1) This post is a result of being contacted by Paul (a.k.a: PAC2), asking if I could help promote “The R Programming wikibook” among R-bloggers and their readers. Paul has made many contributions to the book so far. So thank you Paul for both reaching out and helping all of us with your work on this free open source project.

2) I should also mention that the R wiki exists and is open for contribution. And naturally, every thing that will help the R wikibook will help the R wiki as well.

3) Copyright notice: I hereby release all of the writing material content that is categoriesed in the R category page, under the cc-by-sa copyrights (date: 20.06.2011). Now it’s your turn!

———-

List of R bloggers who have joined: (This list will get updated as this “group writing” project will progress)

R-statistics blog (that’s Tal…)
Decisionstats.com (That’s me)
……………………………………………………………………………….
3) Copyright notice: I hereby release all of the writing material content of this website, under the cc-by-sa copyrights (date: 21.06.2011). Now it’s your turn!

https://decisionstats.com/privacy-3/

Content Licensing-
This website has all content licensed under
http://creativecommons.org/licenses/by-sa/3.0/
You are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work

Scholarships for students via #rstatsjobs and R-lings

A vector drawing of the University of York coa...
Image via Wikipedia

Outstandingly attractive scholarships are available for students willing to travel to Yorkshire. Thats where the Battle of Roses was fought by the British Royal Family.

see http://en.wikipedia.org/wiki/Wars_of_the_Roses

Emphasis  and spaces in email above are made by me.

Message from Dr Top i   bell ow-


It is not New York but very old York, in the North of England.

The scholarships carry a tax-free stipend and financial assistance will be
given for travel expenses to and from York. Accommodation for successful
students is available on the University of York Campus.

For information about the tax-free stipend please write to
scholarships@yccsa.org.

Continue reading “Scholarships for students via #rstatsjobs and R-lings”

Intel® Threading Challenge 2011 Software Contest

Logo of Intel, Jul 1968 - Dec 2005
Image via Wikipedia

One more software contests for you, but in the sub million dollar prize range

http://software.intel.com/en-us/contests/intel-threading-challenge-2011/contests.php

Intel® Threading Challenge 2011 – Win a Trip to Intel Developer Forum in San Francisco

Intel® Threading Challenge 2011 is going BIG this year! After three exciting threading competitions, our fourth Threading Challenge is stepping up the excitement with a BIG Grand Prize, a trip to the Intel Developer Forum (IDF) in San Francisco (September 13-15, 2011).

Since 2008, the Intel® Threading Challenge has attracted developers of varying experience from around the world. The active participation from the community has made the Threading Challenge not only a great programming competition, but a great way for community members to engage with each other, trade threading tips, and discover new parallel programming resources.

Last year’s format of two competition levels, Master and Apprentice, generated great excitement and opened the Threading Challenge to a new group of participants. So, we are going to continue the competition with a Master level and Apprentice level, each competing for the Grand Prize for their level, as well as individual problem awards. We know you love a great challenge and great prizes, so our Threading Challenge Team is putting together some exciting threading problems for you.

Monday, April 18, 2011 – Threading Challenge 2011 (Phase 1) Launches (both levels) at 12:00 PM (noon PDT)– The competition for 2011 is very similar to last year’s, but read on whether you’re a previous participant or new to the Threading Challenge, so you will be aware of all elements of the competition and how to compete. Then, you can start threading your way to prizes today!

Choose the right level for you!

 

Threading Challenge 2011:

• Two levels available for entry: Apprentice & Master
• Phase 1: 3 problems in each level
• Phase 2: Stay tuned for details, coming in Autumn 2011
• We will award 1st, 2nd & 3rd place prizes for each problem in each level
• No overlap of problems and each level’s problems will be offered consecutively
• Participants have the option to use the Intel® Manycore Testing Lab (MTL), consisting of 40 cores, 80 threads
• To enter the Threading Challenge 2011, please read the Official Rules and register for the competition with link in the “To Enter” Section.

The Threading Challenge will be implemented in two phases, with the 1st Phase consisting of 3 problems in each level. The details of the 2nd Phase will be announced in September 2011. For Phase 1, a new problem in each level will be launched on the days listed below at 12:00 noon (PDT) and will be open for entry for 22 days (inclusive of the problem starting day), until closing on the final problem day at 12:00 noon (PDT).

Problem Start and Closing Dates (both Master and Apprentice levels):

Problem 1:
Starts: Monday, April 18, 2011 at 12:00pm (PDT)
Ends. Monday, May 9, 2011 at 12:00pm (PDT)

Problem 2:
Starts: Monday, May 9, 2011 at 12:00pm (PDT)
Ends: Monday, May 30, 2011 at 12:00pm (PDT)

Problem 3: (Due to U.S. Memorial Day Holiday, Problem 2 will start on Tuesday, May 31, 2011)
Starts: Tuesday, May 31, 2011 at 12:00pm (PDT)
Ends: Tuesday, June 21, 2011 at 12:00pm (PDT)

*All problems start and end at 12:00 noon (Pacific Daylight Time)

Contestants will have 22 days to complete their entry submission (solution only for Apprentice OR solution and write-up for Master) for each problem. You may enter ONLY 1 problem at a time and will need to choose which level (Apprentice or Master) you wish to participate in during each problem cycle. You will be awarded points based on your solution submitted. Be sure to take advantage of our threading resources and tools, and you may validate your solution (optional) using the Intel® Manycore Testing Lab to solve your problems and get involved in the dedicated forums to earn extra points.

Each problems winners will be announced on the site after the problem is closed, and Prizes will be awarded to those problem winners (see official rules for prize distribution information). The Grand Prize, a Trip to Intel® Developer Forum (IDF) in San Francisco, will be awarded for each level to the participant that has the highest total points earned for the three problems in each level (i.e., highest total points for Master level problems and Apprentice level problems).

The Intel® Threading Challenge attracts some of the most talented developers in the world to solve parallelism code challenges. Now is your chance to take multithreading to the next level and possibly win great prizes. Demonstrate your threading expertise today!

More Details:

Intel® Threading Challenge 2011 is organized so any level of developer can have the opportunity to participate. Two levels of participation are available. The Apprentice level gives those just getting started in multithreading development a chance to try out and improve their threading skills. The Master level will be executed similarly to previous threading challenges, providing those with more experience a chance to test their skills and compete against other experienced developers.

Intel® Manycore Testing Lab – Available as Option for Threading Challenge 2011 Participants

This year competitors will have the optional opportunity to develop and validate their code using the Intel® Manycore Testing Lab. This 40-core, 80-thread development environment has the latest hardware and software available and will be used by this year’s judges to test the winning entries in Threading Challenge 2011 Phase 1.

The Intel® Manycore Testing Lab (MTL) will be made available to Threading Challenge 2011 contestants. Use of the MTL will give participants the opportunity to write and test their code on systems exactly configured to what the judges will be using to score submitted entries. No more guessing about if your code will build or how it will run. (There is no requirement to use the MTL for any part of the contest. It is strictly an optional alternative being made available to those that wish to use it.)

STEM is cool

Lady Gaga holding a speech at National Equalit...
Image via Wikipedia

A good video created by my favorite social media people from a company in North Carolina.

STEM is cool (Science Technology Engineering Maths?)

No, Science is not kool aid- it is just COOL. and better paying than watching Justin Bieber or Lady Gaga videos. Get those lazy teenagers out of Glee clubs and back into Science clubs.

The video itself-

Disclaimer- I have no direct or indirect  financial relationship with the creators of this video. I think it is cool people express creativity in positive ways to help their favorite software,company, and even the world. Blah Blah Blah 🙂

Yeah, STEM is cool again.

 

 

Summer School on Uncertainty Quantification

Scheme for sensitivity analysis
Image via Wikipedia

SAMSI/Sandia Summer School on Uncertainty Quantification – June 20-24, 2011

http://www.samsi.info/workshop/samsisandia-summer-school-uncertainty-quantification

The utilization of computer models for complex real-world processes requires addressing Uncertainty Quantification (UQ). Corresponding issues range from inaccuracies in the models to uncertainty in the parameters or intrinsic stochastic features.

This Summer school will expose students in the mathematical and statistical sciences to common challenges in developing, evaluating and using complex computer models of processes. It is essential that the next generation of researchers be trained on these fundamental issues too often absent of traditional curricula.

Participants will receive not only an overview of the fast developing field of UQ but also specific skills related to data assimilation, sensitivity analysis and the statistical analysis of rare events.

Theoretical concepts and methods will be illustrated on concrete examples and applications from both nuclear engineering and climate modeling.

The main lecturers are:
Dan Cacuci (N.C. State University): data assimilation and applications to nuclear engineering

Dan Cooley (Colorado State University): statistical analysis of rare events
This short course will introduce the current statistical practice for analyzing extreme events. Statistical practice relies on fitting distributions suggested by asymptotic theory to a subset of data considered to be extreme. Both block maximum and threshold exceedance approaches will be presented for both the univariate and multivariate cases.

Doug Nychka (NCAR): data assimilation and applications in climate modeling
Climate prediction and modeling do not incorporate geophysical data in the sequential manner as weather forecasting and comparison to data is typically based on accumulated statistics, such as averages. This arises because a climate model matches the state of the Earth’s atmosphere and ocean “on the average” and so one would not expect the detailed weather fluctuations to be similar between a model and the real system. An emerging area for climate model validation and improvement is the use of data assimilation to scrutinize the physical processes in a model using observations on shorter time scales. The idea is to find a match between the state of the climate model and observed data that is particular to the observed weather. In this way one can check whether short time physical processes such as cloud formation or dynamics of the atmosphere are consistent with what is observed.

Dongbin Xiu (Purdue University): sensitivity analysis and polynomial chaos for differential equations
This lecture will focus on numerical algorithms for stochastic simulations, with an emphasis on the methods based on generalized polynomial chaos methodology. Both the mathematical framework and the technical details will be examined, along with performance comparisons and implementation issues for practical complex systems.

The main lectures will be supplemented by discussion sessions and by presentations from UQ practitioners from both the Sandia and Los Alamos National Laboratories.

http://www.samsi.info/workshop/samsisandia-summer-school-uncertainty-quantification