RapidMiner launches extensions marketplace

For some time now, I had been hoping for a place where new package or algorithm developers get at least a fraction of the money that iPad or iPhone application developers get. Rapid Miner has taken the lead in establishing a marketplace for extensions. Is there going to be paid extensions as well- I hope so!!

This probably makes it the first “app” marketplace in open source and the second app marketplace in analytics after salesforce.com

It is hard work to think of new algols, and some of them can really be usefull.

Can we hope for #rstats marketplace where people downloading say ggplot3.0 atleast get a prompt to donate 99 cents per download to Hadley Wickham’s Amazon wishlist. http://www.amazon.com/gp/registry/1Y65N3VFA613B

Do you think it is okay to pay 99 cents per iTunes song, but not pay a cent for open source software.

I dont know- but I am just a capitalist born in a country that was socialist for the first 13 years of my life. Congratulations once again to Rapid Miner for innovating and leading the way.

http://rapid-i.com/component/option,com_myblog/show,Rapid-I-Marketplace-Launched.html/Itemid,172

RapidMinerMarketplaceExtensions 30 May 2011
Rapid-I Marketplace Launched by Simon Fischer

Over the years, many of you have been developing new RapidMiner Extensions dedicated to a broad set of topics. Whereas these extensions are easy to install in RapidMiner – just download and place them in the plugins folder – the hard part is to find them in the vastness that is the Internet. Extensions made by ourselves at Rapid-I, on the other hand,  are distributed by the update server making them searchable and installable directly inside RapidMiner.

We thought that this was a bit unfair, so we decieded to open up the update server to the public, and not only this, we even gave it a new look and name. The Rapid-I Marketplace is available in beta mode at http://rapidupdate.de:8180/ . You can use the Web interface to browse, comment, and rate the extensions, and you can use the update functionality in RapidMiner by going to the preferences and entering http://rapidupdate.de:8180/UpdateServer/ as the update server URL. (Once the beta test is complete, we will change the port back to 80 so we won’t have any firewall problems.)

As an Extension developer, just register with the Marketplace and drop me an email (fischer at rapid-i dot com) so I can give you permissions to upload your own extension. Upload is simple provided you use the standard RapidMiner Extension build process and will boost visibility of your extension.

Looking forward to see many new extensions there soon!

Disclaimer- Decisionstats is a partner of Rapid Miner. I have been liking the software for a long long time, and recently agreed to partner with them just like I did with KXEN some years back, and with Predictive AnalyticsConference, and Aster Data until last year.

I still think Rapid Miner is a very very good software,and a globally created software after SAP.

Here is the actual marketplace

http://rapidupdate.de:8180/UpdateServer/faces/index.xhtml

Welcome to the Rapid-I Marketplace Public Beta Test

The Rapid-I Marketplace will soon replace the RapidMiner update server. Using this marketplace, you can share your RapidMiner extensions and make them available for download by the community of RapidMiner users. Currently, we are beta testing this server. If you want to use this server in RapidMiner, you must go to the preferences and enter http://rapidupdate.de:8180/UpdateServer for the update url. After the beta test, we will change the port back to 80, which is currently occupied by the old update server. You can test the marketplace as a user (downloading extensions) and as an Extension developer. If you want to publish your extension here, please let us know via the contact form.

Hot Downloads
«« « 1 2 3 » »»
[Icon]The Image Processing Extension provides operators for handling image data. You can extract attributes describing colour and texture in the image, you can make several transformation of a image data which allows you to perform segmentation and detection of suspicious areas in image data.The extension provides many of image transformation and extraction operators ranging from Wavelet Decomposition, Hough Circle to Block Difference of Inverse probabilities.

[Icon]RapidMiner is unquestionably the world-leading open-source system for data mining. It is available as a stand-alone application for data analysis and as a data mining engine for the integration into own products. Thousands of applications of RapidMiner in more than 40 countries give their users a competitive edge.

  • Data IntegrationAnalytical ETLData Analysis, and Reporting in one single suite
  • Powerful but intuitive graphical user interface for the design of analysis processes
  • Repositories for process, data and meta data handling
  • Only solution with meta data transformation: forget trial and error and inspect results already during design time
  • Only solution which supports on-the-fly error recognition and quick fixes
  • Complete and flexible: Hundreds of data loading, data transformation, data modeling, and data visualization methods
[Icon]All modeling methods and attribute evaluation methods from the Weka machine learning library are available within RapidMiner. After installing this extension you will get access to about 100 additional modelling schemes including additional decision trees, rule learners and regression estimators.This extension combines two of the most widely used open source data mining solutions. By installing it, you can extend RapidMiner to everything what is possible with Weka while keeping the full analysis, preprocessing, and visualization power of RapidMiner.

[Icon]Finally, the two most widely used data analysis solutions – RapidMiner and R – are connected. Arbitrary R models and scripts can now be directly integrated into the RapidMiner analysis processes. The new R perspective offers the known R console together with the great plotting facilities of R. All variables and R scripts can be organized in the RapidMiner Repository.A directly included online help and multi-line editing makes the creation of R scripts much more comfortable.

Interview- Top Data Mining Blogger on Earth , Sandro Saitta

Surajustement Modèle 2
Image via Wikipedia

If you do a Google search for Data Mining Blog- for the past several years one Blog will come on top. data mining blog – Google Search http://bit.ly/kEdPlE

To honor 5 years of Sandro Saitta’s blog (yes thats 5 years!) , we cover an exclusive interview with him where he reveals his unique sauce for cool techie blogging.

Ajay- Describe your journey as a scientist and data miner, from early experiences, to schooling to your work/research/blogging.

Sandro- My first experience with data mining was my master project. I used decision tree to predict pollen concentration for the following week using input data such as wind, temperature and rain. The fact that an algorithm can make a computer learn from experience was really amazing to me. I found it so interesting that I started a PhD in data mining. This time, the field of application was civil engineering. Civil engineers put a lot of sensors on their structure in order to understand how they behave. With all these sensors they generate a lot of data. To interpret these data, I used data mining techniques such as feature selection and clustering. I started my blog, Data Mining Research, during my PhD, to share with other researchers.

I then started applying data mining in the stock market as my first job in industry. I realized the difference between image recognition, where 99% correct classification rate is state of the art, and stock market, where you’re happy with 55%. However, the company ambiance was not as good as I thought, so I moved to consulting. There, I applied data mining in behavioral targeting to increase click-through rates. When you compare the number of customers who click with the ones who don’t, then you really understand what class imbalance mean. A few months ago, I accepted a very good opportunity at SICPA. I’m looking forward to resolving new challenges there.

Ajay- Your blog is the top ranked blog for “data mining blog”. Could you share some tips on better blogging for analytics and technical people

Sandro- It’s always difficult to start a blog, since at the beginning you have no reader. Writing for nobody may seem stupid, but it is not. By writing my first posts during my PhD I was reorganizing my ideas. I was expressing concepts which were not always clear to me. I thus learned a lot and also improved my English level. Of course, it’s still not perfect, but I hope most people can understand me.

Next come the readers. A few dozen each week first. To increase this number, I then started to learn SEO (Search Engine Optimization) by reading books and blogs. I tested many techniques that increased Data Mining Research visibility in the blogosphere. I think SEO is interesting when you already have some content published (which means not at the very beginning of your blog). After a while, once your blog is nicely ranked, the main task is to work on the content of the blog. To be of interest, your content must be particular: original, informative or provocative for example. I also had the chance to have a good visibility thanks to well-known people in the field like Kevin Hillstrom, Gregory Piatetsky-Shapiro, Will Dwinnell / Dean Abbott, Vincent Granville, Matthew Hurst and many others.

Ajay- Whats your favorite statistical software and what are the various softwares that you have worked with.
Could you compare and contrast these software as well.

Sandro- My favorite software at this point is SAS. I worked with it for two years. Once you know the language, you can perform ETL and data mining so easily. It’s also very fast compared to others. There are a lot of tools for data mining, but I cannot think of a tool that is as powerful as SAS and, in the same time, has a high-level programming language behind it.

I also worked with R and Matlab. R is very nice since you have all the up-to-date data mining algorithms implemented. However, working in the memory is not always a good choice, especially for ETL. Matlab is an excellent tool for prototyping. It’s not so fast and certainly not done for ETL, but the price is low regarding all the possibilities for data mining. According to me, SAS is the best choice for ETL and a good choice for data mining. Of course, there is the price.

Ajay- What are your favorite techniques and training resources for learning basics of data mining to say statisticians or business management graduates.

Sandro- I’m the kind of guy who likes to read books. I read data mining books one after the other. The fact that the same concepts are explained differently (and by different people) helps a lot in learning a topic like data mining. Of course, nothing replaces experience in the field. You can read hundreds of books, you will still not be a good practitioner until you really apply data mining in specific fields. My second choice after books is blogs. By reading data mining blogs, you will really see the issues and challenges in the field. It’s still not experience, but we are closer. Finally, web resources and networks such as KDnuggets of course, but also AnalyticBridge and LinkedIn.

Ajay- Describe your hobbies and how they help you ,if at all in your professional life.

Sandro- One of my hobbies is reading. I read a lot of books about data mining, SEO, Google as well as Sci-Fi and Fantasy. I’m a big fan of Asimov by the way. My other hobby is playing tennis. I think I simply use my hobbies as a way to find equilibrium in my life. I always try to find the best balance between work, family, friends and sport.

Ajay- What are your plans for your website for 2011-2012.

Sandro- I will continue to publish guest posts and interviews. I think it is important to let other people express themselves about data mining topics. I will not write about my current applications due to the policies of my current employer. But don’t worry, I still have a lot to write, whether it is technical or not. I will also emphasis more on my experience with data mining, advices for data miners, tips and tricks, and of course book reviews!

Standard Disclosure of Blogging- Sandro awarded me the Peoples Choice award for his blog for 2010 and carried out my interview. There is a lot of love between our respective wordpress blogs, but to reassure our puritan American readers- it is platonic and intellectual.

About Sandro S-



Sandro Saitta is a Data Mining Research Engineer at SICPA Security Solutions. He is also a blogger at Data Mining Research (www.dataminingblog.com). His interests include data mining, machine learning, search engine optimization and website marketing.

You can contact Mr Saitta at his Twitter address- 

https://twitter.com/#!/dataminingblog

Weather Modifying Weapons

OSTM/Jason-2's predecessor TOPEX/Poseidon caug...
Image via Wikipedia

This is part of a continuing series of theoretical weapons. The weapons are theoretical as the United Nations has already banned the weapons (but not banned the building of research of defense from these weapons).

Possible applications of weather modifying weapons.

1) Use surface modifiers on oceans including but not limited to submerged nuclear heaters, airborne solar powered  lasers, surface spreaders like oil slicks. This will help modify the temperature of the ocean in certain critical areas  at critical times, influencing weather esp winds that bring rains.

Example- Modifying or Enhancing El Nino to influence rain to specific countries.

2) Use of air borne or aircraft borne lasers to start forest fires

3) Use of lasers to enhance the rate of melting of strategic glaciers.

4) Modify and interfere with the timing of an active volcano to prevent big rupture, rather to go for controlled releases.

5) Use of harmonics to influence seismic wave activity in geological reasons.

What is a White Paper?

Christine and Jimmy Wales
Image via Wikipedia

As per Jimmy Wales and his merry band at Wiki (pedia not leaky-ah)- The emphasis is mine

What is the best white paper you have read in the past 15 years.

Categories are-

  • Business benefits: Makes a business case for a certain technology or methodology.
  • Technical: Describes how a certain technology works.
  • Hybrid: Combines business benefits with technical details in a single document.
  • Policy: Makes a case for a certain political solution to a societal or economic challenge.
——————————————————————————————————————————————————



white paper is an authoritative report or guide that helps solve a problem. White papers are used to educate readers and help people make decisions, and are often requested and used in politics, policy, business, and technical fields. In commercial use, the term has also come to refer to documents used by businesses as a marketing or sales tool. Policy makers frequently request white papers from universities or academic personnel to inform policy developments with expert opinions or relevant research.

Government white papers

In the Commonwealth of Nations, “white paper” is an informal name for a parliamentary paper enunciating government policy; in the United Kingdom these are mostly issued as “Command papers“. White papers are issued by the government and lay out policy, or proposed action, on a topic of current concern. Although a white paper may on occasion be a consultation as to the details of new legislation, it does signify a clear intention on the part of a government to pass new law. White Papers are a “…. tool of participatory democracy … not [an] unalterable policy commitment.[1] “White Papers have tried to perform the dual role of presenting firm government policies while at the same time inviting opinions upon them.” [2]

In Canada, a white paper “is considered to be a policy document, approved by Cabinet, tabled in the House of Commons and made available to the general public.”[3] A Canadian author notes that the “provision of policy information through the use of white and green papers can help to create an awareness of policy issues among parliamentarians and the public and to encourage an exchange of information and analysis. They can also serve as educational techniques”.[4]

“White Papers are used as a means of presenting government policy preferences prior to the introduction of legislation”; as such, the “publication of a White Paper serves to test the climate of public opinion regarding a controversial policy issue and enables the government to gauge its probable impact”.[5]

By contrast, green papers, which are issued much more frequently, are more open ended. These green papers, also known as consultation documents, may merely propose a strategy to be implemented in the details of other legislation or they may set out proposals on which the government wishes to obtain public views and opinion.

White papers published by the European Commission are documents containing proposals for European Union action in a specific area. They sometimes follow a green paper released to launch a public consultation process.

For examples see the following:

 Commercial white papers

Since the early 1990s, the term white paper has also come to refer to documents used by businesses and so-called think tanks as marketing or sales tools. White papers of this sort argue that the benefits of a particular technologyproduct or policy are superior for solving a specific problem.

These types of white papers are almost always marketing communications documents designed to promote a specific company’s or group’s solutions or products. As a marketing tool, these papers will highlight information favorable to the company authorizing or sponsoring the paper. Such white papers are often used to generate sales leads, establish thought leadership, make a business case, or to educate customers or voters.

There are four main types of commercial white papers:

  • Business benefits: Makes a business case for a certain technology or methodology.
  • Technical: Describes how a certain technology works.
  • Hybrid: Combines business benefits with technical details in a single document.
  • Policy: Makes a case for a certain political solution to a societal or economic challenge.

Resources

  • Stelzner, Michael (2007). Writing White Papers: How to capture readers and keep them engaged. Poway, California: WhitePaperSource Publishing. pp. 214. ISBN 9780977716937.
  • Bly, Robert W. (2006). The White Paper Marketing Handbook. Florence, Kentucky: South-Western Educational Publishing. pp. 256. ISBN 9780324300826.
  • Kantor, Jonathan (2009). Crafting White Paper 2.0: Designing Information for Today’s Time and Attention Challenged Business Reader. Denver,Colorado: Lulu Publishing. pp. 167.ISBN 9780557163243.

Protected: Happy Labour Day to American Stats-ical Association

This content is password-protected. To view it, please enter the password below.

Scholarships for students via #rstatsjobs and R-lings

A vector drawing of the University of York coa...
Image via Wikipedia

Outstandingly attractive scholarships are available for students willing to travel to Yorkshire. Thats where the Battle of Roses was fought by the British Royal Family.

see http://en.wikipedia.org/wiki/Wars_of_the_Roses

Emphasis  and spaces in email above are made by me.

Message from Dr Top i   bell ow-


It is not New York but very old York, in the North of England.

The scholarships carry a tax-free stipend and financial assistance will be
given for travel expenses to and from York. Accommodation for successful
students is available on the University of York Campus.

For information about the tax-free stipend please write to
scholarships@yccsa.org.

Continue reading “Scholarships for students via #rstatsjobs and R-lings”

Analyzing Conversations on Twitter

If you are a marketing , analyst relationship, public relationship or a product manager who uses or abuses social media, you sometimes need to track what influencers and analysts are saying. A tool called Bettween allows you to capture public conversations between two influential (or interesting) tweeps.

See conversations between Neil Raden http://www.beyeblogs.com/raden/ and Curt Monash http://www.dbms2.com/ two noted BI gurus

http://bettween.com/neilraden/curtmonash

  • @NEILRADEN66
  • @CURTMONASH61
  • TOTAL MESSAGES127
  • SHARE CONVERSATION


unless Google decides to license its Wave technology to Twitter for separate encrypted , or public tweets. 🙂 They do share some history and employees (cough cough) or Twitter waits to create or better its public /protected tweet mode to be more granular

http://bettween.com/neilraden/curtmonash#statistics

tools to analyze Twitter conversations in SAS