Here is a great video and slides on doing statistical network analysis using R. It is by Drew Conway from NYU.
Social Network Analysis in R from Drew Conway on Vimeo.
Here is a great video and slides on doing statistical network analysis using R. It is by Drew Conway from NYU.
Social Network Analysis in R from Drew Conway on Vimeo.
The Europe based Rapid Miner came out with version 4.5 of their data mining tool ( also known as Yale) with a much promising “Script” tool.
Also, Rapid Miner came in 1st in open source data mining tools in a poll by Industry benchmark www.kdnuggets.com
They have a brilliant video here for people who just want to have a look at the new Rapid Miner
http://rapid-i.com/videos/rapidminer_tour_3_4_en.html
Citation-
http://rapid-i.com/content/view/147/1/
Here is an interview with Jim Davis, SAS Institute SVP and Chief Marketing Officer.
Jim Davis, SAS Institute..
Ajay -Please describe your career in science to your present position. What advice would you give to young science graduates in this recession? What advice would you give to entrepreneurs in these challenging economic times?
Jim – After earning a degree in computer science from North Carolina State University, I embarked on a career path that ultimately brought me to SAS and my role as senior VP and CMO. Along the way I’ve worked in software development, newspaper and magazine publishing and IT operations. In 1994, I joined SAS, where I worked my way up the ranks from enterprise computing strategist focused on IT issues to program manager for data warehousing to director of product strategy, VP of marketing and now CMO. It’s been an interesting path.
My advice to new graduates embarking on a career is to leave no stone unturned in your search, particularly in this economy, but also consider adding to your skill set. A local example here in the Research Triangle area is at N.C. State University’s Institute for Advanced Analytics, which offers a master’s degree that combines business and analytical skills. These skills are very much in demand. SAS CEO Jim Goodnight helped establish this 10-month degree program where the first 23 graduating all found solid jobs within four months at an average salary of $81,000. Many of this year’s class, facing the worst economy since the Great Depression, have already found jobs. For entrepreneurs today, my advice is simple: make absolutely sure you’re creating a product or service that people want. And especially given the challenging economic environment, resolve to improve your decision making. Regardless of industry or company size, business decisions need to be based on facts, on data, on science. Not on hunches and guesswork. Business analytics can help here.
Ajay – What are some of the biggest challenges that you have faced and tackled as a marketing person for software? What continues to your biggest focus area for this year?
Jim – Among the biggest challenges that the SAS marketing team has worked to overcome is the perception that analytical software – advanced forecasting, optimization and data mining technologies – are way too complex, difficult to use, and only useful to a small band of highly trained statisticians and other quantitative experts, or “quants.” With lots of hard work, we’ve been able to show the marketplace that powerful tools are available in business solutions designed to solve industry issues.
The biggest marketing challenge now is showing the market how SAS offers unique value with its broad and integrated technologies. The industry terminology is confusing with some companies selling Business Intelligence tools that when you scratch the surface are limited to reporting and query operations. Other SAS competitors only provide data integration software, and still others offer analytics. SAS is the only vendor offering an integrated portfolio of these three very important technologies, as well as cross-industry and industry-specific intelligent applications. This combination, which we and others are calling Business Analytics, is a very powerful set of capabilities. Our challenge is to demonstrate the real value of our comprehensive portfolio. We’ll get there but we have some work to do.
Ajay -It is rare to find a major software company that has zero involvement with open source movement (or as I call it with peer-reviewed code). Could you name some of SAS Institute’s contribution to open source? What could be further plans to enhance this position with the global community of scientists?
Jim – SAS does support open source and open standards too. Open standards typically guide open source implementations (e.g., the OASIS work is guiding some of the work in Eclipse Cosmos, some of the JCP standards guide the Tomcat implementation, etc.).
Some examples of SAS’s contributions to open source and open standards include:
Apache Software Foundation – a senior SAS developer has been a committer on multiple releases of the Apache Tomcat project, and has also acted as Release Coordinator.
Eclipse Foundation — SAS developers were among the early adopters of Eclipse. One senior SAS developer wrote a tutorial whitepaper on using Eclipse RCP, and was named “Top Ambassador” in the 2006 Eclipse Community Awards. Another is a committer on the Eclipse Web Tools project. A third proposed and led Eclipse’s Albireo project. SAS is a participant in the Eclipse Cosmos project, with three R&D employees as committers. Finally, SAS’ Rich Main served on the board of directors of the Eclipse Foundation from 2003 to 2006, helping write the Eclipse Bylaws, Development Process, Membership Agreement, Intellectual Property Policy and Public License.
Java Community Process — SAS has been a Java licensee partner since 1997 and has been active in the Java Community Process. SAS has participated in approximately 25 Java Specification Requests spanning both J2SE and J2EE technology. Rich Main of SAS also served on the JCP Executive Committee from 2005 through 2008.
OASIS — A senior SAS developer serves as secretary of the OASIS Solution Deployment Descriptor (SDD) Technical Committee. In total, six SAS employees serve on this committee.
XML for Analysis — SAS co-sponsored XML for Analysis standard with Microsoft and Hyperion.
Others — A small SAS team developed Cobertura, an open source coverage analysis tool for Java. SAS (through our database access team) is one of the top corporate contributors to Firebird, an open source relational database. Another developer contributes to Slide WebDav. We’ve had people work on HtmlUnit (another testing framework) and FreeBSD.
In addition, there are dozens if not hundreds of contributed bug reports, fixes/patches from SAS developers using open source software. SAS will continue to expand our work with and contribute to open-source tools and communities.
For example, we know a number of our customers use R as well as SAS. So we decided to make it easier for them to access R by making it available in the SAS environment. Our first interface to R, which enables users to integrate R functionality with IML or SAS programs, will be in an upcoming version of SAS/IML Studio later this summer. We’re also working on an R interface that can be surfaced in the SAS server or via other SAS clients.
Ajay – What is business intelligence, and business analytics as per you? SAS is the first IT vendor that comes in the non sponsored link when I search for “business intelligence’ in Google. How well do you think the SAS Business Intelligence Platform rates across platforms from SAP, Oracle , IBM and Microsoft.
Jim – Traditional business intelligence (BI) as we know it is outdated and insufficient.
The term BI has been stretched and widened to encapsulate a lot of different techniques, tools and technologies since it was first coined decades ago. Essentially, BI has always been about information delivery, be it in static rows and columns, graphical representations of information, or the modern and hyper-interactive dashboard with dials and widgets.
BI technologies have also evolved to include intuitive ad-hoc query and analysis with the ability to drill down into the details within context. All of these capabilities are great for reacting to business problems after they have occurred. But businesses face diverse and complex problems, global competition grows exponentially, and increasingly restrictive regulations are just around the corner. They need to anticipate and manage change, drive sustainable growth and track performance.
Now they also have to operate in the midst of a ruinous global credit and liquidity crisis. Reactionary decision making is just not working. Now more than ever, progressive organizations are looking to leverage the power of analytics, specifically business analytics. Why? Real business value comes from capitalizing on all available information assets and selecting the best outcome based on every possible scenario.
Proactive evidence-based decisions – not just information delivery – should drive informed decisions. That is business analytics and that is what SAS provides its customers.
Businesses require robust data integration, data quality, data and text mining, predictive modeling, forecasting and optimization technologies to anticipate what might happen, avoid undesired outcomes and course correct.
These capabilities need to be in synch and integrated from the ground up rather than cobbled together through acquisitions. More importantly, they cannot be part of a monolithic platform that requires 2-3 years before any real value is derived.
They must be part of an agile framework that enables an organization to address its most critical business issues now and then add new functionality over time. A business analytics framework — like the one SAS provides — enables strategic business decisions that optimize performance across an organization.
Ajay – For 4 decades SAS Institute created, nurtured and sustained the SAS language, often paying from its pocket for conferences, papers. Till today SAS Language code on your website is free and accessible to all without a registration unlike other software companies. What do you have to say about third party SAS language compilers like “Carolina” and “WPS”
Jim – There is no doubt that much of the power and flexibility behind our framework for business analytics is derived from our SAS language. At its core, the Base SAS language offers an easy-to-learn syntax and hundreds of language elements, pre-built SAS procedures and re-usable functions. Our focus on listening and adapting to customer’s changing needs has helped us, over the years, to sustain and continuously improve the SAS language and the SAS products that leverage it.
Competition comes in many forms and it pushes us to innovate and keep delivering value for our customers. Language compilers or code interpreters like Carolina and WPS are no exception.
One thing that sets SAS apart from other vendors is that we care so deeply about the quality of results.Our Technical Support, Education and consulting services organizations really do partner with customers to help them achieve the best results.
As Anne Milley, SAS’ director of technology product marketing, told DecisionStats this March, customers have varied and specific requirements for their analytics infrastructure. Desired attributes include speed, quality, support, backward and forward compatibility, and others. Certain customers only care about one or two of these attributes, other customers care about more. With our broad and deep analytics portfolio, SAS can uniquely provide the analytics infrastructure that meets a customer’s specific requirements, whether for one or many key attributes. Because of this, an overwhelming majority vote with their pocketbooks to select or retain SAS.
For example, as Anne noted, for some customers with tight batch-processing windows, speed trumps everything. In tests conducted by Merrill Consultants, an MXG program running on WPS runs significantly longer, consumes more CPU time and requires more memory than the same MXG program hosted on its native SAS platform.
At SAS, we provide a complete environment for analytics — from data collection, manipulation, exploration and analysis to the deployment of results. One example of our continuous innovation, and where we are devoting R&D and sales resources, is the SAS In-Database Processing Initiative. Through in-database analytics, customers can move computational tasks (e.g., SAS code, SQL) to execute inside a database. This streamlines the analytic data preparation, model development and scoring processes. Customers needing to leverage their investments in mixed workload relational database platforms will benefit from this SAS initiative. It will help them accelerate their business processes and drive decisions with greater confidence and efficiency.
Ajay – Are you going to move closer for an acquisition? Or be acquired? Which among the existing BI vendors are you most comfortable with in synergy of products and philosophy?
Jim –SAS is in an enviable position as the largest independent provider of business intelligence (BI) software, and the leader in the rapidly emerging field of business analytics, which combines BI with data integration and advanced analytics. We have no plans, nor have had any talks regarding SAS being acquired.
As for SAS acquiring another company, we continuously look for technologies complementary to our wide and deep lineup of business analytics solutions, many of which are targeted at the specific needs of industries ranging from banking, insurance and pharma to healthcare, telecom, manufacturing and government.
Last year, SAS made two acquisitions, IDeaS Revenue Optimization, the premier provider of advanced revenue-management and optimization software for the hospitality industry, and Teragram, a leader in natural language processing and advanced linguistic technology. IDeaS delivers to SAS and our hotel and hospitality customers software sold as a service that meets a critical need in this industry. Teragram’s exciting technology has enhanced SAS’ own robust text mining offerings.
Ajay – Jim Goodnight is a legend in philanthropy, inventions, and as a business leader (obviously he has a fine team supporting him). Who will be the next Jim Goodnight ?
Jim – I think Jim Goodnight best addressed the question of succession plans at SAS best a few years ago when he noted that the business world often places undue emphasis on the CEO and forgets about the CTO, CMO, CFO and other senior leaders who play a key role in any company’s success. SAS has a very strong executive management team that runs a two billion-dollar software company very effectively. If a “next Jim Goodnight” is needed in the future, SAS will be ready and will continue to provide our customers with the business analytics software they need.
Biography-
Jim Davis, Senior Vice President and Chief Marketing Officer for SAS, is responsible for providing strategic direction for SAS products, solutions and services and presenting the SAS brand worldwide. He helped develop the Information Evolution Model and co-authored “Information Revolution: Using the information Evolution Model to Grow your Business.” By outlining how information is managed and used as a corporate asset, the model enables organizations to evaluate their management of information objectively, providing a framework for making improvements necessary to compete in today’s global arena.
SAS (www.sas.com) is the leader in business analytics software and services, and the largest independent vendor in the business intelligence market. Through innovative solutions delivered within an integrated framework, SAS helps customers at more than 45,000 sites improve performance and deliver value by making better decisions faster. Since 1976 SAS has been giving customers around the world The Power to Know®.
Here is an Interview with REvolution Computing’s Director of Community David Smith.
Ajay -Tell us about your journey in science. In particular tell us what attracted you to R and the open source movement.
David- I got my start in science in 1990 working with CSIRO (the government science organization in Australia) after I completed my degree in mathematics and computer science. Seeing the diversity of projects the statisticians there worked on really opened my eyes to statistics as the way of objectively answering questions about science.
That’s also when I was first introduced to the S language, the forerunner of R. I was hooked immediately; it was just so natural for doing the work I had to do. I also had the benefit of a wonderful mentor, Professor Bill Venables, who at the time was teaching S to CSIRO scientists at remote stations around Australia. He brought me along on his travels as an assistant. I learned a lot about the practice of statistical computing helping those scientists solve their problems (and got to visit some great parts of Australia, too).
Ajay- How do you think we should help bring more students to the fields of mathematics and science-
David- For me, statistics is the practical application of mathematics to the real world of messy data, complex problems and difficult conclusions. And in recent years, lots of statistical problems have broken out of geeky science applications to become truly mainstream, even sexy. In our new information society, graduating statisticians have a bright future ahead of them which I think will inevitably draw more students to the field.
Ajay- Your blog at REVolution Computing is one of the best technical corporate blogs. In particular the monthly round up of new packages, R events and product launches all written in a lucid style. Are there any plans for a REvolution computing community or network as well instead of just the blog.
David- Yes, definitely. We recently hired Danese Cooper as our Open Source Diva to help us in this area. Danese has a wealth of experience building open-source communities, such as for Java at Sun. We’ll be announcing some new community initiatives this summer. In the meantime, of course, we’ll continue with the Revolutions blog, which has proven to be a great vehicle for getting the word out about R to a community that hasn’t heard about it before. Thanks for the kind words about the blog, by the way — it’s been a lot of fun to write. It will be a continuing part of our community strategy, and I even plan to expand the roster of authors in the future, too. (If you’re an aspiring R blogger, please get in touch!)
Ajay- I kind of get confused between what exactly is 32 bit or 64 bit computing in terms of hardware and software. What is the deal there. How do Enterprise solutions from REvolution take care of the 64 bit computing. How exactly does Parallel computing and optimized math libraries in REvolution R help as compared to other flavors of R.
David– Fundamentally, 64-bit systems allow you to process larger data sets with R — as long as you have a version of R compiled to take advantage of the increased memory available. (I wrote about some of the technical details behind this recently on the blog.) One of the really exciting trends I’ve noticed over the past 6 months is that R is being applied to larger and more complex problems in areas like predictive analytics and social networking data, so being able to process the largest data sets is key.
One common mis perception is that 64-bit systems are inherently faster than their 32-bit equivalents, but this isn’t generally the case. To speed up large problems, the best approach is to break the problem down into smaller components and run them in parallel on multiple machines. We created the ParallelR suite of packages to make it easy to break down such problems in R and run them on a multiprocessor workstation, a local cluster or grid, or even cloud computing systems like Amazon’s EC2 .
” While the core R team produces versions of R for 64-bit Linux systems, they don’t make one for Windows. Our development team spent more than six months making R work on 64-bit Windows (and optimizing it for speed), which we released as REvolution R Enterprise bundled with ParallelR. We’re excited by the scale of the applications our subscribers are already tackling with a combination of 64-bit and parallel computing”
Ajay- Command line is oh so commanding. Please describe any plans to support or help any R GUI like rattle or R Commander. Do you think Revolution R can get more users if it does help a GUI.
David- Right now we’re focusing on making R easier to use for programmers by creating a new GUI for programming and debugging R code. We heard feedback from some clients who were concerned about training their programmers in R without a modern development environment available. So we’re addressing that by improving R to make the “standard” features programmers expect (like step debugging and variable inspection) work in R and integrating it with the standard environment for programmers on Windows, Visual Studio.
In my opinion R’s strength lies in its combination of high-quality of statistical algorithms with a language ideal for applying them, so “hiding” the language behind a general-purpose GUI negates that strength a bit, I think. On the other hand it would be nice to have an open-source “user-friendly” tool for desktop statistical analysis, so I’m glad others are working to extend R in that area.
Ajay- Companies like SAS are investing in SaaS and cloud computing. Zementis offers scored models on the cloud through PMML. Any views on just building the model or analytics on the cloud itself.
David- To me, cloud computing is a cost-effective way of dynamically scaling hardware to the problem at hand. Not everyone has access to a 20-machine cluster for high-performing computing — and even those that do can’t instantly convert it to a cluster of 100 or 1000 machines to satisfy a sudden spike in demand. REvolution R Enterprise with ParallelR is unique in that it provides a platform for creating sophisticated data analysis applications distributed in the cloud, quickly and easily.
Using clouds for building models is a no-brainer for parallel-computing problems: I recently wrote about how parallel backtesting for financial trading can easily be deployed on Amazon EC2, for example. PMML is a great way of deploying static models, but one of the big advantages of cloud computing is that it makes it possible to update your model much more frequently, to keep your predictions in tune with the latest source data.
Ajay- What are the major alliances that REvolution has in the industry.
David- We have a number of industry partners. Microsoft and Intel, in particular, provide financial and technical support allowing us to really strengthen and optimize R on Windows, a platform that has been somewhat underserved by the open-source community. With Sybase, we’ve been working on combing REvolution R and Sybase Rap to produce some exciting advances in financial risk analytics. Similarly, we’ve been doing work with Vhayu’s Velocity database to provide high-performance data extraction. On the life sciences front, Pfizer is not only a valued client but in many ways a partner who has helped us “road-test” commercial grade R deployment with great success.
Ajay- What are the major R packages that REvolution supports and optimizes and how exactly do they work/help?
David- REvolution R works with all the R packages: in fact, we provide a mirror of CRAN so our subscribers have access to the truly amazing breadth and depth of analytic and graphical methods available in third-party R packages. Those packages that perform intensive mathematical calculations automatically benefit from the optimized math libraries that we incorporate in REvolution R Enterprise. In the future, we plan to work with authors of some key packages provide further improvements — in particular, to make packages work with ParallelR to reduce computation times in multiprocessor or cloud computing environments.
Ajay- Are you planning to lay off people during the recession. does REvolution Computing offer internships to college graduates. What do people at REvolution Computing do to have fun?
David- On the contrary, we’ve been hiring recently. We don’t have an intern program in place just yet, though. For me, it’s been a really fun place to work. Working for an open-source company has a different vibe than the commercial software companies I’ve worked for before. The most fun for me has been meeting with R users around the country and sharing stories about how R is really making a difference in so many different venues — over a few beers of course!
David Smith
Director of Community
David has a long history with the statistical community. After graduating with a degree in Statistics from the University of Adelaide, South Australia, David spent four years researching statistical methodology at Lancaster University (United Kingdom), where he also developed a number of packages for the S-PLUS statistical modeling environment. David continued his association with S-PLUS at Insightful (now TIBCO Spotfire) where for more than eight years he oversaw the product management of S-PLUS and other statistical and data mining products. David is the co-author (with Bill Venables) of the tutorial manual, An Introduction to R , and one of the originating developers of ESS: Emacs Speaks Statistics. Prior to joining REvolution, David was Vice President, Product Management at Zynchros, Inc.
Ajay – To know more about David Smith and REvolution Computing do visit http://www.revolution-computing.com and
http://www.blog.revolution-computing.com
Also see interview with Richard Schultz ,CEO REvolution Computing here.
http://www.decisionstats.com/2009/01/31/interviewrichard-schultz-ceo-revolution-computing/
Let us assume there are top 100 analysts in the world mostly using WordPress or Typepad or Blogger to make posts
Managing them is quite a challenge.
What is marketing ROI of analyst relationships for a Business Intelligence vendor- Curt Monash is the Aerosmith of Business Intelligence Analysts so he can tell it better.
How about a magical community where you just use their mostly Feedburner of Feedblitz RSS feeds to create a self automated community.
Serach Engine Optimization can be tricked by keeping that community website free from Google or Search Engines ( yes it can be done).
Use numerical etc as in Linkedin to spur rivalry by shifting their page positions up and down, or by clicking repeatedly on some posts to manipulate their views on blog posts.
What would SAS pay to have all SAS analysts in one webpage. or SPSS to have all SPSS analysts in one webpage.
Six months later suddenly open the website for search engines, and the RSS feed has downloaded all the posts of all the top 50 analysts of the world. Google advertsing wont matter because hey we have a mega vendor sponsor- while individual bloggers / analysts have no collective strength now as the community is too big.
So much blah blah-
What software would you use.
you can choose between
Ning.com ( but it mostly non Blog feeds based)
or Wordframe.com ( which interface and name sounds suspiciously like WordPress software)
Or you can choose a customized WordPress Solution called Buddy Press.
Here is the software-
BuddyPress will transform an installation of WordPress MU into a social network platform.
BuddyPress is a suite of WordPress plugins and themes, each adding a distinct new feature. BuddyPress contains all the features you’d expect from WordPress but aims to let members socially interact. Read More ?
Note this was just a generic case study for making a case for open source based community softwares. Resemblance to any thing is a matter of coincidence – except for Curt Monash of course.
Cost of Customized WordPress Software for communties is a big zero- it is free and open source and tjousands of plugins can be installed and maintained for it.
See an existing installation here
www.decisionstats.com/community
or at www.buddypress.org
Search Engines are difficult subjects to talk about there are multiple experts and there are multiple vendors from Microsoft, Yahoo, Google, Cuil and there are newer innovations like Cosmix -Blended Search and there are Wiki Search (Including the Digg bar and same features introduced in Google now). Content itself has exploded from websites in 1999 to websites, blog posts, RSS feeds, tweets, Facebook profiles, online communities, voice, podcast and video. Quantitative measures to measure, index and rank the new types of content require that the algorithms behind search be made open source but with strict creative commons licensing and using third party developers to create search algorithm extensions.
This idea seems difficulty to implement but it has been there and done before. No one creates Palo Alto like research labs anymore- all scientist and researchers have to first sign away copy"catrights before beginning their research.
The year 2009 is different from the year 1999, and PageRank is no longer a maths based algol- it is a marketing brand. Time for the Stanford dropouts to go back to school and get some more math and some less marketing (and less pranks on Wolfram please) in their search engine. And Paul Allen who created the building in Stanford where the Google algol was first thought, he needs to spend some Bills and venture fund a new wave of innovation in search engines. Is this wishful thinking? Maybe. I just need a better search engine than Google right now.- Perhaps Herr Schmidt take some time from viewing mountains in Mountain View and measure customer satisfaction instead of just measuring market share in non competitive and likely to face anti trust scrutiny in US and Europe very soon. So better give some of the ranking algol features open so all websites implement the SEO tactics magically revealed and create a better world wide web- thus negating the information asymmetry in a closed source search engine.
We have covered KNIME.com ‘s open source platform earlier. On the eve of it’s new product launch, co-founder of Knime.com Fabian Dill reveals his thoughts in an exclusive interview.
From the Knime.com website
The modular data exploration platform KNIME, originally solely developed at the University of Konstanz, Germany, enables the user to visually create data flows – or pipelines, execute selected analysis steps, and later investigate the results through interactive views on data and models. KNIME already has more than 2,000 active users in diverse application areas, ranging from early drug discovery and customer relationship analysis to financial information integration.
Ajay – What prompted you personally to be part of KNIME and not join a big technology company? What does the future hold for KNIME in 2009-10?
Fabian -I was excited when I first joined the KNIME team in 2005. Back then, we were working exclusively on the open source version backed by some academic funding. Being part of the team that put together such a professional data mining environment from scratch was a great experience. Growing this into a commercial support and development arm has been a thrill as well. The team and the diverse experiences gained from helping get a new company off the ground and being involved in everything it takes to enable this to be successful made it unthinkable for me to work anywhere else.
We continue to develop the open source arm of KNIME and many new features lie ahead: text, image, and time series processing as well as better support for variables. We are constantly working on adding new nodes. KNIME 2.1 is expected in the fall and some of the ongoing development can already be found on the KNIME Labs page (http://labs.knime.org)
The commercial division is providing support and maintenance subscriptions for the freely available desktop version. At the same time we are developing products which will streamline the integration of KNIME into existing IT infrastructures:
the KNIME Grid Support lets you run your compute-intensive (sub-) workflows or nodes on a grid or cluster;
KNIME Reporting makes use of KNIME’s flexibility in order to gather the data for your report and provides simplified views (static or interactive=dashboards) on the resulting workflow and its results; and
the KNIME Enterprise Server facilitates company-wide installation of KNIME and supports collaboration between departments and sites by providing central workflow repositories, scheduled and remote execution, and user rights management.
Ajay -Software as a service and Cloud Computing is the next big thing in 2009. Are there any plans to put KNIME on a cloud computer and charge clients for the hour so they can build models on huge data without buying any hardware but just rent the time?
Fabian – Cloud computing is an agile and client-centric approach and therefore fits nicely into the KNIME framework, especially considering that we are already working on support for distributed computing of KNIME workflows (see above). However, we have no immediate plans for KNIME workflow processing on a per-use charge or similar. Thats an interesting idea, though. The way KNIME nodes are nicely encapsulated (and often even distributable themselves) would make this quite natural.
Ajay – What differentiates KNIME from other products such as RPro and Rapid Miner, for example? What are the principal challenges you have faced in developing it? Why do customers like and dislike it?
Fabian- Every tool has its strengths and weaknesses depending on the task you actually want to accomplish. The focus of KNIME is to support the user during his or her quest of understanding large and heterogeneous data and to make sense out of it. For this task, you cannot rely only on classical data mining techniques, wrapping them into a command line or otherwise configurable environment, but simple, intuitive access to those tools is required in addition to supporting visual exploration with interactive linking and brushing techniques.
By design, KNIME is a modular integration platform, which makes it easy to write own nodes (with the easy-to-use API) or integrate existing libraries or tools.
We integrated Weka, for example, because of its vast library of state-of-the-art machine learning algorithms, the open source program R – in order to provide access to a rich library of statistical functions (and of course many more) – and parts of the Chemistry Development Kit (CDK). All these integrations follow the KNIME requirements for easy and intuitive usage so the user does not need to understand the details of each tool in great depth.
A number of our commercial partners such as Schroedinger, Infocom, Symyx, Tripos, among others, also follow this paradigm and similarly integrate their tools into KNIME. Academic collaborations with ETH Zurich, Switzerland on the High Content Screening Platform HC/DC represent another positive outcome of this open architecture. We believe that this strictly result-oriented approach based on a carefully designed and professionally coded framework is a key factor of KNIME’s broad acceptance. I guess this is another big differentiator: right from the start, KNIME has been developed by a team consisting of SW developers with decades of industrial SW engineering experience.
Ajay – Any there any Asian plans for KNIME? Any other open source partnerships in the pipeline?
Fabian – We have a Japan-based partner, Infocom, who operates in the fields of life science. But we are always open for other partnerships, supporters, or collaborations.
In addition to the open source integrations mentioned above (Weka, R, CDK, HC/DC), there are many other different projects in the works and partnerships under negotiation. Keep an eye on our blog and on our Labs@KNIME page (labs.knime.org).
ABOUT

KNIME – development started in January 2004. Since then: 10 releases; approx. 350,000 lines of code; 25,000 downloads; an estimated 2000 active users. KNIME.com was founded in June 2008 in Zurich, Switzerland.
Fabian Dill – has been working for and with KNIME since 2005; co-founder of KNIME.com.