Home » Posts tagged 'beta'
Tag Archives: beta
Predictive Models Ain’t Easy to Deploy
This is a guest blog post by Carole Ann Matignon of Sparkling Logic. You can see more on Sparkling Logic at http://my.sparklinglogic.com/
Decision Management is about combining predictive models and business rules to automate decisions for your business. Insurance underwriting, loan origination or workout, claims processing are all very good use cases for that discipline… But there is a hiccup… It ain’t as easy you would expect…
What’s easy?
If you have a neat model, then most tools would allow you to export it as a PMML model – PMML stands for Predictive Model Markup Language and is a standard XML representation for predictive model formulas. Many model development tools let you export it without much effort. Many BRMS – Business rules Management Systems – let you import it. Tada… The model is ready for deployment.
What’s hard?
The problem that we keep seeing over and over in the industry is the issue around variables.
Those neat predictive models are formulas based on variables that may or may not exist as is in your object model. When the variable is itself a formula based on the object model, like the min, max or sum of Dollar amount spent in Groceries in the past 3 months, and the object model comes with transaction details, such that you can compute it by iterating through those transactions, then the problem is not “that” big. PMML 4 introduced some support for those variables.
The issue that is not easy to fix, and yet quite frequent, is when the model development data model does not resemble the operational one. Your Data Warehouse very likely flattened the object model, and pre-computed some aggregations that make the mapping very hard to restore.
It is clearly not an impossible project as many organizations do that today. It comes with a significant overhead though that forces modelers to involve IT resources to extract the right data for the model to be operationalized. It is a heavy process that is well justified for heavy-duty models that were developed over a period of time, with a significant ROI.
This is a show-stopper though for other initiatives which do not have the same ROI, or would require too frequent model refresh to be viable. Here, I refer to “real” model refresh that involves a model reengineering, not just a re-weighting of the same variables.
For those initiatives where time is of the essence, the challenge will be to bring closer those two worlds, the modelers and the business rules experts, in order to streamline the development AND deployment of analytics beyond the model formula. The great opportunity I see is the potential for a better and coordinated tuning of the cut-off rules in the context of the model refinement. In other words: the opportunity to refine the strategy as a whole. Very ambitious? I don’t think so.
About Carole Ann Matignon
http://my.sparklinglogic.com/index.php/company/management-team
| Carole-Ann Matignon | ![]() |
![]() |
|
Radoop 0.3 launched- Open Source Graphical Analytics meets Big Data
What is Radoop? Quite possibly an exciting mix of analytics and big data computing
What is Radoop?
Hadoop is an excellent tool for analyzing large data sets, but it lacks an easy-to-use graphical interface. RapidMiner is an excellent tool for data analytics, but its data size is limited by the memory available, and a single machine is often not enough to run the analyses on time. In this project, we combine the strengths of both projects and provide a RapidMiner extension for editing and running ETL, data analytics and machine learning processes over Hadoop.
We have closely integrated the highly optimized data analytics capabilities of Hive and Mahout, and the user-friendly interface of RapidMiner to form a powerful and easy-to-use data analytics solution for Hadoop.
and what’s new
Radoop 0.3 released – fully graphical big data analytics
Today, Radoop had a major step forward with its 0.3 release. The new version of the visual big data analytics package adds full support for all major Hadoop distributions used these days: Apache Hadoop 0.20.2, 0.20.203, 1.0 and Cloudera’s Distribution including Apache Hadoop 3 (CDH3). It also adds support for large clusters by allowing the namenode, the jobtracker and the Hive server to reside on different nodes.
As Radoop’s promise is to make big data analytics easier, the 0.3 release is also focused on improving the user interface. It has an enhanced breakpointing system which allows to investigate intermediate results, and it adds dozens of quick fixes, so common process design mistakes get much easier to solve.
There are many further improvements and fixes, so please consult the release notes for more details. Radoop is in private beta mode, but heading towards a public release in Q2 2012. If you would like to get early access, then please apply at the signup page or describe your use case in email (beta at radoop.eu).
Radoop 0.3 (15 February 2012)
- Support for Apache Hadoop 0.20.2, 0.20.203, 1.0 and Cloudera’s Distribution Including Apache Hadoop 3 (CDH3) in a single release
- Support for clusters with separate master nodes (namenode, jobtracker, Hive server)
- Enhanced breakpointing to evaluate intermediate results
- Dozens of quick fixes for the most common process design errors
- Improved process design and error reporting
- New welcome perspective to help in the first steps
- Many bugfixes and performance improvements
Radoop 0.2.2 (6 December 2011)
- More Aggregate functions and distinct option
- Generate ID operator for convenience
- Numerous bug fixes and improvements
- Improved user interface
Radoop 0.2.1 (16 September 2011)
- Set Role and Data Multiplier operators
- Management panel for testing Hadoop connections
- Stability improvements for Hive access
- Further small bugfixes and improvements
Radoop 0.2 (26 July 2011)
- Three new algoritms: Fuzzy K-Means, Canopy, and Dirichlet clustering
- Three new data preprocessing operators: Normalize, Replace, and Replace Missing Values
- Significant speed improvements in data transmission and interactive analytics
- Increased stability and speedup for K-Means
- More flexible settings for Join operations
- More meaningful error messages
- Other small bugfixes and improvements
Radoop 0.1 (14 June 2011)
Initial release with 26 operators for data transmission, data preprocessing, and one clustering algorithm.
Note that Rapid Miner also has a great R extension so you can use R, a graphical interface and big data analytics is now easier and more powerful than ever.
Interview JJ Allaire Founder, RStudio
Here is an interview with JJ Allaire, founder of RStudio. RStudio is the IDE that has overtaken other IDE within the R Community in terms of ease of usage. On the eve of their latest product launch, JJ talks to DecisionStats on RStudio and more.
Ajay- So what is new in the latest version of RStudio and how exactly is it useful for people?
JJ- The initial release of RStudio as well as the two follow-up releases we did last year were focused on the core elements of using R: editing and running code, getting help, and managing files, history, workspaces, plots, and packages. In the meantime users have also been asking for some bigger features that would improve the overall work-flow of doing analysis with R. In this release (v0.95) we focused on three of these features:
Projects. R developers tend to have several (and often dozens) of working contexts associated with different clients, analyses, data sets, etc. RStudio projects make it easy to keep these contexts well separated (with distinct R sessions, working directories, environments, command histories, and active source documents), switch quickly between project contexts, and even work with multiple projects at once (using multiple running versions of RStudio).
Version Control. The benefits of using version control for collaboration are well known, but we also believe that solo data analysis can achieve significant productivity gains by using version control (this discussion on Stack Overflow talks about why). In this release we introduced integrated support for the two most popular open-source version control systems: Git and Subversion. This includes changelist management, file diffing, and browsing of project history, all right from within RStudio.
Code Navigation. When you look at how programmers work a surprisingly large amount of time is spent simply navigating from one context to another. Modern programming environments for general purpose languages like C++ and Java solve this problem using various forms of code navigation, and in this release we’ve brought these capabilities to R. The two main features here are the ability to type the name of any file or function in your project and go immediately to it; and the ability to navigate to the definition of any function under your cursor (including the definition of functions within packages) using a keystroke (F2) or mouse gesture (Ctrl+Click).
Ajay- What’s the product road map for RStudio? When can we expect the IDE to turn into a full fledged GUI?
JJ- Linus Torvalds has said that “Linux is evolution, not intelligent design.” RStudio tries to operate on a similar principle—the world of statistical computing is too deep, diverse, and ever-changing for any one person or vendor to map out in advance what is most important. So, our internal process is to ship a new release every few months, listen to what people are doing with the product (and hope to do with it), and then start from scratch again making the improvements that are considered most important.
Right now some of the things which seem to be top of mind for users are improved support for authoring and reproducible research, various editor enhancements including code folding, and debugging tools.
What you’ll see is us do in a given release is to work on a combination of frequently requested features, smaller improvements to usability and work-flow, bug fixes, and finally architectural changes required to support current or future feature requirements.
While we do try to base what we work on as closely as possible on direct user-feedback, we also adhere to some core principles concerning the overall philosophy and direction of the product. So for example the answer to the question about the IDE turning into a full-fledged GUI is: never. We believe that textual representations of computations provide fundamental advantages in transparency, reproducibility, collaboration, and re-usability. We believe that writing code is simply the right way to do complex technical work, so we’ll always look for ways to make coding better, faster, and easier rather than try to eliminate coding altogether.
Ajay -Describe your journey in science from a high school student to your present work in R. I noticed you have been very successful in making software products that have been mostly proprietary products or sold to companies.
Why did you get into open source products with RStudio? What are your plans for monetizing RStudio further down the line?
JJ- In high school and college my principal areas of study were Political Science and Economics. I also had a very strong parallel interest in both computing and quantitative analysis. My first job out of college was as a financial analyst at a government agency. The tools I used in that job were SAS and Excel. I had a dim notion that there must be a better way to marry computation and data analysis than those tools, but of course no concept of what this would look like.
From there I went more in the direction of general purpose computing, starting a couple of companies where I worked principally on programming languages and authoring tools for the Web. These companies produced proprietary software, which at the time (between 1995 and 2005) was a workable model because it allowed us to build the revenue required to fund development and to promote and distribute the software to a wider audience.
By 2005 it was however becoming clear that proprietary software would ultimately be overtaken by open source software in nearly all domains. The cost of development had shrunken dramatically thanks to both the availability of high-quality open source languages and tools as well as the scale of global collaboration possible on open source projects. The cost of promoting and distributing software had also collapsed thanks to efficiency of both distribution and information diffusion on the Web.
When I heard about R and learned more about it, I become very excited and inspired by what the project had accomplished. A group of extremely talented and dedicated users had created the software they needed for their work and then shared the fruits of that work with everyone. R was a platform that everyone could rally around because it worked so well, was extensible in all the right ways, and most importantly was free (as in speech) so users could depend upon it as a long-term foundation for their work.
So I started RStudio with the aim of making useful contributions to the R community. We started with building an IDE because it seemed like a first-rate development environment for R that was both powerful and easy to use was an unmet need. Being aware that many other companies had built successful businesses around open-source software, we were also convinced that we could make RStudio available under a free and open-source license (the AGPLv3) while still creating a viable business. At this point RStudio is exclusively focused on creating the best IDE for R that we can. As the core product gets where it needs to be over the next couple of years we’ll then also begin to sell other products and services related to R and RStudio.
About-

JJ Allaire
JJ Allaire is a software engineer and entrepreneur who has created a wide variety of products including ColdFusion,Windows Live Writer, Lose It!, and RStudio.
From http://en.wikipedia.org/wiki/Joseph_J._Allaire
In 1995 Joseph J. (JJ) Allaire co-founded Allaire Corporation with his brother Jeremy Allaire, creating the web development tool ColdFusion.[1] In March 2001, Allaire was sold to Macromedia where ColdFusion was integrated into the Macromedia MX product line. Macromedia was subsequently acquired by Adobe Systems, which continues to develop and market ColdFusion.
After the sale of his company, Allaire became frustrated at the difficulty of keeping track of research he was doing using Google. To address this problem, he co-founded Onfolio in 2004 with Adam Berrey, former Allaire co-founder and VP of Marketing at Macromedia.
On March 8, 2006, Onfolio was acquired by Microsoft where many of the features of the original product are being incorporated into the Windows Live Toolbar. On August 13, 2006, Microsoft released the public beta of a new desktop blogging client called Windows Live Writer that was created by Allaire’s team at Microsoft.
Starting in 2009, Allaire has been developing a web-based interface to the widely used R technical computing environment. A beta version of RStudio was publicly released on February 28, 2011.
JJ Allaire received his B.A. from Macalester College (St. Paul, MN) in 1991.
RStudio-
RStudio is an integrated development environment (IDE) for R which works with the standard version of R available from CRAN. Like R, RStudio is available under a free software license. RStudio is designed to be as straightforward and intuitive as possible to provide a friendly environment for new and experienced R users alike. RStudio is also a company, and they plan to sell services (support, training, consulting, hosting) related to the open-source software they distribute.
Preview- Google Cloud SQL
From -http://code.google.com/apis/sql/
What is Google Cloud SQL?
Google Cloud SQL is web service that allows you to create, configure, and use relational databases with your App Engine applications. It is a fully-managed service that maintains, manages, and administers your databases, allowing you to focus on your applications and services.
By offering the capabilities of a MySQL database, the service enables you to easily move your data, applications, and services into and out of the cloud. This allows for high data portability and helps in faster time-to-market because you can quickly leverage your existing database (using JDBC and/or DB-API) in your App Engine application.
Here is where you can get an invite to the beta only Google Cloud SQL
Sign up for Limited Preview
Google Cloud SQL is available to a limited number of users. To sign up for the service:
- Visit the Google APIs Console. The console opens the All services pane.
- Find the SQL Service line in the Services table and click Request access…
- Fill out the enrollment form.
- Our team will review your enrollment information and respond by email to the address associated with your Google Account.
- Follow the link in the email to view the Terms of Service. Please read these carefully before accepting.
- Sign up for the google-cloud-sql-announce group to receive important announcements and product news. (NOTE- Members: 384)






Kindle as a Tablet for 109$ and how to order it if you in India
I was just blown away by the price and functionality of the Kindle, including the browser and in built Wi-Fi- ( though the 40$ leather bag was a bit sneaky as an accessory, I mean seriously, dude)
And unlike some media technology companies (like Hulu,Spotify , even some Youtube channels)
who offer products to Asia only after a delayed lag, it is just as easy to order Kindle sitting from India.
Thank you Amazon!
and lastly some art to help prod those people who offer beta sites for limited countries even in this age.
Credit- Paul Mutant
The Amazing Microsoft Robotics
Amazing stuff from the makers of Kinetic-
Operating systems of Robots may be the future cash cow of Microsoft , while the pirates of Silicon Valley fight fascinating cloudy wars!
http://www.microsoft.com/robotics/#Product

Microsoft Robotics Developer Studio 4 beta (RDS4 beta) provides a wide range of support to help make it easy to develop robot applications. RDS4 beta includes a programming model that helps make it easy to develop asynchronous, state-driven applications. RDS4 beta provides a common programming framework that can be applied to support a wide variety of robots, enabling code and skill transfer.
RDS4 beta includes a lightweight asynchronous services-oriented runtime, a set of visual authoring and simulation tools, as well as templates, tutorials, and sample code to help you get started.
Microsoft Robotics Developer Studio 4 beta Datasheet – English (PDF Format)
View the product video on Channel 9!
This release has extensive support for the Kinect sensor hardware throug the Kinect for Windows SDK allowing developers to create Kinect-enabled robots in the Visual Simulation Environment and in real life. Along with this release comes a standardized reference spec for building a Kinect-based robot.

-

Concurrency and Coordination Runtime (CCR) helps make it easier to handle asynchronous input and output by eliminating the conventional complexities of manual threading, locks, and semaphores. Lightweight state-oriented Decentralized Software Services (DSS) framework enables you to create program modules that can interoperate on a robot and connected PCs by using a relatively simple, open protocol.

-

Visual Programming Language (VPL) provides a relatively simple drag-and-drop visual programming language tool that helps make it easy to create robotics applications. VPL also provides the ability to take a collection of connected blocks and reuse them as a single block elsewhere in your program. VPL is also capable of generating human-readable C#.

-

DSS Manifest Editor (DSSME) provides a relatively simple creation of application configuration and distribution scenarios.

-

The DSS Log Analyzer tool allows you to view message flows across multiple DSS services. DSS Log Analyzer also allows you to inspect message details.

-

Visual Simulation Environment (VSE) provides the ability to simulate and test robotic applications using a 3D physics-based simulation tool. This allows developers to create robotics applications without the hardware. Sample simulation models and environments enable you to test your application in a variety of 3D virtual environments.



Carole-Ann Matignon – Co-Founder, President & Chief Executive Officer






