LibreOffice 3.3.2

Graph of internet users per 100 inhabitants be...
Image via Wikipedia

the latest freest office productivity software in the world.

The Document Foundation maintains its release schedule thanks to a growing and vibrant community of developers

The Internet, March 22, 2011 – The Document Foundation announces LibreOffice 3.3.2, the second micro release of the free office suite for personal productivity, which further improves the stability of the software and sets the platform for the next release 3.4, due in mid May. The community of developers has been able to maintain the tight schedule thanks to the increase in the number of contributors, and to the fact that those that have started with easy hacks in September 2010 are now working at substantial features. In addition, they have almost completed the code cleaning process, getting rid of German comments and obsolete functionalities.

“I have started hacking LibreOffice code on September 28, 2010, just a few hours after the announcement of the project, and I found a very welcoming community, where senior developers went out of their way to help newbies like me to become productive. After a few hours I submitted a small patch removing 5 or 6 lines of dead code… enough to get my feet wet and learn the workflow”, says Norbert, a French developer living in the United States. “In a short time, I ended up removing the VOS library – deprecated for a decade – from LibreOffice, and finding and fixing various threading issues in the process”.

LibreOffice 3.3.2 is being released just one day after the closing of the first funding round launched by The Document Foundation to collect donations towards the 50,000-euro capital needed to establish a Stiftung in Germany. In five weeks, the community has donated twice as much, i.e. around 100,000 euros. All additional funds will be used for operating expenses such as infrastructure costs and registration of domain names and trademarks, as well as for community development expenses such as travel funding for TDF representatives speaking at conferences, booth fees for trade shows, and initial financing of merchandising items, DVDs and printed material.

Italo Vignoli, a founder and a steering committee member of The Document Foundation, will be keynoting at Flourish 2011 in Chicago on Sunday, April 3, at 10:30AM, about getting independent from OpenOffice and Oracle, starting The Document Foundation, raising the capital and the first community budget, organizing developers and other work, and outlining a roadmap for future releases and features.

The Document Foundation is at http://documentfoundation.org, while LibreOffice is at http://www.libreoffice.org. LibreOffice 3.3.2 is immediately available from the download page.

*** About The Document Foundation

The Document Foundation has the mission of facilitating the evolution of the LibreOffice Community into a new, open, independent, and meritocratic organization within the next few months. An independent foundation is a better reflection of the values of our contributors, users and supporters, and will enable a more effective, efficient and transparent community. TDF will protect past investments by building on the achievements of the first decade, will encourage wide participation within the community, and will co-ordinate activity across the community.

*** Media contacts for TDF

Florian Effenberger (Germany)
Mobile: +49 151 14424108 – E-mail: floeff@documentfoundation.org
Olivier Hallot (Brazil)
Mobile: +55 21 88228812 – E-mail: olivier.hallot@documentfoundation.org
Charles H. Schulz (France)
Mobile: +33 6 98655424 – E-mail: charles.schulz@documentfoundation.org
Italo Vignoli (Italy)
Mobile: +39 348 5653829 – E-mail: italo.vignoli@documentfoundation.org


Italo Vignoli – The Document Foundation
email italo.vignoli@documentfoundation.org
phone +39.348.5653829 – VoIP +39.02.320621813
skype italovignoli – italo.vignoli@gmail.com

Google Cloud Print -print documents from the internet

Print Jobs just got easier- especially if you prefer one printer, use Google Chrome, and can take 2 minutes to set up your printer to print from anywhere in the world through the internet.

It’s called Google Cloud Print– and it makes my life a lot easier when I travel and need to give to printer at home some documents to print rather than rely on external printers. See screenshots below and check out http://www.google.com/cloudprint/ for more

PMML Plugin for Greenplum now available

Predictive Model Markup Language
Image via Wikipedia

From a press release from Zementis.

 

, the Universal PMML Plug-in for in-database scoring. Available now for the EMC Greenplum Database, a high-performance massively parallel processing (MPP) database, the plug-in leverages the Predictive Model Markup Language (PMML) to execute predictive models directly within EMC Greenplum, for highly optimized in-database scoring.

Universal PMML Plug-in

Developed by the Data Mining Group (DMG), PMML is supported by all major data mining vendors, e.g., IBM SPSS, SAS, Teradata, FICO, STASTICA, Microstrategy, TIBCO and Revolution Analytics as well as open source tools like R, KNIME and RapidMiner. With PMML, models built in any of these data mining tools can now instantly be deployed in the EMC Greenplum database. The net result is the ability to leverage the power of standards-based predictive analytics on a massive scale, right where the data resides.

“By partnering with Zementis, a true PMML innovator, we are able to offer a vendor-agnostic solution for moving enterprise-level predictive analytics into the database execution environment,” said Dr. Steven Hillion, Vice President of Analytics at EMC Greenplum. “With Zementis and PMML, the de-facto standard for representing data mining models, we are eliminating the need to recode predictive analytic models in order to deploy them within our database. In turn, this enables an analyst to reduce the time to insight required in most businesses today.”

Want to learn more?
 

To learn more about how the EMC Greenplum Database and the Universal PMML Plug-in work together, feel free to:

  1. Visit the PMML Plug-in product page
  2. Download the white paper

The Universal PMML Plug-in for the EMC Greenplum Database is available now. Contact us today for more information.

Michael Zeller, CEO, Zementis

 

 

KDNuggets Survey on R

CRISP-DM
Image via Wikipedia

From http://www.kdnuggets.com/2011/03/new-poll-r-in-analytics-data-mining-work.html?k11n07

A new poll/survey on actual usage of R in Data Mining

R has been steadily growing in popularity among data miners and analytic professionals.

In KDnuggets 2010 Data Mining / Analytic Tools Poll, R was used by 30% of respondents.
In 2010 Rexer Analytics Data Miner SurveyR was the most popular tool, used by 43% of the data miners.

Another aspect of tool usefulness is how much does it help with the entire data mining process from data preparation and cleaning, modeling, evaluation, visualization and presentation (excluding deployment).

New KDnuggets Poll is asking:
What part of your analytics / data mining work in the past 12 months was done in R?

http://www.kdnuggets.com/2011/03/new-poll-r-in-analytics-data-mining-work.html?k11n07

 

Heritage Health Prize- Data Mining Contest for 3mill USD

An animation of the quicksort algorithm sortin...
Image via Wikipedia

If Netflix was about 1 mill USD to better online video choices, here is a chance to earn serious money, write great code, and save lives!

From http://www.heritagehealthprize.com/

Heritage Health Prize
Launching April 4

Laptop

More than 71 Million individuals in the United States are admitted to
hospitals each year, according to the latest survey from the American
Hospital Association. Studies have concluded that in 2006 well over
$30 billion was spent on unnecessary hospital admissions. Each of
these unnecessary admissions took away one hospital bed from someone
else who needed it more.

Prize Goal & Participation

The goal of the prize is to develop a predictive algorithm that can identify patients who will be admitted to the hospital within the next year, using historical claims data.

Official registration will open in 2011, after the launch of the prize. At that time, pre-registered teams will be notified to officially register for the competition. Teams must consent to be bound by final competition rules.

Registered teams will develop and test their algorithms. The winning algorithm will be able to predict patients at risk for an unplanned hospital admission with a high rate of accuracy. The first team to reach the accuracy threshold will have their algorithms confirmed by a judging panel. If confirmed, a winner will be declared.

The competition is expected to run for approximately two years. Registration will be open throughout the competition.

Data Sets

Registered teams will be granted access to two separate datasets of de-identified patient claims data for developing and testing algorithms: a training dataset and a quiz/test dataset. The datasets will be comprised of de-identified patient data. The datasets will include:

  • Outpatient encounter data
  • Hospitalization encounter data
  • Medication dispensing claims data, including medications
  • Outpatient laboratory data, including test outcome values

The data for each de-identified patient will be organized into two sections: “Historical Data” and “Admission Data.” Historical Data will represent three years of past claims data. This section of the dataset will be used to predict if that patient is going to be admitted during the Admission Data period. Admission Data represents previous claims data and will contain whether or not a hospital admission occurred for that patient; it will be a binary flag.

DataThe training dataset includes several thousand anonymized patients and will be made available, securely and in full, to any registered team for the purpose of developing effective screening algorithms.

The quiz/test dataset is a smaller set of anonymized patients. Teams will only receive the Historical Data section of these datasets and the two datasets will be mixed together so that teams will not be aware of which de-identified patients are in which set. Teams will make predictions based on these data sets and submit their predictions to HPN through the official Heritage Health Prize web site. HPN will use the Quiz Dataset for the initial assessment of the Team’s algorithms. HPN will evaluate and report back scores to the teams through the prize website’s leader board.

Scores from the final Test Dataset will not be made available to teams until the accuracy thresholds are passed. The test dataset will be used in the final judging and results will be kept hidden. These scores are used to preserve the integrity of scoring and to help validate the predictive algorithms.

Teams can begin developing and testing their algorithms as soon as they are registered and ready. Teams will log onto the official Heritage Health Prize website and submit their predictions online. Comparisons will be run automatically and team accuracy scores will be posted on the leader board. This score will be only on a portion of the predictions submitted (the Quiz Dataset), the additional results will be kept back (the Test Dataset).

Form

Once a team successfully scores above the accuracy thresholds on the online testing (quiz dataset), final judging will occur. There will be three parts to this judging. First, the judges will confirm that the potential winning team’s algorithm accurately predicts patient admissions in the Test Dataset (again, above the thresholds for accuracy).

Next, the judging panel will confirm that the algorithm does not identify patients and use external data sources to derive its predictions. Lastly, the panel will confirm that the team’s algorithm is authentic and derives its predictive power from the datasets, not from hand-coding results to improve scores. If the algorithm meets these three criteria, it will be declared the winner.

Failure to meet any one of these three parts will disqualify the team and the contest will continue. The judges reserve the right to award second and third place prizes if deemed applicable.

 

Google Refine

An interesting data cleaning software from Google at

https://code.google.com/p/google-refine/

From the page at

https://code.google.com/p/google-refine/wiki/UserGuide

The Basics

First, although Google Refine might start out looking like a spreadsheet program (Microsoft Excel, Google Spreadsheets, etc.), don’t expect it to work like a spreadsheet program. That’s almost like expecting a database to work like a text editor.

Google Refine is NOT for entering new data one cell at a time. It is NOT for doing accounting.

Google Refine is for applying transformations over many existing cells in bulk, for the purpose of cleaning up the data, extending it with more data from other sources, and getting it to some form that other tools can consume.

To use Google Refine, think in big patterns. For example, to spot errors, think

  • Show me every row where the string length of the customer’s name is longer than 50 characters (because I suspect that the customer’s address is mistakenly included in the name field)
  • Show me every row where the contract fee is less than 1 (because I suspect the fee was entered in unit of thousand dollars rather than dollars)
  • Show me every row where the description field (scraped from some web site) contains “&” (because I suspect it wasn’t decoded properly)

To edit data, think

  • For every row where the contract fee is less than 1, multiply the fee by 1000.
  • For every row where the customer name contains a comma (it has been entered as “last_name, first_name”), split the name by the comma, reverse the array, and join it back with a space (producing “first_name last_name”)

To specify patterns, use filters and facets. Typically, you create a filter or facet on a particular column. For example, you can create a numeric facet on the “contract fee” column and adjust its range selector to select values less than 1. If the default facet doesn’t do what you want, you can configure it (by clicking “change” on the facet’s header). For example, you can create a text facet with on the same “contract fee” column with this expression:

  value < 1

It will show 2 choices: true and false. Just select true. Then, invoke the Transform command on that same column and enter the expression

  value * 1000

That Transform command affects only rows where the “contract fee” cell contains a value less than 1.

You can use several filters and facets together. Only rows that are selected by all facets and filters will be shown in the data table. For example, say you have two text facets, one on the “contract fee” column with the expression

  value < 1

and another on the “state” column (with the default expression). If you select “true” in the first facet and “Nevada” in the second, then you will only see rows for contracts in Nevada with fees less than 1.

Analogies

Databases

If you have programmed databases before (performing SQL queries), then what Google Refine works should be quite familiar to you. Creating filters and facets and selecting something in them is like performing this SELECT statement:

  SELECT *
  WHERE ... constraints determined by selection in facets and filters ...

And invoking the Transform command on a column while having some filters and facets selected is like performing this UPDATE statement

  UPDATE whole_table SET column_X = ... expression ...
  WHERE ... constraints determined by selection in facets and filters ...

The difference between Google Refine and databases is that the facets show you choices that you can select, whereas databases assume that you already know what’s in the data.

 

IBM and Revolution team to create new in-database R

From the Press Release at http://www.revolutionanalytics.com/news-events/news-room/2011/revolution-analytics-netezza-partnership.php

Under the terms of the agreement, the companies will work together to create a version of Revolution’s software that takes advantage of IBM Netezza’s i-class technology so that Revolution R Enterprise can run in-database in an optimal fashion.

About IBM

For information about IBM Netezza, please visit: http://www.netezza.com.
For Information on IBM Information Management, please visit: http://www.ibm.com/software/data/information-on-demand/
For information on IBM Business Analytics, please visit the online press kit: http://www.ibm.com/press/us/en/presskit/27163.wss
Follow IBM and Analytics on Twitter: http://twitter.com/ibmbizanalytics
Follow IBM analytics on Tumblr: http://smarterplanet.tumblr.com/tagged/new_intelligence
IBM YouTube Analytics Channel: http://www.youtube.com/user/ibmbusinessanalytics
For information on IBM Smarter Systems: http://www-03.ibm.com/systems/smarter/

About Revolution Analytics

Revolution Analytics is the leading commercial provider of software and services based on the open source R project for statistical computing.  Led by predictive analytics pioneer Norman Nie, the company brings high performance, productivity and enterprise readiness to R, the most powerful statistics language in the world. The company’s flagship Revolution R product is designed to meet the production needs of large organizations in industries such as finance, life sciences, retail, manufacturing and media.  Used by over 2 million analysts in academia and at cutting-edge companies such as Google, Bank of America and Acxiom, R has emerged as the standard of innovation in statistical analysis. Revolution Analytics is committed to fostering the continued growth of the R community through sponsorship of the Inside-R.org community site, funding worldwide R user groups and offers free licenses of Revolution R Enterprise to everyone in academia.


Netezza, an IBM Company, is the global leader in data warehouse, analytic and monitoring appliances that dramatically simplify high-performance analytics across an extended enterprise. IBM Netezza’s technology enables organizations to process enormous amounts of captured data at exceptional speed, providing a significant competitive and operational advantage in today’s data-intensive industries, including digital media, energy, financial services, government, health and life sciences, retail and telecommunications.

The IBM Netezza TwinFin® appliance is built specifically to analyze petabytes of detailed data significantly faster than existing data warehouse options, and at a much lower total cost of ownership. It stores, filters and processes terabytes of records within a single unit, analyzing only the relevant information for each query.

Using Revolution R Enterprise & Netezza Together

Revolution Analytics and IBM Netezza have announced a partnership to integrate Revolution R Enterprise and the IBM Netezza TwinFin  Data Warehouse Appliance. For the first time, customers seeking to run high performance and full-scale predictive analytics from within a data warehouse platform will be able to directly leverage the power of the open source R statistics language. The companies are working together to create a version of Revolution’s software that takes advantage of IBM Netezza’s i-class technology so that Revolution R Enterprise can run in-database in an optimal fashion.

This partnership integrates Revolution R Enterprise with IBM Netezza’s high performance data warehouse and advanced analytics platform to help organizations combat the challenges that arise as complexity and the scale of data grow.  By moving the analytics processing next to the data, this integration will minimize data movement – a significant bottleneck, especially when dealing with “Big Data”.  It will deliver high performance on large scale data, while leveraging the latest innovations in analytics.

With Revolution R Enterprise for IBM Netezza, advanced R computations are available for rapid analysis of hundreds of terabyte-class data volumes — and can deliver 10-100x performance improvements at a fraction of the cost compared to traditional analytics vendors.

Additional Resources


%d bloggers like this: