LibreOffice 3.3.2

the latest freest office productivity software in the world.

The Document Foundation maintains its release schedule thanks to a growing and vibrant community of developers

The Internet, March 22, 2011 – The Document Foundation announces LibreOffice 3.3.2, the second micro release of the free office suite for personal productivity, which further improves the stability of the software and sets the platform for the next release 3.4, due in mid May. The community of developers has been able to maintain the tight schedule thanks to the increase in the number of contributors, and to the fact that those that have started with easy hacks in September 2010 are now working at substantial features. In addition, they have almost completed the code cleaning process, getting rid of German comments and obsolete functionalities.

“I have started hacking LibreOffice code on September 28, 2010, just a few hours after the announcement of the project, and I found a very welcoming community, where senior developers went out of their way to help newbies like me to become productive. After a few hours I submitted a small patch removing 5 or 6 lines of dead code… enough to get my feet wet and learn the workflow”, says Norbert, a French developer living in the United States. “In a short time, I ended up removing the VOS library – deprecated for a decade – from LibreOffice, and finding and fixing various threading issues in the process”.

LibreOffice 3.3.2 is being released just one day after the closing of the first funding round launched by The Document Foundation to collect donations towards the 50,000-euro capital needed to establish a Stiftung in Germany. In five weeks, the community has donated twice as much, i.e. around 100,000 euros. All additional funds will be used for operating expenses such as infrastructure costs and registration of domain names and trademarks, as well as for community development expenses such as travel funding for TDF representatives speaking at conferences, booth fees for trade shows, and initial financing of merchandising items, DVDs and printed material.

Italo Vignoli, a founder and a steering committee member of The Document Foundation, will be keynoting at Flourish 2011 in Chicago on Sunday, April 3, at 10:30AM, about getting independent from OpenOffice and Oracle, starting The Document Foundation, raising the capital and the first community budget, organizing developers and other work, and outlining a roadmap for future releases and features.

The Document Foundation is at http://documentfoundation.org, while LibreOffice is at http://www.libreoffice.org. LibreOffice 3.3.2 is immediately available from the download page.

*** About The Document Foundation

The Document Foundation has the mission of facilitating the evolution of the LibreOffice Community into a new, open, independent, and meritocratic organization within the next few months. An independent foundation is a better reflection of the values of our contributors, users and supporters, and will enable a more effective, efficient and transparent community. TDF will protect past investments by building on the achievements of the first decade, will encourage wide participation within the community, and will co-ordinate activity across the community.

*** Media contacts for TDF

Florian Effenberger (Germany)
Mobile: +49 151 14424108 – E-mail: floeff@documentfoundation.org
Olivier Hallot (Brazil)
Mobile: +55 21 88228812 – E-mail: olivier.hallot@documentfoundation.org
Charles H. Schulz (France)
Mobile: +33 6 98655424 – E-mail: charles.schulz@documentfoundation.org
Italo Vignoli (Italy)
Mobile: +39 348 5653829 – E-mail: italo.vignoli@documentfoundation.org

—
Italo Vignoli – The Document Foundation
email italo.vignoli@documentfoundation.org
phone +39.348.5653829 – VoIP +39.02.320621813
skype italovignoli – italo.vignoli@gmail.com

LibreOffice Stable Release launched (decisionstats.com)
The first stable release LibreOffice 3.3 (documentfoundation.org)
The Document Foundation Reaches Goal of €50,000 in Eight Days (ostatic.com)

Google Cloud Print -print documents from the internet

Print Jobs just got easier- especially if you prefer one printer, use Google Chrome, and can take 2 minutes to set up your printer to print from anywhere in the world through the internet.

It’s called Google Cloud Print– and it makes my life a lot easier when I travel and need to give to printer at home some documents to print rather than rely on external printers. See screenshots below and check out http://www.google.com/cloudprint/ for more

Secure Printing with Google Cloud Print (kinlane.com)
Print from Your Phone with Google Cloud Print [Cloud Print] (lifehacker.com)
Introduction to Google Cloud Print (kinlane.com)
Google Cloud Print – Print from Anywhere Anytime (searchenginepeople.com)
Are You Printing from the Google Cloud? (chris.pirillo.com)
Print From Your Phone To Your Printer With Google Cloud Print (businessinsider.com)
Print from your phone with Gmail for mobile and Google Cloud Print (gmailblog.blogspot.com)

PMML Plugin for Greenplum now available

From a press release from Zementis.

, the Universal PMML Plug-in for in-database scoring. Available now for the EMC Greenplum Database, a high-performance massively parallel processing (MPP) database, the plug-in leverages the Predictive Model Markup Language (PMML) to execute predictive models directly within EMC Greenplum, for highly optimized in-database scoring.

Developed by the Data Mining Group (DMG), PMML is supported by all major data mining vendors, e.g., IBM SPSS, SAS, Teradata, FICO, STASTICA, Microstrategy, TIBCO and Revolution Analytics as well as open source tools like R, KNIME and RapidMiner. With PMML, models built in any of these data mining tools can now instantly be deployed in the EMC Greenplum database. The net result is the ability to leverage the power of standards-based predictive analytics on a massive scale, right where the data resides.

“By partnering with Zementis, a true PMML innovator, we are able to offer a vendor-agnostic solution for moving enterprise-level predictive analytics into the database execution environment,” said Dr. Steven Hillion, Vice President of Analytics at EMC Greenplum. “With Zementis and PMML, the de-facto standard for representing data mining models, we are eliminating the need to recode predictive analytic models in order to deploy them within our database. In turn, this enables an analyst to reduce the time to insight required in most businesses today.”

Want to learn more?

To learn more about how the EMC Greenplum Database and the Universal PMML Plug-in work together, feel free to:

Visit the PMML Plug-in product page
Download the white paper

The Universal PMML Plug-in for the EMC Greenplum Database is available now. Contact us today for more information.

Michael Zeller, CEO, Zementis

Creating New Capabilities With An Analytics Lab (chucksblog.emc.com)
EMC Greenplum releases Community Edition of MPP database product, big data analysis gets cheaper still (zdnet.com)
EMC lets go of Greenplum Community Edition (go.theregister.com)
Greenplum, Big Data, and an Open Source Card (arnoldit.com)
EMC launches free edition of Greenplum database (zdnet.com)

KDNuggets Survey on R

From http://www.kdnuggets.com/2011/03/new-poll-r-in-analytics-data-mining-work.html?k11n07

A new poll/survey on actual usage of R in Data Mining

R has been steadily growing in popularity among data miners and analytic professionals.

In KDnuggets 2010 Data Mining / Analytic Tools Poll, R was used by 30% of respondents.
In 2010 Rexer Analytics Data Miner SurveyR was the most popular tool, used by 43% of the data miners.

Another aspect of tool usefulness is how much does it help with the entire data mining process from data preparation and cleaning, modeling, evaluation, visualization and presentation (excluding deployment).

New KDnuggets Poll is asking:
What part of your analytics / data mining work in the past 12 months was done in R?

http://www.kdnuggets.com/2011/03/new-poll-r-in-analytics-data-mining-work.html?k11n07

Survey: R used by more data miners than any other tool (revolutionanalytics.com)
Good News for Data Geeks, Bad News for Everyone Else (izabael.com)
Skills of a good data miner (zyxo.wordpress.com)
Why Data mining in CRM? (alsysoncrm.wordpress.com)
Data Mining: How Companies Know Your Personal Information – TIME (bjconquest.com)
What Data Mining Firms Know About You (yro.slashdot.org)

Heritage Health Prize- Data Mining Contest for 3mill USD

An animation of the quicksort algorithm sortin... — Image via Wikipedia

If Netflix was about 1 mill USD to better online video choices, here is a chance to earn serious money, write great code, and save lives!

From http://www.heritagehealthprize.com/

Heritage Health Prize
Launching April 4

Laptop

More than 71 Million individuals in the United States are admitted to
hospitals each year, according to the latest survey from the American
Hospital Association. Studies have concluded that in 2006 well over
$30 billion was spent on unnecessary hospital admissions. Each of
these unnecessary admissions took away one hospital bed from someone
else who needed it more.

http://www.heritagehealthprize.com/competition.php

Prize Goal & Participation

The goal of the prize is to develop a predictive algorithm that can identify patients who will be admitted to the hospital within the next year, using historical claims data.

Official registration will open in 2011, after the launch of the prize. At that time, pre-registered teams will be notified to officially register for the competition. Teams must consent to be bound by final competition rules.

Registered teams will develop and test their algorithms. The winning algorithm will be able to predict patients at risk for an unplanned hospital admission with a high rate of accuracy. The first team to reach the accuracy threshold will have their algorithms confirmed by a judging panel. If confirmed, a winner will be declared.

The competition is expected to run for approximately two years. Registration will be open throughout the competition.

Data Sets

Registered teams will be granted access to two separate datasets of de-identified patient claims data for developing and testing algorithms: a training dataset and a quiz/test dataset. The datasets will be comprised of de-identified patient data. The datasets will include:

Outpatient encounter data
Hospitalization encounter data
Medication dispensing claims data, including medications
Outpatient laboratory data, including test outcome values

The data for each de-identified patient will be organized into two sections: “Historical Data” and “Admission Data.” Historical Data will represent three years of past claims data. This section of the dataset will be used to predict if that patient is going to be admitted during the Admission Data period. Admission Data represents previous claims data and will contain whether or not a hospital admission occurred for that patient; it will be a binary flag.

Data The training dataset includes several thousand anonymized patients and will be made available, securely and in full, to any registered team for the purpose of developing effective screening algorithms.

The quiz/test dataset is a smaller set of anonymized patients. Teams will only receive the Historical Data section of these datasets and the two datasets will be mixed together so that teams will not be aware of which de-identified patients are in which set. Teams will make predictions based on these data sets and submit their predictions to HPN through the official Heritage Health Prize web site. HPN will use the Quiz Dataset for the initial assessment of the Team’s algorithms. HPN will evaluate and report back scores to the teams through the prize website’s leader board.

Scores from the final Test Dataset will not be made available to teams until the accuracy thresholds are passed. The test dataset will be used in the final judging and results will be kept hidden. These scores are used to preserve the integrity of scoring and to help validate the predictive algorithms.

Teams can begin developing and testing their algorithms as soon as they are registered and ready. Teams will log onto the official Heritage Health Prize website and submit their predictions online. Comparisons will be run automatically and team accuracy scores will be posted on the leader board. This score will be only on a portion of the predictions submitted (the Quiz Dataset), the additional results will be kept back (the Test Dataset).

Form

Once a team successfully scores above the accuracy thresholds on the online testing (quiz dataset), final judging will occur. There will be three parts to this judging. First, the judges will confirm that the potential winning team’s algorithm accurately predicts patient admissions in the Test Dataset (again, above the thresholds for accuracy).

Next, the judging panel will confirm that the algorithm does not identify patients and use external data sources to derive its predictions. Lastly, the panel will confirm that the team’s algorithm is authentic and derives its predictive power from the datasets, not from hand-coding results to improve scores. If the algorithm meets these three criteria, it will be declared the winner.

Failure to meet any one of these three parts will disqualify the team and the contest will continue. The judges reserve the right to award second and third place prizes if deemed applicable.

HPN Health Prize: The X-Prize of Health Care (medicineandtechnology.com)
$3 million machine learning prize (heritagehealthprize.com)
Heritage Providers Continues to Promote $3 Million Dollar Prize to Create An Algorithm To Predict and Prevents Hospitalizations (ducknetweb.blogspot.com)
Netflix Prize-Style Competition Predicts Hospitalizations (fastcompany.com)
For Data Crunchers, A Glittering Prize (online.wsj.com)
The American Hospital Association Awards Its Exclusive Endorsement to HR Solutions’ Physician Engagement Survey (prweb.com)

Google Refine

An interesting data cleaning software from Google at

https://code.google.com/p/google-refine/

From the page at

https://code.google.com/p/google-refine/wiki/UserGuide

The Basics

First, although Google Refine might start out looking like a spreadsheet program (Microsoft Excel, Google Spreadsheets, etc.), don’t expect it to work like a spreadsheet program. That’s almost like expecting a database to work like a text editor.

Google Refine is NOT for entering new data one cell at a time. It is NOT for doing accounting.

Google Refine is for applying transformations over many existing cells in bulk, for the purpose of cleaning up the data, extending it with more data from other sources, and getting it to some form that other tools can consume.

To use Google Refine, think in big patterns. For example, to spot errors, think

Show me every row where the string length of the customer’s name is longer than 50 characters (because I suspect that the customer’s address is mistakenly included in the name field)
Show me every row where the contract fee is less than 1 (because I suspect the fee was entered in unit of thousand dollars rather than dollars)
Show me every row where the description field (scraped from some web site) contains “&” (because I suspect it wasn’t decoded properly)

To edit data, think

For every row where the contract fee is less than 1, multiply the fee by 1000.
For every row where the customer name contains a comma (it has been entered as “last_name, first_name”), split the name by the comma, reverse the array, and join it back with a space (producing “first_name last_name”)

To specify patterns, use filters and facets. Typically, you create a filter or facet on a particular column. For example, you can create a numeric facet on the “contract fee” column and adjust its range selector to select values less than 1. If the default facet doesn’t do what you want, you can configure it (by clicking “change” on the facet’s header). For example, you can create a text facet with on the same “contract fee” column with this expression:

  value < 1

It will show 2 choices: true and false. Just select true. Then, invoke the Transform command on that same column and enter the expression

  value * 1000

That Transform command affects only rows where the “contract fee” cell contains a value less than 1.

You can use several filters and facets together. Only rows that are selected by all facets and filters will be shown in the data table. For example, say you have two text facets, one on the “contract fee” column with the expression

  value < 1

and another on the “state” column (with the default expression). If you select “true” in the first facet and “Nevada” in the second, then you will only see rows for contracts in Nevada with fees less than 1.

Analogies

Databases

If you have programmed databases before (performing SQL queries), then what Google Refine works should be quite familiar to you. Creating filters and facets and selecting something in them is like performing this SELECT statement:

  SELECT *
  WHERE ... constraints determined by selection in facets and filters ...

And invoking the Transform command on a column while having some filters and facets selected is like performing this UPDATE statement

  UPDATE whole_table SET column_X = ... expression ...
  WHERE ... constraints determined by selection in facets and filters ...

The difference between Google Refine and databases is that the facets show you choices that you can select, whereas databases assume that you already know what’s in the data.

Transforming spreadsheets into SKOS with Google Refine (semantic-web.at)
Adding geographical information to a spreadsheet based on postcodes – Google Refine and APIs (onlinejournalismblog.com)
Chapter 1. Using Google Refine to Clean Messy Data – ProPublica (propublica.org)

IBM and Revolution team to create new in-database R

From the Press Release at http://www.revolutionanalytics.com/news-events/news-room/2011/revolution-analytics-netezza-partnership.php

Under the terms of the agreement, the companies will work together to create a version of Revolution’s software that takes advantage of IBM Netezza’s i-class technology so that Revolution R Enterprise can run in-database in an optimal fashion.

About IBM

For information about IBM Netezza, please visit: http://www.netezza.com.
For Information on IBM Information Management, please visit: http://www.ibm.com/software/data/information-on-demand/
For information on IBM Business Analytics, please visit the online press kit: http://www.ibm.com/press/us/en/presskit/27163.wss
Follow IBM and Analytics on Twitter: http://twitter.com/ibmbizanalytics
Follow IBM analytics on Tumblr: http://smarterplanet.tumblr.com/tagged/new_intelligence
IBM YouTube Analytics Channel: http://www.youtube.com/user/ibmbusinessanalytics
For information on IBM Smarter Systems: http://www-03.ibm.com/systems/smarter/

About Revolution Analytics

Revolution Analytics is the leading commercial provider of software and services based on the open source R project for statistical computing. Led by predictive analytics pioneer Norman Nie, the company brings high performance, productivity and enterprise readiness to R, the most powerful statistics language in the world. The company’s flagship Revolution R product is designed to meet the production needs of large organizations in industries such as finance, life sciences, retail, manufacturing and media. Used by over 2 million analysts in academia and at cutting-edge companies such as Google, Bank of America and Acxiom, R has emerged as the standard of innovation in statistical analysis. Revolution Analytics is committed to fostering the continued growth of the R community through sponsorship of the Inside-R.org community site, funding worldwide R user groups and offers free licenses of Revolution R Enterprise to everyone in academia.

Netezza, an IBM Company, is the global leader in data warehouse, analytic and monitoring appliances that dramatically simplify high-performance analytics across an extended enterprise. IBM Netezza’s technology enables organizations to process enormous amounts of captured data at exceptional speed, providing a significant competitive and operational advantage in today’s data-intensive industries, including digital media, energy, financial services, government, health and life sciences, retail and telecommunications.

The IBM Netezza TwinFin ® appliance is built specifically to analyze petabytes of detailed data significantly faster than existing data warehouse options, and at a much lower total cost of ownership. It stores, filters and processes terabytes of records within a single unit, analyzing only the relevant information for each query.

Using Revolution R Enterprise & Netezza Together

Revolution Analytics and IBM Netezza have announced a partnership to integrate Revolution R Enterprise and the IBM Netezza TwinFin Data Warehouse Appliance. For the first time, customers seeking to run high performance and full-scale predictive analytics from within a data warehouse platform will be able to directly leverage the power of the open source R statistics language. The companies are working together to create a version of Revolution’s software that takes advantage of IBM Netezza’s i-class technology so that Revolution R Enterprise can run in-database in an optimal fashion.

This partnership integrates Revolution R Enterprise with IBM Netezza’s high performance data warehouse and advanced analytics platform to help organizations combat the challenges that arise as complexity and the scale of data grow. By moving the analytics processing next to the data, this integration will minimize data movement – a significant bottleneck, especially when dealing with “Big Data”. It will deliver high performance on large scale data, while leveraging the latest innovations in analytics.

With Revolution R Enterprise for IBM Netezza, advanced R computations are available for rapid analysis of hundreds of terabyte-class data volumes — and can deliver 10-100x performance improvements at a fraction of the cost compared to traditional analytics vendors.

Additional Resources

Whitepapers:
- Why R is Hot
- Netezza Positioned as a Leader in the Gartner Magic Quadrant for Data Warehouse Database Management Systems
On-Demand Webinar: Revolution R Enterprise: 100% R and More
Free Downloads: Revolution R Community
Product Information:
- Revolution R Enterprise: R is Ready for Business
- IBM Netezza TwinFin® appliance

IBM’s bet: Commerce can be just as big as analytics (zdnet.com)
Revolution Analytics announces partnership with IBM Netezza (revolutionanalytics.com)
Netezza Chief Talks About “Formative” PTC Days, IBM Deal History, and the Future of Big Data (xconomy.com)
Gartner Ranks Data Warehousing Leaders (informationweek.com)
IBM Acquires Netezza in $1.7 Billion Deal (dailyfinance.com)
HP To Acquire Analytics Specialist Vertica (consultramy.wordpress.com)
SAP, IBM Team up on In-memory Analytics (pcworld.com)

Related Articles

Please share:

Related Articles

Please share:

Related Articles

Please share:

Related Articles

Please share:

Heritage Health Prize Launching April 4

Prize Goal & Participation

Data Sets

Related Articles

Please share:

The Basics

Analogies

Databases

Related Articles

Please share:

About IBM

About Revolution Analytics

Using Revolution R Enterprise & Netezza Together

Additional Resources

Related Articles

Please share:

Heritage Health Prize
Launching April 4