open source – Page 9 – DECISION STATS

Google Snappy

Diagram of how a 32-bit integer is arranged in... — Image via Wikipedia

a cool sounding software- yet again by the guys from California, this one enables to zip and unzip Big Data much much faster

http://news.ycombinator.com/item?id=2356735

and

https://code.google.com/p/snappy/

Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.

Snappy is widely used inside Google, in everything from BigTable and MapReduce to our internal RPC systems. (Snappy has previously been referred to as “Zippy” in some presentations and the likes.)

For more information, please see the README. Benchmarks against a few other compression libraries (zlib, LZO, LZF, FastLZ, and QuickLZ) are included in the source code distribution.

Introduction

============

Snappy is a compression/decompression library. It does not aim for maximum

compression, or compatibility with any other compression library; instead,

it aims for very high speeds and reasonable compression. For instance,

compared to the fastest mode of zlib, Snappy is an order of magnitude faster

for most inputs, but the resulting compressed files are anywhere from 20% to

100% bigger. (For more information, see “Performance”, below.)

Snappy has the following properties:

* Fast: Compression speeds at 250 MB/sec and beyond, with no assembler code.

See “Performance” below.

* Stable: Over the last few years, Snappy has compressed and decompressed

petabytes of data in Google’s production environment. The Snappy bitstream

format is stable and will not change between versions.

* Robust: The Snappy decompressor is designed not to crash in the face of

corrupted or malicious input.

* Free and open source software: Snappy is licensed under the Apache license,

version 2.0. For more information, see the included COPYING file.

Snappy has previously been called “Zippy” in some Google presentations

and the like.

Performance

===========

Snappy is intended to be fast. On a single core of a Core i7 processor

in 64-bit mode, it compresses at about 250 MB/sec or more and decompresses at

about 500 MB/sec or more. (These numbers are for the slowest inputs in our

benchmark suite; others are much faster.) In our tests, Snappy usually

is faster than algorithms in the same class (e.g. LZO, LZF, FastLZ, QuickLZ,

etc.) while achieving comparable compression ratios.

Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x

for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and

other already-compressed data. Similar numbers for zlib in its fastest mode

are 2.6-2.8x, 3-7x and 1.0x, respectively. More sophisticated algorithms are

capable of achieving yet higher compression rates, although usually at the

expense of speed. Of course, compression ratio will vary significantly with

the input.

Although Snappy should be fairly portable, it is primarily optimized

for 64-bit x86-compatible processors, and may run slower in other environments.

In particular:

– Snappy uses 64-bit operations in several places to process more data at

once than would otherwise be possible.

– Snappy assumes unaligned 32- and 64-bit loads and stores are cheap.

On some platforms, these must be emulated with single-byte loads

and stores, which is much slower.

– Snappy assumes little-endian throughout, and needs to byte-swap data in

several places if running on a big-endian platform.

Experience has shown that even heavily tuned code can be improved.

Performance optimizations, whether for 64-bit x86 or other platforms,

are of course most welcome; see “Contact”, below.

Usage

=====

Note that Snappy, both the implementation and the interface,

is written in C++.

To use Snappy from your own program, include the file “snappy.h” from

your calling file, and link against the compiled library.

There are many ways to call Snappy, but the simplest possible is

snappy::Compress(input, &output);

and similarly

snappy::Uncompress(input, &output);

where “input” and “output” are both instances of std::string.

Google releases snappy, the compression library used in Bigtable (code.google.com)
Maximizing Search Engine Visitors The Correct Way (ronmedlin.com)
MapReduce from the basics to the actually useful (in under 30 minutes) (cloudant.com)

Top 10 Games on Linux -sudo update

Here are some cool games I like to play on my Ubuntu 10.10 – I think they run on most other versions of Linux as well. 1) Open Arena– First person Shooter– This is like Quake Arena- very very nice graphics and good for playing for a couple of hours while taking a break. It is available here- http://openarena.ws/smfnews.php ideally if you have a bunch of gaming friends, playing on a local network or internet is quite mind blowing entertaining. And it’s free! 2) Armagetron– This is based on the TRON game of light cycles-It is available here at http://www.armagetronad.net/ or you can use Synaptic packages manager for all the games mentioned here

If violence or cars is not your stuff and you like puzzles like Sudoko, well just install the application Sudoko from http://gnome-sudoku.sourceforge.net/ Also recommended for people of various ages as it has multiple levels.

If you ever liked Pinball play the open source version from download at http://pinball.sourceforge.net/ Alternatively you can go to Ubuntu Software Centre>Games>Arcade>Emilio>Pinball and you can also build your own pinball if you like the game well enough. 5) Pacman/Njam- Clone of the original classic game. Downloadable from http://www.linuxcompatible.org/news/story/pacman_for_linux.html 6) Gweled– This is free clone version of Bejeweled. It now has a new website at http://gweled.org/ http://linux.softpedia.com/progDownload/Gweled-Download-3449.html

Gweled is a GNOME version of a popular PalmOS/Windows/Java game called “Bejeweled” or “Diamond Mine”. The aim of the game is to make alignment of 3 or more gems, both vertically or horizontally by swapping adjacent gems. The game ends when there are no possible moves left. Here are some key features of “Gweled”: · exact same gameplay as the commercial versions · SVG original graphics

7) Hearts – For this card game classis you can use Ubuntu software to install the package or go to http://linuxappfinder.com/package/gnome-hearts 8) Card Games- KPatience has almost 14 card games including solitaire, and free cell. 9) Sauerbraten -First person shooter with good network play, edit maps capabilities. You can read more here- http://sauerbraten.org/ 10) Tetris-KBlocks Tetris is the classic game. If you like classic slow games- Tetris is the best. and I like the toughest Tetris game -Bastet http://fph.altervista.org/prog/bastet.html Even an xkcd toon for it

Ubuntu 10.10 Alternatives (lockergnome.com)
Maciej Danielski: Tried Ubuntu 10.10 for a week and now back on #! CrunchBang Linux (meanmachine.wordpress.com)
7 Predictions For Open Source in 2011 (pcworld.com)
Bodhi Linux Get Software Page Goes Live (jeffhoogland.blogspot.com)
Interview with Matt Asay of Canonical (interviews.slashdot.org)
Macbuntu Makes your Linux Desktop Look Like Mac OS X [Downloads] (lifehacker.com)
Fix VirtualBox’s Guest Additions in Ubuntu 10.10 [Linux Tip] (lifehacker.com)
Linux Mint 10 Boasts New Menu And Theme (lockergnome.com)
Ubuntu tablet rumored for early 2011 launch (go.theregister.com)
Alien Arena 2011 Released (techie-buzz.com)
This Is What $10,000 Worth of Top Tier Pinball Play Looks Like [Clips] (kotaku.com)
6 Fun Ways To Explore Ubuntu 10.10 [Linux] (makeuseof.com)
4 Fun Party Games Using Networked Computers (makeuseof.com)

Protected: Using SAS and C/C++ together

PMML Plugin for Greenplum now available

From a press release from Zementis.

, the Universal PMML Plug-in for in-database scoring. Available now for the EMC Greenplum Database, a high-performance massively parallel processing (MPP) database, the plug-in leverages the Predictive Model Markup Language (PMML) to execute predictive models directly within EMC Greenplum, for highly optimized in-database scoring.

Developed by the Data Mining Group (DMG), PMML is supported by all major data mining vendors, e.g., IBM SPSS, SAS, Teradata, FICO, STASTICA, Microstrategy, TIBCO and Revolution Analytics as well as open source tools like R, KNIME and RapidMiner. With PMML, models built in any of these data mining tools can now instantly be deployed in the EMC Greenplum database. The net result is the ability to leverage the power of standards-based predictive analytics on a massive scale, right where the data resides.

“By partnering with Zementis, a true PMML innovator, we are able to offer a vendor-agnostic solution for moving enterprise-level predictive analytics into the database execution environment,” said Dr. Steven Hillion, Vice President of Analytics at EMC Greenplum. “With Zementis and PMML, the de-facto standard for representing data mining models, we are eliminating the need to recode predictive analytic models in order to deploy them within our database. In turn, this enables an analyst to reduce the time to insight required in most businesses today.”

Want to learn more?

To learn more about how the EMC Greenplum Database and the Universal PMML Plug-in work together, feel free to:

Visit the PMML Plug-in product page
Download the white paper

The Universal PMML Plug-in for the EMC Greenplum Database is available now. Contact us today for more information.

Michael Zeller, CEO, Zementis

Creating New Capabilities With An Analytics Lab (chucksblog.emc.com)
EMC Greenplum releases Community Edition of MPP database product, big data analysis gets cheaper still (zdnet.com)
EMC lets go of Greenplum Community Edition (go.theregister.com)
Greenplum, Big Data, and an Open Source Card (arnoldit.com)
EMC launches free edition of Greenplum database (zdnet.com)

IBM and Revolution team to create new in-database R

From the Press Release at http://www.revolutionanalytics.com/news-events/news-room/2011/revolution-analytics-netezza-partnership.php

Under the terms of the agreement, the companies will work together to create a version of Revolution’s software that takes advantage of IBM Netezza’s i-class technology so that Revolution R Enterprise can run in-database in an optimal fashion.

About IBM

For information about IBM Netezza, please visit: http://www.netezza.com.
For Information on IBM Information Management, please visit: http://www.ibm.com/software/data/information-on-demand/
For information on IBM Business Analytics, please visit the online press kit: http://www.ibm.com/press/us/en/presskit/27163.wss
Follow IBM and Analytics on Twitter: http://twitter.com/ibmbizanalytics
Follow IBM analytics on Tumblr: http://smarterplanet.tumblr.com/tagged/new_intelligence
IBM YouTube Analytics Channel: http://www.youtube.com/user/ibmbusinessanalytics
For information on IBM Smarter Systems: http://www-03.ibm.com/systems/smarter/

About Revolution Analytics

Revolution Analytics is the leading commercial provider of software and services based on the open source R project for statistical computing. Led by predictive analytics pioneer Norman Nie, the company brings high performance, productivity and enterprise readiness to R, the most powerful statistics language in the world. The company’s flagship Revolution R product is designed to meet the production needs of large organizations in industries such as finance, life sciences, retail, manufacturing and media. Used by over 2 million analysts in academia and at cutting-edge companies such as Google, Bank of America and Acxiom, R has emerged as the standard of innovation in statistical analysis. Revolution Analytics is committed to fostering the continued growth of the R community through sponsorship of the Inside-R.org community site, funding worldwide R user groups and offers free licenses of Revolution R Enterprise to everyone in academia.

Netezza, an IBM Company, is the global leader in data warehouse, analytic and monitoring appliances that dramatically simplify high-performance analytics across an extended enterprise. IBM Netezza’s technology enables organizations to process enormous amounts of captured data at exceptional speed, providing a significant competitive and operational advantage in today’s data-intensive industries, including digital media, energy, financial services, government, health and life sciences, retail and telecommunications.

The IBM Netezza TwinFin ® appliance is built specifically to analyze petabytes of detailed data significantly faster than existing data warehouse options, and at a much lower total cost of ownership. It stores, filters and processes terabytes of records within a single unit, analyzing only the relevant information for each query.

Using Revolution R Enterprise & Netezza Together

Revolution Analytics and IBM Netezza have announced a partnership to integrate Revolution R Enterprise and the IBM Netezza TwinFin Data Warehouse Appliance. For the first time, customers seeking to run high performance and full-scale predictive analytics from within a data warehouse platform will be able to directly leverage the power of the open source R statistics language. The companies are working together to create a version of Revolution’s software that takes advantage of IBM Netezza’s i-class technology so that Revolution R Enterprise can run in-database in an optimal fashion.

This partnership integrates Revolution R Enterprise with IBM Netezza’s high performance data warehouse and advanced analytics platform to help organizations combat the challenges that arise as complexity and the scale of data grow. By moving the analytics processing next to the data, this integration will minimize data movement – a significant bottleneck, especially when dealing with “Big Data”. It will deliver high performance on large scale data, while leveraging the latest innovations in analytics.

With Revolution R Enterprise for IBM Netezza, advanced R computations are available for rapid analysis of hundreds of terabyte-class data volumes — and can deliver 10-100x performance improvements at a fraction of the cost compared to traditional analytics vendors.

Additional Resources

Whitepapers:
- Why R is Hot
- Netezza Positioned as a Leader in the Gartner Magic Quadrant for Data Warehouse Database Management Systems
On-Demand Webinar: Revolution R Enterprise: 100% R and More
Free Downloads: Revolution R Community
Product Information:
- Revolution R Enterprise: R is Ready for Business
- IBM Netezza TwinFin® appliance

IBM’s bet: Commerce can be just as big as analytics (zdnet.com)
Revolution Analytics announces partnership with IBM Netezza (revolutionanalytics.com)
Netezza Chief Talks About “Formative” PTC Days, IBM Deal History, and the Future of Big Data (xconomy.com)
Gartner Ranks Data Warehousing Leaders (informationweek.com)
IBM Acquires Netezza in $1.7 Billion Deal (dailyfinance.com)
HP To Acquire Analytics Specialist Vertica (consultramy.wordpress.com)
SAP, IBM Team up on In-memory Analytics (pcworld.com)

HIGHLIGHTS from REXER Survey :R gives best satisfaction

A Summary report from Rexer Analytics Annual Survey

HIGHLIGHTS from the 4^th Annual Data Miner Survey (2010):

• FIELDS & GOALS: Data miners work in a diverse set of fields. CRM / Marketing has been the #1 field in each of the past four years. Fittingly, “improving the understanding of customers”, “retaining customers” and other CRM goals are also the goals identified by the most data miners surveyed.

• ALGORITHMS: Decision trees, regression, and cluster analysis continue to form a triad of core algorithms for most data miners. However, a wide variety of algorithms are being used. This year, for the first time, the survey asked about Ensemble Models, and 22% of data miners report using them.
A third of data miners currently use text mining and another third plan to in the future.

• MODELS: About one-third of data miners typically build final models with 10 or fewer variables, while about 28% generally construct models with more than 45 variables.

• TOOLS: After a steady rise across the past few years, the open source data mining software R overtook other tools to become the tool used by more data miners (43%) than any other. STATISTICA, which has also been climbing in the rankings, is selected as the primary data mining tool by the most data miners (18%). Data miners report using an average of 4.6 software tools overall. STATISTICA, IBM SPSS Modeler, and R received the strongest satisfaction ratings in both 2010 and 2009.

• TECHNOLOGY: Data Mining most often occurs on a desktop or laptop computer, and frequently the data is stored locally. Model scoring typically happens using the same software used to develop models. STATISTICA users are more likely than other tool users to deploy models using PMML.

• CHALLENGES: As in previous years, dirty data, explaining data mining to others, and difficult access to data are the top challenges data miners face. This year data miners also shared best practices for overcoming these challenges. The best practices are available online.

• FUTURE: Data miners are optimistic about continued growth in the number of projects they will be conducting, and growth in data mining adoption is the number one “future trend” identified. There is room to improve: only 13% of data miners rate their company’s analytic capabilities as “excellent” and only 8% rate their data quality as “very strong”.

Please contact us if you have any questions about the attached report or this annual research program. The 5^th Annual Data Miner Survey will be launching next month. We will email you an invitation to participate.

Information about Rexer Analytics is available at www.RexerAnalytics.com. Rexer Analytics continues their impressive journey see http://www.rexeranalytics.com/Clients.html

|My only thought- since most data miners are using multiple tools including free tools as well as paid software, Perhaps a pie chart of market share by revenue and volume would be handy.

Also some ideas on comparing diverse data mining projects by data size, or complexity.

Skills of a good data miner (zyxo.wordpress.com)
7 Data Blogs To Explore (readwriteweb.com)
FBI Data-Mining Program:Total Information Awareness (alitarhini.wordpress.com)

Zementis partners with R Analytics Vendor- Revo

Image via Wikipedia

Just got a PR email from Michael Zeller,CEO , Zementis annoucing Zementis (ADAPA) and Revolution Analytics just partnered up.

Is this something substantial or just time-sharing http://bi.cbronline.com/news/sas-ceo-says-cep-open-source-and-cloud-bi-have-limited-appeal or a Barney Partnership (http://www.dbms2.com/2008/05/08/database-blades-are-not-what-they-used-to-be/)

Summary- Thats cloud computing scoring of models on EC2 (Zementis) partnering with the actual modeling software in R (Revolution Analytics RevoDeployR)

See previous interviews with both Dr Zeller at https://decisionstats.com/2009/02/03/interview-michael-zeller-ceozementis/ ,https://decisionstats.com/2009/05/07/interview-ron-ramos-zementis/ and https://decisionstats.com/2009/10/05/interview-michael-zellerceo-zementis-on-pmml/)

and Revolution guys at https://decisionstats.com/2010/08/03/q-a-with-david-smith-revolution-analytics/

and https://decisionstats.com/2009/05/29/interview-david-smith-revolution-computing/

–

strategic partnership with Revolution Analytics, the leading commercial provider of software and support for the popular open source R statistics language. With this partnership, predictive models developed on Revolution R Enterprise are now accessible for real-time scoring through the ADAPA Decisioning Engine by Zementis.

ADAPA is an extremely fast and scalable predictive platform. Models deployed in ADAPA are automatically available for execution in real-time and batch-mode as Web Services. ADAPA allows Revolution R Enterprise to leverage the Predictive Model Markup Language (PMML) for better decision management. With PMML, models built in R can be used in a wide variety of real-world scenarios without requiring laborious or expensive proprietary processes to convert them into applications capable of running on an execution system.

“By partnering with Zementis, Revolution Analytics is building an end-to-end solution for moving enterprise-level predictive R models into the execution environment,” said Jeff Erhardt, Revolution Analytics Chief Operation Officer. “With Zementis, we are eliminating the need to take R applications apart and recode, retest and redeploy them in order to obtain desirable results.”

Got demo?

Yes, we do! Revolution Analytics and Zementis have put together a demo which combines the building of models in R with automatic deployment and execution in ADAPA. It uses Revolution Analytics’ RevoDeployR, a new Web Services framework that allows for data analysts working in R to publish R scripts to a server-based installation of Revolution R Enterprise.

Action Items:

Try our INTERACTIVE DEMO

DOWNLOAD the white paper

Try the ADAPA FREE TRIAL

RevoDeployR & ADAPA allow for real-time analysis and predictions from R to be effectively used by existing Excel spreadsheets, BI dashboards and Web-based applications, all in real-time.

Predictive analytics with RevoDeployR from Revolution Analytics and ADAPA from Zementis put model building and real-time scoring into a league of their own. Seriously!

Revolution R Enterprise 4.2 now available (revolutionanalytics.com)
Enterprise Startup Spotlight: Revolution Analytics, Taking on SAS, SPSS (readwriteweb.com)
Gartner predicts business intelligence revolution (v3.co.uk)

Related Articles

Please share:

3) Sudoko–

4) Pinball

That’s all for holiday season folks, the top 10 lists is based on almost 3 decades of gaming experience, but beauty is the eye of the beholder- so happy gaming for free.

Related Articles

Please share:

Related Articles

Please share:

About IBM

About Revolution Analytics

Using Revolution R Enterprise & Netezza Together

Additional Resources

Related Articles

Please share:

Related Articles

Please share:

Related Articles

Please share: