Analytics – Page 97 – DECISION STATS

The Mommy Track

A new paper quantitatively analyzes the impact of child bearing on women. Summary-

Women [who score in the upper third on a standardized test] have a net 8 percent reduction in pay during the first five years after giving birth

From http://papers.nber.org/papers/w16582

Having a child lowers a woman’s lifetime earnings, but how much depends upon her skill level. In The Mommy Track Divides: The Impact of Childbearing on Wages of Women of Differing Skill Levels (NBER Working Paper No. 16582), co-authors Elizabeth Ty Wilde, Lily Batchelder, and David Ellwood estimate that having a child costs the average high skilled woman $230,000 in lost lifetime wages relative to similar women who never gave birth. By comparison, low skilled women experience a lifetime wage loss of only $49,000.

Using the 1979 National Longitudinal Survey of Youth (NLSY), Wilde et. al. divided women into high, medium, and low skill categories based on their Armed Forces Qualification Test (AFQT) scores. The authors use these skill categories, combined with earnings, labor force participation, and family formation data, to chart the labor market progress of women before and after childbirth, from ages 14-to-21 in 1979 through 41-to-49 in 2006, this study’s final sample year.

High scoring and low scoring women differed in a number of ways. While 70-75 percent of higher scoring women work full-time all year prior to their first birth, only 55-60 percent of low scoring women do. As they age, the high scoring women enjoy steeper wage growth than low scoring women; low scoring women’s wages do not change much if they reenter the labor market after they have their first child. Five years after the first birth, about 35 percent of each group is working full-time. However, the high scoring women who are not working full-time are more likely to be working part-time than the low scoring women, who are more likely to leave the workforce entirely.

and

Men’s earning profiles are relatively unaffected by having children although men who never have children earn less on average than those who do. High scoring women who have children late also tend to earn more than high scoring childless women. Their earnings advantage occurs before they have children and narrows substantially after they become mothers.

Highly Educated Women Pay a High Price to Have Children (dailyfinance.com)
Women Still Lag Behind Men In Wages, By a Significant Margin (walletpop.com)
Changes in the Distribution of Workers’ Hourly Wages Between 1979 and 2009 (economistsview.typepad.com)
Triangle Returns: Young Women Continue to Die Locked in Sweatshops (yubanet.com)
Women at Work: Educational attainment and earnings (washingtonpolicywatch.org)
College Graduates and the Terrible Labor Market (rortybomb.wordpress.com)

Protected: SAS legal falls flat against WPS again: Technical Grounds

What to do if you see a possible GPL violation

Well I have played with software (mostly but not exclusively) analytical, and I admire the zeal and energy of both open source and closed source practioners- all having relatively decent people executing strategies their investors or owners tell them to do (closed source) or motivated by their own self sense of cool-change the world-openness (open source)

What I dont get is people stealing open source code- repackaging without adding major contributions- claiming patent pending stuff- and basically making money by creating CLOSED source from the open source software-(as open source is yet to break the enterprise glass cieling)

you are either open source or you arent.

bi- sexuality is okay. bi-codability is not.

Next time you see someone stealing some community’s open source code- refer to this excellent link.

But, we cannot act on our own if we do not hold copyright. Thus, be sure to find out who the copyright holders of the software are before reporting a violation.

http://www.gnu.org/licenses/gpl-violation.html

Violations of the GNU Licenses

If you think you see a violation of the GNU GPL, LGPL, AGPL, or FDL, the first thing you should do is double-check the facts:

Does the distribution contain a copy of the License?
Does it clearly state which software is covered by the License? Does it say anything misleading, perhaps giving the impression that something is covered by the License when in fact it is not?
Is source code included in the distribution?
Is a written offer for source code included with a distribution of just binaries?
Is the available source code complete, or is it designed for linking in other non-free modules?

If there seems to be a real violation, the next thing you need to do is record the details carefully:

the precise name of the product
the name of the person or organization distributing it
email addresses, postal addresses and phone numbers for how to contact the distributor(s)
the exact name of the package whose license is violated
how the license was violated:
- Is the copyright notice of the copyright holder included?
- Is the source code completely missing?
- Is there a written offer for source that’s incomplete in some way? This could happen if it provides a contact address or network URL that’s somehow incorrect.
- Is there a copy of the license included in the distribution?
- Is some of the source available, but not all? If so, what parts are missing?

The more of these details that you have, the easier it is for the copyright holder to pursue the matter.

Once you have collected the details, you should send a precise report to the copyright holder of the packages that are being misused. The copyright holder is the one who is legally authorized to take action to enforce the license.

If the copyright holder is the Free Software Foundation, please send the report to <license-violation@gnu.org>. It’s important that we be able to write back to you to get more information about the violation or product. So, if you use an anonymous remailer, please provide a return path of some sort. If you’d like to encrypt your correspondence, just send a brief mail saying so, and we’ll make appropriate arrangements.

Note that the GPL, and other copyleft licenses, are copyright licenses. This means that only the copyright holders are empowered to act against violations. The FSF acts on all GPL violations reported on FSF copyrighted code, and we offer assistance to any other copyright holder who wishes to do the same.

But, we cannot act on our own if we do not hold copyright. Thus, be sure to find out who the copyright holders of the software are before reporting a violation.

iOS beats Android at open source app compliance, says study (linuxfordevices.com)
The GPL is a License, Not a Contract (groklaw.net)
Google’s Android faces a serious Linux copyright issue (potentially bigger than its Java problem) (fosspatents.blogspot.com)
Google accused of violating GPLv2 licensing in Android (linuxfordevices.com)
The Open Source trials: hanging in the legal balance of copyright and copyleft (visionmobile.com)
Email To The FSF About WordPress’s GPL License Violations (smackdown.blogsblogsblogs.com)
More evidence of Google’s habit of GPL laundering in Android: the BlueZ Bluetooth stack and the ext4 file system (fosspatents.blogspot.com)
Most Android, iPhone apps violate open source rules (macworld.com)
Android violates Linux license, experts claim (infoworld.com)
Koha Community Considers Affero License (go-to-hellman.blogspot.com)
How to avoid public GPL floggings on Apple’s App Store (zdnet.com)
Ask HN: Open sourcing our product? (news.ycombinator.com)
Most Mobile Phone Apps Violate Open Source Rules (pcworld.com)
WordPress Creator GPL Says WP Template Must Be GPL’d (yro.slashdot.org)
Study: 70 percent of iPhone and Android open source apps violate licenses (infoworld.com)
Australian Telco Telstra Complies With GPL (news.slashdot.org)
Hosting Company Appears To Be Violating the GPL (yro.slashdot.org)

Google Snappy

Diagram of how a 32-bit integer is arranged in... — Image via Wikipedia

a cool sounding software- yet again by the guys from California, this one enables to zip and unzip Big Data much much faster

http://news.ycombinator.com/item?id=2356735

and

https://code.google.com/p/snappy/

Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.

Snappy is widely used inside Google, in everything from BigTable and MapReduce to our internal RPC systems. (Snappy has previously been referred to as “Zippy” in some presentations and the likes.)

For more information, please see the README. Benchmarks against a few other compression libraries (zlib, LZO, LZF, FastLZ, and QuickLZ) are included in the source code distribution.

Introduction

============

Snappy is a compression/decompression library. It does not aim for maximum

compression, or compatibility with any other compression library; instead,

it aims for very high speeds and reasonable compression. For instance,

compared to the fastest mode of zlib, Snappy is an order of magnitude faster

for most inputs, but the resulting compressed files are anywhere from 20% to

100% bigger. (For more information, see “Performance”, below.)

Snappy has the following properties:

* Fast: Compression speeds at 250 MB/sec and beyond, with no assembler code.

See “Performance” below.

* Stable: Over the last few years, Snappy has compressed and decompressed

petabytes of data in Google’s production environment. The Snappy bitstream

format is stable and will not change between versions.

* Robust: The Snappy decompressor is designed not to crash in the face of

corrupted or malicious input.

* Free and open source software: Snappy is licensed under the Apache license,

version 2.0. For more information, see the included COPYING file.

Snappy has previously been called “Zippy” in some Google presentations

and the like.

Performance

===========

Snappy is intended to be fast. On a single core of a Core i7 processor

in 64-bit mode, it compresses at about 250 MB/sec or more and decompresses at

about 500 MB/sec or more. (These numbers are for the slowest inputs in our

benchmark suite; others are much faster.) In our tests, Snappy usually

is faster than algorithms in the same class (e.g. LZO, LZF, FastLZ, QuickLZ,

etc.) while achieving comparable compression ratios.

Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x

for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and

other already-compressed data. Similar numbers for zlib in its fastest mode

are 2.6-2.8x, 3-7x and 1.0x, respectively. More sophisticated algorithms are

capable of achieving yet higher compression rates, although usually at the

expense of speed. Of course, compression ratio will vary significantly with

the input.

Although Snappy should be fairly portable, it is primarily optimized

for 64-bit x86-compatible processors, and may run slower in other environments.

In particular:

– Snappy uses 64-bit operations in several places to process more data at

once than would otherwise be possible.

– Snappy assumes unaligned 32- and 64-bit loads and stores are cheap.

On some platforms, these must be emulated with single-byte loads

and stores, which is much slower.

– Snappy assumes little-endian throughout, and needs to byte-swap data in

several places if running on a big-endian platform.

Experience has shown that even heavily tuned code can be improved.

Performance optimizations, whether for 64-bit x86 or other platforms,

are of course most welcome; see “Contact”, below.

Usage

=====

Note that Snappy, both the implementation and the interface,

is written in C++.

To use Snappy from your own program, include the file “snappy.h” from

your calling file, and link against the compiled library.

There are many ways to call Snappy, but the simplest possible is

snappy::Compress(input, &output);

and similarly

snappy::Uncompress(input, &output);

where “input” and “output” are both instances of std::string.

Google releases snappy, the compression library used in Bigtable (code.google.com)
Maximizing Search Engine Visitors The Correct Way (ronmedlin.com)
MapReduce from the basics to the actually useful (in under 30 minutes) (cloudant.com)

Protected: Using SAS and C/C++ together

PMML Plugin for Greenplum now available

From a press release from Zementis.

, the Universal PMML Plug-in for in-database scoring. Available now for the EMC Greenplum Database, a high-performance massively parallel processing (MPP) database, the plug-in leverages the Predictive Model Markup Language (PMML) to execute predictive models directly within EMC Greenplum, for highly optimized in-database scoring.

Developed by the Data Mining Group (DMG), PMML is supported by all major data mining vendors, e.g., IBM SPSS, SAS, Teradata, FICO, STASTICA, Microstrategy, TIBCO and Revolution Analytics as well as open source tools like R, KNIME and RapidMiner. With PMML, models built in any of these data mining tools can now instantly be deployed in the EMC Greenplum database. The net result is the ability to leverage the power of standards-based predictive analytics on a massive scale, right where the data resides.

“By partnering with Zementis, a true PMML innovator, we are able to offer a vendor-agnostic solution for moving enterprise-level predictive analytics into the database execution environment,” said Dr. Steven Hillion, Vice President of Analytics at EMC Greenplum. “With Zementis and PMML, the de-facto standard for representing data mining models, we are eliminating the need to recode predictive analytic models in order to deploy them within our database. In turn, this enables an analyst to reduce the time to insight required in most businesses today.”

Want to learn more?

To learn more about how the EMC Greenplum Database and the Universal PMML Plug-in work together, feel free to:

Visit the PMML Plug-in product page
Download the white paper

The Universal PMML Plug-in for the EMC Greenplum Database is available now. Contact us today for more information.

Michael Zeller, CEO, Zementis

Creating New Capabilities With An Analytics Lab (chucksblog.emc.com)
EMC Greenplum releases Community Edition of MPP database product, big data analysis gets cheaper still (zdnet.com)
EMC lets go of Greenplum Community Edition (go.theregister.com)
Greenplum, Big Data, and an Open Source Card (arnoldit.com)
EMC launches free edition of Greenplum database (zdnet.com)

KDNuggets Survey on R

From http://www.kdnuggets.com/2011/03/new-poll-r-in-analytics-data-mining-work.html?k11n07

A new poll/survey on actual usage of R in Data Mining

R has been steadily growing in popularity among data miners and analytic professionals.

In KDnuggets 2010 Data Mining / Analytic Tools Poll, R was used by 30% of respondents.
In 2010 Rexer Analytics Data Miner SurveyR was the most popular tool, used by 43% of the data miners.

Another aspect of tool usefulness is how much does it help with the entire data mining process from data preparation and cleaning, modeling, evaluation, visualization and presentation (excluding deployment).

New KDnuggets Poll is asking:
What part of your analytics / data mining work in the past 12 months was done in R?

http://www.kdnuggets.com/2011/03/new-poll-r-in-analytics-data-mining-work.html?k11n07

Survey: R used by more data miners than any other tool (revolutionanalytics.com)
Good News for Data Geeks, Bad News for Everyone Else (izabael.com)
Skills of a good data miner (zyxo.wordpress.com)
Why Data mining in CRM? (alsysoncrm.wordpress.com)
Data Mining: How Companies Know Your Personal Information – TIME (bjconquest.com)
What Data Mining Firms Know About You (yro.slashdot.org)

Related Articles

Please share:

Violations of the GNU Licenses

Related Articles

Please share:

Related Articles

Please share:

Related Articles

Please share:

Related Articles

Please share: