The Mommy Track

Wage_labour
Image via Wikipedia

A new paper quantitatively analyzes the impact of child bearing on women. Summary-

Women [who score in the upper third on a standardized test] have a net 8 percent reduction in pay during the first five years after giving birth

From http://papers.nber.org/papers/w16582

Having a child lowers a woman’s lifetime earnings, but how much depends upon her skill level. In The Mommy Track Divides: The Impact of Childbearing on Wages of Women of Differing Skill Levels (NBER Working Paper No. 16582), co-authors Elizabeth Ty Wilde, Lily Batchelder, and David Ellwood estimate that having a child costs the average high skilled woman $230,000 in lost lifetime wages relative to similar women who never gave birth. By comparison, low skilled women experience a lifetime wage loss of only $49,000.

Using the 1979 National Longitudinal Survey of Youth (NLSY), Wilde et. al. divided women into high, medium, and low skill categories based on their Armed Forces Qualification Test (AFQT) scores. The authors use these skill categories, combined with earnings, labor force participation, and family formation data, to chart the labor market progress of women before and after childbirth, from ages 14-to-21 in 1979 through 41-to-49 in 2006, this study’s final sample year.

High scoring and low scoring women differed in a number of ways. While 70-75 percent of higher scoring women work full-time all year prior to their first birth, only 55-60 percent of low scoring women do. As they age, the high scoring women enjoy steeper wage growth than low scoring women; low scoring women’s wages do not change much if they reenter the labor market after they have their first child. Five years after the first birth, about 35 percent of each group is working full-time. However, the high scoring women who are not working full-time are more likely to be working part-time than the low scoring women, who are more likely to leave the workforce entirely.

and

Men’s earning profiles are relatively unaffected by having children although men who never have children earn less on average than those who do. High scoring women who have children late also tend to earn more than high scoring childless women. Their earnings advantage occurs before they have children and narrows substantially after they become mothers.

R Commander Plugins-20 and growing!

First graphical user interface in 1973.
Image via Wikipedia
R Commander Extensions: Enhancing a Statistical Graphical User Interface by extending menus to statistical packages

R Commander ( see paper by Prof J Fox at http://www.jstatsoft.org/v14/i09/paper ) is a well known and established graphical user interface to the R analytical environment.
While the original GUI was created for a basic statistics course, the enabling of extensions (or plug-ins  http://www.r-project.org/doc/Rnews/Rnews_2007-3.pdf ) has greatly enhanced the possible use and scope of this software. Here we give a list of all known R Commander Plugins and their uses along with brief comments.

  1. DoE – http://cran.r-project.org/web/packages/RcmdrPlugin.DoE/RcmdrPlugin.DoE.pdf
  2. doex
  3. EHESampling
  4. epack- http://cran.r-project.org/web/packages/RcmdrPlugin.epack/RcmdrPlugin.epack.pdf
  5. Export- http://cran.r-project.org/web/packages/RcmdrPlugin.Export/RcmdrPlugin.Export.pdf
  6. FactoMineR
  7. HH
  8. IPSUR
  9. MAc- http://cran.r-project.org/web/packages/RcmdrPlugin.MAc/RcmdrPlugin.MAc.pdf
  10. MAd
  11. orloca
  12. PT
  13. qcc- http://cran.r-project.org/web/packages/RcmdrPlugin.qcc/RcmdrPlugin.qcc.pdf and http://cran.r-project.org/web/packages/qcc/qcc.pdf
  14. qual
  15. SensoMineR
  16. SLC
  17. sos
  18. survival-http://cran.r-project.org/web/packages/RcmdrPlugin.survival/RcmdrPlugin.survival.pdf
  19. SurvivalT
  20. Teaching Demos

Note the naming convention for above e plugins is always with a Prefix of “RCmdrPlugin.” followed by the names above
Also on loading a Plugin, it must be already installed locally to be visible in R Commander’s list of load-plugin, and R Commander loads the e-plugin after restarting.Hence it is advisable to load all R Commander plugins in the beginning of the analysis session.

However the notable E Plugins are
1) DoE for Design of Experiments-
Full factorial designs, orthogonal main effects designs, regular and non-regular 2-level fractional
factorial designs, central composite and Box-Behnken designs, latin hypercube samples, and simple D-optimal designs can currently be generated from the GUI. Extensions to cover further latin hypercube designs as well as more advanced D-optimal designs (with blocking) are planned for the future.
2) Survival- This package provides an R Commander plug-in for the survival package, with dialogs for Cox models, parametric survival regression models, estimation of survival curves, and testing for differences in survival curves, along with data-management facilities and a variety of tests, diagnostics and graphs.
3) qcc -GUI for  Shewhart quality control charts for continuous, attribute and count data. Cusum and EWMA charts. Operating characteristic curves. Process capability analysis. Pareto chart and cause-and-effect chart. Multivariate control charts
4) epack- an Rcmdr “plug-in” based on the time series functions. Depends also on packages like , tseries, abind,MASS,xts,forecast. It covers Log-Exceptions garch
and following Models -Arima, garch, HoltWinters
5)Export- The package helps users to graphically export Rcmdr output to LaTeX or HTML code,
via xtable() or Hmisc::latex(). The plug-in was originally intended to facilitate exporting Rcmdr
output to formats other than ASCII text and to provide R novices with an easy-to-use,
easy-to-access reference on exporting R objects to formats suited for printed output. The
package documentation contains several pointers on creating reports, either by using
conventional word processors or LaTeX/LyX.
6) MAc- This is an R-Commander plug-in for the MAc package (Meta-Analysis with
Correlations). This package enables the user to conduct a meta-analysis in a menu-driven,
graphical user interface environment (e.g., SPSS), while having the full statistical capabilities of
R and the MAc package. The MAc package itself contains a variety of useful functions for
conducting a research synthesis with correlational data. One of the unique features of the MAc
package is in its integration of user-friendly functions to complete the majority of statistical steps
involved in a meta-analysis with correlations. It uses recommended procedures as described in
The Handbook of Research Synthesis and Meta-Analysis (Cooper, Hedges, & Valentine, 2009).

A query to help for ??Rcmdrplugins reveals the following information which can be quite overwhelming given that almost 20 plugins are now available-

RcmdrPlugin.DoE::DoEGlossary
Glossary for DoE terminology as used in
RcmdrPlugin.DoE
RcmdrPlugin.DoE::Menu.linearModelDesign
RcmdrPlugin.DoE Linear Model Dialog for
experimental data
RcmdrPlugin.DoE::Menu.rsm
RcmdrPlugin.DoE response surface model Dialog
for experimental data
RcmdrPlugin.DoE::RcmdrPlugin.DoE-package
R-Commander plugin package that implements
design of experiments facilities from packages
DoE.base, FrF2 and DoE.wrapper into the
R-Commander
RcmdrPlugin.DoE::RcmdrPlugin.DoEUndocumentedFunctions
Functions used in menus
RcmdrPlugin.doex::ranblockAnova
Internal RcmdrPlugin.doex objects
RcmdrPlugin.doex::RcmdrPlugin.doex-package
Install the DOEX Rcmdr Plug-In
RcmdrPlugin.EHESsampling::OpenSampling1
Internal functions for menu system of
RcmdrPlugin.EHESsampling
RcmdrPlugin.EHESsampling::RcmdrPlugin.EHESsampling-package
Help with EHES sampling
RcmdrPlugin.Export::RcmdrPlugin.Export-package
Graphically export objects to LaTeX or HTML
RcmdrPlugin.FactoMineR::defmacro
Internal RcmdrPlugin.FactoMineR objects
RcmdrPlugin.FactoMineR::RcmdrPlugin.FactoMineR
Graphical User Interface for FactoMineR
RcmdrPlugin.IPSUR::IPSUR-package
An IPSUR Plugin for the R Commander
RcmdrPlugin.MAc::RcmdrPlugin.MAc-package
Meta-Analysis with Correlations (MAc) Rcmdr
Plug-in
RcmdrPlugin.MAd::RcmdrPlugin.MAd-package
Meta-Analysis with Mean Differences (MAd) Rcmdr
Plug-in
RcmdrPlugin.orloca::activeDataSetLocaP
RcmdrPlugin.orloca: A GUI for orloca-package
(internal functions)
RcmdrPlugin.orloca::RcmdrPlugin.orloca-package
RcmdrPlugin.orloca: A GUI for orloca-package
RcmdrPlugin.orloca::RcmdrPlugin.orloca.es
RcmdrPlugin.orloca.es: Una interfaz grafica
para el paquete orloca
RcmdrPlugin.qcc::RcmdrPlugin.qcc-package
Install the Demos Rcmdr Plug-In
RcmdrPlugin.qual::xbara
Internal RcmdrPlugin.qual objects
RcmdrPlugin.qual::RcmdrPlugin.qual-package
Install the quality Rcmdr Plug-In
RcmdrPlugin.SensoMineR::defmacro
Internal RcmdrPlugin.SensoMineR objects
RcmdrPlugin.SensoMineR::RcmdrPlugin.SensoMineR
Graphical User Interface for SensoMineR
RcmdrPlugin.SLC::Rcmdr.help.RcmdrPlugin.SLC
RcmdrPlugin.SLC: A GUI for slc-package
(internal functions)
RcmdrPlugin.SLC::RcmdrPlugin.SLC-package
RcmdrPlugin.SLC: A GUI for SLC R package
RcmdrPlugin.sos::RcmdrPlugin.sos-package
Efficiently search R Help pages
RcmdrPlugin.steepness::Rcmdr.help.RcmdrPlugin.steepness
RcmdrPlugin.steepness: A GUI for
steepness-package (internal functions)
RcmdrPlugin.steepness::RcmdrPlugin.steepness
RcmdrPlugin.steepness: A GUI for steepness R
package
RcmdrPlugin.survival::allVarsClusters
Internal RcmdrPlugin.survival Objects
RcmdrPlugin.survival::RcmdrPlugin.survival-package
Rcmdr Plug-In Package for the survival Package
RcmdrPlugin.TeachingDemos::RcmdrPlugin.TeachingDemos-package
Install the Demos Rcmdr Plug-In

 

Using R from within Python

Python logo
Image via Wikipedia

I came across this excellent JSS paper at www.jstatsoft.org/v35/c02/paper

on a Python package called PypeR which allows you to use R from within Python using the pipe functionality.

It is an interesting package and given Python’s increasing buzz , one worthy to be checked out by people using or thinking Python in their packages.

























Citation:
	@article{Xia:McClelland:Wang:2010:JSSOBK:v35c02,
	  author =	"Xiao-Qin Xia and Michael McClelland and Yipeng Wang",
	  title =	"PypeR, A Python Package for Using R in Python",
	  journal =	"Journal of Statistical Software, Code Snippets",
	  volume =	"35",
	  number =	"2",
	  pages =	"1--8",
	  day =  	"30",
	  month =	"7",
	  year = 	"2010",
	  CODEN =	"JSSOBK",
	  ISSN = 	"1548-7660",
	  bibdate =	"2010-03-23",
	  URL =  	"http://www.jstatsoft.org/v35/c02",
	  accepted =	"2010-03-23",
	  acknowledgement = "",
	  keywords =	"",
	  submitted =	"2009-10-23",
	}

 

Checks in the mail more effective checks to your pay

Paycheck (film)
Image via Wikipedia

NBER (whose excellent monthly newsletter I subscribe to- among others) http://www.nber.org/ in a recent paper claims that cheque in mails (one time) sare better spent than monthly pay increases.

I wonder what this conclusion can be used for in designing annual bonuses versus higher pay in private sector compensation- but people do seem happier receiving a bigger one time boost than 12 small mini boosts.

 

http://papers.nber.org/papers/w16246

Check in the Mail or More in the Paycheck: Does the Effectiveness of Fiscal Stimulus Depend on How It Is Delivered?

use a mirror
Use a mirror
download in pdf format
(176 K)

email paper

Claudia R. Sahm, Matthew D. Shapiro, Joel Slemrod

NBER Working Paper No. 16246
Issued in July 2010
NBER Program(s):   EFG ME PE

An NBER digest for this paper is available.

Recent fiscal policies have aimed to stimulate household spending. In 2008, most households received one-time economic stimulus payments. In 2009, most working households received the Making Work Pay tax credit in the form of reduced withholding; other households, mainly retirees, received one-time payments. This paper quantifies the spending response to these different policies and examines whether the spending response differed according to whether the stimulus was delivered as a one-time payment or as a flow of payments in the form of reduced withholding. Based on responses from a representative sample of households in the Thomson Reuters/University of Michigan Surveys of Consumers, the paper finds that the reduction in withholding led to a substantially lower rate of spending than the one-time payments. Specifically, 25 percent of households reported that the one-time economic stimulus payment in 2008 led them to mostly increase their spending while only 13 percent reported that the extra pay from the lower withholding in 2009 led them to mostly increase their spending. The paper uses several approaches to isolate the effect of the delivery mechanism from the changing aggregate and individual conditions. Responses to a hypothetical stimulus in 2009, examination of “free responses” concerning differing responses to the policies, and regression analysis controlling for individual economic conditions and demographics all support the primary importance of the income delivery mechanism in determining the spending response to the policies.

This paper is available as PDF (176 K) or via email.

Machine-readable bibliographic record – MARC, RIS, BibTeX

How to balance your online advertising and your offline conscience

Google in 1998, showing the original logo
Image via Wikipedia

I recently found an interesting example of  a website that both makes a lot of money and yet is much more efficient than any free or non profit. It is called ECOSIA

If you see a website that wants to balance administrative costs  plus have a transparent way to make the world better- this is a great example.

  • http://ecosia.org/how.php
  • HOW IT WORKS
    You search with Ecosia.
  • Perhaps you click on an interesting sponsored link.
  • The sponsoring company pays Bing or Yahoo for the click.
  • Bing or Yahoo gives the bigger chunk of that money to Ecosia.
  • Ecosia donates at least 80% of this income to support WWF’s work in the Amazon.
  • If you like what we’re doing, help us spread the word!
  • Key facts about the park:

    • World’s largest tropical forest reserve (38,867 square kilometers, or about the size of Switzerland)
    • Home to about 14% of all amphibian species and roughly 54% of all bird species in the Amazon – not to mention large populations of at least eight threatened species, including the jaguar
    • Includes part of the Guiana Shield containing 25% of world’s remaining tropical rainforests – 80 to 90% of which are still pristine
    • Holds the last major unpolluted water reserves in the Neotropics, containing approximately 20% of all of the Earth’s water
    • One of the last tropical regions on Earth vastly unaltered by humans
    • Significant contributor to climatic regulation via heat absorption and carbon storage

     

    http://ecosia.org/statistics.php

    They claim to have donated 141,529.42 EUR !!!

    http://static.ecosia.org/files/donations.pdf

     

     

     

     

     

     

     

     

     

     

    Well suppose you are the Web Admin of a very popular website like Wikipedia or etc

    One way to meet server costs is to say openly hey i need to balance my costs so i need some money.

    The other way is to use online advertising.

    I started mine with Google Adsense.

    Click per milli (or CPM)  gives you a very low low conversion compared to contacting ad sponsor directly.

    But its a great data experiment-

    as you can monitor which companies are likely to be advertised on your site (assume google knows more about their algols than you will)

    which formats -banner or text or flash have what kind of conversion rates

    what are the expected pay off rates from various keywords or companies (like business intelligence software, predictive analytics software and statistical computing software are similar but have different expected returns (if you remember your eco class)

     

    NOW- Based on above data, you know whats your minimum baseline to expect from a private advertiser than a public, crowd sourced search engine one (like Google or Bing)

    Lets say if you have 100000 views monthly. and assume one out of 1000 page views will lead to a click. Say the advertiser will pay you 1 $ for every 1 click (=1000 impressions)

    Then your expected revenue is $100.But if your clicks are priced at 2.5$ for every click , and your click through rate is now 3 out of 1000 impressions- (both very moderate increases that can done by basic placement optimization of ad type, graphics etc)-your new revenue is  750$.

    Be a good Samaritan- you decide to share some of this with your audience -like 4 Amazon books per month ( or I free Amazon book per week)- That gives you a cost of 200$, and leaves you with some 550$.

    Wait! it doesnt end there- Adam Smith‘s invisible hand moves on .

    You say hmm let me put 100 $ for an annual paper writing contest of $1000, donate $200 to one laptop per child ( or to Amazon rain forests or to Haiti etc etc etc), pay $100 to your upgraded server hosting, and put 350$ in online advertising. say $200 for search engines and $150 for Facebook.

    Woah!

    Month 1 would should see more people  visiting you for the first time. If you have a good return rate (returning visitors as a %, and low bounce rate (visits less than 5 secs)- your traffic should see atleast a 20% jump in new arrivals and 5-10 % in long term arrivals. Ignoring bounces- within  three months you will have one of the following

    1) An interesting case study on statistics on online and social media advertising, tangible motivations for increasing community response , and some good data for study

    2) hopefully better cost management of your server expenses

    3)very hopefully a positive cash flow

     

    you could even set a percentage and share the monthly (or annually is better actions) to your readers and advertisers.

    go ahead- change the world!

    the key paradigms here are sharing your traffic and revenue openly to everyone

    donating to a suitable cause

    helping increase awareness of the suitable cause

    basing fixed percentages rather than absolute numbers to ensure your site and cause are sustained for years.

    An Introduction to Data Mining-online book

    I was reading David Smith’s blog http://blog.revolutionanalytics.com/

    where he mentioned this interview of Norman Nie, at TDWI

    http://tdwi.org/Articles/2010/11/17/R-101.aspx?Page=2

    where I saw this link (its great if you want to study Data Mining btw)

    http://www.kdnuggets.com/education/usa-canada.html

    and I c/liked the U Toronto link

    http://chem-eng.utoronto.ca/~datamining/

    Best of All- I really liked this online book created by Professor S. Sayad

    Its succinct and beautiful and describes all of the Data Mining you want to read in one Map (actually 4 images painstakingly assembled with perfection)

    The best thing is- in the original map- even the sub items are click-able for specifics like Pie Chart and Stacked Column chart are not in one simple drop down like Charts- but rather by nature of the kind of variables that lead to these charts. For doing that- you would need to go to the site itself- ( see http://chem-eng.utoronto.ca/~datamining/dmc/categorical_variables.htm

    vs

    http://chem-eng.utoronto.ca/~datamining/dmc/categorical_numerical.htm

    Again- there is no mention of the data visualization software used to create the images but I think I can take a hint from the Software Page which says software used are-

    Software

    See it on your own-online book (c)Professor S. Sayad

    Really good DIY tutorial

    http://chem-eng.utoronto.ca/~datamining/dmc/data_mining_map.htm

    Complex Event Processing- SASE Language

    Logo of the anti-RFID campaign by German priva...
    Image via Wikipedia

    Complex Event Processing (CEP- not to be confused by Circular Probability Error) is defined processing many events happening across all the layers of an organization, identifying the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time.

    Software supporting CEP are-

    Oracle http://www.oracle.com/us/technologies/soa/service-oriented-architecture-066455.html

    Oracle CEP is a Java application server for the development and deployment of high-performance event driven applications. It can detect patterns in the flow of events and message payloads, often based on filtering, correlation, and aggregation across event sources, and includes industry leading temporal and ordering capabilities. It supports ultra-high throughput (1 million/sec++) and microsecond latency.

    Tibco is also trying to get into this market (it claims to have a 40 % market share in the public CEP market 😉 though probably they have not measured the DoE and DoD as worthy of market share yet

    – see webcast by TIBCO ‘s head here http://www.tibco.com/products/business-optimization/complex-event-processing/default.jsp

    and product info here-http://www.tibco.com/products/business-optimization/complex-event-processing/businessevents/default.jsp

    TIBCO is the undisputed leader in complex event processing (CEP) software with over 40 percent market share, according to a recent IDC Study.

    A good explanation of how social media itself can be used as an analogy for CEP is given in this SAS Global Paper

    http://support.sas.com/resources/papers/proceedings10/040-2010.pdf

    You can see a report on Predictive Analytics and Data Mining  in q1 2010 also from SAS’s website  at –http://www.sas.com/news/analysts/forresterwave-predictive-analytics-dm-104388-0210.pdf

    A very good explanation on architecture involved is given by SAS CTO Keith Collins here on SAS’s Knowledge Exchange site,

    http://www.sas.com/knowledge-exchange/risk/four-ways-divide-conquer.html

    What it is: Methods 1 through 3 look at historical data and traditional architectures with information stored in the warehouse. In this environment, it often takes months of data cleansing and preparation to get the data ready to analyze. Now, what if you want to make a decision or determine the effect of an action in real time, as a sale is made, for instance, or at a specific step in the manufacturing process. With streaming data architectures, you can look at data in the present and make immediate decisions. The larger flood of data coming from smart phones, online transactions and smart-grid houses will continue to increase the amount of data that you might want to analyze but not keep. Real-time streaming, complex event processing (CEP) and analytics will all come together here to let you decide on the fly which data is worth keeping and which data to analyze in real time and then discard.

    When you use it: Radio-frequency identification (RFID) offers a good user case for this type of architecture. RFID tags provide a lot of information, but unless the state of the item changes, you don’t need to keep warehousing the data about that object every day. You only keep data when it moves through the door and out of the warehouse.

    The same concept applies to a customer who does the same thing over and over. You don’t need to keep storing data for analysis on a regular pattern, but if they change that pattern, you might want to start paying attention.

    Figure  4: Traditional architecture vs. streaming architecture

    Figure 4: Traditional architecture vs. streaming architecture

     

    In academia  here is something called SASE Language

    • A rich declarative event language
    • Formal semantics of the event language
    • Theorectical underpinnings of CEP
    • An efficient automata-based implementation

    http://sase.cs.umass.edu/

    and

    http://avid.cs.umass.edu/sase/index.php?page=navleft_1col

    Financial Services

    The query below retrieves the total trading volume of Google stocks in the 4 hour period after some bad news occurred.

    PATTERN SEQ(News a, Stock+ b[ ])WHERE   [symbol]    AND	a.type = 'bad'    AND	b[i].symbol = 'GOOG' WITHIN  4 hoursHAVING  b[b.LEN].volume < 80%*b[1].volumeRETURN  sum(b[ ].volume)

    The next query reports a one-hour period in which the price of a stock increased from 10 to 20 and its trading volume stayed relatively stable.

    PATTERN	SEQ(Stock+ a[])WHERE 	 [symbol]   AND	  a[1].price = 10   AND	  a[i].price > a[i-1].price   AND	  a[a.LEN].price = 20            WITHIN  1 hourHAVING	avg(a[].volume) ≥ a[1].volumeRETURN	a[1].symbol, a[].price

    The third query detects a more complex trend: in an hour, the volume of a stock started high, but after a period of price increasing or staying relatively stable, the volume plummeted.

    PATTERN SEQ(Stock+ a[], Stock b)WHERE 	 [symbol]   AND	  a[1].volume > 1000   AND	  a[i].price > avg(a[…i-1].price))   AND	  b.volume < 80% * a[a.LEN].volume           WITHIN  1 hourRETURN	a[1].symbol, a[].(price,volume), b.(price,volume)

    (note from Ajay-

     

    I was not really happy about the depth of resources on CEP available online- there seem to be missing bits and pieces in both open source, academic and corporate information- one reason for this is the obvious military dual use of this technology- like feeds from Satellite, Audio Scans, etc)