Chart – Page 3 – DECISION STATS

The Mommy Track

A new paper quantitatively analyzes the impact of child bearing on women. Summary-

Women [who score in the upper third on a standardized test] have a net 8 percent reduction in pay during the first five years after giving birth

From http://papers.nber.org/papers/w16582

Having a child lowers a woman’s lifetime earnings, but how much depends upon her skill level. In The Mommy Track Divides: The Impact of Childbearing on Wages of Women of Differing Skill Levels (NBER Working Paper No. 16582), co-authors Elizabeth Ty Wilde, Lily Batchelder, and David Ellwood estimate that having a child costs the average high skilled woman $230,000 in lost lifetime wages relative to similar women who never gave birth. By comparison, low skilled women experience a lifetime wage loss of only $49,000.

Using the 1979 National Longitudinal Survey of Youth (NLSY), Wilde et. al. divided women into high, medium, and low skill categories based on their Armed Forces Qualification Test (AFQT) scores. The authors use these skill categories, combined with earnings, labor force participation, and family formation data, to chart the labor market progress of women before and after childbirth, from ages 14-to-21 in 1979 through 41-to-49 in 2006, this study’s final sample year.

High scoring and low scoring women differed in a number of ways. While 70-75 percent of higher scoring women work full-time all year prior to their first birth, only 55-60 percent of low scoring women do. As they age, the high scoring women enjoy steeper wage growth than low scoring women; low scoring women’s wages do not change much if they reenter the labor market after they have their first child. Five years after the first birth, about 35 percent of each group is working full-time. However, the high scoring women who are not working full-time are more likely to be working part-time than the low scoring women, who are more likely to leave the workforce entirely.

and

Men’s earning profiles are relatively unaffected by having children although men who never have children earn less on average than those who do. High scoring women who have children late also tend to earn more than high scoring childless women. Their earnings advantage occurs before they have children and narrows substantially after they become mothers.

Highly Educated Women Pay a High Price to Have Children (dailyfinance.com)
Women Still Lag Behind Men In Wages, By a Significant Margin (walletpop.com)
Changes in the Distribution of Workers’ Hourly Wages Between 1979 and 2009 (economistsview.typepad.com)
Triangle Returns: Young Women Continue to Die Locked in Sweatshops (yubanet.com)
Women at Work: Educational attainment and earnings (washingtonpolicywatch.org)
College Graduates and the Terrible Labor Market (rortybomb.wordpress.com)

HIGHLIGHTS from REXER Survey :R gives best satisfaction

A Summary report from Rexer Analytics Annual Survey

HIGHLIGHTS from the 4^th Annual Data Miner Survey (2010):

• FIELDS & GOALS: Data miners work in a diverse set of fields. CRM / Marketing has been the #1 field in each of the past four years. Fittingly, “improving the understanding of customers”, “retaining customers” and other CRM goals are also the goals identified by the most data miners surveyed.

• ALGORITHMS: Decision trees, regression, and cluster analysis continue to form a triad of core algorithms for most data miners. However, a wide variety of algorithms are being used. This year, for the first time, the survey asked about Ensemble Models, and 22% of data miners report using them.
A third of data miners currently use text mining and another third plan to in the future.

• MODELS: About one-third of data miners typically build final models with 10 or fewer variables, while about 28% generally construct models with more than 45 variables.

• TOOLS: After a steady rise across the past few years, the open source data mining software R overtook other tools to become the tool used by more data miners (43%) than any other. STATISTICA, which has also been climbing in the rankings, is selected as the primary data mining tool by the most data miners (18%). Data miners report using an average of 4.6 software tools overall. STATISTICA, IBM SPSS Modeler, and R received the strongest satisfaction ratings in both 2010 and 2009.

• TECHNOLOGY: Data Mining most often occurs on a desktop or laptop computer, and frequently the data is stored locally. Model scoring typically happens using the same software used to develop models. STATISTICA users are more likely than other tool users to deploy models using PMML.

• CHALLENGES: As in previous years, dirty data, explaining data mining to others, and difficult access to data are the top challenges data miners face. This year data miners also shared best practices for overcoming these challenges. The best practices are available online.

• FUTURE: Data miners are optimistic about continued growth in the number of projects they will be conducting, and growth in data mining adoption is the number one “future trend” identified. There is room to improve: only 13% of data miners rate their company’s analytic capabilities as “excellent” and only 8% rate their data quality as “very strong”.

Please contact us if you have any questions about the attached report or this annual research program. The 5^th Annual Data Miner Survey will be launching next month. We will email you an invitation to participate.

Information about Rexer Analytics is available at www.RexerAnalytics.com. Rexer Analytics continues their impressive journey see http://www.rexeranalytics.com/Clients.html

|My only thought- since most data miners are using multiple tools including free tools as well as paid software, Perhaps a pie chart of market share by revenue and volume would be handy.

Also some ideas on comparing diverse data mining projects by data size, or complexity.

Skills of a good data miner (zyxo.wordpress.com)
7 Data Blogs To Explore (readwriteweb.com)
FBI Data-Mining Program:Total Information Awareness (alitarhini.wordpress.com)

OK Cupid Data Visualization- Flow Chart to your Heart

Quite appropriate on a V Day, OK Cupid remains quite innovative how they use data (in this questionnaire data)

OkCupid: Finding your Valentine with R (revolutionanalytics.com)
OkCupid Demystifies Dating with Big Data (gigaom.com)
OkCupid’s Love Math Doesn’t Solve The Equation [They Blinded Us With Science] (jezebel.com)
OK Cupid Finds That It’s Our Differences That Make Us Attractive (Aw) (thegloss.com)
Match.com Buys OkCupid for $50M (appscout.com)

Protovis a graphical toolkit for visualization

I just found about a new data visualization tool called Protovis http://vis.stanford.edu/protovis/ex/

Protovis composes custom views of data with simple marks such as bars and dots. Unlike low-level graphics libraries that quickly become tedious for visualization, Protovis defines marks through dynamic properties that encode data, allowing inheritance, scales and layouts to simplify construction.

Protovis is free and open-source and is a Stanford project. It has been used in web interface R Node (which I will talk later )

http://squirelove.net/r-node/doku.php

Conventional

While Protovis is designed for custom visualization, it is still easy to create many standard chart types. These simpler examples serve as an introduction to the language, demonstrating key abstractions such as quantitative and ordinal scales, while hinting at more advanced features, including stack layout.

Custom

Many charting libraries provide stock chart designs, but offer only limited customization; Protovis excels at custom visualization design through a concise representation and precise control over graphical marks. These examples, including a few recreations of unusual historical designs, demonstrate the language’s expressiveness.

Try Protovis today 🙂 http://vis.stanford.edu/protovis/

It uses JavaScript and SVG for web-native visualizations; no plugin required (though you will need a modern web browser)! Although programming experience is helpful, Protovis is mostly declarative and designed to be learned by example.

Linking Petterson – Visualising FRBR data with Protovis (home.hio.no)
The Stanford Visualization Group Debuts Visual Tool for Cleaning Up Data (readwriteweb.com)
Roll your own JavaScript lambda syntax (strobe.cc)

R Commander Plugins-20 and growing!

R Commander Extensions: Enhancing a Statistical Graphical User Interface by extending menus to statistical packages

R Commander ( see paper by Prof J Fox at http://www.jstatsoft.org/v14/i09/paper ) is a well known and established graphical user interface to the R analytical environment.
While the original GUI was created for a basic statistics course, the enabling of extensions (or plug-ins http://www.r-project.org/doc/Rnews/Rnews_2007-3.pdf ) has greatly enhanced the possible use and scope of this software. Here we give a list of all known R Commander Plugins and their uses along with brief comments.

DoE – http://cran.r-project.org/web/packages/RcmdrPlugin.DoE/RcmdrPlugin.DoE.pdf
doex
EHESampling
epack- http://cran.r-project.org/web/packages/RcmdrPlugin.epack/RcmdrPlugin.epack.pdf
Export- http://cran.r-project.org/web/packages/RcmdrPlugin.Export/RcmdrPlugin.Export.pdf
FactoMineR
HH
IPSUR
MAc- http://cran.r-project.org/web/packages/RcmdrPlugin.MAc/RcmdrPlugin.MAc.pdf
MAd
orloca
PT
qcc- http://cran.r-project.org/web/packages/RcmdrPlugin.qcc/RcmdrPlugin.qcc.pdf and http://cran.r-project.org/web/packages/qcc/qcc.pdf
qual
SensoMineR
SLC
sos
survival-http://cran.r-project.org/web/packages/RcmdrPlugin.survival/RcmdrPlugin.survival.pdf
SurvivalT
Teaching Demos

Note the naming convention for above e plugins is always with a Prefix of “RCmdrPlugin.” followed by the names above
Also on loading a Plugin, it must be already installed locally to be visible in R Commander’s list of load-plugin, and R Commander loads the e-plugin after restarting.Hence it is advisable to load all R Commander plugins in the beginning of the analysis session.

However the notable E Plugins are
1) DoE for Design of Experiments-
Full factorial designs, orthogonal main effects designs, regular and non-regular 2-level fractional
factorial designs, central composite and Box-Behnken designs, latin hypercube samples, and simple D-optimal designs can currently be generated from the GUI. Extensions to cover further latin hypercube designs as well as more advanced D-optimal designs (with blocking) are planned for the future.
2) Survival- This package provides an R Commander plug-in for the survival package, with dialogs for Cox models, parametric survival regression models, estimation of survival curves, and testing for differences in survival curves, along with data-management facilities and a variety of tests, diagnostics and graphs.
3) qcc -GUI for Shewhart quality control charts for continuous, attribute and count data. Cusum and EWMA charts. Operating characteristic curves. Process capability analysis. Pareto chart and cause-and-effect chart. Multivariate control charts
4) epack- an Rcmdr “plug-in” based on the time series functions. Depends also on packages like , tseries, abind,MASS,xts,forecast. It covers Log-Exceptions garch
and following Models -Arima, garch, HoltWinters
5)Export- The package helps users to graphically export Rcmdr output to LaTeX or HTML code,
via xtable() or Hmisc::latex(). The plug-in was originally intended to facilitate exporting Rcmdr
output to formats other than ASCII text and to provide R novices with an easy-to-use,
easy-to-access reference on exporting R objects to formats suited for printed output. The
package documentation contains several pointers on creating reports, either by using
conventional word processors or LaTeX/LyX.
6) MAc- This is an R-Commander plug-in for the MAc package (Meta-Analysis with
Correlations). This package enables the user to conduct a meta-analysis in a menu-driven,
graphical user interface environment (e.g., SPSS), while having the full statistical capabilities of
R and the MAc package. The MAc package itself contains a variety of useful functions for
conducting a research synthesis with correlational data. One of the unique features of the MAc
package is in its integration of user-friendly functions to complete the majority of statistical steps
involved in a meta-analysis with correlations. It uses recommended procedures as described in
The Handbook of Research Synthesis and Meta-Analysis (Cooper, Hedges, & Valentine, 2009).

A query to help for ??Rcmdrplugins reveals the following information which can be quite overwhelming given that almost 20 plugins are now available-

RcmdrPlugin.DoE::DoEGlossary
Glossary for DoE terminology as used in
RcmdrPlugin.DoE
RcmdrPlugin.DoE::Menu.linearModelDesign
RcmdrPlugin.DoE Linear Model Dialog for
experimental data
RcmdrPlugin.DoE::Menu.rsm
RcmdrPlugin.DoE response surface model Dialog
for experimental data
RcmdrPlugin.DoE::RcmdrPlugin.DoE-package
R-Commander plugin package that implements
design of experiments facilities from packages
DoE.base, FrF2 and DoE.wrapper into the
R-Commander
RcmdrPlugin.DoE::RcmdrPlugin.DoEUndocumentedFunctions
Functions used in menus
RcmdrPlugin.doex::ranblockAnova
Internal RcmdrPlugin.doex objects
RcmdrPlugin.doex::RcmdrPlugin.doex-package
Install the DOEX Rcmdr Plug-In
RcmdrPlugin.EHESsampling::OpenSampling1
Internal functions for menu system of
RcmdrPlugin.EHESsampling
RcmdrPlugin.EHESsampling::RcmdrPlugin.EHESsampling-package
Help with EHES sampling
RcmdrPlugin.Export::RcmdrPlugin.Export-package
Graphically export objects to LaTeX or HTML
RcmdrPlugin.FactoMineR::defmacro
Internal RcmdrPlugin.FactoMineR objects
RcmdrPlugin.FactoMineR::RcmdrPlugin.FactoMineR
Graphical User Interface for FactoMineR
RcmdrPlugin.IPSUR::IPSUR-package
An IPSUR Plugin for the R Commander
RcmdrPlugin.MAc::RcmdrPlugin.MAc-package
Meta-Analysis with Correlations (MAc) Rcmdr
Plug-in
RcmdrPlugin.MAd::RcmdrPlugin.MAd-package
Meta-Analysis with Mean Differences (MAd) Rcmdr
Plug-in
RcmdrPlugin.orloca::activeDataSetLocaP
RcmdrPlugin.orloca: A GUI for orloca-package
(internal functions)
RcmdrPlugin.orloca::RcmdrPlugin.orloca-package
RcmdrPlugin.orloca: A GUI for orloca-package
RcmdrPlugin.orloca::RcmdrPlugin.orloca.es
RcmdrPlugin.orloca.es: Una interfaz grafica
para el paquete orloca
RcmdrPlugin.qcc::RcmdrPlugin.qcc-package
Install the Demos Rcmdr Plug-In
RcmdrPlugin.qual::xbara
Internal RcmdrPlugin.qual objects
RcmdrPlugin.qual::RcmdrPlugin.qual-package
Install the quality Rcmdr Plug-In
RcmdrPlugin.SensoMineR::defmacro
Internal RcmdrPlugin.SensoMineR objects
RcmdrPlugin.SensoMineR::RcmdrPlugin.SensoMineR
Graphical User Interface for SensoMineR
RcmdrPlugin.SLC::Rcmdr.help.RcmdrPlugin.SLC
RcmdrPlugin.SLC: A GUI for slc-package
(internal functions)
RcmdrPlugin.SLC::RcmdrPlugin.SLC-package
RcmdrPlugin.SLC: A GUI for SLC R package
RcmdrPlugin.sos::RcmdrPlugin.sos-package
Efficiently search R Help pages
RcmdrPlugin.steepness::Rcmdr.help.RcmdrPlugin.steepness
RcmdrPlugin.steepness: A GUI for
steepness-package (internal functions)
RcmdrPlugin.steepness::RcmdrPlugin.steepness
RcmdrPlugin.steepness: A GUI for steepness R
package
RcmdrPlugin.survival::allVarsClusters
Internal RcmdrPlugin.survival Objects
RcmdrPlugin.survival::RcmdrPlugin.survival-package
Rcmdr Plug-In Package for the survival Package
RcmdrPlugin.TeachingDemos::RcmdrPlugin.TeachingDemos-package
Install the Demos Rcmdr Plug-In

New edition of “R Companion to Applied Regression” – by John Fox and Sandy Weisberg (r-bloggers.com)
Reasons for Transitioning to Vim: Bringing LaTeX, R, Sweave and More under One Roof (r-bloggers.com)

Libreoffice 3.3 released

What does LibreOffice give you?

http://www.libreoffice.org/features/

WRITER is the word processor inside LibreOffice. Use it for everything, from dashing off a quick letter to producing an entire book with tables of contents, embedded illustrations, bibliographies and diagrams. The while-you-type auto-completion, auto-formatting and automatic spelling checking make difficult tasks easy (but are easy to disable if you prefer). Writer is powerful enough to tackle desktop publishing tasks such as creating multi-column newsletters and brochures. The only limit is your imagination.

CALC tames your numbers and helps with difficult decisions when you’re weighing the alternatives. Analyze your data with Calc and then use it to present your final output. Charts and analysis tools help bring transparency to your conclusions. A fully-integrated help system makes easier work of entering complex formulas. Add data from external databases such as SQL or Oracle, then sort and filter them to produce statistical analyses. Use the graphing functions to display large number of 2D and 3D graphics from 13 categories, including line, area, bar, pie, X-Y, and net – with the dozens of variations available, you’re sure to find one that suits your project.

IMPRESS is the fastest and easiest way to create effective multimedia presentations. Stunning animation and sensational special effects help you convince your audience. Create presentations that look even more professional than the standard presentations you commonly see at work. Get your collegues’ and bosses’ attention by creating something a little bit different.

DRAW lets you build diagrams and sketches from scratch. A picture is worth a thousand words, so why not try something simple with box and line diagrams? Or else go further and easily build dynamic 3D illustrations and special effects. It’s as simple or as powerful as you want it to be.

BASE is the database front-end of the LibreOffice suite. With Base, you can seamlessly integrate into your existing database structures. Based on imported and linked tables and queries from MySQL, PostgreSQL or Microsoft Access and many other data sources, you can build powerful databases containing forms, reports, views and queries. Full integration is possible with the in-built HSQL database.

MATH is a simple equation editor that lets you lay-out and display your mathematical, chemical, electrical or scientific equations quickly in standard written notation. Even the most-complex calculations can be understandable when displayed correctly. E=mc²

Open Documentation just announced release candidate 3 of Libre office.

New Features-

http://www.libreoffice.org/download/new-features/

General

Added the LibreColors to the palette;
Added Quickstarter for Unix builds;
Introduced Linux “Libertine G” and Linux “Biolinum G” fonts;
Implement import of alpha channel for RGBA .tiffs [http://bugs.freedesktop.org/show_bug.cgi?id=30472];
Show all appropiate formats by default on “Save As” [http://qa.openoffice.org/issues/show_bug.cgi?id=113141];
Use radio buttons for mutually exclusive menu options;
Replace the “Help Support” menu item by the “License Information” one;
Load and save documents in flat XML;
Made Help system available via the WikiHelp;
Option to enable saving of documents at all times (see Tools -> Options -> LibreOffice -> General -> “Allow to save document…”).

Calc

[http://bugs.freedesktop.org/show_bug.cgi?id=30559]: Added new tab page ‘Compatibility’ in the Options dialog;
Better default key bindings;
Use Ctrl-Shift-D to launch selection list in LibreOffice;
Added new image file used in the “insert new sheet” button. This image is not visible in read-only mode;
Fix fake small caps resizing factor [http://qa.openoffice.org/issues/show_bug.cgi?id=1526];
Added dotted/dashed borders in Calc;
Added icons for toggling sheet grids in Calc;
Better performance and interoperability on Excel doc import;
Better performance on DBF import;
Slightly better performance on ODS import;
Possibility to use English formula names;
Distributed alignment – allows one to specify ‘distributed’ horizontal alignment and ‘justified’ and ‘distributed’ vertical alignments within cells. This is notably useful for CJK locales;
Support for 3 different formula syntaxes: Calc A1, Excel A1 and Excel R1C1;
Configurable argument and array separators in formula expressions;
External reference works within OFFSET function;
Hitting TAB during auto-complete commits current selection and moves to the next cell;
Shift-TAB cycles through auto-complete selections;
Find and replace skips those cells that are filtered out (thus hidden);
Protecting sheet provides two additional sheet protection options, to optionally limit cursor placement in protected and unprotected areas;
Copying a range highlights the range being copied. It also allows you to paste it by hitting ENTER key. Hitting ESC removes the range highlight;
Jumping to and from references in formula cells via “Ctrl-[” and “Ctrl-]”;
Cell cursor stays at the original cell during range selection.

Writer

AutoCorrections match case of the words that AutoCorrect replaces. (Issuezilla 2838);
Ability to turn off number recognition in Writer;
RTF export (from GSoc);
Port of Lotus Word Pro filter;
New dialog box for title page.

Impress/Draw

PPTX chart import feature;
[http://qa.openoffice.org/issues/show_bug.cgi?id=112421] make “Presenter Screen” default to laptop, not projector;
Improve randomization in “Dissolve” transition.

Math

Default to just printing the formula itself in Math;
[http://qa.openoffice.org/issues/show_bug.cgi?id=113400] Maths brackets malformed in presentation mode.

Base

[http://qa.openoffice.org/issues/show_bug.cgi?id=112597] Added display properties to control shapes.

Development

UNO APIs for size and moveProtect of notes;
Via Issuezilla bug #i80184: allow addition of drawing documents to gallery via API.

Productivity Enhancements

New custom properties handling;
Embedding of standard PDF fonts;
New “Narrow” font family;
Increased document protection in Writer and Calc;
Automatic decimals digits for “General” format in Calc;
1 million rows in a spreadsheet;
New options for CSV (Comma-Separated Value) importation in Calc;
Insert drawing objects in charts;
Hierarchical axis labels for charts;
Improved slide layout handling in Impress;
Manual setting for primary key support for databases;
Support of Read-Only database registration;
New Math command: ‘nospace’.

Internationalization

Additional locale data.

Usability and Interface

Common search toolbar;
New easier-to-use print interface;
More options for changing case;
Redesign of thesaurus;
Resetting of text to the default language in Writer;
Text rendering of form controls in Writer;
Changed defaults for charts;
Colored sheet tabs in Calc;
Adaptation to marked selection for filter area in Calc;
Sort dialog box for DataPilot in Calc;
Display custom names for DataPilot fields, items and totals in Calc.

Developer Features and Extensibility

Grid control enhancements;
New MetaData node for database;
Extending database drivers using extensions.

Make Numbers Easier to Read in OpenOffice Calc (helpdeskgeek.com)
Libre Office, Using Java To A Lesser Extent (lockergnome.com)
OpenOffice vs. Office 2011: Rooting for the Underdog (appreaders.com)
LibreOffice RC 3 now available (omgubuntu.co.uk)
Libre Office Beta 3 released (omgubuntu.co.uk)
Rumblings From the LibreOffice Camp Signal Good Things Ahead (ostatic.com)
LibreOffice 3.3 RC2 released, available for download (omgubuntu.co.uk)
LibreOffice: Ready for Liftoff (zdnet.com)
LibreOffice – The Likely Future of OpenOffice (maketecheasier.com)
Replace OpenOffice.org with LibreOffice in Ubuntu [Linux Tip] (lifehacker.com)
LibreOffice Ubuntu PPA makes installation easy (omgubuntu.co.uk)

Challenges of Analyzing a dataset (with R)

GIF-animation showing a moving echocardiogram;... — Image via Wikipedia

Analyzing data can have many challenges associated with it. In the case of business analytics data, these challenges or constraints can have a marked effect on the quality and timeliness of the analysis as well as the expected versus actual payoff from the analytical results.

Challenges of Analytical Data Processing-

1) Data Formats- Reading in complete data, without losing any part (or meta data), or adding in superfluous details (that increase the scope). Technical constraints of data formats are relatively easy to navigate thanks to ODBC and well documented and easily search-able syntax and language.

The costs of additional data augmentation (should we pay for additional credit bureau data to be appended) , time of storing and processing the data (every column needed for analysis can add in as many rows as whole dataset, which can be a time enhancing problem if you are considering an extra 100 variables with a few million rows), but above all that of business relevance and quality guidelines will ensure basic data input and massaging are considerable parts of whole analytical project timeline.

2) Data Quality-Perfect data exists in a perfect world. The price of perfect information is one business will mostly never budget or wait for. To deliver inferences and results based on summaries of data which has missing, invalid, outlier data embedded within it makes the role of an analyst just as important as which ever tool is chosen to remove outliers, replace missing values, or treat invalid data.

3) Project Scope-

How much data? How much Analytical detail versus High Level Summary? Timelines for delivery as well as refresh of data analysis? Checks (statistical as well as business)?

How easy is it to load and implement the new analysis in existing Information Technology Infrastructure? These are some of the outer parameters that can limit both your analytical project scope, your analytical tool choice, and your processing methodology.
4) Output Results vis a vis stakeholder expectation management-

Stakeholders like to see results, not constraints, hypothesis ,assumptions , p-value, or chi -square value. Output results need to be streamlined to a decision management process to justify the investment of human time and effort in an analytical project, choice,training and navigating analytical tool complexities and constraints are subset of it. Optimum use of graphical display is a part of aligning results to a more palatable form to stakeholders, provided graphics are done nicely.

Eg Marketing wants to get more sales so they need a clear campaign, to target certain customers via specific channels with specified collateral. In order to base their business judgement, business analytics needs to validate , cross validate and sometimes invalidate this business decision making with clear transparent methods and processes.

Given a dataset- the basic analytical steps that an analyst will do with R are as follows. This is meant as a note for analysts at a beginner level with R.

Package -specific syntax

update.packages() #This updates all packages
install.packages(package1) #This installs a package locally, a one time event
library(package1) #This loads a specified package in the current R session, which needs to be done every R session

CRAN________LOCAL HARD DISK_________R SESSION is the top to bottom hierarchy of package storage and invocation.

ls() #This lists all objects or datasets currently active in the R session

> names(assetsCorr) #This gives the names of variables within a dataframe
[1] “AssetClass”            “LargeStocksUS”         “SmallStocksUS”
[4] “CorporateBondsUS”      “TreasuryBondsUS”       “RealEstateUS”
[7] “StocksCanada”          “StocksUK”              “StocksGermany”
[10] “StocksSwitzerland”     “StocksEmergingMarkets”

> str(assetsCorr) #gives complete structure of dataset
‘data.frame’:    12 obs. of 11 variables:
$ AssetClass           : Factor w/ 12 levels “CorporateBondsUS”,..: 4 5 2 6 1 12 3 7 11 9 …
$ LargeStocksUS        : num 15.3 16.4 1 0 0 …
$ SmallStocksUS        : num 13.49 16.64 0.66 1 0 …
$ CorporateBondsUS     : num 9.26 6.74 0.38 0.46 1 0 0 0 0 0 …
$ TreasuryBondsUS      : num 8.44 6.26 0.33 0.27 0.95 1 0 0 0 0 …
$ RealEstateUS         : num 10.6 17.32 0.08 0.59 0.35 …
$ StocksCanada         : num 10.25 19.78 0.56 0.53 -0.12 …
$ StocksUK             : num 10.66 13.63 0.81 0.41 0.24 …
$ StocksGermany        : num 12.1 20.32 0.76 0.39 0.15 …
$ StocksSwitzerland    : num 15.01 20.8 0.64 0.43 0.55 …
$ StocksEmergingMarkets: num 16.5 36.92 0.3 0.6 0.12 …

> dim(assetsCorr) #gives dimensions observations and variable number
[1] 12 11

str(Dataset) – This gives the structure of the dataset (note structure gives both the names of variables within dataset as well as dimensions of the dataset)

head(dataset,n1) gives the first n1 rows of dataset while
tail(dataset,n2) gives the last n2 rows of a dataset where n1,n2 are numbers and dataset is the name of the object (here a data frame that is being considered)

summary(dataset) gives you a brief summary of all variables while

library(Hmisc)
describe(dataset) gives a detailed description on the variables

simple graphics can be given by

hist(Dataset1)
and
plot(Dataset1)

As you can see in above cases, there are multiple ways to get even basic analysis about data in R- however most of the syntax commands are intutively understood (like hist for histogram, t.test for t test, plot for plot).

For detailed analysis throughout the scope of analysis, for a business analytics user it is recommended to using multiple GUI, and multiple packages. Even for highly specific and specialized analytical tasks it is recommended to check for a GUI that incorporates the required package.

The data analysis path is built on curiosity, followed by action (radar.oreilly.com)
Using Datasets in KRL (Flickr RSS) (code.kynetx.com)
R interface to Google Chart Tools (r-bloggers.com)
How To Get Experience Working With Large Datasets (highscalability.com)
A portal for European government data: PublicData.eu plans (onlinejournalismblog.com)
5 Datasets You Can Buy and Use for SEO (and a few for free!) (seomoz.org)
Integrated Longitudinal Database Available in Census Centers (kauffman.org)

Related Articles

Please share:

Related Articles

Please share:

Related Articles

Please share:

Conventional

Custom

Related Articles

Please share:

Related Articles

Please share:

What does LibreOffice give you?

General

Calc

Writer

Impress/Draw

Math

Base

Development

Productivity Enhancements

Internationalization

Usability and Interface

Developer Features and Extensibility

Related Articles

Please share:

Related Articles

Please share: