data analysis – DECISION STATS

Polyglots for Data Science #python #sas #r #stats #spss #matlab #julia #octave

In the future I think analysts need to be polyglots- you will need to know more than one language for crunching data.

SAS, Python, R, Julia,SPSS,Matlab- Pick Any Two 😉 or Any Three.

No, you can’t count C or Java as a statistical language 🙂 🙂

Efforts to promote Polyglots in Statistical Software are-

1) R for SAS and SPSS Users (free or book)

SPSS and R reference
SAS/IML and R reference

JMP and R reference http://www.jmp.com/support/help/Working_with_R.shtml

2) R for Stata Users (book)

3) SAS and R (blog and book)

4) Using Python and R together

Accessing R from Python (Rpy2) http://www.bytemining.com/wp-content/uploads/2010/10/rpy2.pdf
Big Data with R and Python (though these have been made separately)

Python for Data Analysis is a book . Python for Data Analysis by Wes McKinney

Probably we need a Python and R for Data Analysis book- just like we have for SAS and R books.

The RPy2 documentation is handy http://rpy.sourceforge.net/rpy2/doc-2.1/html/introduction.html
A nice tutorial is also here – also the inspiration to writing this post http://files.meetup.com/1225993/Laurent%20Gautier_R_toPython_bridge_to_R.pdf#!

5) Matlab and R

Reference (http://mathesaurus.sourceforge.net/matlab-python-xref.pdf ) includes Python

5) Octave and R

package http://cran.r-project.org/web/packages/RcppOctave/vignettes/RcppOctave.pdf includes Matlab

reference http://cran.r-project.org/doc/contrib/R-and-octave.txt

6) Julia and python

Julia and IPython https://github.com/JuliaLang/IJulia.jl

PyPlot uses the Julia PyCall package to call Python’s matplotlib directly from Julia

7) SPSS and Python is here

8) SPSS and R is as below

The Essentials for R for Statistics versions 22, 21, 20, and 19 are available here.
This link will take you to the SourceForge site where the Version 18 Essentials and Plugins are hosted.
- Plugins for Version 18 for R

9) Using R from Clojure – Incanter

Use embedded R from Clojure and Incanter http://github.com/jolby/rincanter

Dashboard Design: Google Activity

I quite like Google’s monthly email on account activity. It is the Google way to offer free services, as well as treat users as special, that continues to command loyalty despite occasional exasperation with corporate thingies.

See this dashboard-

Notice the use of Bigger Font for overall number of emails as well as smaller bar plots- I would say they are almost spark lines or spark bar plots if you excuse my Tufte.

The medium range font shows persons sent/from statistics, and the color shades are done to empahsize or de-emphasize the metric

Colors used are black/grey, green and blue coincident with the Corporate Logo.

However some of the JS for visualizations need to be tweaked. Clearly the hover script ( an integral part of Dashboard design ) needs better elucidiation or formatting)

I would also venture my neck and suggest that rather than just monthly snapshots, atleast some way of comparing snapshots across periods or even the total time period be enabled- rather than be in seperate views. This may give the user a bit more analytical value.

Overall, a nice and simple dashboard which may be of some use to the business user who makes or views a lot of reports on online properties. Minimal and effective- and in keeping with Open Data- Data Liberation Principles. I guess Google is secure in the knowledge that users do not view time spent on Google services as a total waste , unlike some of the other more social 😉 websites they spend time on.

Interview Pranay Agrawal Co-Founder Fractal Analytics

Here is an interview with Pranay Agrawal, Executive Vice President- Global Client Development, Fractal Analytics – one of India’s leading analytics services providers and one of the pioneers in analytics services delivery.

Ajay- Describe Fractal Analytics’ journey as a startup to a pioneer in the Predictive Analytics Services industry. What were some of the key turning points in the field of analytics that you have noticed during these times?

Pranay- In 2000, Fractal Analytics started as a pure-play analytics services company in India with a focus on financial services. Five years later, we spread our operation to the United States and opened new verticals. Today, we have the widest global footprint among analytics providers and have experience handling data and deep understanding of consumer behavior in over 150 counties. We have matured from an analytics service organization to a productized analytics services firm, specializing in consumer goods, retail, financial services, insurance and technology verticals.
We are on the fore-front of a massive inflection point with Big Data Analytics at the center. We have witnessed the transformation of analytics within our clients from a cost center to the most critical division that drives competitive advantage. Advances are quickly converging in computer science, artificial intelligence, machine learning and game theory, changing the way how analytics is consumed by B2B and B2C companies. Companies that use analytics well are poised to excel in innovation, customer engagement and business performance.

Ajay- What are analytical tools that you use at Fractal Analytics? Are there any trends in analytical software usage that you have observed?

Pranay- We are tools agnostic to serve our clients using whatever platforms they need to ensure they can quickly and effectively operationalize the results we deliver. We use R, SAS, SPSS, SpotFire, Tableau, Xcelsius, Webfocus, Microstrategy and Qlikview. We are seeing an increase in adoption of open source platform such as R, and specialize tools for dashboard like Tableau/Qlikview, plus an entire spectrum of emerging tools to process manage and extract information from Big Data that support Hadoop and NoSQL data structures

Ajay- What are Fractal Analytics plans for Big Data Analytics?

Pranay- We see our clients being overwhelmed by the increasing complexity of the data. While they are all excited by the possibilities of Big Data, on-the-ground struggle continues to realize its full potential. The analytics paradigm is changing in the context of Big Data. Our solutions focus on how to make it super-simple for our clients combined with analytics sophistication possible with Big Data.
Let’s take our Customer Genomics solution for retailers as an example. Retailers are collecting information about Shopper behaviors through every transaction. Retailers want to transform their business to make it more customer-centric but do not know how to go about it. Our Customer Genomics solution uses advanced machine learning algorithm to label every shopper across more than 80 different dimensions. Retailers use these to identify which products it should deep-discount depending on what price-sensitive shoppers buy. They are transforming the way they plan their assortment, planogram and targeted promotions armed with this intelligence.

We are also building harmonization engines using Concordia to enable real-time update of Customer Genomics based on every direct, social, or shopping transaction. This will further bridge the gap between marketing actions and consumer behavior to drive loyalty, market share and profitability.

Ajay- What are some of the key things that differentiate Fractal Analytics from the rest of the industry? How are you different?

Pranay- We are one of the pioneer pure-play analytics firm with over a decade of experience consulting with Fortune 500 companies. What clients most appreciate about working with us includes:

Experience managing structured and unstructured Big Data (volume, variety) with a deep understanding of consumer behavior in more than 150 counties
Advanced analytics leveraging supervised machine-learning platforms
Proprietary products for example: Concordia for data harmonization, Customer Genomics for consumer insights and personalized marketing, Pincer for pricing optimization, Eavesdrop for social media listening, Medley for assortment optimization in retail industry and Known Value Item for retail stores
Deep industry expertise enables us to leverage cross-industry knowledge to solve a wide range of marketing problems
Lowest attrition rates in the industry and very selective hiring process makes us a great place to work

Ajay- What are some of the initiatives that you have taken to ensure employee satisfaction and happiness?

Pranay- We believe happy employees create happy customers. We are building a great place to work by taking a personal interest in grooming people. Our people are highly engaged as evidenced by 33% new hire referrals and the highest Glassdoor ratings in our industry.
We recognize the accomplishments and contributions made through many programs such as:

FractElite – where peers nominate and defend the best of us
Recognition board – where anyone can write a visible thank you
Value cards – where anyone can acknowledge great role model behavior in one or more values
Townhall – a quarterly all hands where we announce anniversaries and FractElite awards, with an open forum to ask questions
Employee engagement surveys – to measure and report out on satisfaction programs
Open access to managers and leadership team – to ensure we understand and appreciate each person’s unique goals and ambitions, coach for high performance, and laud their success

Ajay- How happy are Fractal Analytics customers quantitatively? What is your retention rate- and what plans do you have for 2013?

Pranay- As consultants, delivering value with great service is critical to our growth, which has nearly doubled in the last year. Most of our clients have been with us for over five years and we are typically considered a strategic partner.
We conduct client satisfaction surveys during and after each project to measure our performance and identify opportunities to serve our clients better. In 2013, we will continue partnering with our clients to define additional process improvements from applying best practice in engagement management to building more advanced analytics and automated services to put high-impact decisions into our clients’ hands faster.

About–

Pranay Agrawal -Pranay co-founded Fractal Analytics in 2000 and heads client engagement worldwide. He has a MBA from India Institute of Management (IIM) Ahmedabad, Bachelors in Accounting from Bangalore University, and Certified Financial Risk Manager from GARP. He is is also available online on http://www.linkedin.com/in/pranayfractal

Fractal Analytics is a provider of predictive analytics and decision sciences to financial services, insurance, consumer goods, retail, technology, pharma and telecommunication industries. Fractal Analytics helps companies compete on analytics and in understanding, predicting and influencing consumer behavior. Over 20 fortune 500 financial services, consumer packaged goods, retail and insurance companies partner with Fractal to make better data driven decisions and institutionalize analytics inside their organizations.

Fractal sets up analytical centers of excellence for its clients to tackle tough big data challenges, improve decision management, help understand, predict & influence consumer behavior, increase marketing effectiveness, reduce risk and optimize business results.

Data Frame in Python

Exploring some Python Packages and R packages to move /work with both Python and R without melting your brain or exceeding your project deadline

—————————————

If you liked the data.frame structure in R, you have some way to work with them at a faster processing speed in Python.

Here are three packages that enable you to do so-

(1) pydataframe http://code.google.com/p/pydataframe/

An implemention of an almost R like DataFrame object. (install via Pypi/Pip: “pip install pydataframe”)

Usage:

        u = DataFrame( { "Field1": [1, 2, 3],
                        "Field2": ['abc', 'def', 'hgi']},
                        optional:
                         ['Field1', 'Field2']
                         ["rowOne", "rowTwo", "thirdRow"])

A DataFrame is basically a table with rows and columns.

Columns are named, rows are numbered (but can be named) and can be easily selected and calculated upon. Internally, columns are stored as 1d numpy arrays. If you set row names, they’re converted into a dictionary for fast access. There is a rich subselection/slicing API, see help(DataFrame.get_item) (it also works for setting values). Please note that any slice get’s you another DataFrame, to access individual entries use get_row(), get_column(), get_value().

DataFrames also understand basic arithmetic and you can either add (multiply,…) a constant value, or another DataFrame of the same size / with the same column names, like this:

#multiply every value in ColumnA that is smaller than 5 by 6.
my_df[my_df[:,'ColumnA'] < 5, 'ColumnA'] *= 6

#you always need to specify both row and column selectors, use : to mean everything
my_df[:, 'ColumnB'] = my_df[:,'ColumnA'] + my_df[:, 'ColumnC']

#let's take every row that starts with Shu in ColumnA and replace it with a new list (comprehension)
select = my_df.where(lambda row: row['ColumnA'].startswith('Shu'))
my_df[select, 'ColumnA'] = [row['ColumnA'].replace('Shu', 'Sha') for row in my_df[select,:].iter_rows()]

Dataframes talk directly to R via rpy2 (rpy2 is not a prerequiste for the library!)

(2) pandas http://pandas.pydata.org/

Library Highlights

A fast and efficient DataFrame object for data manipulation with integrated indexing;
Tools for reading and writing data between in-memory data structures and different formats: CSV and text files, Microsoft Excel, SQL databases, and the fast HDF5 format;
Intelligent data alignment and integrated handling of missing data: gain automatic label-based alignment in computations and easily manipulate messy data into an orderly form;
Flexible reshaping and pivoting of data sets;
Intelligent label-based slicing, fancy indexing, and subsetting of large data sets;
Columns can be inserted and deleted from data structures for size mutability;
Aggregating or transforming data with a powerful group by engine allowing split-apply-combine operations on data sets;
High performance merging and joining of data sets;
Hierarchical axis indexing provides an intuitive way of working with high-dimensional data in a lower-dimensional data structure;
Time series-functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging. Even create domain-specific time offsets and join time series without losing data;
The library has been ruthlessly optimized for performance, with critical code paths compiled to C;
Python with pandas is in use in a wide variety of academic and commercial domains, including Finance, Neuroscience, Economics, Statistics, Advertising, Web Analytics, and more.

Why not R?

First of all, we love open source R! It is the most widely-used open source environment for statistical modeling and graphics, and it provided some early inspiration for pandas features. R users will be pleased to find this library adopts some of the best concepts of R, like the foundational DataFrame (one user familiar with R has described pandas as “R data.frame on steroids”). But pandas also seeks to solve some frustrations common to R users:

R has barebones data alignment and indexing functionality, leaving much work to the user. pandas makes it easy and intuitive to work with messy, irregularly indexed data, like time series data. pandas also provides rich tools, like hierarchical indexing, not found in R;
R is not well-suited to general purpose programming and system development. pandas enables you to do large-scale data processing seamlessly when developing your production applications;
Hybrid systems connecting R to a low-productivity systems language like Java, C++, or C# suffer from significantly reduced agility and maintainability, and you’re still stuck developing the system components in a low-productivity language;
The “copyleft” GPL license of R can create concerns for commercial software vendors who want to distribute R with their software under another license. Python and pandas use more permissive licenses.

(3) datamatrix http://pypi.python.org/pypi/datamatrix/0.8

datamatrix 0.8

A Pythonic implementation of R’s data.frame structure.

Latest Version: 0.9

This module allows access to comma- or other delimiter separated files as if they were tables, using a dictionary-like syntax. DataMatrix objects can be manipulated, rows and columns added and removed, or even transposed

—————————————————————–

Modeling in Python

Continue reading “Data Frame in Python”

Interview Alain Chesnais Chief Scientist Trendspottr.com

Here is a brief interview with Alain Chesnais ,Chief Scientist Trendspottr.com. It is a big honor to interview such a legend in computer science, and I am grateful to both him and Mark Zohar for taking time to write these down.

Ajay- Describe your career from your student days to being the President of ACM (Association of Computing Machinery http://www.acm.org/ ). How can we increase the interest of students in STEM education, particularly in view of the shortage of data scientists.

Alain- I’m trying to sum up a career of over 35 years. This may be a bit long winded…

I started my career in CS when I was in high school in the early 70’s. I was accepted in the National Science Foundation’s Science Honors Program in 9th grade and the first course I took was a Fortran programming course at Columbia University. This was on an IBM 360 using punch cards.

The next year my high school got a donation from DEC of a PDP-8E mini computer. I ended up spending a lot of time in the machine room all through high school at a time when access to computers wasn’t all that common. I went to college in Paris and ended up at l’Ecole Normale Supérieure de Cachan in the newly created Computer Science department.

My first job after finishing my graduate studies was as a research assistant at the Centre National de la Recherche Scientifique where I focused my efforts on modelling the behaviour of distributed database systems in the presence of locking. When François Mitterand was elected president of France in 1981, he invited Nicholas Negroponte and Seymour Papert to come to France to set up the Centre Mondial Informatique. I was hired as a researcher there and continued on to become director of software development until it was closed down in 1986. I then started up my own company focusing on distributed computer graphics. We sold the company to Abvent in the early 90’s.

After that, I was hired by Thomson Digital Image to lead their rendering team. We were acquired by Wavefront Technologies in 1993 then by SGI in 1995 and merged with Alias Research. In the merged company: Alias|wavefront, I was director of engineering on the Maya project. Our team received an Oscar in 2003 for the creation of the Maya software system.

Since then I’ve worked at various companies, most recently focusing on social media and Big Data issues associated with it. Mark Zohar and I worked together at SceneCaster in 2007 where we developed a Facebook app that allowed users to create their own 3D scenes and share them with friends via Facebook without requiring a proprietary plugin. In December 2007 it was the most popular app in its category on Facebook.

Recently Mark approached me with a concept related to mining the content of public tweets to determine what was trending in real time. Using math similar to what I had developed during my graduate studies to model the performance of distributed databases in the presence of locking, we built up a real time analytics engine that ranks the content of tweets as they stream in. The math is designed to scale linearly in complexity with the volume of data that we analyze. That is the basis for what we have created for TrendSpottr.

In parallel to my professional career, I have been a very active volunteer at ACM. I started out as a member of the Paris ACM SIGGRAPH chapter in 1985 and volunteered to help do our mailings (snail mail at the time). After taking on more responsibilities with the chapter, I was elected chair of the chapter in 1991. I was first appointed to the SIGGRAPH Local Groups Steering Committee, then became ACM Director for Chapters. Later I was successively elected SIGGRAPH Vice Chair, ACM SIG Governing Board (SGB) Vice Chair for Operations, SGB Chair, ACM SIGGRAPH President, ACM Secretary/Treasurer, ACM Vice President, and finally, in 2010, I was elected ACM President. My term as ACM President has just ended on July 1st. Vint Cerf is our new President. I continue to serve on the ACM Executive Committee in my role as immediate Past President.

(Note- About ACM
ACM, the Association for Computing Machinery www.acm.org, is the world’s largest educational and scientific computing society, uniting computing educators, researchers and professionals to inspire dialogue, share resources and address the field’s challenges. )

Ajay- What sets Trendspotter apart from other startups out there in terms of vision in trying to achieve a more coherent experience on the web.

Alain- The Basic difference with other approaches that we are aware of is that we have developed an incremental solution that calculates the results on the fly as the data streams in. Our evaluators are based on solid mathematical foundations that have proven their usefulness over time. One way to describe what we do is to think of it as signal processing where the tweets are the signal and our evaluators are like triggers that tell you what elements of the signal have the characteristics that we are filtering for (velocity and acceleration). One key result of using this approach is that our unit cost per tweet analyzed does not go up with increased volume. Using more traditional data analysis approaches involving an implicit sort would imply a complexity of N*log(N), where N is the volume of tweets being analyzed. That would imply that the cost per tweet analyzed would go up with the volume of tweets. Our approach was designed to avoid that, so that we can maintain a cap on our unit costs of analysis, no matter what volume of data we analyze.

Ajay- What do you think is the future of big data visualization going to look like? What are some of the technologies that you are currently bullish on?

Alain- I see several trends that would have deep impact on Big Data visualization. I firmly believe that with large amounts of data, visualization is key tool for understanding both the structure and the relationships that exist between data elements. Let’s focus on some of the key things that are pushing in this direction:

the volume of data that is available is growing at a rate we have never seen before. Cisco has measured an 8 fold increase in the volume of IP traffic over the last 5 years and predicts that we will reach the zettabyte of data over IP in 2016
more of the data is becoming publicly available. This isn’t only on social networks such as Facebook and twitter, but joins a more general trend involving open research initiatives and open government programs
the desired time to get meaningful results is going down dramatically. In the past 5 years we have seen the half life of data on Facebook, defined as the amount of time that half of the public reactions to any given post (likes, shares., comments) take place, go from about 12 hours to under 3 hours currently
our access to the net is always on via mobile device. You are always connected.
the CPU and GPU capabilities of mobile devices is huge (an iPhone has 10 times the compute power of a Cray-1 and more graphics capabilities than early SGI workstations)

Put all of these observations together and you quickly come up with a massive opportunity to analyze data visually on the go as it happens no matter where you are. We can’t afford to have to wait for results. When something of interest occurs we need to be aware of it immediately.

Ajay- What are some of the applications we could use Trendspottr. Could we predict events like Arab Spring, or even the next viral thing.

Alain- TrendSpottr won’t predict what will happen next. What it *will* do is alert you immediately when it happens. You can think of it like a smoke detector. It doesn’t tell that a fire will take place, but it will save your life when a fire does break out.

Typical uses for TrendSpottr are

thought leadership by tracking content that your readership is interested in via TrendSpottr you can be seen as a thought leader on the subject by being one of the first to share trending content on a given subject. I personally do this on my Facebook page (http://www.facebook.com/alain.chesnais) and have seen my klout score go up dramatically as a result
brand marketing to be able to know when something is trending about your brand and take advantage of it as it happens.
competitive analysis to see what is being said about two competing elements. For instance, searching TrendSpottr for “Obama OR Romney” gives you a very good understanding about how social networks are reacting to each politician. You can also do searches like “$aapl OR $msft OR $goog” to get a sense of what is the current buzz for certain hi tech stocks.
understanding your impact in real time to be able to see which of the content that you are posting is trending the most on social media so that you can highlight it on your main page. So if all of your content is hosted on common domain name (ourbrand.com), searching for ourbrand.com will show you the most active of your site’s content. That can easily be set up by putting a TrendSpottr widget on your front page

Ajay- What are some of the privacy guidelines that you keep in mind- given the fact that you collect individual information but also have government agencies as potential users.

Alain- We take privacy very seriously and anonymize all of the data that we collect. We don’t keep explicit records of the data we collected through the various incoming streams and only store the aggregate results of our analysis.

About–

Alain Chesnais is immediate Past President of ACM, elected for the two-year term beginning July 1, 2010.Chesnais studied at l’Ecole Normale Supérieure de l’Enseignement Technique and l’Université de Paris where he earned a Maîtrise de Mathematiques, a Maitrise de Structure Mathématique de l’Informatique, and a Diplôme d’Etudes Approfondies in Compuer Science. He was a high school student at the United Nations International School in New York, where, along with preparing his International Baccalaureate with a focus on Math, Physics and Chemistry, he also studied Mandarin Chinese.Chesnais recently founded Visual Transitions, which specializes in helping companies move to HTML 5, the newest standard for structuring and presenting content on the World Wide Web. He was the CTO of SceneCaster.com from June 2007 until April 2010, and was Vice President of Product Development at Tucows Inc. from July 2005 – May 2007. He also served as director of engineering at Alias|Wavefront on the team that received an Oscar from the Academy of Motion Picture Arts and Sciences for developing the Maya 3D software package.

Prior to his election as ACM president, Chesnais was vice president from July 2008 – June 2010 as well as secretary/treasurer from July 2006 – June 2008. He also served as president of ACM SIGGRAPH from July 2002 – June 2005 and as SIG Governing Board Chair from July 2000 – June 2002.

As a French citizen now residing in Canada, he has more than 20 years of management experience in the software industry. He joined the local SIGGRAPH Chapter in Paris some 20 years ago as a volunteer and has continued his involvement with ACM in a variety of leadership capacities since then.

About Trendspottr.com

http://trendspottr.com/aboutus.php

TrendSpottr is a real-time viral search and predictive analytics service that identifies the most timely and trending information for any topic or keyword. Our core technology analyzes real-time data streams and spots emerging trends at their earliest acceleration point — hours or days before they have become “popular” and reached mainstream awareness.

TrendSpottr serves as a predictive early warning system for news and media organizations, brands, government agencies and Fortune 500 companies and helps them to identify emerging news, events and issues that have high viral potential and market impact. TrendSpottr has partnered with HootSuite, DataSift and other leading social and big data companies.

JSS launches special edition for GUI for #Rstats

I love GUIs (graphical user interfaces)- they might be TCL/TK based or GTK based or even QT based. As a researcher they help me with faster coding, as a consultant they help with faster transition of projects from startup to handover stage and as an R instructor helps me get people to learn R faster.

I wish Python had some GUIs though 😉

from the open access journal of statistical software-

JSS Special Volume 49: Graphical User Interfaces for R

Graphical User Interfaces for R
Pedro M. Valero-Mora, Ruben Ledesma
Vol. 49, Issue 1, Jun 2012
Submitted 2012-06-03, Accepted 2012-06-03

Url: http://www.jstatsoft.org/v49/i01

Paper: http://www.jstatsoft.org/v49/i01/paper

Integrated Degradation Models in R Using iDEMO
Ya-Shan Cheng, Chien-Yu Peng
Vol. 49, Issue 2, Jun 2012
Submitted 2010-12-31, Accepted 2011-06-29

Url: http://www.jstatsoft.org/v49/i02

Paper: http://www.jstatsoft.org/v49/i02/paper

Glotaran: A Java-Based Graphical User Interface for the R Package TIMP
Joris J. Snellenburg, Sergey Laptenok, Ralf Seger, Katharine M. Mullen, Ivo H. M. van Stokkum
Vol. 49, Issue 3, Jun 2012
Submitted 2011-01-20, Accepted 2011-09-16

Url: http://www.jstatsoft.org/v49/i03

Paper: http://www.jstatsoft.org/v49/i03/paper

A Graphical User Interface for R in a Rich Client Platform for Ecological Modeling
Marcel Austenfeld, Wolfram Beyschlag
Vol. 49, Issue 4, Jun 2012
Submitted 2011-01-05, Accepted 2012-02-20

Url: http://www.jstatsoft.org/v49/i04

Paper: http://www.jstatsoft.org/v49/i04/paper

Closing the Gap between Methodologists and End-Users: R as a Computational Back-End
Byron C. Wallace, Issa J. Dahabreh, Thomas A. Trikalinos, Joseph Lau, Paul Trow, Christopher H. Schmid
Vol. 49, Issue 5, Jun 2012
Submitted 2010-11-01, Accepted 2012-12-20

Url: http://www.jstatsoft.org/v49/i05

Paper: http://www.jstatsoft.org/v49/i05/paper

tourrGui: A gWidgets GUI for the Tour to Explore High-Dimensional Data Using Low-Dimensional Projections
Bei Huang, Dianne Cook, Hadley Wickham
Vol. 49, Issue 6, Jun 2012
Submitted 2011-01-20, Accepted 2012-04-16

Url: http://www.jstatsoft.org/v49/i06

Paper: http://www.jstatsoft.org/v49/i06/paper

The RcmdrPlugin.survival Package: Extending the R Commander Interface to Survival Analysis
John Fox, Marilia S. Carvalho
Vol. 49, Issue 7, Jun 2012
Submitted 2010-12-26, Accepted 2011-12-28

Url: http://www.jstatsoft.org/v49/i07

Paper: http://www.jstatsoft.org/v49/i07/paper

Deducer: A Data Analysis GUI for R
Ian Fellows
Vol. 49, Issue 8, Jun 2012
Submitted 2011-02-28, Accepted 2011-09-08

Url: http://www.jstatsoft.org/v49/i08

Paper: http://www.jstatsoft.org/v49/i08/paper

RKWard: A Comprehensive Graphical User Interface and Integrated Development Environment for Statistical Analysis with R
Stefan Rödiger, Thomas Friedrichsmeier, Prasenjit Kapat, Meik Michalke
Vol. 49, Issue 9, Jun 2012
Submitted 2010-12-28, Accepted 2011-05-06

Url: http://www.jstatsoft.org/v49/i09

Paper: http://www.jstatsoft.org/v49/i09/paper

gWidgetsWWW: Creating Interactive Web Pages within R
John Verzani
Vol. 49, Issue 10, Jun 2012
Submitted 2010-12-17, Accepted 2011-05-11

Url: http://www.jstatsoft.org/v49/i10

Paper: http://www.jstatsoft.org/v49/i10/paper

Oscars and Interfaces
Antony Unwin
Vol. 49, Issue 11, Jun 2012
Submitted 2010-12-08, Accepted 2011-07-15

Url: http://www.jstatsoft.org/v49/i11

Paper: http://www.jstatsoft.org/v49/i11/paper

Google Cloud is finally here

Amazon gets some competition, and customers should see some relief, unless Google withdraws commitment on these products after a few years of trying (like it often does now!)

http://cloud.google.com/products/index.html

Machine Type Pricing
Configuration	Virtual Cores	Memory	GCEU *	Local disk	Price/Hour	$/GCEU/hour
n1-standard-1-d	1	3.75GB ***	2.75	420GB ***	$0.145	0.053
n1-standard-2-d	2	7.5GB	5.5	870GB	$0.29	0.053
n1-standard-4-d	4	15GB	11	1770GB	$0.58	0.053
n1-standard-8-d	8	30GB	22	2 x 1770GB	$1.16	0.053

Network Pricing
Ingress	Free
Egress to the same Zone.	Free
Egress to a different Cloud service within the same Region.	Free
Egress to a different Zone in the same Region (per GB)	$0.01
Egress to a different Region within the US	$0.01 ****
Inter-continental Egress	At Internet Egress Rate

Internet Egress (Americas/EMEA destination) per GB
0-1 TB in a month	$0.12
1-10 TB	$0.11
10+ TB	$0.08

Internet Egress (APAC destination) per GB
0-1 TB in a month	$0.21
1-10 TB	$0.18
10+ TB	$0.15

Persistent Disk Pricing
Provisioned space	$0.10 GB/month
Snapshot storage**	$0.125 GB/month
IO Operations	$0.10 per million

IP Address Pricing
Static IP address (assigned but unused)	$0.01 per hour
Ephemeral IP address (attached to instance)	Free

* GCEU is Google Compute Engine Unit — a measure of computational power of our instances based on industry benchmarks; review the GCEU definition for more information
** coming soon
*** 1GB is defined as 2^30 bytes
**** promotional pricing; eventually will be charged at internet download rates

Google Prediction API

Tap into Google’s machine learning algorithms to analyze data and predict future outcomes.

Leverage machine learning without the complexity
Use the familiar RESTful interface
Enter input in any format – numeric or text

Build smart apps

Learn how you can use Prediction API to build customer sentiment analysis, spam detection or document and email classification.

Google Translation API

Use Google Translate API to build multilingual apps and programmatically translate text in your webpage or application.

Translate text into other languages programmatically
Use the familiar RESTful interface
Take advantage of Google’s powerful translation algorithms

Build multilingual apps

Learn how you can use Translate API to build apps that can programmatically translate text in your applications or websites.

Google BigQuery

Analyze Big Data in the cloud using SQL and get real-time business insights in seconds using Google BigQuery. Use a fully-managed data analysis service with no servers to install or maintain.
Figure

Reliable & Secure

Complete peace of mind as your data is automatically replicated across multiple sites and secured using access control lists.
Scale infinitely

You can store up to hundreds of terabytes, paying only for what you use.
Blazing fast

Run ad hoc SQL queries on
multi-terabyte datasets in seconds.

Google App Engine

Create apps on Google’s platform that are easy to manage and scale. Benefit from the same systems and infrastructure that power Google’s applications.

Focus on your apps

Let us worry about the underlying infrastructure and systems.
Scale infinitely

See your applications scale seamlessly from hundreds to millions of users.
Business ready

Premium paid support and 99.95% SLA for business users

Please share:

Please share:

Please share:

Library Highlights

Why not R?

Please share:

Please share:

Please share:

Please share: