Home » Posts tagged 'data analysis'
Tag Archives: data analysis
Interview Pranay Agrawal Co-Founder Fractal Analytics
Here is an interview with Pranay Agrawal, Executive Vice President- Global Client Development, Fractal Analytics – one of India’s leading analytics services providers and one of the pioneers in analytics services delivery.
Ajay- Describe Fractal Analytics’ journey as a startup to a pioneer in the Predictive Analytics Services industry. What were some of the key turning points in the field of analytics that you have noticed during these times?
Pranay- In 2000, Fractal Analytics started as a pure-play analytics services company in India with a focus on financial services. Five years later, we spread our operation to the United States and opened new verticals. Today, we have the widest global footprint among analytics providers and have experience handling data and deep understanding of consumer behavior in over 150 counties. We have matured from an analytics service organization to a productized analytics services firm, specializing in consumer goods, retail, financial services, insurance and technology verticals.
We are on the fore-front of a massive inflection point with Big Data Analytics at the center. We have witnessed the transformation of analytics within our clients from a cost center to the most critical division that drives competitive advantage. Advances are quickly converging in computer science, artificial intelligence, machine learning and game theory, changing the way how analytics is consumed by B2B and B2C companies. Companies that use analytics well are poised to excel in innovation, customer engagement and business performance.
Ajay- What are analytical tools that you use at Fractal Analytics? Are there any trends in analytical software usage that you have observed?
Pranay- We are tools agnostic to serve our clients using whatever platforms they need to ensure they can quickly and effectively operationalize the results we deliver. We use R, SAS, SPSS, SpotFire, Tableau, Xcelsius, Webfocus, Microstrategy and Qlikview. We are seeing an increase in adoption of open source platform such as R, and specialize tools for dashboard like Tableau/Qlikview, plus an entire spectrum of emerging tools to process manage and extract information from Big Data that support Hadoop and NoSQL data structures
Ajay- What are Fractal Analytics plans for Big Data Analytics?
Pranay- We see our clients being overwhelmed by the increasing complexity of the data. While they are all excited by the possibilities of Big Data, on-the-ground struggle continues to realize its full potential. The analytics paradigm is changing in the context of Big Data. Our solutions focus on how to make it super-simple for our clients combined with analytics sophistication possible with Big Data.
Let’s take our Customer Genomics solution for retailers as an example. Retailers are collecting information about Shopper behaviors through every transaction. Retailers want to transform their business to make it more customer-centric but do not know how to go about it. Our Customer Genomics solution uses advanced machine learning algorithm to label every shopper across more than 80 different dimensions. Retailers use these to identify which products it should deep-discount depending on what price-sensitive shoppers buy. They are transforming the way they plan their assortment, planogram and targeted promotions armed with this intelligence.
We are also building harmonization engines using Concordia to enable real-time update of Customer Genomics based on every direct, social, or shopping transaction. This will further bridge the gap between marketing actions and consumer behavior to drive loyalty, market share and profitability.
Ajay- What are some of the key things that differentiate Fractal Analytics from the rest of the industry? How are you different?
Pranay- We are one of the pioneer pure-play analytics firm with over a decade of experience consulting with Fortune 500 companies. What clients most appreciate about working with us includes:
- Experience managing structured and unstructured Big Data (volume, variety) with a deep understanding of consumer behavior in more than 150 counties
- Advanced analytics leveraging supervised machine-learning platforms
- Proprietary products for example: Concordia for data harmonization, Customer Genomics for consumer insights and personalized marketing, Pincer for pricing optimization, Eavesdrop for social media listening, Medley for assortment optimization in retail industry and Known Value Item for retail stores
- Deep industry expertise enables us to leverage cross-industry knowledge to solve a wide range of marketing problems
- Lowest attrition rates in the industry and very selective hiring process makes us a great place to work
Ajay- What are some of the initiatives that you have taken to ensure employee satisfaction and happiness?
Pranay- We believe happy employees create happy customers. We are building a great place to work by taking a personal interest in grooming people. Our people are highly engaged as evidenced by 33% new hire referrals and the highest Glassdoor ratings in our industry.
We recognize the accomplishments and contributions made through many programs such as:
- FractElite – where peers nominate and defend the best of us
- Recognition board – where anyone can write a visible thank you
- Value cards – where anyone can acknowledge great role model behavior in one or more values
- Townhall – a quarterly all hands where we announce anniversaries and FractElite awards, with an open forum to ask questions
- Employee engagement surveys – to measure and report out on satisfaction programs
- Open access to managers and leadership team – to ensure we understand and appreciate each person’s unique goals and ambitions, coach for high performance, and laud their success
Ajay- How happy are Fractal Analytics customers quantitatively? What is your retention rate- and what plans do you have for 2013?
Pranay- As consultants, delivering value with great service is critical to our growth, which has nearly doubled in the last year. Most of our clients have been with us for over five years and we are typically considered a strategic partner.
We conduct client satisfaction surveys during and after each project to measure our performance and identify opportunities to serve our clients better. In 2013, we will continue partnering with our clients to define additional process improvements from applying best practice in engagement management to building more advanced analytics and automated services to put high-impact decisions into our clients’ hands faster.
About-
Pranay Agrawal -Pranay co-founded Fractal Analytics in 2000 and heads client engagement worldwide. He has a MBA from India Institute of Management (IIM) Ahmedabad, Bachelors in Accounting from Bangalore University, and Certified Financial Risk Manager from GARP. He is is also available online on http://www.linkedin.com/in/pranayfractal
Fractal Analytics is a provider of predictive analytics and decision sciences to financial services, insurance, consumer goods, retail, technology, pharma and telecommunication industries. Fractal Analytics helps companies compete on analytics and in understanding, predicting and influencing consumer behavior. Over 20 fortune 500 financial services, consumer packaged goods, retail and insurance companies partner with Fractal to make better data driven decisions and institutionalize analytics inside their organizations.
Fractal sets up analytical centers of excellence for its clients to tackle tough big data challenges, improve decision management, help understand, predict & influence consumer behavior, increase marketing effectiveness, reduce risk and optimize business results.
Data Frame in Python
Exploring some Python Packages and R packages to move /work with both Python and R without melting your brain or exceeding your project deadline
—————————————
If you liked the data.frame structure in R, you have some way to work with them at a faster processing speed in Python.
Here are three packages that enable you to do so-
(1) pydataframe http://code.google.com/p/pydataframe/
An implemention of an almost R like DataFrame object. (install via Pypi/Pip: “pip install pydataframe”)
Usage:
u = DataFrame( { "Field1": [1, 2, 3], "Field2": ['abc', 'def', 'hgi']}, optional: ['Field1', 'Field2'] ["rowOne", "rowTwo", "thirdRow"])
A DataFrame is basically a table with rows and columns.
Columns are named, rows are numbered (but can be named) and can be easily selected and calculated upon. Internally, columns are stored as 1d numpy arrays. If you set row names, they’re converted into a dictionary for fast access. There is a rich subselection/slicing API, see help(DataFrame.get_item) (it also works for setting values). Please note that any slice get’s you another DataFrame, to access individual entries use get_row(), get_column(), get_value().
DataFrames also understand basic arithmetic and you can either add (multiply,…) a constant value, or another DataFrame of the same size / with the same column names, like this:
#multiply every value in ColumnA that is smaller than 5 by 6.
my_df[my_df[:,'ColumnA'] < 5, 'ColumnA'] *= 6
#you always need to specify both row and column selectors, use : to mean everything
my_df[:, 'ColumnB'] = my_df[:,'ColumnA'] + my_df[:, 'ColumnC']
#let's take every row that starts with Shu in ColumnA and replace it with a new list (comprehension)
select = my_df.where(lambda row: row['ColumnA'].startswith('Shu'))
my_df[select, 'ColumnA'] = [row['ColumnA'].replace('Shu', 'Sha') for row in my_df[select,:].iter_rows()]
Dataframes talk directly to R via rpy2 (rpy2 is not a prerequiste for the library!)
(2) pandas http://pandas.pydata.org/
Library Highlights
- A fast and efficient DataFrame object for data manipulation with integrated indexing;
- Tools for reading and writing data between in-memory data structures and different formats: CSV and text files, Microsoft Excel, SQL databases, and the fast HDF5 format;
- Intelligent data alignment and integrated handling of missing data: gain automatic label-based alignment in computations and easily manipulate messy data into an orderly form;
- Flexible reshaping and pivoting of data sets;
- Intelligent label-based slicing, fancy indexing, and subsetting of large data sets;
- Columns can be inserted and deleted from data structures for size mutability;
- Aggregating or transforming data with a powerful group by engine allowing split-apply-combine operations on data sets;
- High performance merging and joining of data sets;
- Hierarchical axis indexing provides an intuitive way of working with high-dimensional data in a lower-dimensional data structure;
- Time series-functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging. Even create domain-specific time offsets and join time series without losing data;
- The library has been ruthlessly optimized for performance, with critical code paths compiled to C;
- Python with pandas is in use in a wide variety of academic and commercial domains, including Finance, Neuroscience, Economics, Statistics, Advertising, Web Analytics, and more.
Why not R?
First of all, we love open source R! It is the most widely-used open source environment for statistical modeling and graphics, and it provided some early inspiration for pandas features. R users will be pleased to find this library adopts some of the best concepts of R, like the foundational DataFrame (one user familiar with R has described pandas as “R data.frame on steroids”). But pandas also seeks to solve some frustrations common to R users:
- R has barebones data alignment and indexing functionality, leaving much work to the user. pandas makes it easy and intuitive to work with messy, irregularly indexed data, like time series data. pandas also provides rich tools, like hierarchical indexing, not found in R;
- R is not well-suited to general purpose programming and system development. pandas enables you to do large-scale data processing seamlessly when developing your production applications;
- Hybrid systems connecting R to a low-productivity systems language like Java, C++, or C# suffer from significantly reduced agility and maintainability, and you’re still stuck developing the system components in a low-productivity language;
- The “copyleft” GPL license of R can create concerns for commercial software vendors who want to distribute R with their software under another license. Python and pandas use more permissive licenses.
(3) datamatrix http://pypi.python.org/pypi/datamatrix/0.8
datamatrix 0.8
A Pythonic implementation of R’s data.frame structure.
Latest Version: 0.9
This module allows access to comma- or other delimiter separated files as if they were tables, using a dictionary-like syntax. DataMatrix objects can be manipulated, rows and columns added and removed, or even transposed
—————————————————————–
Modeling in Python
JSS launches special edition for GUI for #Rstats
I love GUIs (graphical user interfaces)- they might be TCL/TK based or GTK based or even QT based. As a researcher they help me with faster coding, as a consultant they help with faster transition of projects from startup to handover stage and as an R instructor helps me get people to learn R faster.
I wish Python had some GUIs though
from the open access journal of statistical software-
JSS Special Volume 49: Graphical User Interfaces for R
Pedro M. Valero-Mora, Ruben Ledesma
Vol. 49, Issue 1, Jun 2012
Submitted 2012-06-03, Accepted 2012-06-03
Ya-Shan Cheng, Chien-Yu Peng
Vol. 49, Issue 2, Jun 2012
Submitted 2010-12-31, Accepted 2011-06-29
Joris J. Snellenburg, Sergey Laptenok, Ralf Seger, Katharine M. Mullen, Ivo H. M. van Stokkum
Vol. 49, Issue 3, Jun 2012
Submitted 2011-01-20, Accepted 2011-09-16
Marcel Austenfeld, Wolfram Beyschlag
Vol. 49, Issue 4, Jun 2012
Submitted 2011-01-05, Accepted 2012-02-20
Byron C. Wallace, Issa J. Dahabreh, Thomas A. Trikalinos, Joseph Lau, Paul Trow, Christopher H. Schmid
Vol. 49, Issue 5, Jun 2012
Submitted 2010-11-01, Accepted 2012-12-20
Bei Huang, Dianne Cook, Hadley Wickham
Vol. 49, Issue 6, Jun 2012
Submitted 2011-01-20, Accepted 2012-04-16
John Fox, Marilia S. Carvalho
Vol. 49, Issue 7, Jun 2012
Submitted 2010-12-26, Accepted 2011-12-28
Ian Fellows
Vol. 49, Issue 8, Jun 2012
Submitted 2011-02-28, Accepted 2011-09-08
Stefan Rödiger, Thomas Friedrichsmeier, Prasenjit Kapat, Meik Michalke
Vol. 49, Issue 9, Jun 2012
Submitted 2010-12-28, Accepted 2011-05-06
John Verzani
Vol. 49, Issue 10, Jun 2012
Submitted 2010-12-17, Accepted 2011-05-11
Antony Unwin
Vol. 49, Issue 11, Jun 2012
Submitted 2010-12-08, Accepted 2011-07-15
Google Cloud is finally here
Amazon gets some competition, and customers should see some relief, unless Google withdraws commitment on these products after a few years of trying (like it often does now!)
http://cloud.google.com/products/index.html
| Machine Type Pricing | ||||||
|---|---|---|---|---|---|---|
| Configuration | Virtual Cores | Memory | GCEU * | Local disk | Price/Hour | $/GCEU/hour |
| n1-standard-1-d | 1 | 3.75GB *** | 2.75 | 420GB *** | $0.145 | 0.053 |
| n1-standard-2-d | 2 | 7.5GB | 5.5 | 870GB | $0.29 | 0.053 |
| n1-standard-4-d | 4 | 15GB | 11 | 1770GB | $0.58 | 0.053 |
| n1-standard-8-d | 8 | 30GB | 22 | 2 x 1770GB | $1.16 | 0.053 |
| Network Pricing | |
|---|---|
| Ingress | Free |
| Egress to the same Zone. | Free |
| Egress to a different Cloud service within the same Region. | Free |
| Egress to a different Zone in the same Region (per GB) | $0.01 |
| Egress to a different Region within the US | $0.01 **** |
| Inter-continental Egress | At Internet Egress Rate |
| Internet Egress (Americas/EMEA destination) per GB | |
| 0-1 TB in a month | $0.12 |
| 1-10 TB | $0.11 |
| 10+ TB | $0.08 |
| Internet Egress (APAC destination) per GB | |
| 0-1 TB in a month | $0.21 |
| 1-10 TB | $0.18 |
| 10+ TB | $0.15 |
| Persistent Disk Pricing | |
|---|---|
| Provisioned space | $0.10 GB/month |
| Snapshot storage** | $0.125 GB/month |
| IO Operations | $0.10 per million |
| IP Address Pricing | |
|---|---|
| Static IP address (assigned but unused) | $0.01 per hour |
| Ephemeral IP address (attached to instance) | Free |
** coming soon
*** 1GB is defined as 2^30 bytes
**** promotional pricing; eventually will be charged at internet download rates
Google Prediction API
Tap into Google’s machine learning algorithms to analyze data and predict future outcomes.
Leverage machine learning without the complexity
Use the familiar RESTful interface
Enter input in any format – numeric or text
Build smart apps
Learn how you can use Prediction API to build customer sentiment analysis, spam detection or document and email classification.
Google Translation API
Use Google Translate API to build multilingual apps and programmatically translate text in your webpage or application.
Translate text into other languages programmatically
Use the familiar RESTful interface
Take advantage of Google’s powerful translation algorithms
Build multilingual apps
Learn how you can use Translate API to build apps that can programmatically translate text in your applications or websites.
Google BigQuery
Analyze Big Data in the cloud using SQL and get real-time business insights in seconds using Google BigQuery. Use a fully-managed data analysis service with no servers to install or maintain.
Figure
Reliable & Secure
Complete peace of mind as your data is automatically replicated across multiple sites and secured using access control lists.
Scale infinitely
You can store up to hundreds of terabytes, paying only for what you use.
Blazing fast
Run ad hoc SQL queries on
multi-terabyte datasets in seconds.
Google App Engine
Create apps on Google’s platform that are easy to manage and scale. Benefit from the same systems and infrastructure that power Google’s applications.
Focus on your apps
Let us worry about the underlying infrastructure and systems.
Scale infinitely
See your applications scale seamlessly from hundreds to millions of users.
Business ready
Premium paid support and 99.95% SLA for business users
R for Predictive Modeling- PAW Toronto
A nice workshop on using R for Predictive Modeling by Max Kuhn Director, Nonclinical Statistics, Pfizer is on at PAW Toronto.
Workshop
Monday, April 23, 2012 in Toronto
Full-day: 9:00am – 4:30pm
R for Predictive Modeling:
A Hands-On Introduction
Intended Audience: Practitioners who wish to learn how to execute on predictive analytics by way of the R language; anyone who wants “to turn ideas into software, quickly and faithfully.”
Knowledge Level: Either hands-on experience with predictive modeling (without R) or hands-on familiarity with any programming language (other than R) is sufficient background and preparation to participate in this workshop.

Workshop Description
This one-day session provides a hands-on introduction to R, the well-known open-source platform for data analysis. Real examples are employed in order to methodically expose attendees to best practices driving R and its rich set of predictive modeling packages, providing hands-on experience and know-how. R is compared to other data analysis platforms, and common pitfalls in using R are addressed.
The instructor, a leading R developer and the creator of CARET, a core R package that streamlines the process for creating predictive models, will guide attendees on hands-on execution with R, covering:
- A working knowledge of the R system
- The strengths and limitations of the R language
- Preparing data with R, including splitting, resampling and variable creation
- Developing predictive models with R, including decision trees, support vector machines and ensemble methods
- Visualization: Exploratory Data Analysis (EDA), and tools that persuade
- Evaluating predictive models, including viewing lift curves, variable importance and avoiding overfitting
Hardware: Bring Your Own Laptop
Each workshop participant is required to bring their own laptop running Windows or OS X. The software used during this training program, R, is free and readily available for download.
Attendees receive an electronic copy of the course materials and related R code at the conclusion of the workshop.
Schedule
- Workshop starts at 9:00am
- Morning Coffee Break at 10:30am – 11:00am
- Lunch provided at 12:30 – 1:15pm
- Afternoon Coffee Break at 2:30pm – 3:00pm
- End of the Workshop: 4:30pm
Instructor
Max Kuhn, Director, Nonclinical Statistics, Pfizer
Max Kuhn is a Director of Nonclinical Statistics at Pfizer Global R&D in Connecticut. He has been applying models in the pharmaceutical industries for over 15 years.
He is a leading R developer and the author of several R packages including the CARET package that provides a simple and consistent interface to over 100 predictive models available in R.
Mr. Kuhn has taught courses on modeling within Pfizer and externally, including a class for the India Ministry of Information Technology.
Source-
http://www.predictiveanalyticsworld.com/toronto/2012/r_for_predictive_modeling.php





