Home » Posts tagged 'personal'

Tag Archives: personal

Google Visualization Tools Can Help You Build a Personal Dashboard

The Google Visualization API is a great way for people to make dashboards with slick graphics based  on data without getting into the fine print of the scripting language  itself.  It utilizes the same tools as Google itself does, and makes visualizing data using API calls to the Visualization API. Thus a real-time customizable dashboard that is publishable to the internet can be created within minutes, and more importantly insights can be much more easily drawn from graphs than from looking at rows of tables and numbers.

  1. There are 41 gadgets (including made by both Google and third-party developers ) available in the Gadget  Gallery ( https://developers.google.com/chart/interactive/docs/gadgetgallery)
  2. There are 12 kinds of charts available in the Chart Gallery (https://developers.google.com/chart/interactive/docs/gallery) .
  3. However there 26 additional charts in the charts page at https://developers.google.com/chart/interactive/docs/more_charts )

Building and embedding charts is simplified to a few steps

  • Load the AJAX API
  • Load the Visualization API and the appropriate package (like piechart or barchart from the kinds of chart)
  • Set a callback to run when the Google Visualization API is loaded
    • Within the Callback – It creates and populates a data table, instantiates the particular chart type chosen, passes in the data and draws it.
    • Create the data table with appropriately named columns and data rows.
    • Set chart options with Title, Width and Height
  • Instantiate and draw the chart, passing in some options including the name and id
  • Finally write the HTML/ Div that will hold the chart

You can simply copy and paste the code directly from https://developers.google.com/chart/interactive/docs/quick_start without getting into any details, and tweak them according to your data, chart preference and voila your web dashboard is ready!
That is the beauty of working with API- you can create and display genius ideas without messing with the scripting languages and code (too much). If you like to dive deeper into the API, you can look at the various objects at https://developers.google.com/chart/interactive/docs/reference

First launched in Mar 2008, Google Visualization API has indeed come a long way in making dashboards easier to build for people wanting to utilize advanced data visualization . It came about directly as a result of Google’s 2007 acquisition of GapMinder (of Hans Rosling fame).
As invariably and inevitably computing shifts to the cloud, visualization APIs will be very useful. Tableau Software has been a pioneer in selling data visualizing to the lucrative business intelligence and business dashboards community (you can see the Tableau Software API at http://onlinehelp.tableausoftware.com/v7.0/server/en-us/embed_api.htm ), and Google Visualization can do the same and capture business dashboard and visualization market , if there is more focus on integrating it from Google in it’s multiple and often confusing API offerings.
However as of now, this is quite simply the easiest way to create a web dashboard for your personal needs. Google guarantees 3 years of backward compatibility with this API and it is completely free.

Interview BigML.com

Here is an interview with Charlie Parker, head of large scale online algorithms at http://bigml.com

Ajay-  Describe your own personal background in scientific computing, and how you came to be involved with machine learning, cloud computing and BigML.com

Charlie- I am a machine learning Ph.D. from Oregon State University. Francisco Martin (our founder and CEO), Adam Ashenfelter (the lead developer on the tree algorithm), and myself were all studying machine learning at OSU around the same time. We all went our separate ways after that.

Francisco started Strands and turned it into a 100+ million dollar company building recommender systems. Adam worked for CleverSet, a probabilistic modeling company that was eventually sold to Cisco, I believe. I worked for several years in the research labs at Eastman Kodak on data mining, text analysis, and computer vision.

When Francisco left Strands to start BigML, he brought in Justin Donaldson who is a brilliant visualization guy from Indiana, and an ex-Googler named Jose Ortega who is responsible for most of our data infrastructure. They pulled in Adam and I a few months later. We also have Poul Petersen, a former Strands employee, who manages our herd of servers. He is a wizard and makes everyone else’s life much easier.

Ajay- You use clojure for the back end of BigML.com .Are there any other languages and packages you are considering? What makes clojure such a good fit for cloud computing ?

Charlie- Clojure is a great language because it offers you all of the benefits of Java (extensive libraries, cross-platform compatibility, easy integration with things like Hadoop, etc.) but has the syntactical elegance of a functional language. This makes our code base small and easy to read as well as powerful.

We’ve had occasional issues with speed, but that just means writing the occasional function or library in Java. As we build towards processing data at the Terabyte level, we’re hoping to create a framework that is language-agnostic to some extent. So if we have some great machine learning code in C, for example, we’ll use Clojure to tie everything together, but the code that does the heavy lifting will still be in C. For the API and Web layers, we use Python and Django, and Justin is a huge fan of HaXe for our visualizations.

 Ajay- Current support is for Decision Trees. When can we see SVM, K Means Clustering and Logit Regression?

Charlie- Right now we’re focused on perfecting our infrastructure and giving you new ways to put data in the system, but expect to see more algorithms appearing in the next few months. We want to make sure they are as beautiful and easy to use as the trees are. Without giving too much away, the first new thing we will probably introduce is an ensemble method of some sort (such as Boosting or Bagging). Clustering is a little further away but we’ll get there soon!

Ajay- How can we use the BigML.com API using R and Python.

Charlie- We have a public github repo for the language bindings. https://github.com/bigmlcom/io Right now, there there are only bash scripts but that should change very soon. The python bindings should be there in a matter of days, and the R bindings in probably a week or two. Clojure and Java bindings should follow shortly after that. We’ll have a blog post about it each time we release a new language binding. http://blog.bigml.com/

Ajay-  How can we predict large numbers of observations using a Model  that has been built and pruned (model scoring)?

Charlie- We are in the process of refactoring our backend right now for better support for batch prediction and model evaluation. This is something that is probably only a few weeks away. Keep your eye on our blog for updates!

Ajay-  How can we export models built in BigML.com for scoring data locally.

Charlie- This is as simple as a call to our API. https://bigml.com/developers/models The call gives you a JSON object representing the tree that is roughly equivalent to a PMML-style representation.

About-

You can read about Charlie Parker at http://www.linkedin.com/pub/charles-parker/11/85b/4b5 and the rest of the BigML team at

https://bigml.com/team

 

Interview Michal Kosinski , Concerto Web Based App using #Rstats

Here is an interview with Michal Kosinski , leader of the team that has created Concerto – a web based application using R. What is Concerto? As per http://www.psychometrics.cam.ac.uk/page/300/concerto-testing-platform.htm

Concerto is a web based, adaptive testing platform for creating and running rich, dynamic tests. It combines the flexibility of HTML presentation with the computing power of the R language, and the safety and performance of the MySQL database. It’s totally free for commercial and academic use, and it’s open source

Ajay-  Describe your career in science from high school to this point. What are the various stats platforms you have trained on- and what do you think about their comparative advantages and disadvantages?  

Michal- I started with maths, but quickly realized that I prefer social sciences – thus after one year, I switched to a psychology major and obtained my MSc in Social Psychology with a specialization in Consumer Behaviour. At that time I was mostly using SPSS – as it was the only statistical package that was taught to students in my department. Also, it was not too bad for small samples and the rather basic analyses I was performing at that time.

 

My more recent research performed during my Mphil course in Psychometrics at Cambridge University followed by my current PhD project in social networks and research work at Microsoft Research, requires significantly more powerful tools. Initially, I tried to squeeze as much as possible from SPSS/PASW by mastering the syntax language. SPSS was all I knew, though I reached its limits pretty quickly and was forced to switch to R. It was a pretty dreary experience at the start, switching from an unwieldy but familiar environment into an unwelcoming command line interface, but I’ve quickly realized how empowering and convenient this tool was.

 

I believe that a course in R should be obligatory for all students that are likely to come close to any data analysis in their careers. It is really empowering – once you got the basics you have the potential to use virtually any method there is, and automate most tasks related to analysing and processing data. It is also free and open-source – so you can use it wherever you work. Finally, it enables you to quickly and seamlessly migrate to other powerful environments such as Matlab, C, or Python.

Ajay- What was the motivation behind building Concerto?

Michal- We deal with a lot of online projects at the Psychometrics Centre – one of them attracted more than 7 million unique participants. We needed a powerful tool that would allow researchers and practitioners to conveniently build and deliver online tests.

Also, our relationships with the website designers and software engineers that worked on developing our tests were rather difficult. We had trouble successfully explaining our needs, each little change was implemented with a delay and at significant cost. Not to mention the difficulties with embedding some more advanced methods (such as adaptive testing) in our tests.

So we created a tool allowing us, psychometricians, to easily develop psychometric tests from scratch an publish them online. And all this without having to hire software developers.

Ajay -Why did you choose R as the background for Concerto? What other languages and platforms did you consider. Apart from Concerto, how else do you utilize R in your center, department and University?

Michal- R was a natural choice as it is open-source, free, and nicely integrates with a server environment. Also, we believe that it is becoming a universal statistical and data processing language in science. We put increasing emphasis on teaching R to our students and we hope that it will replace SPSS/PASW as a default statistical tool for social scientists.

Ajay -What all can Concerto do besides a computer adaptive test?

Michal- We did not plan it initially, but Concerto turned out to be extremely flexible. In a nutshell, it is a web interface to R engine with a built-in MySQL database and easy-to-use developer panel. It can be installed on both Windows and Unix systems and used over the network or locally.

Effectively, it can be used to build any kind of web application that requires a powerful and quickly deployable statistical engine. For instance, I envision an easy to use website (that could look a bit like SPSS) allowing students to analyse their data using a web browser alone (learning the underlying R code simultaneously). Also, the authors of R libraries (or anyone else) could use Concerto to build user-friendly web interfaces to their methods.

Finally, Concerto can be conveniently used to build simple non-adaptive tests and questionnaires. It might seem to be slightly less intuitive at first than popular questionnaire services (such us my favourite Survey Monkey), but has virtually unlimited flexibility when it comes to item format, test flow, feedback options, etc. Also, it’s free.

Ajay- How do you see the cloud computing paradigm growing? Do you think browser based computation is here to stay?

Michal - I believe that cloud infrastructure is the future. Dynamically sharing computational and network resources between online service providers has a great competitive advantage over traditional strategies to deal with network infrastructure. I am sure the security concerns will be resolved soon, finishing the transformation of the network infrastructure as we know it. On the other hand, however, I do not see a reason why client-side (or browser) processing of the information should cease to exist – I rather think that the border between the cloud and personal or local computer will continually dissolve.

About

Michal Kosinski is Director of Operations for The Psychometrics Centre and Leader of the e-Psychometrics Unit. He is also a research advisor to the Online Services and Advertising group at the Microsoft Research Cambridge, and a visiting lecturer at the Department of Mathematics in the University of Namur, Belgium. You can read more about him at http://www.michalkosinski.com/

You can read more about Concerto at http://code.google.com/p/concerto-platform/ and http://www.psychometrics.cam.ac.uk/page/300/concerto-testing-platform.htm

Analytics for Cyber Conflict

 

The emerging use of Analytics and Knowledge Discovery in Databases for Cyber Conflict and Trade Negotiations

 

The blog post is the first in series or articles on cyber conflict and the use of analytics for targeting in both offense and defense in conflict situations.

 

It covers knowledge discovery in four kinds of databases (so chosen because of perceived importance , sensitivity, criticality and functioning of the geopolitical economic system)-

  1. Databases on Unique Identity Identifiers- including next generation biometric databases connected to Government Initiatives and Banking, and current generation databases of identifiers like government issued documents made online
  2. Databases on financial details -This includes not only traditional financial service providers but also online databases with payment details collected by retail product selling corporates like Sony’s Playstation Network, Microsoft ‘s XBox and
  3. Databases on contact details – including those by offline businesses collecting marketing databases and contact details
  4. Databases on social behavior- primarily collected by online businesses like Facebook , and other social media platforms.

It examines the role of

  1. voluntary privacy safeguards and government regulations ,

  2. weak cryptographic security of databases,

  3. weakness in balancing marketing ( maximized data ) with privacy (minimized data)

  4. and lastly the role of ownership patterns in database owning corporates

A small distinction between cyber crime and cyber conflict is that while cyber crime focusses on stealing data, intellectual property and information  to primarily maximize economic gains

cyber conflict focuses on stealing information and also disrupt effective working of database backed systems in order to gain notional competitive advantages in economics as well as geo-politics. Cyber terrorism is basically cyber conflict by non-state agents or by designated terrorist states as defined by the regulations of the “target” entity. A cyber attack is an offensive action related to cyber-infrastructure (like the Stuxnet worm that disabled uranium enrichment centrifuges of Iran). Cyber attacks and cyber terrorism are out of scope of this paper, we will concentrate on cyber conflicts involving databases.

Some examples are given here-

Types of Knowledge Discovery in -

1) Databases on Unique Identifiers- including biometric databases.

Unique Identifiers or primary keys for identifying people are critical for any intensive knowledge discovery program. The unique identifier generated must be extremely secure , and not liable to reverse engineering of the cryptographic hash function.

For biometric databases, an interesting possibility could be determining the ethnic identity from biometric information, and also mapping relatives. Current biometric information that is collected is- fingerprint data, eyes iris data, facial data. A further feature could be adding in voice data as a part of biometric databases.

This is subject to obvious privacy safeguards.

For example, Google recently unveiled facial recognition to unlock Android 4.0 mobiles, only to find out that the security feature could easily be bypassed by using a photo of the owner.

 

 

Example of Biometric Databases

In Afghanistan more than 2 million Afghans have contributed iris, fingerprint, facial data to a biometric database. In India, 121 million people have already been enrolled in the largest biometric database in the world. More than half a million customers of the Tokyo Mitsubishi Bank are are already using biometric verification at ATMs.

Examples of Breached Online Databases

In 2011, Playstation Network by Sony (PSN) lost data of 77 million customers including personal information and credit card information. Additionally data of 24 million customers were lost by Sony’s Sony Online Entertainment. The websites of open source platforms like SourceForge, WineHQ and Kernel.org were also broken into 2011. Even retailers like McDonald and Walgreen reported database breaches.

 

The role of cyber conflict arises in the following cases-

  1. Databases are online for accessing and authentication by proper users. Databases can be breached remotely by non-owners ( or “perpetrators”) non with much lesser chance of intruder identification, detection and penalization by regulators, or law enforcers (or “protectors”) than offline modes of intellectual property theft.

  2. Databases are valuable to external agents (or “sponsors”) subsidizing ( with finance, technology, information, motivation) the perpetrators for intellectual property theft. Databases contain information that can be used to disrupt the functioning of a particular economy, corporation (or “ primary targets”) or for further chain or domino effects in accessing other data (or “secondary targets”)

  3. Loss of data is more expensive than enhanced cost of security to database owners

  4. Loss of data is more disruptive to people whose data is contained within the database (or “customers”)

So the role play for different people for these kind of databases consists of-

1) Customers- who are in the database

2) Owners -who own the database. They together form the primary and secondary targets.

3) Protectors- who help customers and owners secure the databases.

and

1) Sponsors- who benefit from the theft or disruption of the database

2) Perpetrators- who execute the actual theft and disruption in the database

The use of topic models and LDA is known for making data reduction on text, and the use of data visualization including tied to GPS based location data is well known for investigative purposes, but the increasing complexity of both data generation and the sophistication of machine learning driven data processing makes this an interesting area to watch.

 

 

The next article in this series will cover-

the kind of algorithms that are currently or being proposed for cyber conflict, the role of non state agents , and what precautions can knowledge discovery in databases practitioners employ to avoid breaches of security, ethics, and regulation.

Citations-

  1. Michael A. Vatis , CYBER ATTACKS DURING THE WAR ON TERRORISM: A PREDICTIVE ANALYSIS Dartmouth College (Institute for Security Technology Studies).
  2. From Data Mining to Knowledge Discovery in Databases Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyt

Timo Elliott on 2012

Continuing the DecisionStats series on  trends for 2012, Timo Elliott , Technology Evangelist  at SAP Business Objects, looks at the predictions he made in the beginning of  2011 and follows up with the things that surprised him in 2011, and what he foresees in 2012.

You can read last year’s predictions by Mr Elliott at http://www.decisionstats.com/brief-interview-timo-elliott/

Timo- Here are my comments on the “top three analytics trends” predictions I made last year:

(1) Analytics, reinvented. New DW techniques make it possible to do sub-second, interactive analytics directly against row-level operational data. Now BI processes and interfaces need to be rethought and redesigned to make best use of this — notably by blurring the distinctions between the “design” and “consumption” phases of BI.

I spent most of 2011 talking about this theme at various conferences: how existing BI technology israpidly becoming obsolete and how the changes are akin to the move from film to digital photography. Technology that has been around for many years (in-memory, column stores, datawarehouse appliances, etc.) came together to create exciting new opportunities and even generally-skeptical industry analysts put out press releases such as “Gartner Says Data Warehousing Reaching Its Most Significant Inflection Point Since Its Inception.” Some of the smaller BI vendors had been pushing in-memory analytics for years, but the general market started paying more attention when megavendors like SAP started painting a long-term vision of in-memory becoming a core platform for applications, not just analytics. Database leader Oracle was forced to upgrade their in-memory messaging from “It’s a complete fantasy” to “we have that too”.

(2) Corporate and personal BI come together. The ability to mix corporate and personal data for quick, pragmatic analysis is a common business need. The typical solution to the problem — extracting and combining the data into a local data store (either Excel or a departmental data mart) — pleases users, but introduces duplication and extra costs and makes a mockery of information governance. 2011 will see the rise of systems that let individuals and departments load their data into personal spaces in the corporate environment, allowing pragmatic analytic flexibility without compromising security and governance.

The number of departmental “data discovery” initiatives continued to rise through 2011, but new tools do make it easier for business people to upload and manipulate their own information while using the corporate standards. 2012 will see more development of “enterprise data discovery” interfaces for casual users.

(3) The next generation of business applications. Where are the business applications designed to support what people really do all day, such as implementing this year’s strategy, launching new products, or acquiring another company? 2011 will see the first prototypes of people-focused, flexible, information-centric, and collaborative applications, bringing together the best of business intelligence, “enterprise 2.0”, and existing operational applications.

2011 saw the rise of sophisticated, user-centric mobile applications that combine data from corporate systems with GPS mapping and the ability to “take action”, such as mobile medical analytics for doctors or mobile beauty advisor applications, and collaborative BI started becoming a standard part of enterprise platforms.

And one that should happen, but probably won’t: (4) Intelligence = Information + PEOPLE. Successful analytics isn’t about technology — it’s about people, process, and culture. The biggest trend in 2011 should be organizations spending the majority of their efforts on user adoption rather than technical implementation.

Unsurprisingly, there was still high demand for presentations on why BI projects fail and how to implement BI competency centers.  The new architectures probably resulted in even more emphasis on technology than ever, while business peoples’ expectations skyrocketed, fueled by advances in the consumer world. The result was probably even more dissatisfaction in the past, but the benefits of the new architectures should start becoming clearer during 2012.

What surprised me the most:

The rapid rise of Hadoop / NoSQL. The potentials of the technology have always been impressive, but I was surprised just how quickly these technology has been used to address real-life business problems (beyond the “big web” vendors where it originated), and how quickly it is becoming part of mainstream enterprise analytic architectures (e.g. Sybase IQ 15.4 includes native MapReduce APIs, Hadoop integration and federation, etc.)

Prediction for 2012:

As I sat down to gather my thoughts about BI in 2012, I quickly came up with the same long laundry list of BI topics as everybody else: in-memory, mobile, predictive, social, collaborative decision-making, data discovery, real-time, etc. etc.  All of these things are clearly important, and where going to continue to see great improvements this year. But I think that the real “next big thing” in BI is what I’m seeing when I talk to customers: they’re using these new opportunities not only to “improve analytics” but also fundamentally rethink some of their key business processes.

Instead of analytics being something that is used to monitor and eventually improve a business process, analytics is becoming a more fundamental part of the business process itself. One example is a large telco company that has transformed the way they attract customers. Instead of laboriously creating a range of rate plans, promoting them, and analyzing the results, they now use analytics to automatically create hundreds of more complex, personalized rate plans. They then throw them out into the market, monitor in real time, and quickly cull any that aren’t successful. It’s a way of doing business that would have been inconceivable in the past, and a lot more common in the future.

 

About

 

Timo Elliott

Timo Elliott is a 20-year veteran of SAP BusinessObjects, and has spent the last quarter-century working with customers around the world on information strategy.

He works closely with SAP research and innovation centers around the world to evangelize new technology prototypes.

His popular Business Analytics blog tracks innovation in analytics and social media, including topics such as augmented corporate reality, collaborative decision-making, and social network analysis.

His PowerPoint Twitter Tools lets presenters see and react to tweets in real time, embedded directly within their slides.

A popular and engaging speaker, Elliott presents regularly to IT and business audiences at international conferences, on subjects such as why BI projects fail and what to do about it, and the intersection of BI and enterprise 2.0.

Prior to Business Objects, Elliott was a computer consultant in Hong Kong and led analytics projects for Shell in New Zealand. He holds a first-class honors degree in Economics with Statistics from Bristol University, England

Timo can be contacted via Twitter at https://twitter.com/timoelliott

 Part 1 of this series was from James Kobielus, Forrestor at http://www.decisionstats.com/jim-kobielus-on-2012/

Should you buy Zynga or Wait for the FB IPO

I am going to make a case for whether to buy or not buy  Zynga, and waiting to buy Facebook instead. Of course if Mark Pincus offers you a deep discount, and Mark Zuckenberg totally goes over the top with his P/E multiple, all bets would be re-valuated.

In the interest of your time, and my personal happiness, I am going to use a fairly standard way to measure attractiveness of both these companies- notably the Porter’s Five Forces Model. I will also review the recent experiences of Groupon and LinkedIn valuation to underscore what subtle differences in culture, and reputation of founders can affect the eventual value creation or destruction in an IPO.

(to be continued)

Interview Zach Goldberg, Google Prediction API

Here is an interview with Zach Goldberg, who is the product manager of Google Prediction API, the next generation machine learning analytics-as-an-api service state of the art cloud computing model building browser app.
Ajay- Describe your journey in science and technology from high school to your current job at Google.

Zach- First, thanks so much for the opportunity to do this interview Ajay!  My personal journey started in college where I worked at a startup named Invite Media.   From there I transferred to the Associate Product Manager (APM) program at Google.  The APM program is a two year rotational program.  I did my first year working in display advertising.  After that I rotated to work on the Prediction API.

Ajay- How does the Google Prediction API help an average business analytics customer who is already using enterprise software , servers to generate his business forecasts. How does Google Prediction API fit in or complement other APIs in the Google API suite.

Zach- The Google Prediction API is a cloud based machine learning API.  We offer the ability for anybody to sign up and within a few minutes have their data uploaded to the cloud, a model built and an API to make predictions from anywhere. Traditionally the task of implementing predictive analytics inside an application required a fair amount of domain knowledge; you had to know a fair bit about machine learning to make it work.  With the Google Prediction API you only need to know how to use an online REST API to get started.

You can learn more about how we help businesses by watching our video and going to our project website.

Ajay-  What are the additional use cases of Google Prediction API that you think traditional enterprise software in business analytics ignore, or are not so strong on.  What use cases would you suggest NOT using Google Prediction API for an enterprise.

Zach- We are living in a world that is changing rapidly thanks to technology.  Storing, accessing, and managing information is much easier and more affordable than it was even a few years ago.  That creates exciting opportunities for companies, and we hope the Prediction API will help them derive value from their data.

The Prediction API focuses on providing predictive solutions to two types of problems: regression and classification. Businesses facing problems where there is sufficient data to describe an underlying pattern in either of these two areas can expect to derive value from using the Prediction API.

Ajay- What are your separate incentives to teach about Google APIs  to academic or researchers in universities globally.

Zach- I’d refer you to our university relations page-

Google thrives on academic curiosity. While we do significant in-house research and engineering, we also maintain strong relations with leading academic institutions world-wide pursuing research in areas of common interest. As part of our mission to build the most advanced and usable methods for information access, we support university research, technological innovation and the teaching and learning experience through a variety of programs.

Ajay- What is the biggest challenge you face while communicating about Google Prediction API to traditional users of enterprise software.

Zach- Businesses often expect that implementing predictive analytics is going to be very expensive and require a lot of resources.  Many have already begun investing heavily in this area.  Quite often we’re faced with surprise, and even skepticism, when they see the simplicity of the Google Prediction API.  We work really hard to provide a very powerful solution and take care of the complexity of building high quality models behind the scenes so businesses can focus more on building their business and less on machine learning.

 

 

Follow

Get every new post delivered to your Inbox.

Join 856 other followers