Top Funny Charts

I have recently become a Quora addict, and you can see why it is such a great site. If possible say hello to me there at

http://www.quora.com/Ajay-Ohri

My latest favorite question-

What are the most hilarious pie charts?

https://www.quora.com/Pie-Charts/What-are-the-most-hilarious-pie-charts

I am only showing you some of the answers, you can see the rest yourself.

 

 

Data Frame in Python

Exploring some Python Packages and R packages to move /work with both Python and R without melting your brain or exceeding your project deadline

—————————————

If you liked the data.frame structure in R, you have some way to work with them at a faster processing speed in Python.

Here are three packages that enable you to do so-

(1) pydataframe http://code.google.com/p/pydataframe/

An implemention of an almost R like DataFrame object. (install via Pypi/Pip: “pip install pydataframe”)

Usage:

        u = DataFrame( { "Field1": [1, 2, 3],
                        "Field2": ['abc', 'def', 'hgi']},
                        optional:
                         ['Field1', 'Field2']
                         ["rowOne", "rowTwo", "thirdRow"])

A DataFrame is basically a table with rows and columns.

Columns are named, rows are numbered (but can be named) and can be easily selected and calculated upon. Internally, columns are stored as 1d numpy arrays. If you set row names, they’re converted into a dictionary for fast access. There is a rich subselection/slicing API, see help(DataFrame.get_item) (it also works for setting values). Please note that any slice get’s you another DataFrame, to access individual entries use get_row(), get_column(), get_value().

DataFrames also understand basic arithmetic and you can either add (multiply,…) a constant value, or another DataFrame of the same size / with the same column names, like this:

#multiply every value in ColumnA that is smaller than 5 by 6.
my_df[my_df[:,'ColumnA'] < 5, 'ColumnA'] *= 6

#you always need to specify both row and column selectors, use : to mean everything
my_df[:, 'ColumnB'] = my_df[:,'ColumnA'] + my_df[:, 'ColumnC']

#let's take every row that starts with Shu in ColumnA and replace it with a new list (comprehension)
select = my_df.where(lambda row: row['ColumnA'].startswith('Shu'))
my_df[select, 'ColumnA'] = [row['ColumnA'].replace('Shu', 'Sha') for row in my_df[select,:].iter_rows()]

Dataframes talk directly to R via rpy2 (rpy2 is not a prerequiste for the library!)

 

(2) pandas http://pandas.pydata.org/

Library Highlights

  • A fast and efficient DataFrame object for data manipulation with integrated indexing;
  • Tools for reading and writing data between in-memory data structures and different formats: CSV and text files, Microsoft Excel, SQL databases, and the fast HDF5 format;
  • Intelligent data alignment and integrated handling of missing data: gain automatic label-based alignment in computations and easily manipulate messy data into an orderly form;
  • Flexible reshaping and pivoting of data sets;
  • Intelligent label-based slicing, fancy indexing, and subsetting of large data sets;
  • Columns can be inserted and deleted from data structures for size mutability;
  • Aggregating or transforming data with a powerful group by engine allowing split-apply-combine operations on data sets;
  • High performance merging and joining of data sets;
  • Hierarchical axis indexing provides an intuitive way of working with high-dimensional data in a lower-dimensional data structure;
  • Time series-functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging. Even create domain-specific time offsets and join time series without losing data;
  • The library has been ruthlessly optimized for performance, with critical code paths compiled to C;
  • Python with pandas is in use in a wide variety of academic and commercial domains, including Finance, Neuroscience, Economics, Statistics, Advertising, Web Analytics, and more.

Why not R?

First of all, we love open source R! It is the most widely-used open source environment for statistical modeling and graphics, and it provided some early inspiration for pandas features. R users will be pleased to find this library adopts some of the best concepts of R, like the foundational DataFrame (one user familiar with R has described pandas as “R data.frame on steroids”). But pandas also seeks to solve some frustrations common to R users:

  • R has barebones data alignment and indexing functionality, leaving much work to the user. pandas makes it easy and intuitive to work with messy, irregularly indexed data, like time series data. pandas also provides rich tools, like hierarchical indexing, not found in R;
  • R is not well-suited to general purpose programming and system development. pandas enables you to do large-scale data processing seamlessly when developing your production applications;
  • Hybrid systems connecting R to a low-productivity systems language like Java, C++, or C# suffer from significantly reduced agility and maintainability, and you’re still stuck developing the system components in a low-productivity language;
  • The “copyleft” GPL license of R can create concerns for commercial software vendors who want to distribute R with their software under another license. Python and pandas use more permissive licenses.

(3) datamatrix http://pypi.python.org/pypi/datamatrix/0.8

datamatrix 0.8

A Pythonic implementation of R’s data.frame structure.

Latest Version: 0.9

This module allows access to comma- or other delimiter separated files as if they were tables, using a dictionary-like syntax. DataMatrix objects can be manipulated, rows and columns added and removed, or even transposed

—————————————————————–

Modeling in Python

Continue reading “Data Frame in Python”

Interview Rob J Hyndman Forecasting Expert #rstats

Here is an interview with Prof Rob J Hyndman who has created many time series forecasting methods and authored books as well as R packages on the same.

Ajay -Describe your journey from being a student of science to a Professor. What were some key turning points along that journey?
 
Rob- I started a science honours degree at the University of Melbourne in 1985. By the end of 1985 I found myself simultaneously working as a statistical consultant (having completed all of one year of statistics courses!). For the next three years I studied mathematics, statistics and computer science at university, and tried to learn whatever I needed to in order to help my growing group of clients. Often we would cover things in classes that I’d already taught myself through my consulting work. That really set the trend for the rest of my career. I’ve always been an academic on the one hand, and a statistical consultant on the other. The consulting work has led me to learn a lot of things that I would not otherwise have come across, and has also encouraged me to focus on research problems that are of direct relevance to the clients I work with.
I never set out to be an academic. In fact, I thought that I would get a job in the business world as soon as I finished my degree. But once I completed the degree, I was offered a position as a statistical consultant within the University of Melbourne, helping researchers in various disciplines and doing some commercial work. After a year, I was getting bored doing only consulting, and I thought it would be interesting to do a PhD. I was lucky enough to be offered a generous scholarship which meant I was paid more to study than to continue working.
Again, I thought that I would probably go and get a job in the business world after I finished my PhD. But I finished it early and my scholarship was going to be cut off once I submitted my thesis. So instead, I offered to teach classes for free at the university and delayed submitting my thesis until the scholarship period ran out. That turned out to be a smart move because the university saw that I was a good teacher, and offered me a lecturing position starting immediately I submitted my thesis. So I sort of fell into an academic career.
I’ve kept up the consulting work part-time because it is interesting, and it gives me a little extra money. But I’ve also stayed an academic because I love the freedom to be able to work on anything that takes my fancy.
Ajay- Describe your upcoming book on Forecasting.
 
Rob- My first textbook on forecasting (with Makridakis and Wheelwright) was written a few years after I finished my PhD. It has been very popular, but it costs a lot of money (about $140 on Amazon). I estimate that I get about $1 for every book sold. The rest goes to the publisher (Wiley) and all they do is print, market and distribute it. I even typeset the whole thing myself and they print directly from the files I provided. It is now about 15 years since the book was written and it badly needs updating. I had a choice of writing a new edition with Wiley or doing something completely new. I decided to do a new one, largely because I didn’t want a publisher to make a lot of money out of students using my hard work.
It seems to me that students try to avoid buying textbooks and will search around looking for suitable online material instead. Often the online material is of very low quality and contains many errors.
As I wasn’t making much money on my textbook, and the facilities now exist to make online publishing very easy, I decided to try a publishing experiment. So my new textbook will be online and completely free. So far it is about 2/3 completed and is available at http://otexts.com/fpp/. I am hoping that my co-author (George Athanasopoulos) and I will finish it off before the end of 2012.
The book is intended to provide a comprehensive introduction to forecasting methods. We don’t attempt to discuss the theory much, but provide enough information for people to use the methods in practice. It is tied to the forecast package in R, and we provide code to show how to use the various forecasting methods.
The idea of online textbooks makes a lot of sense. They are continuously updated so if we find a mistake we fix it immediately. Also, we can add new sections, or update parts of the book, as required rather than waiting for a new edition to come out. We can also add richer content including video, dynamic graphics, etc.
For readers that want a print edition, we will be aiming to produce a print version of the book every year (available via Amazon).
I like the idea so much I’m trying to set up a new publishing platform (otexts.com) to enable other authors to do the same sort of thing. It is taking longer than I would like to make that happen, but probably next year we should have something ready for other authors to use.
Ajay- How can we make textbooks cheaper for students as well as compensate authors fairly
 
Rob- Well free is definitely cheaper, and there are a few businesses trying to make free online textbooks a reality. Apart from my own efforts, http://www.flatworldknowledge.com/ is producing a lot of free textbooks. And textbookrevolution.org is another great resource.
With otexts.com, we will compensate authors in two ways. First, the print versions of a book will be sold (although at a vastly cheaper rate than other commercial publishers). The royalties on print sales will be split 50/50 with the authors. Second, we plan to have some features of each book available for subscription only (e.g., solutions to exercises, some multimedia content, etc.). Again, the subscription fees will be split 50/50 with the authors.
Ajay- Suppose a person who used to use forecasting software from another company decides to switch to R. How easy and lucid do you think the current documentation on R website for business analytics practitioners such as these – in the corporate world.
 
Rob- The documentation on the R website is not very good for newcomers, but there are a lot of other R resources now available. One of the best introductions is Matloff’s “The Art of R Programming”. Provided someone has done some programming before (e.g., VBA, python or java), learning R is a breeze. The people who have trouble are those who have only ever used menu interfaces such as Excel. Then they are not only learning R, but learning to think about computing in a different way from what they are used to, and that can be tricky. However, it is well worth it. Once you know how to code, you can do so much more.  I wish some basic programming was part of every business and statistics degree.
If you are working in a particular area, then it is often best to find a book that uses R in that discipline. For example, if you want to do forecasting, you can use my book (otexts.com/fpp/). Or if you are using R for data visualization, get hold of Hadley Wickham’s ggplot2 book.
Ajay- In a long and storied career- What is the best forecast you ever made ? and the worst?
 
 Rob- Actually, my best work is not so much in making forecasts as in developing new forecasting methodology. I’m very proud of my forecasting models for electricity demand which are now used for all long-term planning of electricity capacity in Australia (see  http://robjhyndman.com/papers/peak-electricity-demand/  for the details). Also, my methods for population forecasting (http://robjhyndman.com/papers/stochastic-population-forecasts/ ) are pretty good (in my opinion!). These methods are now used by some national governments (but not Australia!) for their official population forecasts.
Of course, I’ve made some bad forecasts, but usually when I’ve tried to do more than is reasonable given the available data. One of my earliest consulting jobs involved forecasting the sales for a large car manufacturer. They wanted forecasts for the next fifteen years using less than ten years of historical data. I should have refused as it is unreasonable to forecast that far ahead using so little data. But I was young and naive and wanted the work. So I did the forecasts, and they were clearly outside the company’s (reasonable) expectations, and they then refused to pay me. Lesson learned. It’s better to refuse work than do it poorly.

Probably the biggest impact I’ve had is in helping the Australian government forecast the national health budget. In 2001 and 2002, they had underestimated health expenditure by nearly $1 billion in each year which is a lot of money to have to find, even for a national government. I was invited to assist them in developing a new forecasting method, which I did. The new method has forecast errors of the order of plus or minus $50 million which is much more manageable. The method I developed for them was the basis of the ETS models discussed in my 2008 book on exponential smoothing (www.exponentialsmoothing.net)

. And now anyone can use the method with the ets() function in the forecast package for R.
About-
Rob J Hyndman is Pro­fessor of Stat­ist­ics in the Depart­ment of Eco­no­met­rics and Busi­ness Stat­ist­ics at Mon­ash Uni­ver­sity and Dir­ector of the Mon­ash Uni­ver­sity Busi­ness & Eco­nomic Fore­cast­ing Unit. He is also Editor-in-Chief of the Inter­na­tional Journal of Fore­cast­ing and a Dir­ector of the Inter­na­tional Insti­tute of Fore­casters. Rob is the author of over 100 research papers in stat­ist­ical sci­ence. In 2007, he received the Moran medal from the Aus­tralian Academy of Sci­ence for his con­tri­bu­tions to stat­ist­ical research, espe­cially in the area of stat­ist­ical fore­cast­ing. For 25 years, Rob has main­tained an act­ive con­sult­ing prac­tice, assist­ing hun­dreds of com­pan­ies and organ­iz­a­tions. His recent con­sult­ing work has involved fore­cast­ing elec­tri­city demand, tour­ism demand, the Aus­tralian gov­ern­ment health budget and case volume at a US call centre.

New Free Online Book by Rob Hyndman on Forecasting using #Rstats

From the creator of some of the most widely used packages for time series in the R programming language comes a brand new book, and its online!

This time the book is free, will be updated and 7 chapters are ready (to read!)

. If you do forecasting professionally, now is the time to suggest your own use cases to be featured as the book gets ready by end- 2012. The book is intended as a replace­ment for Makri­dakis, Wheel­wright and Hyn­d­man (Wiley 1998).

http://otexts.com/fpp/

The book is writ­ten for three audi­ences:

(1) people find­ing them­selves doing fore­cast­ing in busi­ness when they may not have had any for­mal train­ing in the area;

(2) undergraduate stu­dents study­ing busi­ness;

(3) MBA stu­dents doing a fore­cast­ing elec­tive.

The book is dif­fer­ent from other fore­cast­ing text­books in sev­eral ways.

  • It is free and online, mak­ing it acces­si­ble to a wide audience.
  • It is con­tin­u­ously updated. You don’t have to wait until the next edi­tion for errors to be removed or new meth­ods to be dis­cussed. We will update the book frequently.
  • There are dozens of real data exam­ples taken from our own con­sult­ing prac­tice. We have worked with hun­dreds of busi­nesses and orga­ni­za­tions help­ing them with fore­cast­ing issues, and this expe­ri­ence has con­tributed directly to many of the exam­ples given here, as well as guid­ing our gen­eral phi­los­o­phy of forecasting.
  • We empha­sise graph­i­cal meth­ods more than most fore­cast­ers. We use graphs to explore the data, analyse the valid­ity of the mod­els fit­ted and present the fore­cast­ing results.

A print ver­sion and a down­load­able e-version of the book will be avail­able to pur­chase on Ama­zon, but not until a few more chap­ters are written.

Contents

(Ajay-Support the open textbook movement!)

If you’ve found this book helpful, please consider helping to fund free, open and online textbooks. (Donations via PayPal.)

Look for yourself at http://otexts.com/fpp/

 

Interview John Myles White , Machine Learning for Hackers

Here is an interview with one of the younger researchers  and rock stars of the R Project, John Myles White,  co-author of Machine Learning for Hackers.

Ajay- What inspired you guys to write Machine Learning for Hackers. What has been the public response to the book. Are you planning to write a second edition or a next book?

John-We decided to write Machine Learning for Hackers because there were so many people interested in learning more about Machine Learning who found the standard textbooks a little difficult to understand, either because they lacked the mathematical background expected of readers or because it wasn’t clear how to translate the mathematical definitions in those books into usable programs. Most Machine Learning books are written for audiences who will not only be using Machine Learning techniques in their applied work, but also actively inventing new Machine Learning algorithms. The amount of information needed to do both can be daunting, because, as one friend pointed out, it’s similar to insisting that everyone learn how to build a compiler before they can start to program. For most people, it’s better to let them try out programming and get a taste for it before you teach them about the nuts and bolts of compiler design. If they like programming, they can delve into the details later.

We once said that Machine Learning for Hackers  is supposed to be a chemistry set for Machine Learning and I still think that’s the right description: it’s meant to get readers excited about Machine Learning and hopefully expose them to enough ideas and tools that they can start to explore on their own more effectively. It’s like a warmup for standard academic books like Bishop’s.
The public response to the book has been phenomenal. It’s been amazing to see how many people have bought the book and how many people have told us they found it helpful. Even friends with substantial expertise in statistics have said they’ve found a few nuggets of new information in the book, especially regarding text analysis and social network analysis — topics that Drew and I spend a lot of time thinking about, but are not thoroughly covered in standard statistics and Machine Learning  undergraduate curricula.
I hope we write a second edition. It was our first book and we learned a ton about how to write at length from the experience. I’m about to announce later this week that I’m writing a second book, which will be a very short eBook for O’Reilly. Stay tuned for details.

Ajay-  What are the key things that a potential reader can learn from this book?

John- We cover most of the nuts and bolts of introductory statistics in our book: summary statistics, regression and classification using linear and logistic regression, PCA and k-Nearest Neighbors. We also cover topics that are less well known, but are as important: density plots vs. histograms, regularization, cross-validation, MDS, social network analysis and SVM’s. I hope a reader walks away from the book having a feel for what different basic algorithms do and why they work for some problems and not others. I also hope we do just a little to shift a future generation of modeling culture towards regularization and cross-validation.

Ajay- Describe your journey as a science student up till your Phd. What are you current research interests and what initiatives have you done with them?

John-As an undergraduate I studied math and neuroscience. I then took some time off and came back to do a Ph.D. in psychology, focusing on mathematical modeling of both the brain and behavior. There’s a rich tradition of machine learning and statistics in psychology, so I got increasingly interested in ML methods during my years as a grad student. I’m about to finish my Ph.D. this year. My research interests all fall under one heading: decision theory. I want to understand both how people make decisions (which is what psychology teaches us) and how they should make decisions (which is what statistics and ML teach us). My thesis is focused on how people make decisions when there are both short-term and long-term consequences to be considered. For non-psychologists, the classic example is probably the explore-exploit dilemma. I’ve been working to import more of the main ideas from stats and ML into psychology for modeling how real people handle that trade-off. For psychologists, the classic example is the Marshmallow experiment. Most of my research work has focused on the latter: what makes us patient and how can we measure patience?

Ajay- How can academia and private sector solve the shortage of trained data scientists (assuming there is one)?

John- There’s definitely a shortage of trained data scientists: most companies are finding it difficult to hire someone with the real chops needed to do useful work with Big Data. The skill set required to be useful at a company like Facebook or Twitter is much more advanced than many people realize, so I think it will be some time until there are undergraduates coming out with the right stuff. But there’s huge demand, so I’m sure the market will clear sooner or later.

The changes that are required in academia to prepare students for this kind of work are pretty numerous, but the most obvious required change is that quantitative people need to be learning how to program properly, which is rare in academia, even in many CS departments. Writing one-off programs that no one will ever have to reuse and that only work on toy data sets doesn’t prepare you for working with huge amounts of messy data that exhibit shifting patterns. If you need to learn how to program seriously before you can do useful work, you’re not very valuable to companies who need employees that can hit the ground running. The companies that have done best in building up data teams, like LinkedIn, have learned to train people as they come in since the proper training isn’t typically available outside those companies.
Of course, on the flipside, the people who do know how to program well need to start learning more about theory and need to start to have a better grasp of basic mathematical models like linear and logistic regressions. Lots of CS students seem not to enjoy their theory classes, but theory really does prepare you for thinking about what you can learn from data. You may not use automata theory if you work at Foursquare, but you will need to be able to reason carefully and analytically. Doing math is just like lifting weights: if you’re not good at it right now, you just need to dig in and get yourself in shape.
About-
John Myles White is a Phd Student in  Ph.D. student in the Princeton Psychology Department, where he studies human decision-making both theoretically and experimentally. Along with the political scientist Drew Conway, he is  the author of a book published by O’Reilly Media entitled “Machine Learning for Hackers”, which is meant to introduce experienced programmers to the machine learning toolkit. He is also working with Mark Hansenon a book for laypeople about exploratory data analysis.John is the lead maintainer for several R packages, including ProjectTemplate and log4r.

(TIL he has played in several rock bands!)

—–
You can read more in his own words at his blog at http://www.johnmyleswhite.com/about/
He can be contacted via social media at Google Plus at https://plus.google.com/109658960610931658914 or twitter at twitter.com/johnmyleswhite/

Interview Alain Chesnais Chief Scientist Trendspottr.com

Here is a brief interview with Alain Chesnais ,Chief Scientist  Trendspottr.com. It is a big honor to interview such a legend in computer science, and I am grateful to both him and Mark Zohar for taking time to write these down.
alain_chesnais2.jpg

Ajay-  Describe your career from your student days to being the President of ACM (Association of Computing Machinery http://www.acm.org/ ). How can we increase  the interest of students in STEM education, particularly in view of the shortage of data scientists.
 
Alain- I’m trying to sum up a career of over 35 years. This may be a bit long winded…
I started my career in CS when I was in high school in the early 70’s. I was accepted in the National Science Foundation’s Science Honors Program in 9th grade and the first course I took was a Fortran programming course at Columbia University. This was on an IBM 360 using punch cards.
The next year my high school got a donation from DEC of a PDP-8E mini computer. I ended up spending a lot of time in the machine room all through high school at a time when access to computers wasn’t all that common. I went to college in Paris and ended up at l’Ecole Normale Supérieure de Cachan in the newly created Computer Science department.
My first job after finishing my graduate studies was as a research assistant at the Centre National de la Recherche Scientifique where I focused my efforts on modelling the behaviour of distributed database systems in the presence of locking. When François Mitterand was elected president of France in 1981, he invited Nicholas Negroponte and Seymour Papert to come to France to set up the Centre Mondial Informatique. I was hired as a researcher there and continued on to become director of software development until it was closed down in 1986. I then started up my own company focusing on distributed computer graphics. We sold the company to Abvent in the early 90’s.
After that, I was hired by Thomson Digital Image to lead their rendering team. We were acquired by Wavefront Technologies in 1993 then by SGI in 1995 and merged with Alias Research. In the merged company: Alias|wavefront, I was director of engineering on the Maya project. Our team received an Oscar in 2003 for the creation of the Maya software system.
Since then I’ve worked at various companies, most recently focusing on social media and Big Data issues associated with it. Mark Zohar and I worked together at SceneCaster in 2007 where we developed a Facebook app that allowed users to create their own 3D scenes and share them with friends via Facebook without requiring a proprietary plugin. In December 2007 it was the most popular app in its category on Facebook.
Recently Mark approached me with a concept related to mining the content of public tweets to determine what was trending in real time. Using math similar to what I had developed during my graduate studies to model the performance of distributed databases in the presence of locking, we built up a real time analytics engine that ranks the content of tweets as they stream in. The math is designed to scale linearly in complexity with the volume of data that we analyze. That is the basis for what we have created for TrendSpottr.
In parallel to my professional career, I have been a very active volunteer at ACM. I started out as a member of the Paris ACM SIGGRAPH chapter in 1985 and volunteered to help do our mailings (snail mail at the time). After taking on more responsibilities with the chapter, I was elected chair of the chapter in 1991. I was first appointed to the SIGGRAPH Local Groups Steering Committee, then became ACM Director for Chapters. Later I was successively elected SIGGRAPH Vice Chair, ACM SIG Governing Board (SGB) Vice Chair for Operations, SGB Chair, ACM SIGGRAPH President, ACM Secretary/Treasurer, ACM Vice President, and finally, in 2010, I was elected ACM President. My term as ACM President has just ended on July 1st. Vint Cerf is our new President. I continue to serve on the ACM Executive Committee in my role as immediate Past President.
(Note- About ACM
ACM, the Association for Computing Machinery www.acm.org, is the world’s largest educational and scientific computing society, uniting computing educators, researchers and professionals to inspire dialogue, share resources and address the field’s challenges. )
Ajay- What sets Trendspotter apart from other startups out there in terms of vision in trying to achieve a more coherent experience on the web.
 
Alain- The Basic difference with other approaches that we are aware of is that we have developed an incremental solution that calculates the results on the fly as the data streams in. Our evaluators are based on solid mathematical foundations that have proven their usefulness over time. One way to describe what we do is to think of it as signal processing where the tweets are the signal and our evaluators are like triggers that tell you what elements of the signal have the characteristics that we are filtering for (velocity and acceleration). One key result of using this approach is that our unit cost per tweet analyzed does not go up with increased volume. Using more traditional data analysis approaches involving an implicit sort would imply a complexity of N*log(N), where N is the volume of tweets being analyzed. That would imply that the cost per tweet analyzed would go up with the volume of tweets. Our approach was designed to avoid that, so that we can maintain a cap on our unit costs of analysis, no matter what volume of data we analyze.
Ajay- What do you think is the future of big data visualization going to look like? What are some of the technologies that you are currently bullish on?
Alain- I see several trends that would have deep impact on Big Data visualization. I firmly believe that with large amounts of data, visualization is key tool for understanding both the structure and the relationships that exist between data elements. Let’s focus on some of the key things that are pushing in this direction:
  • the volume of data that is available is growing at a rate we have never seen before. Cisco has measured an 8 fold increase in the volume of IP traffic over the last 5 years and predicts that we will reach the zettabyte of data over IP in 2016
  • more of the data is becoming publicly available. This isn’t only on social networks such as Facebook and twitter, but joins a more general trend involving open research initiatives and open government programs
  • the desired time to get meaningful results is going down dramatically. In the past 5 years we have seen the half life of data on Facebook, defined as the amount of time that half of the public reactions to any given post (likes, shares., comments) take place, go from about 12 hours to under 3 hours currently
  • our access to the net is always on via mobile device. You are always connected.
  • the CPU and GPU capabilities of mobile devices is huge (an iPhone has 10 times the compute power of a Cray-1 and more graphics capabilities than early SGI workstations)
Put all of these observations together and you quickly come up with a massive opportunity to analyze data visually on the go as it happens no matter where you are. We can’t afford to have to wait for results. When something of interest occurs we need to be aware of it immediately.
Ajay- What are some of the applications we could use Trendspottr. Could we predict events like Arab Spring, or even the next viral thing.
 
Alain- TrendSpottr won’t predict what will happen next. What it *will* do is alert you immediately when it happens. You can think of it like a smoke detector. It doesn’t tell that a fire will take place, but it will save your life when a fire does break out.
Typical uses for TrendSpottr are
  • thought leadership by tracking content that your readership is interested in via TrendSpottr you can be seen as a thought leader on the subject by being one of the first to share trending content on a given subject. I personally do this on my Facebook page (http://www.facebook.com/alain.chesnais) and have seen my klout score go up dramatically as a result
  • brand marketing to be able to know when something is trending about your brand and take advantage of it as it happens.
  • competitive analysis to see what is being said about two competing elements. For instance, searching TrendSpottr for “Obama OR Romney” gives you a very good understanding about how social networks are reacting to each politician. You can also do searches like “$aapl OR $msft OR $goog” to get a sense of what is the current buzz for certain hi tech stocks.
  • understanding your impact in real time to be able to see which of the content that you are posting is trending the most on social media so that you can highlight it on your main page. So if all of your content is hosted on common domain name (ourbrand.com), searching for ourbrand.com will show you the most active of your site’s content. That can easily be set up by putting a TrendSpottr widget on your front page

Ajay- What are some of the privacy guidelines that you keep in  mind- given the fact that you collect individual information but also have government agencies as potential users.

 
Alain- We take privacy very seriously and anonymize all of the data that we collect. We don’t keep explicit records of the data we collected through the various incoming streams and only store the aggregate results of our analysis.
About
Alain Chesnais is immediate Past President of ACM, elected for the two-year term beginning July 1, 2010.Chesnais studied at l’Ecole Normale Supérieure de l’Enseignement Technique and l’Université de Paris where he earned a Maîtrise de Mathematiques, a Maitrise de Structure Mathématique de l’Informatique, and a Diplôme d’Etudes Approfondies in Compuer Science. He was a high school student at the United Nations International School in New York, where, along with preparing his International Baccalaureate with a focus on Math, Physics and Chemistry, he also studied Mandarin Chinese.Chesnais recently founded Visual Transitions, which specializes in helping companies move to HTML 5, the newest standard for structuring and presenting content on the World Wide Web. He was the CTO of SceneCaster.com from June 2007 until April 2010, and was Vice President of Product Development at Tucows Inc. from July 2005 – May 2007. He also served as director of engineering at Alias|Wavefront on the team that received an Oscar from the Academy of Motion Picture Arts and Sciences for developing the Maya 3D software package.

Prior to his election as ACM president, Chesnais was vice president from July 2008 – June 2010 as well as secretary/treasurer from July 2006 – June 2008. He also served as president of ACM SIGGRAPH from July 2002 – June 2005 and as SIG Governing Board Chair from July 2000 – June 2002.

As a French citizen now residing in Canada, he has more than 20 years of management experience in the software industry. He joined the local SIGGRAPH Chapter in Paris some 20 years ago as a volunteer and has continued his involvement with ACM in a variety of leadership capacities since then.

About Trendspottr.com

TrendSpottr is a real-time viral search and predictive analytics service that identifies the most timely and trending information for any topic or keyword. Our core technology analyzes real-time data streams and spots emerging trends at their earliest acceleration point — hours or days before they have become “popular” and reached mainstream awareness.

TrendSpottr serves as a predictive early warning system for news and media organizations, brands, government agencies and Fortune 500 companies and helps them to identify emerging news, events and issues that have high viral potential and market impact. TrendSpottr has partnered with HootSuite, DataSift and other leading social and big data companies.

Machine Learning to Translate Code from different programming languages

Google Translate has been a pioneer in using machine learning for translating various languages (and so is the awesome Google Transliterate)

I wonder if they can expand it to programming languages and not just human languages.

 

Issues in converting  translating programming language code

1) Paths referred for stored objects

2) Object Names should remain the same and not translated

3) Multiple Functions have multiple uses , sometimes function translate is not straightforward

I think all these issues are doable, solveable and more importantly profitable.

 

I look forward to the day a iOS developer can convert his code to Android app code by simple upload and download.

%d bloggers like this: