Home » Posts tagged 'learning'
Tag Archives: learning
Interview Rob J Hyndman Forecasting Expert #rstats
Here is an interview with Prof Rob J Hyndman who has created many time series forecasting methods and authored books as well as R packages on the same.
Probably the biggest impact I’ve had is in helping the Australian government forecast the national health budget. In 2001 and 2002, they had underestimated health expenditure by nearly $1 billion in each year which is a lot of money to have to find, even for a national government. I was invited to assist them in developing a new forecasting method, which I did. The new method has forecast errors of the order of plus or minus $50 million which is much more manageable. The method I developed for them was the basis of the ETS models discussed in my 2008 book on exponential smoothing (www.exponentialsmoothing.net)
Python with Friends
Wanted to learn Python? Stuck on a desk with no redemption. You have two very lucid options. One is use Google. I mean not the search engine, but their class on learning Python.
The videos are available on Youtube at http://www.youtube.com/user/GoogleDevelopers (starting at http://www.youtube.com/watch?v=tKTZoB2Vjuk&feature=plcp)
http://code.google.com/edu/languages/google-python-class/
The other is new module of Python at code academy. It is truly awesome even if you dont know any programming!
So learn some awesome python today and be an excellent hacker tommorow!
Interview John Myles White , Machine Learning for Hackers
Here is an interview with one of the younger researchers and rock stars of the R Project, John Myles White, co-author of Machine Learning for Hackers.
Ajay- What inspired you guys to write Machine Learning for Hackers. What has been the public response to the book. Are you planning to write a second edition or a next book?
John-We decided to write Machine Learning for Hackers because there were so many people interested in learning more about Machine Learning who found the standard textbooks a little difficult to understand, either because they lacked the mathematical background expected of readers or because it wasn’t clear how to translate the mathematical definitions in those books into usable programs. Most Machine Learning books are written for audiences who will not only be using Machine Learning techniques in their applied work, but also actively inventing new Machine Learning algorithms. The amount of information needed to do both can be daunting, because, as one friend pointed out, it’s similar to insisting that everyone learn how to build a compiler before they can start to program. For most people, it’s better to let them try out programming and get a taste for it before you teach them about the nuts and bolts of compiler design. If they like programming, they can delve into the details later.
Ajay- What are the key things that a potential reader can learn from this book?
John- We cover most of the nuts and bolts of introductory statistics in our book: summary statistics, regression and classification using linear and logistic regression, PCA and k-Nearest Neighbors. We also cover topics that are less well known, but are as important: density plots vs. histograms, regularization, cross-validation, MDS, social network analysis and SVM’s. I hope a reader walks away from the book having a feel for what different basic algorithms do and why they work for some problems and not others. I also hope we do just a little to shift a future generation of modeling culture towards regularization and cross-validation.
Ajay- Describe your journey as a science student up till your Phd. What are you current research interests and what initiatives have you done with them?
John-As an undergraduate I studied math and neuroscience. I then took some time off and came back to do a Ph.D. in psychology, focusing on mathematical modeling of both the brain and behavior. There’s a rich tradition of machine learning and statistics in psychology, so I got increasingly interested in ML methods during my years as a grad student. I’m about to finish my Ph.D. this year. My research interests all fall under one heading: decision theory. I want to understand both how people make decisions (which is what psychology teaches us) and how they should make decisions (which is what statistics and ML teach us). My thesis is focused on how people make decisions when there are both short-term and long-term consequences to be considered. For non-psychologists, the classic example is probably the explore-exploit dilemma. I’ve been working to import more of the main ideas from stats and ML into psychology for modeling how real people handle that trade-off. For psychologists, the classic example is the Marshmallow experiment. Most of my research work has focused on the latter: what makes us patient and how can we measure patience?
Ajay- How can academia and private sector solve the shortage of trained data scientists (assuming there is one)?
John- There’s definitely a shortage of trained data scientists: most companies are finding it difficult to hire someone with the real chops needed to do useful work with Big Data. The skill set required to be useful at a company like Facebook or Twitter is much more advanced than many people realize, so I think it will be some time until there are undergraduates coming out with the right stuff. But there’s huge demand, so I’m sure the market will clear sooner or later.
(TIL he has played in several rock bands!)
Online Education takes off
Udacity is a smaller player but welcome competition to Coursera. I think companies that have on demand learning programs should consider donating a course to these online education players (like SAS Institute for SAS , Revolution Analytics for R, SAP, Oracle for in-memory analytics etc)
Any takers!
Coursera is doing a superb job with huge number of free courses from notable professors. 111 courses!
I am of course partial to the 7 courses that are related to my field-
Hacker Alert- Darpa project 10$ K for summer
If you bleed red,white and blue and know some geo-spatial analysis ,social network analysis and some supervised and unsupervised learning (and unlearning)- here is a chance for you to put your skills for an awesome project
from wired-
http://www.wired.com/dangerroom/2012/07/hackathon-guinea-pig/
For this challenge, Darpa will lodge a selected six to eight teams at George Mason University and provide them with an initial $10,000 for equipment and access to unclassified data sets including “ground-level video of human activity in both urban and rural environments; high-resolution wide-area LiDAR of urban and mountainous terrain, wide-area airborne full motion video; and unstructured amateur photos and videos, such as would be taken from an adversary’s cell phone.” However, participants are encouraged to use any open sourced, legal data sets they want. (In the hackathon spirit, we would encourage the consumption of massive quantities of pizza and Red Bull, too.)
DARPA Innovation House Project
Home | Data Access | Awards | Team Composition | Logisitics | Deliverables | Proposals | Evaluation Criteria | FAQ
PROPOSAL SUBMISSION
Proposals must be one to three pages. Team resumes of any length must be attached and do not count against the page limit. Proposals must have 1-inch margins, use a font size of at least 11, and be delivered in Microsoft Word or Adobe PDF format.
Proposals must be emailed to InnovationHouse@c4i.gmu.edu by 4:00PM ET on Tuesday, July 31, 2012.
Proposals must have a Title and contain at least the following sections with the following contents.
- Team Members
Each team member must be listed with name, email and phone.
The Lead Developer should be indicated.
The statement “All team members are proposed as Key Personnel.” must be included.
- Capability Description
The description should clearly explain what capability the software is designed to provide the user, how it is proposed to work, and what data it will process.
In addition, a clear argument should be made as to why it is a novel approach that is not incremental to existing methods in the field.
- Proposed Phase 1 Demonstration
This section should clearly explain what will be demonstrated at the end of Session I. The description should be expressive, and as concrete as possible about the nature of the designs and software the team intends to produce in Session I.
- Proposed Phase 2 Demonstration
This section should clearly explain how the final software capability will be demonstrated as quantitatively as possible (for example, positing the amount of data that will be processed during the demonstration), how much time that will take, and the nature of the results the processing aims to achieve.
In addition, the following sections are optional.
- Technical Approach
The technical approach section amplifies the Capability Description, explaining proposed algorithms, coding practices, architectural designs and/or other technical details.
- Team Qualifications
Team qualifications should be included if the team?s experience base does not make it obvious that it has the potential to do this level of software development. In that case, this section should make a credible argument as to why the team should be considered to have a reasonable chance of completing its goals, especially under the tight timelines described.
Other sections may be included at the proposers? discretion, provided the proposal does not exceed three pages.
http://www.darpa.mil/NewsEvents/Releases/2012/07/10.aspx
Machine Learning to Translate Code from different programming languages
Google Translate has been a pioneer in using machine learning for translating various languages (and so is the awesome Google Transliterate)
I wonder if they can expand it to programming languages and not just human languages.
Issues in converting translating programming language code
1) Paths referred for stored objects
2) Object Names should remain the same and not translated
3) Multiple Functions have multiple uses , sometimes function translate is not straightforward
I think all these issues are doable, solveable and more importantly profitable.
I look forward to the day a iOS developer can convert his code to Android app code by simple upload and download.
Google Cloud is finally here
Amazon gets some competition, and customers should see some relief, unless Google withdraws commitment on these products after a few years of trying (like it often does now!)
http://cloud.google.com/products/index.html
| Machine Type Pricing | ||||||
|---|---|---|---|---|---|---|
| Configuration | Virtual Cores | Memory | GCEU * | Local disk | Price/Hour | $/GCEU/hour |
| n1-standard-1-d | 1 | 3.75GB *** | 2.75 | 420GB *** | $0.145 | 0.053 |
| n1-standard-2-d | 2 | 7.5GB | 5.5 | 870GB | $0.29 | 0.053 |
| n1-standard-4-d | 4 | 15GB | 11 | 1770GB | $0.58 | 0.053 |
| n1-standard-8-d | 8 | 30GB | 22 | 2 x 1770GB | $1.16 | 0.053 |
| Network Pricing | |
|---|---|
| Ingress | Free |
| Egress to the same Zone. | Free |
| Egress to a different Cloud service within the same Region. | Free |
| Egress to a different Zone in the same Region (per GB) | $0.01 |
| Egress to a different Region within the US | $0.01 **** |
| Inter-continental Egress | At Internet Egress Rate |
| Internet Egress (Americas/EMEA destination) per GB | |
| 0-1 TB in a month | $0.12 |
| 1-10 TB | $0.11 |
| 10+ TB | $0.08 |
| Internet Egress (APAC destination) per GB | |
| 0-1 TB in a month | $0.21 |
| 1-10 TB | $0.18 |
| 10+ TB | $0.15 |
| Persistent Disk Pricing | |
|---|---|
| Provisioned space | $0.10 GB/month |
| Snapshot storage** | $0.125 GB/month |
| IO Operations | $0.10 per million |
| IP Address Pricing | |
|---|---|
| Static IP address (assigned but unused) | $0.01 per hour |
| Ephemeral IP address (attached to instance) | Free |
** coming soon
*** 1GB is defined as 2^30 bytes
**** promotional pricing; eventually will be charged at internet download rates
Google Prediction API
Tap into Google’s machine learning algorithms to analyze data and predict future outcomes.
Leverage machine learning without the complexity
Use the familiar RESTful interface
Enter input in any format – numeric or text
Build smart apps
Learn how you can use Prediction API to build customer sentiment analysis, spam detection or document and email classification.
Google Translation API
Use Google Translate API to build multilingual apps and programmatically translate text in your webpage or application.
Translate text into other languages programmatically
Use the familiar RESTful interface
Take advantage of Google’s powerful translation algorithms
Build multilingual apps
Learn how you can use Translate API to build apps that can programmatically translate text in your applications or websites.
Google BigQuery
Analyze Big Data in the cloud using SQL and get real-time business insights in seconds using Google BigQuery. Use a fully-managed data analysis service with no servers to install or maintain.
Figure
Reliable & Secure
Complete peace of mind as your data is automatically replicated across multiple sites and secured using access control lists.
Scale infinitely
You can store up to hundreds of terabytes, paying only for what you use.
Blazing fast
Run ad hoc SQL queries on
multi-terabyte datasets in seconds.
Google App Engine
Create apps on Google’s platform that are easy to manage and scale. Benefit from the same systems and infrastructure that power Google’s applications.
Focus on your apps
Let us worry about the underlying infrastructure and systems.
Scale infinitely
See your applications scale seamlessly from hundreds to millions of users.
Business ready
Premium paid support and 99.95% SLA for business users





