Book Promotion- Click, Buy, Lie , Die

To build awareness of Eric Siegel’s new, acclaimed book, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (published by Wiley Feb. 19),  an offer ya can’t refuse.
Order the book on April 3 via Amazon ($15) for:

1. Free access to the first of 4 modules of the author’s online training program, Predictive Analytics Applied

2. A 35% discount off the full training ($495), or its in-person version, Predictive Analytics for Business, Marketing & Web ($1,495 – Apr 25-26 in NYC)

3. Automatic entrance into a drawing to receive a pass for any Predictive Analytics World this year (San Francisco, Chicago, DC, Boston, London, or Berlin).

Ajay- at $15 a pop, and quite a nice book, it’s a steal! See book review here–

https://decisionstats.com/2013/02/25/book-review-predictive-analytics-the-power-to-predict-who-will-click-buy-lie-or-die/

 

Interview John Myles White , Machine Learning for Hackers

Here is an interview with one of the younger researchers  and rock stars of the R Project, John Myles White,  co-author of Machine Learning for Hackers.

Ajay- What inspired you guys to write Machine Learning for Hackers. What has been the public response to the book. Are you planning to write a second edition or a next book?

John-We decided to write Machine Learning for Hackers because there were so many people interested in learning more about Machine Learning who found the standard textbooks a little difficult to understand, either because they lacked the mathematical background expected of readers or because it wasn’t clear how to translate the mathematical definitions in those books into usable programs. Most Machine Learning books are written for audiences who will not only be using Machine Learning techniques in their applied work, but also actively inventing new Machine Learning algorithms. The amount of information needed to do both can be daunting, because, as one friend pointed out, it’s similar to insisting that everyone learn how to build a compiler before they can start to program. For most people, it’s better to let them try out programming and get a taste for it before you teach them about the nuts and bolts of compiler design. If they like programming, they can delve into the details later.

We once said that Machine Learning for Hackers  is supposed to be a chemistry set for Machine Learning and I still think that’s the right description: it’s meant to get readers excited about Machine Learning and hopefully expose them to enough ideas and tools that they can start to explore on their own more effectively. It’s like a warmup for standard academic books like Bishop’s.
The public response to the book has been phenomenal. It’s been amazing to see how many people have bought the book and how many people have told us they found it helpful. Even friends with substantial expertise in statistics have said they’ve found a few nuggets of new information in the book, especially regarding text analysis and social network analysis — topics that Drew and I spend a lot of time thinking about, but are not thoroughly covered in standard statistics and Machine Learning  undergraduate curricula.
I hope we write a second edition. It was our first book and we learned a ton about how to write at length from the experience. I’m about to announce later this week that I’m writing a second book, which will be a very short eBook for O’Reilly. Stay tuned for details.

Ajay-  What are the key things that a potential reader can learn from this book?

John- We cover most of the nuts and bolts of introductory statistics in our book: summary statistics, regression and classification using linear and logistic regression, PCA and k-Nearest Neighbors. We also cover topics that are less well known, but are as important: density plots vs. histograms, regularization, cross-validation, MDS, social network analysis and SVM’s. I hope a reader walks away from the book having a feel for what different basic algorithms do and why they work for some problems and not others. I also hope we do just a little to shift a future generation of modeling culture towards regularization and cross-validation.

Ajay- Describe your journey as a science student up till your Phd. What are you current research interests and what initiatives have you done with them?

John-As an undergraduate I studied math and neuroscience. I then took some time off and came back to do a Ph.D. in psychology, focusing on mathematical modeling of both the brain and behavior. There’s a rich tradition of machine learning and statistics in psychology, so I got increasingly interested in ML methods during my years as a grad student. I’m about to finish my Ph.D. this year. My research interests all fall under one heading: decision theory. I want to understand both how people make decisions (which is what psychology teaches us) and how they should make decisions (which is what statistics and ML teach us). My thesis is focused on how people make decisions when there are both short-term and long-term consequences to be considered. For non-psychologists, the classic example is probably the explore-exploit dilemma. I’ve been working to import more of the main ideas from stats and ML into psychology for modeling how real people handle that trade-off. For psychologists, the classic example is the Marshmallow experiment. Most of my research work has focused on the latter: what makes us patient and how can we measure patience?

Ajay- How can academia and private sector solve the shortage of trained data scientists (assuming there is one)?

John- There’s definitely a shortage of trained data scientists: most companies are finding it difficult to hire someone with the real chops needed to do useful work with Big Data. The skill set required to be useful at a company like Facebook or Twitter is much more advanced than many people realize, so I think it will be some time until there are undergraduates coming out with the right stuff. But there’s huge demand, so I’m sure the market will clear sooner or later.

The changes that are required in academia to prepare students for this kind of work are pretty numerous, but the most obvious required change is that quantitative people need to be learning how to program properly, which is rare in academia, even in many CS departments. Writing one-off programs that no one will ever have to reuse and that only work on toy data sets doesn’t prepare you for working with huge amounts of messy data that exhibit shifting patterns. If you need to learn how to program seriously before you can do useful work, you’re not very valuable to companies who need employees that can hit the ground running. The companies that have done best in building up data teams, like LinkedIn, have learned to train people as they come in since the proper training isn’t typically available outside those companies.
Of course, on the flipside, the people who do know how to program well need to start learning more about theory and need to start to have a better grasp of basic mathematical models like linear and logistic regressions. Lots of CS students seem not to enjoy their theory classes, but theory really does prepare you for thinking about what you can learn from data. You may not use automata theory if you work at Foursquare, but you will need to be able to reason carefully and analytically. Doing math is just like lifting weights: if you’re not good at it right now, you just need to dig in and get yourself in shape.
About-
John Myles White is a Phd Student in  Ph.D. student in the Princeton Psychology Department, where he studies human decision-making both theoretically and experimentally. Along with the political scientist Drew Conway, he is  the author of a book published by O’Reilly Media entitled “Machine Learning for Hackers”, which is meant to introduce experienced programmers to the machine learning toolkit. He is also working with Mark Hansenon a book for laypeople about exploratory data analysis.John is the lead maintainer for several R packages, including ProjectTemplate and log4r.

(TIL he has played in several rock bands!)

—–
You can read more in his own words at his blog at http://www.johnmyleswhite.com/about/
He can be contacted via social media at Google Plus at https://plus.google.com/109658960610931658914 or twitter at twitter.com/johnmyleswhite/

Predictive Analytics Conferences-Four PAWs Coming – PAW Boston Super Early Ends Friday

  • Message from PAW Conferences

Friday, July 13th is your final opportunity to take advantage of the super early bird pricing for Predictive Analytics World Boston, Sept 30 – Oct 4.
INFO: www.pawcon.com/boston

AGENDA AT A GLANCE: www.pawcon.com/boston/2012/agenda_overview.php

Register now and realize savings of up to $600 over onsite registration:
www.pawcon.com/boston/register.php

– – – – – – – – – – – – – – –

All ANALYTICS EVENTS:

PAW Government: Sept 17-18, 2012 – www.pawgov.com
PAW Boston: Sept 30-Oct 4, 2012 – http://www.pawcon.com/boston
Text Analytics World Boston: Oct 3-4, 2012 – www.tawcon.com/boston
PAW Düsseldorf: Nov 6-7, 2012 – predictiveanalyticsworld.de
PAW London: Nov 27-28, 2012 – www.pawcon.com/london
PAW Videos: Available on-demand – www.pawcon.com/video

Interview: Hjálmar Gíslason, CEO of DataMarket.com

Here is an interview with Hjálmar Gíslason, CEO of Datamarket.com  . DataMarket is an active marketplace for structured data and statistics. Through powerful search and visual data exploration, DataMarket connects data seekers with data providers.

 

Ajay-  Describe your journey as an entrepreneur and techie in Iceland. What are the 10 things that surprised you most as a tech entrepreneur.

HG- DataMarket is my fourth tech start-up since at age 20 in 1996. The previous ones have been in gaming, mobile and web search. I come from a technical background but have been moving more and more to the business side over the years. I can still prototype, but I hope there isn’t a single line of my code in production!

Funny you should ask about the 10 things that have surprised me the most on this journey, as I gave a presentation – literally yesterday – titled: “9 things nobody told me about the start-up business”

They are:
* Do NOT generalize – especially not to begin with
* Prioritize – and find a work-flow that works for you
* Meet people – face to face
* You are a sales person – whether you like it or not
* Technology is not a product – it’s the entire experience
* Sell the current version – no matter how amazing the next one is
* Learn from mistakes – preferably others’
* Pick the right people – good people is not enough
* Tell a good story – but don’t make them up

I obviously elaborate on each of these points in the talk, but the points illustrate roughly some of the things I believe I’ve learned … so far 😉

9 things nobody told me about the start-up business

Ajay-

Both Amazon  and Google  have entered the public datasets space. Infochimps  has 14,000+ public datasets. The US has http://www.data.gov/

So clearly the space is both competitive and yet the demand for public data repositories is clearly under served still. 

How does DataMarket intend to address this market in a unique way to differentiate itself from others.

HG- DataMarket is about delivering business data to decision makers. We help data seekers find the data they need for planning and informed decision making, and data publishers reaching this audience. DataMarket.com is the meeting point, where data seekers can come to find the best available data, and data publishers can make their data available whether for free or for a fee. We’ve populated the site with a wealth of data from public sources such as the UN, Eurostat, World Bank, IMF and others, but there is also premium data that is only available to those that subscribe to and pay for the access. For example we resell the entire data offering from the EIU (Economist Intelligence Unit) (link: http://datamarket.com/data/list/?q=provider:eiu)

DataMarket.com allows all this data to be searched, visualized, compared and downloaded in a single place in a standard, unified manner.

We see many of these efforts not as competition, but as valuable potential sources of data for our offering, while others may be competing with parts of our proposition, such as easy access to the public data sets.

 

Ajay- What are your views on data confidentiality and access to data owned by Governments funded by tax payer money.

HG- My views are very simple: Any data that is gathered or created for taxpayers’ money should be open and free of charge unless higher priorities such as privacy or national security indicate otherwise.

Reflecting that, any data that is originally open and free of charge is still open and free of charge on DataMarket.com, just easier to find and work with.

Ajay-  How is the technology entrepreneurship and venture capital scene in Iceland. What things work and what things can be improved?

HG- The scene is quite vibrant, given the small community. Good teams with promising concepts have been able to get the funding they need to get started and test their footing internationally. When the rapid growth phase is reached outside funding may still be needed.

There are positive and negative things about any location. Among the good things about Iceland from the stand point of a technology start-up are highly skilled tech people and a relatively simple corporate environment. Among the bad things are a tiny local market, lack of skills in international sales and marketing and capital controls that were put in place after the crash of the Icelandic economy in 2008.

I’ve jokingly said that if a company is hot in the eyes of VCs it would get funding even if it was located in the jungles of Congo, while if they’re only lukewarm towards you, they will be looking for any excuse not to invest. Location can certainly be one of them, and in that case being close to the investor communities – physically – can be very important.

We’re opening up our sales and marketing offices in Boston as we speak. Not to be close to investors though, but to be close to our market and current customers.

Ajay- Describe your hobbies when you are not founding amazing tech startups.

HG- Most of my time is spent working – which happens to by my number one hobby.

It is still important to step away from it all every now and then to see things in perspective and come back with a clear mind.

I *love* traveling to exotic places. Me and my wife have done quite a lot of traveling in Africa and S-America: safari, scuba diving, skiing, enjoying nature. When at home I try to do some sports activities 3-4 times a week at least, and – recently – play with my now 8 month old son as much as I can.

About-

http://datamarket.com/p/about/team/

Management

Hjalmar GislasonHjálmar Gíslason, Founder and CEO: Hjalmar is a successful entrepreneur, founder of three startups in the gaming, mobile and web sectors since 1996. Prior to launching DataMarket, Hjalmar worked on new media and business development for companies in the Skipti Group (owners of Iceland Telecom) after their acquisition of his search startup – Spurl. Hjalmar offers a mix of business, strategy and technical expertise. DataMarket is based largely on his vision of the need for a global exchange for structured data.

hjalmar.gislason@datamarket.com

To know more, have a quick  look at  http://datamarket.com/

Preview- Google Cloud SQL

From –http://code.google.com/apis/sql/

What is Google Cloud SQL?

Google Cloud SQL is web service that allows you to create, configure, and use relational databases with your App Engine applications. It is a fully-managed service that maintains, manages, and administers your databases, allowing you to focus on your applications and services.

By offering the capabilities of a MySQL database, the service enables you to easily move your data, applications, and services into and out of the cloud. This allows for high data portability and helps in faster time-to-market because you can quickly leverage your existing database (using JDBC and/or DB-API) in your App Engine application.

Here is where you can get an invite to the beta only Google Cloud SQL

Sign up for Limited Preview

Google Cloud SQL is available to a limited number of users. To sign up for the service:

  1. Visit the Google APIs Console. The console opens the All services pane.
  2. Find the SQL Service line in the Services table and click Request access…
  3. Fill out the enrollment form.
  4. Our team will review your enrollment information and respond by email to the address associated with your Google Account.
  5. Follow the link in the email to view the Terms of Service. Please read these carefully before accepting.
  6. Sign up for the google-cloud-sql-announce group to receive important announcements and product news. (NOTE- Members: 384)
and after all that violence and double talk, a walk in the clouds with SQL.
1. There are three kinds of instances in the beta view
2. Wait for the Instance to be created note- the Design of the Interface uptil now is much better than Amazon’s.  
Note you need to have an appspot application from Google Apps and can choose between the Python and Java versions. Quite clearly there is a play for other languages too. I think GO is also supported.
3. You can import your data from your Google Storage bucket
4. I am not that hot at coding or maybe the interface was too pretty. Anyways- the log tells me that import of the text file has failed from Google Storage to Google Cloud SQL 
5. Incidentally the Google Cloud Storage interface is also much better than the Amazon GUI for transferring data- Note I was using the classical statistical dataset Boston Housing Data as the test case. 
6. The SQL prompt is the weakest part of the design process of the Interphase. There is no Query builder and the SELECT FROM WHERE prompt is slightly amusing/ insulting . I mean guys either throw in a fully fledged GUI for query builder similar to the MYSQL Workbench , than create a pretty white command prompt.
7. You can also export your data back to your Google Storage bucket 
These are early days, and I am trying to see if there is a play for some cloud kind of ODBC action between R, Prediction API , and the cloud SQL… so try it out yourself at http://code.google.com/apis/sql/ and see if there is any juice you can build  here.

Funny Economics

Some wry observations from me  on the world on economics-

1) 150 years after humiliating their country in the Opium Wars, Chinese mandarins have somehow convinced their leaders and military to park 2 trillion assets in Anglo Saxon debt. If Greece geting a 50% discount on its loan is the new precedent, when will the USA force its lendors to the negotiation table.

2) Income inequality and protests are something the Arabs and Israelis have in common. Besides being the sons of Abraham of course. Note the Persians are not considered the same as Arabs.

3) Advance knowledge of geo-political events can and ensures Western financial dealers have an edge on the sovereign funds in the other hemisphere.  What used to be the playgrounds of Eton has now shifted to the pubs of Boston and So Cal.

4) After spending 1 trillion USD on arms in the past one decade (funded by guys in item 1), the United States military forces is in a much better more advanced position to wage simultaneous war.

5) Can a war in Korean peninsula affect war in the Persian sphere of influence. Just follow the money , baby.

6) Saudi Wahabis continue to fund terror despite losing a lot of money in the economic meltdown in past few years. For every 1 $ increase in Saudi oil revenue, western oil companies ,traders, financiers make more, much more.

7) Demographics is an important key to economics. An aging Japan, and stagnant West is one cause to shift from manpower intensive warfare to cyber warfare. Plus Cyber warfare is good business . Underpopulated Russia and Arabs continue to lack true economic potential.

8) There are new economic incentives to develop tools to disseminate as well as distort information flow in real time in a hyper connected digital world.

 

Denial of Service Attacks against Hospitals and Emergency Rooms

One of the most frightening possibilities of cyber warfare is to use remotely deployed , or timed intrusion malware to disturb, distort, deny health care services.

Computer Virus Shuts Down Georgia Hospital

A doctor in an Emergency Room depends on critical information that may save lives if it is electronic and comes on time. However this electronic information can be distorted (which is more severe than deleting it)

The electronic system of a Hospital can also be overwhelmed. If there can be built Stuxnet worms on   nuclear centrifuge systems (like those by Siemens), then the widespread availability of health care systems means these can be reverse engineered for particularly vicious cyber worms.

An example of prime area for targeting is Veterans Administration for veterans of armed forces, but also cyber attacks against electronic health records.

Consider the following data points-

http://threatpost.com/en_us/blogs/dhs-warns-about-threat-mobile-devices-healthcare-051612

May 16, 2012, 9:03AM

DHS’s National Cybersecurity and Communications Integration Center (NCCIC) issued the unclassfied bulletin, “Attack Surface: Healthcare and Public Health Sector” on May 4. In it, DHS warns of a wide range of security risks, including that could expose patient data to malicious attackers, or make hospital networks and first responders subject to disruptive cyber attack

http://publicintelligence.net/nccic-medical-device-cyberattacks/

National Cybersecurity and Communications Integration Center Bulletin

The Healthcare and Public Health (HPH) sector is a multi-trillion dollar industry employing over 13 million personnel, including approximately five million first-responders with at least some emergency medical training, three million registered nurses, and more than 800,000 physicians.

(U) A significant portion of products used in patient care and management including diagnosis and treatment are Medical Devices (MD). These MDs are designed to monitor changes to a patient’s health and may be implanted or external. The Food and Drug Administration (FDA) regulates devices from design to sale and some aspects of the relationship between manufacturers and the MDs after sale. However, the FDA cannot regulate MD use or users, which includes how they are linked to or configured within networks. Typically, modern MDs are not designed to be accessed remotely; instead they are intended to be networked at their point of use. However, the flexibility and scalability of wireless networking makes wireless access a convenient option for organizations deploying MDs within their facilities. This robust sector has led the way with medical based technology options for both patient care and data handling.

(U) The expanded use of wireless technology on the enterprise network of medical facilities and the wireless utilization of MDs opens up both new opportunities and new vulnerabilities to patients and medical facilities. Since wireless MDs are now connected to Medical information technology (IT) networks, IT networks are now remotely accessible through the MD. This may be a desirable development, but the communications security of MDs to protect against theft of medical information and malicious intrusion is now becoming a major concern. In addition, many HPH organizations are leveraging mobile technologies to enhance operations. The storage capacity, fast computing speeds, ease of use, and portability render mobile devices an optimal solution.

(U) This Bulletin highlights how the portability and remote connectivity of MDs introduce additional risk into Medical IT networks and failure to implement a robust security program will impact the organization’s ability to protect patients and their medical information from intentional and unintentional loss or damage.

(U) According to Health and Human Services (HHS), a major concern to the Healthcare and Public Health (HPH) Sector is exploitation of potential vulnerabilities of medical devices on Medical IT networks (public, private and domestic). These vulnerabilities may result in possible risks to patient safety and theft or loss of medical information due to the inadequate incorporation of IT products, patient management products and medical devices onto Medical IT Networks. Misconfigured networks or poor security practices may increase the risk of compromised medical devices. HHS states there are four factors which further complicate security resilience within a medical organization.

1. (U) There are legacy medical devices deployed prior to enactment of the Medical Device Law in 1976, that are still in use today.

2. (U) Many newer devices have undergone rigorous FDA testing procedures and come equipped with design features which facilitate their safe incorporation onto Medical IT networks. However, these secure design features may not be implemented during the deployment phase due to complexity of the technology or the lack of knowledge about the capabilities. Because the technology is so new, there may not be an authoritative understanding of how to properly secure it, leaving open the possibilities for exploitation through zero-day vulnerabilities or insecure deployment configurations. In addition, new or robust features, such as custom applications, may also mean an increased amount of third party code development which may create vulnerabilities, if not evaluated properly. Prior to enactment of the law, the FDA required minimal testing before placing on the market. It is challenging to localize and mitigate threats within this group of legacy equipment.

3. (U) In an era of budgetary restraints, healthcare facilities frequently prioritize more traditional programs and operational considerations over network security.

4. (U) Because these medical devices may contain sensitive or privacy information, system owners may be reluctant to allow manufactures access for upgrades or updates. Failure to install updates lays a foundation for increasingly ineffective threat mitigation as time passes.

(U) Implantable Medical Devices (IMD): Some medical computing devices are designed to be implanted within the body to collect, store, analyze and then act on large amounts of information. These IMDs have incorporated network communications capabilities to increase their usefulness. Legacy implanted medical devices still in use today were manufactured when security was not yet a priority. Some of these devices have older proprietary operating systems that are not vulnerable to common malware and so are not supported by newer antivirus software. However, many are vulnerable to cyber attacks by a malicious actor who can take advantage of routine software update capabilities to gain access and, thereafter, manipulate the implant.

(U) During an August 2011 Black Hat conference, a security researcher demonstrated how an outside actor can shut off or alter the settings of an insulin pump without the user’s knowledge. The demonstration was given to show the audience that the pump’s cyber vulnerabilities could lead to severe consequences. The researcher that provided the demonstration is a diabetic and personally aware of the implications of this activity. The researcher also found that a malicious actor can eavesdrop on a continuous glucose monitor’s (CGM) transmission by using an oscilloscope, but device settings could not be reprogrammed. The researcher acknowledged that he was not able to completely assume remote control or modify the programming of the CGM, but he was able to disrupt and jam the device.

http://www.healthreformwatch.com/category/electronic-medical-records/

February 7, 2012

Since the data breach notification regulations by HHS went into effect in September 2009, 385 incidents affecting 500 or more individuals have been reported to HHS, according to its website.

http://www.darkdaily.com/cyber-attacks-against-internet-enabled-medical-devices-are-new-threat-to-clinical-pathology-laboratories-215#axzz1yPzItOFc

February 16 2011

One high-profile healthcare system that regularly experiences such attacks is the Veterans Administration (VA). For two years, the VA has been fighting a cyber battle against illegal and unwanted intrusions into their medical devices

 

http://www.mobiledia.com/news/120863.html

 DEC 16, 2011
Malware in a Georgia hospital’s computer system forced it to turn away patients, highlighting the problems and vulnerabilities of computerized systems.

The computer infection started to cause problems at the Gwinnett Medical Center last Wednesday and continued to spread, until the hospital was forced to send all non-emergency admissions to other hospitals.

More doctors and nurses than ever are using mobile devices in healthcare, and hospitals are making patient records computerized for easier, convenient access over piles of paperwork.

http://www.doctorsofusc.com/uscdocs/locations/lac-usc-medical-center

As one of the busiest public hospitals in the western United States, LAC+USC Medical Center records nearly 39,000 inpatient discharges, 150,000 emergency department visits, and 1 million ambulatory care visits each year.

http://www.healthreformwatch.com/category/electronic-medical-records/

If one jumbo jet crashed in the US each day for a week, we’d expect the FAA to shut down the industry until the problem was figured out. But in our health care system, roughly 250 people die each day due to preventable error

http://www.pcworld.com/article/142926/are_healthcare_organizations_under_cyberattack.html

Feb 28, 2008

“There is definitely an uptick in attacks,” says Dr. John Halamka, CIO at both Beth Israel Deaconess Medical Center and Harvard Medical School in the Boston area. “Privacy is the foundation of everything we do. We don’t want to be the TJX of healthcare.” TJX is the Framingham, Mass-based retailer which last year disclosed a massive data breach involving customer records.

Dr. Halamka, who this week announced a project in electronic health records as an online service to the 300 doctors in the Beth Israel Deaconess Physicians Organization,