Interview John Myles White , Machine Learning for Hackers

Here is an interview with one of the younger researchers  and rock stars of the R Project, John Myles White,  co-author of Machine Learning for Hackers.

Ajay- What inspired you guys to write Machine Learning for Hackers. What has been the public response to the book. Are you planning to write a second edition or a next book?

John-We decided to write Machine Learning for Hackers because there were so many people interested in learning more about Machine Learning who found the standard textbooks a little difficult to understand, either because they lacked the mathematical background expected of readers or because it wasn’t clear how to translate the mathematical definitions in those books into usable programs. Most Machine Learning books are written for audiences who will not only be using Machine Learning techniques in their applied work, but also actively inventing new Machine Learning algorithms. The amount of information needed to do both can be daunting, because, as one friend pointed out, it’s similar to insisting that everyone learn how to build a compiler before they can start to program. For most people, it’s better to let them try out programming and get a taste for it before you teach them about the nuts and bolts of compiler design. If they like programming, they can delve into the details later.

We once said that Machine Learning for Hackers  is supposed to be a chemistry set for Machine Learning and I still think that’s the right description: it’s meant to get readers excited about Machine Learning and hopefully expose them to enough ideas and tools that they can start to explore on their own more effectively. It’s like a warmup for standard academic books like Bishop’s.
The public response to the book has been phenomenal. It’s been amazing to see how many people have bought the book and how many people have told us they found it helpful. Even friends with substantial expertise in statistics have said they’ve found a few nuggets of new information in the book, especially regarding text analysis and social network analysis — topics that Drew and I spend a lot of time thinking about, but are not thoroughly covered in standard statistics and Machine Learning  undergraduate curricula.
I hope we write a second edition. It was our first book and we learned a ton about how to write at length from the experience. I’m about to announce later this week that I’m writing a second book, which will be a very short eBook for O’Reilly. Stay tuned for details.

Ajay-  What are the key things that a potential reader can learn from this book?

John- We cover most of the nuts and bolts of introductory statistics in our book: summary statistics, regression and classification using linear and logistic regression, PCA and k-Nearest Neighbors. We also cover topics that are less well known, but are as important: density plots vs. histograms, regularization, cross-validation, MDS, social network analysis and SVM’s. I hope a reader walks away from the book having a feel for what different basic algorithms do and why they work for some problems and not others. I also hope we do just a little to shift a future generation of modeling culture towards regularization and cross-validation.

Ajay- Describe your journey as a science student up till your Phd. What are you current research interests and what initiatives have you done with them?

John-As an undergraduate I studied math and neuroscience. I then took some time off and came back to do a Ph.D. in psychology, focusing on mathematical modeling of both the brain and behavior. There’s a rich tradition of machine learning and statistics in psychology, so I got increasingly interested in ML methods during my years as a grad student. I’m about to finish my Ph.D. this year. My research interests all fall under one heading: decision theory. I want to understand both how people make decisions (which is what psychology teaches us) and how they should make decisions (which is what statistics and ML teach us). My thesis is focused on how people make decisions when there are both short-term and long-term consequences to be considered. For non-psychologists, the classic example is probably the explore-exploit dilemma. I’ve been working to import more of the main ideas from stats and ML into psychology for modeling how real people handle that trade-off. For psychologists, the classic example is the Marshmallow experiment. Most of my research work has focused on the latter: what makes us patient and how can we measure patience?

Ajay- How can academia and private sector solve the shortage of trained data scientists (assuming there is one)?

John- There’s definitely a shortage of trained data scientists: most companies are finding it difficult to hire someone with the real chops needed to do useful work with Big Data. The skill set required to be useful at a company like Facebook or Twitter is much more advanced than many people realize, so I think it will be some time until there are undergraduates coming out with the right stuff. But there’s huge demand, so I’m sure the market will clear sooner or later.

The changes that are required in academia to prepare students for this kind of work are pretty numerous, but the most obvious required change is that quantitative people need to be learning how to program properly, which is rare in academia, even in many CS departments. Writing one-off programs that no one will ever have to reuse and that only work on toy data sets doesn’t prepare you for working with huge amounts of messy data that exhibit shifting patterns. If you need to learn how to program seriously before you can do useful work, you’re not very valuable to companies who need employees that can hit the ground running. The companies that have done best in building up data teams, like LinkedIn, have learned to train people as they come in since the proper training isn’t typically available outside those companies.
Of course, on the flipside, the people who do know how to program well need to start learning more about theory and need to start to have a better grasp of basic mathematical models like linear and logistic regressions. Lots of CS students seem not to enjoy their theory classes, but theory really does prepare you for thinking about what you can learn from data. You may not use automata theory if you work at Foursquare, but you will need to be able to reason carefully and analytically. Doing math is just like lifting weights: if you’re not good at it right now, you just need to dig in and get yourself in shape.
John Myles White is a Phd Student in  Ph.D. student in the Princeton Psychology Department, where he studies human decision-making both theoretically and experimentally. Along with the political scientist Drew Conway, he is  the author of a book published by O’Reilly Media entitled “Machine Learning for Hackers”, which is meant to introduce experienced programmers to the machine learning toolkit. He is also working with Mark Hansenon a book for laypeople about exploratory data analysis.John is the lead maintainer for several R packages, including ProjectTemplate and log4r.

(TIL he has played in several rock bands!)

You can read more in his own words at his blog at
He can be contacted via social media at Google Plus at or twitter at

How to learn to be a hacker easily

1) Are you sure. It is tough to be a hacker. And football players get all the attention.

2) Really? Read on

3) Read Hacker’s Code

The Hacker’s Code

“A hacker of the Old Code.”

  • Hackers come and go, but a great hack is forever.
  • Public goods belong to the public.*
  • Software hoarding is evil.
    Software does the greatest good given to the greatest number.
  • Don’t be evil.
  • Sourceless software sucks.
  • People have rights.
    Organizations live on sufferance.
  • Governments are organizations.
  • If it is wrong when citizens do it,
    it is wrong when governments do it.
  • Information wants to be free.
    Information deserves to be free.
  • Being legal doesn’t make it right.
  • Being illegal doesn’t make it wrong.
  • Subverting tyranny is the highest duty.
  • Trust your technolust!

4) Read How to be a hacker by

Eric Steven Raymond

or just get the Hacker Attitude

The Hacker Attitude

1. The world is full of fascinating problems waiting to be solved.
2. No problem should ever have to be solved twice.
3. Boredom and drudgery are evil.
4. Freedom is good.
5. Attitude is no substitute for competence.
5) If you are tired of reading English, maybe I should move on to technical stuff
6) Create your hacking space, a virtual disk on your machine.
You will need to learn a bit of Linux. If you are a Windows user, I recommend creating a VMWare partition with Ubuntu
If you like Mac, I recommend the more aesthetic Linux Mint.
How to create your virtual disk-
read here-
Download VM Player here
Down iso image of operating system here
Downloading is the longest thing in this exercise
Now just do what is written here
or if you want to try and experiment with other ways to use Windows and Linux just read this
Moving data back and forth between your new virtual disk and your old real disk
7) Get Tor to hide your IP address when on internet
8a ) Block Ads using Ad-block plugin when surfing the internet (like 14.95 million other users)
 8b) and use Mafiafire to get elusive websites
9) Get a  Bit Torrent Client at
This will help you download stuff
10) Hacker Culture Alert-
This instruction is purely for sharing the culture but not the techie work of being a hacker
The website Pirate bay acts like a search engine for Bit torrents
Visiting it is considered bad since you can get lots of music, videos, movies etc for free, without paying copyright fees.
The website 4chan is considered a meeting place to meet other hackers. The site can be visually shocking
You need to do atleast set up these systems, read the websites and come back in N month time for second part in this series on how to learn to be a hacker. That will be the coding part.
Updated – sorry been a bit delayed on next part. Will post soon.

C4ISTAR for Hacking and Cyber Conflict

As per

C2I stands for command, control, and intelligence.

C3I stands for command, control, communications, and intelligence.

C4I stands for command, control, communications, computers, and (military) intelligence.

C4ISTAR is the British acronym used to represent the group of the military functions designated by C4 (command, control, communications, computers), I (military intelligence), and STAR (surveillance, target acquisition, and reconnaissance) in order to enable the coordination of operations

I increasingly believe that cyber conflict will develop its own terminology and theory and paradigms in due time. In the meantime, it will adopt paradigms from existing military literature and adapt it to the unique sub culture of cyber conflict for both offensive, defensive as well as pre-emptive actions. Here I am theorizing for a case of targeted hacking attacks rather than massive attacks that bring down a website for a few hours and achieve nothing but a few press headlines . I would also theorize on countering such attacks.

So what would be the C4ISTAR for –

1) Media company supporting SOPA/PIPA/Take down Mega Upload-

Command and Control refers to the ability of commanders to direct forces-

This will be the senior executives including the members of board, legal officers, and public relationship/marketing people. Their name is available from corporate websites, and social media scraping can ensure both a list of contact addresses (online) as well as biases for phishing /malware attacks. This could also include phone (flooding or voicemail hacking ) attacks , and attacks against the email server of the company rather than the corporate website.

Communications– This will include all online and social media channels including websites of the media company , but also  those of the press relations firms handling communications , phones,websites- anything which the target is likely to communicate externally (and if possible internal communication)

Timing is everything- coordinating attacks immediately is juevenile, but it might be more mature to attack on vulnerable days like product launches or just before a board of directors meeting


Most corporates have an in-house research team, they can be easily targeted using social media channels, but also offline research and digging deep. Targeting intelligence corps of the target corporate is likely to produce a much better disruption. Eventually they can be persuaded to stop working for that corporate.

Computers– Anything that runs on electricity and can be disabled – should be disabled. This might require much more creativity than just flooding.

 surveillance-  This can be both online as well as offline, and would be of electronic assets, likely responses for the attack, and the key people who are to be disrupted.

target acquisition-  at least ten people within each corporate can and should be ideally disrupted, rather than just the website. this would call for social media scraping, and prior planning. even email in-boxes can be disrupted (if all else fails)

and reconnaissance-

study your target companies, target employees, and their strategies.

Then segment and prioritize in a list of  matrix of 10  to 10, who is more vulnerable and who is more valuable to attack.

the C4ISTAR for -a hacker activist organization is much more complicated but forensics reveal that most hackers tend to leave a signature style (in terms of computers,operating systems,machine ids,communication, tools, or even port numbers used)

the best defense for a media rich company to prevent hacking attacks is to first identify its own C4ISTAR structure for its digital content strategy and then fortify as well as scrub vulnerabilities (including from online information regarding its own employees)

(to be continued)

The Hacker Attitude

Interview Beth Schultz Editor

Here is an interview with Beth Scultz Editor in Chief, . is the new online community on Predictive Analytics, and its a bit different in emphasizing quality more than just quantity. Beth is veteran in tech journalism and communities.

Ajay-Describe your journey in technology journalism and communication. What are the other online communities that you have been involved with?

Beth- I’m a longtime IT journalist, having begun my career covering the telecommunications industry at the brink of AT&T’s divestiture — many eons ago. Over the years, I’ve covered the rise of internal corporate networking; the advent of the Internet and creation of the Web for business purposes; the evolution of Web technology for use in building intranets, extranets, and e-commerce sites; the move toward a highly dynamic next-generation IT infrastructure that we now call cloud computing; and development of myriad enterprise applications, including business intelligence and the analytics surrounding them. I have been involved in developing online B2B communities primarily around next-generation enterprise IT infrastructure and applications. In addition, Shawn Hessinger, our community editor, has been involved in myriad Web sites aimed at creating community for small business owners.

 Ajay- Technology geeks get all the money while journalists get a story. Comments please

Beth- Great technology geeks — those being the ones with technology smarts as well as business savvy — do stand to make a lot of money. And some pursue that to all ends (with many entrepreneurs gunning for the acquisition) while others more or less fall into it. Few journalists, at least few tech journalists, have big dollars in mind. The gratification for journalists comes in being able to meet these folks, hear and deliver their stories — as appropriate — and help explain what makes this particular technology geek developing this certain type of product or service worth paying attention to.

 Ajay- Describe what you are trying to achieve with the All Analytics community and how it seeks to differentiate itself with other players in this space.

 Beth- With, we’re concentrating on creating the go-to site for CXOs, IT professionals, line-of-business managers, and other professionals to share best practices, concrete experiences, and research about data analytics, business intelligence, information optimization, and risk management, among many other topics. We differentiate ourself by featuring excellent editorial content from a top-notch group of bloggers, access to industry experts through weekly chats, ongoing lively and engaging message board discussions, and biweekly debates.

We’re a new property, and clearly in rapid building mode. However, we’ve already secured some of the industry’s most respected BI/analytics experts to participate as bloggers. For example, a small sampling of our current lineup includes the always-intrigueing John Barnes, a science fiction novelist and statistics guru; Sandra Gittlen, a longtime IT journalist with an affinity for BI coverage; Olivia Parr-Rud, an internationally recognized expert in BI and organizational alignment; Tom Redman, a well-known data-quality expert; and Steve Williams, a leading BI strategy consultant. I blog daily as well, and in particular love to share firsthand experiences of how organizations are benefiting from the use of BI, analytics, data warehousing, etc. We’ve featured inside looks at analytics initiatives at companies such as, Oberweis Dairy, the Cincinnati Zoo & Botanical Garden, and Thomson Reuters, for example.

In addition, we’ve hosted instant e-chats with Web and social media experts Joe Stanganelli and Pierre DeBois, and this Friday, Aug. 26, at 3 p.m. ET we’ll be hosting an e-chat with Marshall Sponder, Web metrics guru and author of the newly published book, Social Media Analytics: Effective Tools for Building, Interpreting, and Using Metrics. (Readers interested in participating in the chat do need to fill out a quick registration form, available here . The chat is available here .

Experts participating in our biweekly debate series, called Point/Counterpoint, have broached topics such as BI in the cloud, mobile BI and whether an analytics culture is truly possible to build.

Ajay-  What are some tips you would like to share about writing tech stories to aspiring bloggers.

Beth- I suppose my best advice is this: Don’t write about technology for technology’s sake. Always strive to tell the audience why they should care about a particular technology, product, or service. How might a reader use it to his or her company’s advantage, and what are the potential benefits? Improved productivity, increased revenue, better customer service? Providing anecdotal evidence goes a long way toward delivering that message, as well.

Ajay- What are the other IT world websites that have made a mark on the internet.

Beth- I’d be remiss if I didn’t give a shout out to UBM TechWeb sites, including InformationWeek, which has long charted the use of IT within the enterprise; Dark Reading, a great source for folks interested in securing an enterprise’s information assets; and Light Reading, which takes the pulse of the telecom industry.


Beth Schultz has more than two decades of experience as an IT writer and editor. Most recently, she brought her expertise to bear writing thought-provoking editorial and marketing materials on a variety of technology topics for leading IT publications and industry players. Previously, she oversaw multimedia content development, writing and editing for special feature packages at Network World. Beth has a keen ability to identify business and technology trends, developing expertise through in-depth analysis and early-adopter case studies. Over the years, she has earned more than a dozen national and regional editorial excellence awards for special issues from American Business Media, American Society of Business Press Editors,, and others.


A Poem for all those restless Arabian Knights

The cast of Watchmen, created in 1986 by Gibbo...
Image via Wikipedia

I met a traveller from an antique land
Who said: Two vast and trunkless legs of stone
Stand in the desert. Near them, on the sand,
Half sunk, a shattered visage lies, whose frown
And wrinkled lip, and sneer of cold command
Tell that its sculptor well those passions read
Which yet survive, stamped on these lifeless things,
The hand that mocked them and the heart that fed.

And on the pedestal these words appear:
“My name is Ozymandias, king of kings:
Look on my works, ye Mighty, and despair!”
Nothing beside remains. Round the decay
Of that colossal wreck, boundless and bare
The lone and level sands stretch far away.[1]

OZYMANDIAS BY Horace Smith.[12

In Egypt’s sandy silence, all alone,
Stands a gigantic Leg, which far off throws
The only shadow that the Desert knows:
“I am great OZYMANDIAS,” saith the stone,
“The King of Kings; this mighty City shows
“The wonders of my hand.” The City’s gone,
Nought but the Leg remaining to disclose
The site of this forgotten Babylon.
We wonder, and some Hunter may express
Wonder like ours, when thro’ the wilderness
Where London stood, holding the Wolf in chace,
He meets some fragments huge, and stops to guess
What powerful but unrecorded race
Once dwelt in that annihilated place.