Poets and Hackers

My latest book , a collaboration with many fine artists is now up. Its called Poets and Hackers

Enjoy!

Poets & Hackers v5http://www.scribd.com/embeds/66419481/content?start_page=1&view_mode=list&access_key=key-23x8ifmmz5noevn8m4vn//

Featured: PAW & TAW NYC Hotel Reservations Due This Week

Message from PAWCON-

Space is filling up fast at the Hilton New York, host hotel for Predictive Analytics World and Text Analytics World, next month in New York City. Take advantage of the special room rate negotiated for attendees prior to Friday, September 23rd.

Space is limited so be sure to book your room before it’s too late.

You can reserve your room today by calling             212-586-7000       and reference Data Driven Business Week or online at:
http://www.hilton.com/en/hi/groups/personalized/N/NYCNHHH-RMSP-20111015/index.jhtml?WT.mc_id=POG#reservation

MORE INFORMATION:

PAW: http://www.pawcon.com/nyc
PAW REGISTRATION: http://www.pawcon.com/newyork/register.php

TAW: http://www.tawgo.com/nyc
TAW REGISTRATION: http://www.tawgo.com/newyork/2011/registration

View the PAW overview video: www.pawcon.com/newyork/2011/video_about_predictive_analytics_world.php 

Interview Dan Steinberg Founder Salford Systems

Here is an interview with Dan Steinberg, Founder and President of Salford Systems (http://www.salford-systems.com/ )

Ajay- Describe your journey from academia to technology entrepreneurship. What are the key milestones or turning points that you remember.

 Dan- When I was in graduate school studying econometrics at Harvard,  a number of distinguished professors at Harvard (and MIT) were actively involved in substantial real world activities.  Professors that I interacted with, or studied with, or whose software I used became involved in the creation of such companies as Sun Microsystems, Data Resources, Inc. or were heavily involved in business consulting through their own companies or other influential consultants.  Some not involved in private sector consulting took on substantial roles in government such as membership on the President’s Council of Economic Advisors. The atmosphere was one that encouraged free movement between academia and the private sector so the idea of forming a consulting and software company was quite natural and did not seem in any way inconsistent with being devoted to the advancement of science.

 Ajay- What are the latest products by Salford Systems? Any future product plans or modification to work on Big Data analytics, mobile computing and cloud computing.

 Dan- Our central set of data mining technologies are CART, MARS, TreeNet, RandomForests, and PRIM, and we have always maintained feature rich logistic regression and linear regression modules. In our latest release scheduled for January 2012 we will be including a new data mining approach to linear and logistic regression allowing for the rapid processing of massive numbers of predictors (e.g., one million columns), with powerful predictor selection and coefficient shrinkage. The new methods allow not only classic techniques such as ridge and lasso regression, but also sub-lasso model sizes. Clear tradeoff diagrams between model complexity (number of predictors) and predictive accuracy allow the modeler to select an ideal balance suitable for their requirements.

The new version of our data mining suite, Salford Predictive Modeler (SPM), also includes two important extensions to the boosted tree technology at the heart of TreeNet.  The first, Importance Sampled learning Ensembles (ISLE), is used for the compression of TreeNet tree ensembles. Starting with, say, a 1,000 tree ensemble, the ISLE compression might well reduce this down to 200 reweighted trees. Such compression will be valuable when models need to be executed in real time. The compression rate is always under the modeler’s control, meaning that if a deployed model may only contain, say, 30 trees, then the compression will deliver an optimal 30-tree weighted ensemble. Needless to say, compression of tree ensembles should be expected to be lossy and how much accuracy is lost when extreme compression is desired will vary from case to case. Prior to ISLE, practitioners have simply truncated the ensemble to the maximum allowable size.  The new methodology will substantially outperform truncation.

The second major advance is RULEFIT, a rule extraction engine that starts with a TreeNet model and decomposes it into the most interesting and predictive rules. RULEFIT is also a tree ensemble post-processor and offers the possibility of improving on the original TreeNet predictive performance. One can think of the rule extraction as an alternative way to explain and interpret an otherwise complex multi-tree model. The rules extracted are similar conceptually to the terminal nodes of a CART tree but the various rules will not refer to mutually exclusive regions of the data.

 Ajay- You have led teams that have won multiple data mining competitions. What are some of your favorite techniques or approaches to a data mining problem.

 Dan- We only enter competitions involving problems for which our technology is suitable, generally, classification and regression. In these areas, we are  partial to TreeNet because it is such a capable and robust learning machine. However, we always find great value in analyzing many aspects of a data set with CART, especially when we require a compact and easy to understand story about the data. CART is exceptionally well suited to the discovery of errors in data, often revealing errors created by the competition organizers themselves. More than once, our reports of data problems have been responsible for the competition organizer’s decision to issue a corrected version of the data and we have been the only group to discover the problem.

In general, tackling a data mining competition is no different than tackling any analytical challenge. You must start with a solid conceptual grasp of the problem and the actual objectives, and the nature and limitations of the data. Following that comes feature extraction, the selection of a modeling strategy (or strategies), and then extensive experimentation to learn what works best.

 Ajay- I know you have created your own software. But are there other software that you use or liked to use?

 Dan- For analytics we frequently test open source software to make sure that our tools will in fact deliver the superior performance we advertise. In general, if a problem clearly requires technology other than that offered by Salford, we advise clients to seek other consultants expert in that other technology.

 Ajay- Your software is installed at 3500 sites including 400 universities as per http://www.salford-systems.com/company/aboutus/index.html What is the key to managing and keeping so many customers happy?

 Dan- First, we have taken great pains to make our software reliable and we make every effort  to avoid problems related to bugs.  Our testing procedures are extensive and we have experts dedicated to stress-testing software . Second, our interface is designed to be natural, intuitive, and easy to use, so the challenges to the new user are minimized. Also, clear documentation, help files, and training videos round out how we allow the user to look after themselves. Should a client need to contact us we try to achieve 24-hour turn around on tech support issues and monitor all tech support activity to ensure timeliness, accuracy, and helpfulness of our responses. WebEx/GotoMeeting and other internet based contact permit real time interaction.

 Ajay- What do you do to relax and unwind?

 Dan- I am in the gym almost every day combining weight and cardio training. No matter how tired I am before the workout I always come out energized so locating a good gym during my extensive travels is a must. I am also actively learning Portuguese so I look to watch a Brazilian TV show or Portuguese dubbed movie when I have time; I almost never watch any form of video unless it is available in Portuguese.

 Biography-

http://www.salford-systems.com/blog/dan-steinberg.html

Dan Steinberg, President and Founder of Salford Systems, is a well-respected member of the statistics and econometrics communities. In 1992, he developed the first PC-based implementation of the original CART procedure, working in concert with Leo Breiman, Richard Olshen, Charles Stone and Jerome Friedman. In addition, he has provided consulting services on a number of biomedical and market research projects, which have sparked further innovations in the CART program and methodology.

Dr. Steinberg received his Ph.D. in Economics from Harvard University, and has given full day presentations on data mining for the American Marketing Association, the Direct Marketing Association and the American Statistical Association. After earning a PhD in Econometrics at Harvard Steinberg began his professional career as a Member of the Technical Staff at Bell Labs, Murray Hill, and then as Assistant Professor of Economics at the University of California, San Diego. A book he co-authored on Classification and Regression Trees was awarded the 1999 Nikkei Quality Control Literature Prize in Japan for excellence in statistical literature promoting the improvement of industrial quality control and management.

His consulting experience at Salford Systems has included complex modeling projects for major banks worldwide, including Citibank, Chase, American Express, Credit Suisse, and has included projects in Europe, Australia, New Zealand, Malaysia, Korea, Japan and Brazil. Steinberg led the teams that won first place awards in the KDDCup 2000, and the 2002 Duke/TeraData Churn modeling competition, and the teams that won awards in the PAKDD competitions of 2006 and 2007. He has published papers in economics, econometrics, computer science journals, and contributes actively to the ongoing research and development at Salford.

Contest : 2 free passes to Predictive Analytics World

I got some good news from the fine people at Predictive Analytics World.

 you qualify for 2 free passes to the PAW NYC event October 16-20, 2011.  I will be sending you a code to use for registration to receive these passes within the next couple of days.

If you cannot attend our PAW NYC event, please feel free to use these two free passes as a promotional tool within your blog.

Now I have been partnering with PAW for a long time, so it is nice to have free passes. I am grateful for their support of this blog. Therein lies my dilemma. I am in India, and a return ticket from NYC to India costs 1100$. Unless something drastic happens , I dont see myself with that kind of travel money.

Ergo.

I am offering two free passes to Predictive Analytics World . http://predictiveanalyticsworld.com/

All you need to do is – ahem- cough-

  1. like the Facebook Page of Decisionstats.  https://www.facebook.com/pages/Decisionstats/217450141605435 OR
  2.  Add me to a Google circle https://plus.google.com/116302364907696741272/posts OR
  3. Follow me on Twitter https://twitter.com/#!/0_h_r_1

AND


  1. Read one of my poems at my poetry blog at http://poemsforkush.wordpress.com/ and leave a comment with your email id please . It’s a promotion for my next book “Poets and Hackers” due for release in 2 weeks.
The 2 free passes are for any 2 days of the PAW NYC event.  This free pass may not be used for Text Analytics World conference being held the same week.  Please have your Contest winners use the Free Code:  XXXXXXXX.  This code will be good for two uses in registering. 
Thats ‘it. Two free passes , and go for it if you are around NYC in October. NY is a lovely place and I am wearing my red FDNY T shirt as I am typing this.

What do you get?

One of these –http://www.predictiveanalyticsworld.com/newyork/register.php (details awaited!) to

http://www.predictiveanalyticsworld.com/newyork/2011/

Predictive Analytics World Header Image

 

 

Cloud Computing with #Rstats and CloudNumbers.com

Some of you know that I am due to finish “R for Business Analytics” for Springer by Dec 2011 and “R for Cloud Computing” by Dec 2012. Accordingly while I am busy crunching out ” R for Business Analytics” which is a corporate business analyst\s view on using #Rstats, I am gathering material for the cloud computing book too.

I have been waiting for someone like CloudNumbers.com for some time now, and I like their initial pricing structure.  As scale picks up, this should only get better. As a business Intelligence analyst, I wonder if they can help set up a dedicated or private cloud too for someone who wants a data mart solution to be done.The best thing I like about this- they have a referral scheme so if someone you know wants to test it out, well it gives you some freebies too in the form of an invitation code.

 

 

 

 

  • I read the instructions

  • I reviewed the pricing plan and click back to the dashboard 

 

  • I clicked on start new session

  • I click next
  • Choosing R from a very convenient interface design
  • Choosing all the applications I may need
  • This is a really nice feature in enabling to choose packages for R
  • Finally I can choose ONLY 7.5 gb RAM in the free version

I name the session in case I want to start multiple sessions

After waiting 15 minutes, my instance is up and I type R to get the following

Note I can also see the desktop- which is a great improvement over EC2 interface for R Cloud computing on Linux. Also it shuts down on its own if I leave it running (as of now after 180 minutes) so i click shut down session

 

You can click this link to try and get your own cloud in the sky for free -10 hours are free for you

https://my.cloudnumbers.com/register/65E97A

 

Google Product Launches

So dear G launched a whole new set of Products. Some thoughts-

1) Join up the Social Invite List here – it is called Google Plus. We hope it doesnt end up like Buzz http://www.google.com/buzz or Orkut https://groups.google.com/group/opensocial-api/?pli=1 or Plus One http://www.google.com/webmasters/+1/button/ or Wave (email killer) http://googlewave.blogspot.com/

When the biggest cloud computing company in the world announces a phased rollout to a product- we wonder if they are really sure on launching the product rollout or just were in a hurry again.

Machine learning wont work with social , chaps. Well not everything in social. And the Google Social Blog forgot to write about it http://googlesocialweb.blogspot.com/

Well anyways, even Google Finance’s automated announcements feed failed to pick many of their own  product launches (or it does in an automated manner depending on which time period you choose – yes still no social buttons up http://www.google.com/finance?q=google

BACK TO GOOGLE PLUS

https://services.google.com/fb/forms/googleplus/

Google+

Thanks for stopping by.We’re still ironing out a few kinks in Google+, so it’s not quite ready for everyone to climb aboard. But, if you want, we’ll let you know the minute the doors are open for real. Cool? Cool.

  • First Name *
  • Email *

Google+ Privacy Policy

2) Google Web Fonts- Great product, how and hey http://googlewebfonts.blogspot.com/ when do you plan to monetize uhm  web fonts. Not that would be awesome. Not even a single ad on those pages- not even for philanthropy. or poor poets. or even Google Book Authors who self publish . Sound of silence….

http://www.google.com/webfonts/v2

 

3) Google Analytics gets some groove back. I really want to see much better integration of Google Apps and Google Analytics and Google Desktop search. Ditto for the interface. Enterprise software uses different fonts than retail software, dude. More fries, http://analytics.blogspot.com/ ?

Feature 1- Custom Reports for metrics I can slice and dice on my own

Feature 2 Awesome analytics for In-Page Analytics (beta feature) Beta is boring if overused. Try Theta maybe?

Feature 3 Daily Automated Alerts for Unusual Server /Traffic Activity

Feature 4 event Tracking is cool esp for understanding social media impact

 

It is still too early for mobile (in terms of traffic) as well as tablet analytics (?)

4) Angry Birds is still the best feature in Chrome  (but there are lots others at http://chrome.blogspot.com/) and esp http://googlecode.blogspot.com/2011/06/working-with-chromes-file-browser.html

Try http://chrome.angrybirds.com/

There are ways to make software that are not evil. Very very disappointed at total lack of monetization of this chrome app. Not even a T Shirt for me to buy ad . sighs

Funny thing- the product manager forgot to take off Facebook like button or even add the +1 button or even the Tweet this button.

Quo Vadis ?

 

5) What do you love?

http://www.wdyl.com/#

Calling #Rstats lovers and bloggers – to work together on “The R Programming wikibook”

so you think u like R, huh. Well it is time to pay it forward.

Message from a dear R blogger, Tal G from Tel Aviv (creator of R-bloggers.com and SAS-X.com)

———————————————————————————————————-
Calling R lovers and bloggers – to work together on “The R Programming wikibook”
Posted: 20 Jun 2011 07:05 AM PDT

This post is a call for both R community members and R-bloggers, to come and help make The R Programming wikibook be amazing:

Dear R community member – please consider giving a visit to The R Programming wikibook. If you wish to contribute your knowledge and editing skills to the project, then you could learn how to write in wiki-markup here, and how to edit a wikibook here (you can even use R syntax highlighting in the wikibook). You could take information into the site from the (soon to be) growing list of available R resources for harvesting.

Dear R blogger, you can help The R Programming wikibook by doing the following:

Write to your readers about the project and invite them to join.
Add your blog’s R content as an available resource for other editors to use for the wikibook. Here is how to do that:
First, make a clear indication on your blog that your content is licensed under cc-by-sa copyrights (*see what it means at the end of the post). You can do this by adding it to the footer of your blog, or by writing a post that clearly states that this is the case (what a great opportunity to write to your readers about the project…).
Next, go and add a link, to where all of your R content is located on your site, to the resource page (also with a link to the license post, if you wrote one). For example, since I write about other things besides R, I would give a link to my R category page, and will also give a link to this post. If you do not know how to add it to the wiki, just e-mail me about it (tal.galili@gmail.com).
If you are an R blogger, besides living up to the spirit of the R community, you will benefit from joining this project in that every time someone will use your content on the wikibook, they will add your post as a resource. In the long run, this is likely to help visitors of the site get to know about you and strengthen your site’s SEO ranking. Which reminds me, if you write about this, I always appreciate a link back to my blog

* Having a cc-by-sa copyrights means that you will agree that anyone may copy, distribute, display, and make derivative works based on your content, only if they give the author (you) the credits in the manner specified by you. And also that the user may distribute derivative works only under a license identical to the license that governs the original work.

———-

Three more points:

1) This post is a result of being contacted by Paul (a.k.a: PAC2), asking if I could help promote “The R Programming wikibook” among R-bloggers and their readers. Paul has made many contributions to the book so far. So thank you Paul for both reaching out and helping all of us with your work on this free open source project.

2) I should also mention that the R wiki exists and is open for contribution. And naturally, every thing that will help the R wikibook will help the R wiki as well.

3) Copyright notice: I hereby release all of the writing material content that is categoriesed in the R category page, under the cc-by-sa copyrights (date: 20.06.2011). Now it’s your turn!

———-

List of R bloggers who have joined: (This list will get updated as this “group writing” project will progress)

R-statistics blog (that’s Tal…)
Decisionstats.com (That’s me)
……………………………………………………………………………….
3) Copyright notice: I hereby release all of the writing material content of this website, under the cc-by-sa copyrights (date: 21.06.2011). Now it’s your turn!

https://decisionstats.com/privacy-3/

Content Licensing-
This website has all content licensed under
http://creativecommons.org/licenses/by-sa/3.0/
You are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work