Customizing your R software startup

Customizing your R software startup helps you do the following.
Thus it helps you to boot up R faster.
It automatically loads packages that you use regularly (like a R GUI -Deducer, Rattle or R Commander), set a CRAN mirror that you mostly use or is nearest for downloading new packages, and set some optional parameters.

Everytime you start R Instead of doing this , loading same R packages, setting a CRAN mirror,setting some new functions- the user needs to do this just once by customizing the R Profile SITE file.

This is done by editing the $R_HOME/etc/Renviron file for globally setting a default or the .Renviron file that is created in your home directory for a shared system.

There are two special functions you can customize in these files.
.First( ) will be run at the start of the R session and
.Last( ) will be run when the R session is shutting down.

When R starts up, it loads the .Rprofile file in your home directory and executes the .First() function.

Where is the R Profile file?
It is located in the \etc folder of your R folder- folder you installed R in.
In Windows the folder will be of the format -”C:\Program Files\R\R-x.ab.c\etc”
where x.ab.c will be the R version number (like 2.11.1)
Example
.First <- function(){
library(rattle)
rattle()
cat(“\nHello World”, date(), “\n”)
}

will automatically start the Rattle GUI for data mining and print Hello World with the date in your session.

You can also modify the Rcmd_environ file in the same \etc folder if you are particular on your settings

## Default browser
R_BROWSER=${R_BROWSER-‘C:\Documents and Settings\abc\Local Settings\Application Data\Google\Chrome\Application\chrome.exe} ## Default editor EDITOR=${EDITOR-${notepad++}}

will change the default Web browser to Chrome and the default editor to Notepad++ which is an enhanced Code Editor.

Interview Michael J. A. Berry Data Miners, Inc

Here is an interview with noted Data Mining practitioner Michael Berry, author of seminal books in data mining, noted trainer and consultantmjab picture

Ajay- Your famous book “Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management” came out in 2004, and an update is being planned for 2011. What are the various new data mining techniques and their application that you intend to talk about in that book.

Michael- Each time we do a revision, it feels like writing a whole new book. The first edition came out in 1997 and it is hard to believe how much the world has changed since then. I’m currently spending most of my time in the on-line retailing world. The things I worry about today–improving recommendations for cross-sell and up-sell,and search engine optimization–wouldn’t have even made sense to me back then. And the data sizes that are routine today were beyond the capacity of the most powerful super computers of the nineties. But, if possible, Gordon and I have changed even more than the data mining landscape. What has changed us is experience. We learned an awful lot between the first and second editions, and I think we’ve learned even more between the second and third.

One consequence is that we now have to discipline ourselves to avoid making the book too heavy to lift. For the first edition, we could write everything we knew (and arguably, a bit more!); now we have to remind ourselves that our intended audience is still the same–intelligent laymen with a practical interest in getting more information out of data. Not statisticians. Not computer scientists. Not academic researchers. Although we welcome all readers, we are primarily writing for someone who works in a marketing department and has a title with the word “analyst” or “analytics” in it. We have relaxed our “no equations” rule slightly for cases when the equations really do make things easier to explain, but the core explanations are still in words and pictures.

The third edition completes a transition that was already happening in the second edition. We have fully embraced standard statistical modeling techniques as full-fledged components of the data miner’s toolkit. In the first edition, it seemed important to make a distinction between old, dull, statistics, and new, cool, data mining. By the second edition, we realized that didn’t really make sense, but remnants of that attitude persisted. The third edition rectifies this. There is a chapter on statistical modeling techniques that explains linear and logistic regression, naive Bayes models, and more. There is also a brand new chapter on text mining, a curious omission from previous editions.

There is also a lot more material on data preparation. Three whole chapters are devoted to various aspects of data preparation. The first focuses on creating customer signatures. The second is focused on using derived variables to bring information to the surface, and the third deals with data reduction techniques such as principal components. Since this is where we spend the greatest part of our time in our work, it seemed important to spend more time on these subjects in the book as well.

Some of the chapters have been beefed up a bit. The neural network chapter now includes radial basis functions in addition to multi-layer perceptrons. The clustering chapter has been split into two chapters to accommodate new material on soft clustering, self-organizing maps, and more. The survival analysis chapter is much improved and includes material on some of our recent application of survival analysis methods to forecasting. The genetic algorithms chapter now includes a discussion of swarm intelligence.

Ajay- Describe your early career and how you came into Data Mining as a profession. What do you think of various universities now offering MS in Analytics. How do you balance your own teaching experience with your consulting projects at The Data Miners.

Michael- I fell into data mining quite by accident. I guess I always had a latent interest in the topic. As a high school and college student, I was a fan of Martin Gardner‘s mathematical games in in Scientific American. One of my favorite things he wrote about was a game called New Eleusis in which one players, God, makes up a rule to govern how cards can be played (“an even card must be followed by a red card”, say) and the other players have to figure out the rule by watching what plays are allowed by God and which ones are rejected. Just for my own amusement, I wrote a computer program to play the game and presented it at the IJCAI conference in, I think, 1981.

That paper became a chapter in a book on computer game playing–so my first book was about finding patterns in data. Aside from that, my interest in finding patterns in data lay dormant for years. At Thinking Machines, I was in the compiler group. In particular, I was responsible for the run-time system of the first Fortran Compiler for the CM-2 and I represented Thinking Machines at the Fortran 8X (later Fortran-90) standards meetings.

What changed my direction was that Thinking Machines got an export license to sell our first machine overseas. The machine went to a research lab just outside of Paris. The connection machine was so hard to program, that if you bought one, you got an applications engineer to go along with it. None of the applications engineers wanted to go live in Paris for a few months, but I did.

Paris was a lot of fun, and so, I discovered, was actually working on applications. When I came back to the states, I stuck with that applied focus and my next assignment was to spend a couple of years at Epsilon, (then a subsidiary of American Express) working on a database marketing system that stored all the “records of charge” for American Express card members. The purpose of the system was to pick ads to go in the billing envelope. I also worked on some more general purpose data mining software for the CM-5.

When Thinking Machines folded, I had the opportunity to open a Cambridge office for a Virginia-based consulting company called MRJ that had been a major channel for placing Connection Machines in various government agencies. The new group at MRJ was focused on data mining applications in the commercial market. At least, that was the idea. It turned out that they were more interested in data warehousing projects, so after a while we parted company.

That led to the formation of Data Miners. My two partners in Data Miners, Gordon Linoff and Brij Masand, share the Thinking Machines background.

To tell the truth, I really don’t know much about the university programs in data mining that have started to crop up. I’ve visited the one at NC State, but not any of the others.

I myself teach a class in “Marketing Analytics” at the Carroll School of Management at Boston College. It is an elective part of the MBA program there. I also teach short classes for corporations on their sites and at various conferences.

Ajay- At the previous Predictive Analytics World, you took a session on Forecasting and Predicting Subsciber levels (http://www.predictiveanalyticsworld.com/dc/2009/agenda.php#day2-6) .

It seems inability to forecast is a problem many many companies face today. What do you think are the top 5 principles of business forecasting which companies need to follow.

Michael- I don’t think I can come up with five. Our approach to forecasting is essentially simulation. We try to model the underlying processes and then turn the crank to see what happens. If there is a principal behind that, I guess it is to approach a forecast from the bottom up rather than treating aggregate numbers as a time series.

Ajay- You often partner your talks with SAS Institute, and your blog at http://blog.data-miners.com/ sometimes contain SAS code as well. What particular features of the SAS software do you like. Do you use just the Enterprise Miner or other modules as well for Survival Analysis or Forecasting.

Michael- Our first data mining class used SGI’s Mineset for the hands-on examples. Later we developed versions using Clementine, Quadstone, and SAS Enterprise Miner. Then, market forces took hold. We don’t market our classes ourselves, we depend on others to market them and then share in the revenue.

SAS turned out to be much better at marketing our classes than the other companies, so over time we stopped updating the other versions. An odd thing about our relationship with SAS is that it is only with the education group. They let us use Enterprise Miner to develop course materials, but we are explicitly forbidden to use it in our consulting work. As a consequence, we don’t use it much outside of the classroom.

Ajay- Also any other software you use (apart from SQL and J)

Michael- We try to fit in with whatever environment our client has set up. That almost always is SQL-based (Teradata, Oracle, SQL Server, . . .). Often SAS Stat is also available and sometimes Enterprise Miner.

We run into SPSS, Statistica, Angoss, and other tools as well. We tend to work in big data environments so we’ve also had occasion to use Ab Initio and, more recently, Hadoop. I expect to be seeing more of that.

Biography-

Together with his colleague, Gordon Linoff, Michael Berry is author of some of the most widely read and respected books on data mining. These best sellers in the field have been translated into many languages. Michael is an active practitioner of data mining. His books reflect many years of practical, hands-on experience down in the data mines.

Data Mining Techniques cover

Data Mining Techniques for Marketing, Sales and Customer Relationship Management

by Michael J. A. Berry and Gordon S. Linoff
copyright 2004 by John Wiley & Sons
ISB

Mining the Web cover

Mining the Web

by Michael J.A. Berry and Gordon S. Linoff
copyright 2002 by John Wiley & Sons
ISBN 0-471-41609-6

Non-English editions available in Traditional Chinese and Simplified Chinese

This book looks at the new opportunities and challenges for data mining that have been created by the web. The book demonstrates how to apply data mining to specific types of online businesses, such as auction sites, B2B trading exchanges, click-and-mortar retailers, subscription sites, and online retailers of digital content.

Mastering Data Mining

by Michael J.A. Berry and Gordon S. Linoff
copyright 2000 by John Wiley & Sons
ISBN 0-471-33123-6

Non-English editions available in JapaneseItalianTraditional Chinese , and Simplified Chinese

A case study-based guide to applying data mining techniques for solving practical business problems. These “warts and all” case studies are drawn directly from consulting engagements performed by the authors.

A data mining educator as well as a consultant, Michael is in demand as a keynote speaker and seminar leader in the area of data mining generally and the application of data mining to customer relationship management in particular.

Prior to founding Data Miners in December, 1997, Michael spent 8 years at Thinking Machines Corporation. There he specialized in the application of massively parallel supercomputing techniques to business and marketing applications, including one of the largest database marketing systems of the time.

Why Cloud?

Here are some reasons why cloud computing is very helpful to small business owners like me- and can be very helpful to even bigger people.

1) Infrastructure Overhead becomes zero

– I need NOT invest in secure powerbackups (like a big battery for electricity power-outs-true in India), data disaster management (read raid), software licensing compliance.

All this is done for me by infrastructure providers like Google and Amazon.

For simple office productivity, I type on Google Docs that auto-saves my data,writing on cloud. I need not backup- Google does it for me.  Ditto for presentations and spreadsheets. Amazon gets me the latest Window software installed whenever I logon- I need not be  bothered by software contracts (read bug fixes and patches) any more.

2) Renting Hardware by the hour- A small business owner cannot invest too much in computing hardware (or software). The pay as you use makes sense for them. I could never afford a 8 cores desktop with 25 gb RAM- but I sure can rent and use it to bid for heavier data projects that I would have had to let go in the past.

3) Renting software by the hour- You may have bought your last PC for all time

An example- A windows micro instance costs you 3 cents per hour on Amazon. If you take a mathematical look at upgrading your PC to latest Windows, buying more and more upgraded desktops just to keep up, those costs would exceed 3 cents per hour. For Unix, it is 2 cents per hour, and those softwares (like Red Hat Linux and Ubuntu have increasingly been design friendly even for non techie users)

Some other software companies especially in enterprise software plan to and already offer paid machine images that basically adds their software layer on top of the OS and you can rent software for the hour.

It does not make sense for customers to effectively subsidize golf tournaments, rock concerts, conference networks by their own money- as they can rent software by the hour and switch to pay per use.

People especially SME consultants, academics and students and cost conscious customers – in Analytics would love to see a world where they could say run SAS Enterprise Miner for 10 dollars a hour for two hours to build a data mining model on 25 gb RAM, rather than hurt their pockets and profitability in Annual license models. Ditto for SPSS, JMP, KXEN, Revolution R, Oracle Data Mining (already available on Amazon) , SAP (??), WPS ( on cloud ???? ) . It’s the economy, stupid.

Corporates have realized that cutting down on Hardware and software expenses is more preferable to cutting down people. Would you rather fire people in your own team to buy that big HP or Dell or IBM Server (effectively subsidizing jobs in those companies). IF you had to choose between an annual license renewal for your analytics software TO renting software by the hour and using those savings for better benefits for your employees, what makes business sense for you to invest in.

Goodbye annual license fees.  Welcome brave new world.

Public Opinion Quarterly

If you are interested in

SURVEY METHODOLOGY FOR PUBLIC HEALTH RESEARCHERS

There is a free virtual issue, Survey Methodology for Public Health Researchers: Selected Readings from 20 years of PublicOpinion Quarterly. The virtual issue’s 18 articles illustrate the range of survey methods material that can be found in POQ and include conclusions that are still valid today. Specially chosen by guest editor Floyd J. Fowler, the articles will be of interest to those who work and research in public health and health services more broadly

Google AppInventor in Action

A GUI based SDK for making Apps for Mobiles (Android)- that you can then put in the Android Marketplace.

Watch a 60 sec video on that!

Blog Update

Some changes at Decisionstats-

1) We are back at Decisionstats.com and Decisionstats.wordpress.com will point to that as well. The SEO effects would be interesting and so would be the Instant Pagerank or LinkRank or whatever Coffee/Percolator they use in Cali to index the site.

2) AsterData is no longer a sponsor- but Predictive Analytics Conference is. Welcome PAWS! I have been a blog partner to PAWS ever since it began- and it’s a great marketing fit. Expect to see a lot of exclusive content and interviews from great speakers at PAWS.

3) The Feedblitz newsletter (now at 404 subscribers) is now a weekly subscription to send one big big email rather than lots of email through the week- this is because my blogging frequency is moving up as I collect material for a new book on business analytics that I would probably release in 2011 (if all goes well, touchwood). Linkedin group would be getting a weekly update announcement. If you are connected to Decisionstats on Analyticbridge _ I would soon try to find a way to update the whole post automatically using RSS and Ning.com . or not. Depends.

4) R continues to be a bigger focus. So will SPSS and maybe JMP. Newer softwares or older softwares that change more rapidly would get more coverage. Generally a particular software is covered if it has newer features, or an interesting techie conference, or it gets sued.

5) I will occasionally write a poem or post a video once a week randomly to prove geeks and nerds and analysts can have fun (much more fun actually dont we)

Thanks for reading this. Sept 2010 was the best ever for Decisionstats.com – we crossed 15,000 + visitors and thanks for that again! I promise to bore you less and less as we grow old together on the blog 😉

The White Man's Burden-Poem

Rudyard Kipling, The White Man’s Burden

Take up the White Man’s burden–Send forth the best ye breed–

Go bind your sons to exile To serve your captives’ need;

To wait in heavy harness, On fluttered folk and wild–

Your new-caught, sullen peoples, Half-devil and half-child.

Take up the White Man’s burden–In patience to abide,

To veil the threat of terror And check the show of pride;

By open speech and simple, An hundred times made plain

To seek another’s profit, And work another’s gain.

Take up the White Man’s burden– The savage wars of peace–

Fill full the mouth of Famine And bid the sickness cease;

And when your goal is nearest The end for others sought,

Watch sloth and heathen Folly Bring all your hopes to nought.

Take up the White Man’s burden–No tawdry rule of kings,

But toil of serf and sweeper–The tale of common things.

The ports ye shall not enter,The roads ye shall not tread,

Go mark them with your living,And mark them with your dead.

Take up the White Man’s burden–And reap his old reward:

The blame of those ye better,The hate of those ye guard–

The cry of hosts ye humour (Ah, slowly!) toward the light:–

“Why brought he us from bondage, Our loved Egyptian night?”

Take up the White Man’s burden–Ye dare not stoop to less–

Nor call too loud on Freedom To cloke your weariness;

By all ye cry or whisper, By all ye leave or do,

The silent, sullen peoples Shall weigh your gods and you.

Take up the White Man’s burden– Have done with childish days–

The lightly proferred laurel, The easy, ungrudged praise.

Comes now, to search your manhood Through all the thankless years

Cold, edged with dear-bought wisdom, The judgment of your peers!

This famous poem, written by Britain‘s imperial poet, was a response to the American take over of the Phillipines after the Spanish-American War.(published in 1899)

source

http://www.fordham.edu/halsall/mod/kipling.html