Open Source Compiler for SAS language/ GNU -DAP

A Bold GNU Head
Image via Wikipedia

I am still testing this out.

But if you know bit more about make and .compile in Ubuntu check out

http://www.gnu.org/software/dap/

I loved the humorous introduction

Dap is a small statistics and graphics package based on C. Version 3.0 and later of Dap can read SBS programs (based on the utterly famous, industry standard statistics system with similar initials – you know the one I mean)! The user wishing to perform basic statistical analyses is now freed from learning and using C syntax for straightforward tasks, while retaining access to the C-style graphics and statistics features provided by the original implementation. Dap provides core methods of data management, analysis, and graphics that are commonly used in statistical consulting practice (univariate statistics, correlations and regression, ANOVA, categorical data analysis, logistic regression, and nonparametric analyses).

Anyone familiar with the basic syntax of C programs can learn to use the C-style features of Dap quickly and easily from the manual and the examples contained in it; advanced features of C are not necessary, although they are available. (The manual contains a brief introduction to the C syntax needed for Dap.) Because Dap processes files one line at a time, rather than reading entire files into memory, it can be, and has been, used on data sets that have very many lines and/or very many variables.

I wrote Dap to use in my statistical consulting practice because the aforementioned utterly famous, industry standard statistics system is (or at least was) not available on GNU/Linux and costs a bundle every year under a lease arrangement. And now you can run programs written for that system directly on Dap! I was generally happy with that system, except for the graphics, which are all but impossible to use,  but there were a number of clumsy constructs left over from its ancient origins.

http://www.gnu.org/software/dap/#Sample output

  • Unbalanced ANOVA
  • Crossed, nested ANOVA
  • Random model, unbalanced
  • Mixed model, balanced
  • Mixed model, unbalanced
  • Split plot
  • Latin square
  • Missing treatment combinations
  • Linear regression
  • Linear regression, model building
  • Ordinal cross-classification
  • Stratified 2×2 tables
  • Loglinear models
  • Logit  model for linear-by-linear association
  • Logistic regression
  • Copyright © 2001, 2002, 2003, 2004 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA

    sounds too good to be true- GNU /DAP joins WPS workbench and Dulles Open’s Carolina as the third SAS language compiler (besides the now defunct BASS software) see http://en.wikipedia.org/wiki/SAS_language#Controversy

     

    Also see http://en.wikipedia.org/wiki/DAP_(software)

    Dap was written to be a free replacement for SAS, but users are assumed to have a basic familiarity with the C programming language in order to permit greater flexibility. Unlike R it has been designed to be used on large data sets.

    It has been designed so as to cope with very large data sets; even when the size of the data exceeds the size of the computer’s memory

    QGIS and R

    Logo graphic for the Quantum GIS free software...
    Image via Wikipedia

    Qgis is Quantum GIS http://www.qgis.org/

    Quantum GIS (QGIS) is a user friendly Open Source Geographic Information System (GIS) licensed under the GNU General Public License. QGIS is an official project of the Open Source Geospatial Foundation (OSGeo). It runs on Linux, Unix, MacOSX, and Windows and supportsnumerous vector, raster, and database formats and functionalities.

    Learn more about QGIS

    Quantum GIS provides a continously growing number of capabilities provided by core functions and plugins. You can visualize, manage, edit, analyse data, and compose printable maps

    Also you can use both Qgis and R through Python (!!!)

    http://www.qgis.org/wiki/HomeRange_plugin#Home-range_analyses_in_QGIS_using_R_through_Python

    Interesting app for webs (sometimes better suited than some R map packages)

    https://plugins.qgis.org/plugins/HomeRange_plugin/

    Based on a Google Summer of Code _

     Also

    https://sites.google.com/site/eospansite/introqgis_r

    and

    HomeRange_plugin

    http://hub.qgis.org/projects/quantum-gis/wiki/HomeRange_plugin

     

    Also read-

    http://blog.qgis.org/node/51

    Related Articles-

    R Graphs Resources

    https://rforanalytics.wordpress.com/r-graphs-resources/

    Using R from other Software

    https://rforanalytics.wordpress.com/using-r-from-other-software/

    and

    Visualize NHL Play-by-Play using Tableau Public and R

    http://brocktibert.wordpress.com/2011/02/13/visualize-nhl-play-by-play-using-tableau-public-and-r/

    LibreOffice Stable Release launched

    Non Oracle Open Office completes important milestone- from the press release

    The Document Foundation launches LibreOffice 3.3

    The first stable release of the free office suite is available for download

    The Internet, January 25, 2011 – The Document Foundation launches LibreOffice 3.3, the first stable release of the free office suite developed by the community. In less than four months, the number of developers hacking LibreOffice has grown from less than twenty in late September 2010, to well over one hundred today. This has allowed us to release ahead of the aggressive schedule set by the project.

    Not only does it ship a number of new and original features, LibreOffice 3.3 is also a significant achievement for a number of reasons:

    – the developer community has been able to build their own and independent process, and get up and running in a very short time (with respect to the size of the code base and the project’s strong ambitions);

    – thanks to the high number of new contributors having been attracted into the project, the source code is quickly undergoing a major clean-up to provide a better foundation for future development of LibreOffice;

    – the Windows installer, which is going to impact the largest and most diverse user base, has been integrated into a single build containing all language versions, thus reducing the size for download sites from 75 to 11GB, making it easier for us to deploy new versions more rapidly and lowering the carbon footprint of the entire infrastructure.

    Caolán McNamara from RedHat, one of the developer community leaders, comments, “We are excited: this is our very first stable release, and therefore we are eager to get user feedback, which will be integrated as soon as possible into the code, with the first enhancements being released in February. Starting from March, we will be moving to a real time-based, predictable, transparent and public release schedule, in accordance with Engineering Steering Committee’s goals and users’ requests”. The LibreOffice development roadmap is available at http://wiki.documentfoundation.org/ReleasePlan

    LibreOffice 3.3 brings several unique new features. The 10 most-popular among community members are, in no particular order:

    1. the ability to import and work with SVG files;
    2. an easy way to format title pages and their numbering in Writer;
    3. a more-helpful Navigator Tool for Writer;
    4. improved ergonomics in Calc for sheet and cell management;
    5. and Microsoft Works and Lotus Word Pro document import filters.

    In addition, many great extensions are now bundled, providing

    PDF import,

    a slide-show presenter console,

    a much improved report builder, and more besides.

    A more-complete and detailed list of all the new features offered by LibreOffice 3.3 is viewable on the following web page: http://www.libreoffice.org/download/new-features-and-fixes/

    LibreOffice 3.3 also provides all the new features of OpenOffice.org 3.3, such as new custom properties handling; embedding of standard PDF fonts in PDF documents; new Liberation Narrow font; increased document protection in Writer and Calc; auto decimal digits for “General” format in Calc; 1 million rows in a spreadsheet; new options for CSV import in Calc; insert drawing objects in Charts; hierarchical axis labels for Charts; improved slide layout handling in Impress; a new easier-to-use print interface; more options for changing case; and colored sheet tabs in Calc. Several of these new features were contributed by members of the LibreOffice team prior to the formation of The Document Foundation.

    LibreOffice hackers will be meeting at FOSDEM in Brussels on February 5 and 6, and will be presenting their work during a one-day workshop on February 6, with speeches and hacking sessions coordinated by several members of the project.

    The home of The Document Foundation is at http://www.documentfoundation.org

    The home of LibreOffice is at http://www.libreoffice.org where the download page has been redesigned by the community to be more user-friendly.

    *** About The Document Foundation

    The Document Foundation has the mission of facilitating the evolution of the OOo Community into a new, open, independent, and meritocratic organization within the next few months. An independent Foundation is a better reflection of the values of our contributors, users and supporters, and will enable a more effective, efficient and transparent community. TDF will protect past investments by building on the achievements of the first decade, will encourage wide participation within the community, and will co-ordinate activity across the community.

    *** Media Contacts for TDF

    Florian Effenberger (Germany)

    Mobile: +49 151 14424108 – E-mail: floeff@documentfoundation.org

    Olivier Hallot (Brazil)

    Mobile: +55 21 88228812 – E-mail: olivier.hallot@documentfoundation.org

    Charles H. Schulz (France)

    Mobile: +33 6 98655424 – E-mail: charles.schulz@documentfoundation.org

    Italo Vignoli (Italy)

    Mobile: +39 348 5653829 – E-mail: italo.vignoli@documentfoundation.org

    Interview Luis Torgo Author Data Mining with R

    Example of k-nearest neighbour classification
    Image via Wikipedia

    Here is an interview with Prof Luis Torgo, author of the recent best seller “Data Mining with R-learning with case studies”.

    Ajay- Describe your career in science. How do you think can more young people be made interested in science.

    Luis- My interest in science only started after I’ve finished my degree. I’ve entered a research lab at the University of Porto and started working on Machine Learning, around 1990. Since then I’ve been involved generally in data analysis topics both from a research perspective as well as from a more applied point of view through interactions with industry partners on several projects. I’ve spent most of my career at the Faculty of Economics of the University of Porto, but since 2008 I’m at the department of Computer Science of the Faculty of Sciences of the same university. At the same time I’ve been a researcher at LIAAD / Inesc Porto LA (www.liaad.up.pt).

    I like a lot what I do and like science and the “scientific way of thinking”, but I cannot say that I’ve always thought of this area as my “place”. Most of all I like solving challenging problems through data analysis. If that translates into some scientific outcome than I’m more satisfied but that is not my main goal, though I’m kind of “forced” to think about that because of the constraints of an academic career.

    That does not mean I’m not passionate about science, I just think there are many more ways of “doing science” than what is reflected in the usual “scientific indicators” that most institutions seem to be more and more obsessed about.

    Regards interesting young people in science that is a hard question that I’m not sure I’m qualified to answer. I do tend to think that young people are more sensible to concrete examples of problems they think are interesting and that science helps in solving, as a way of finding a motivation for facing the hard work they will encounter in a scientific career. I do believe in case studies as a nice way to learn and motivate, and thus my book 😉

    Ajay- Describe your new book “Data Mining with R, learning with case studies” Why did you choose a case study based approach? who is the target audience? What is your favorite case study from the book

    Luis- This book is about learning how to use R for data mining. The book follows a “learn by doing it” approach to data mining instead of the more common theoretical description of the available techniques in this discipline. This is accomplished by presenting a series of illustrative case studies for which all necessary steps, code and data are provided to the reader. Moreover, the book has an associated web page (www.liaad.up.pt/~ltorgo/DataMiningWithR) where all code inside the book is given so that easy copy-paste is possible for the more lazy readers.

    The language used in the book is very informal without many theoretical details on the used data mining techniques. For obtaining these theoretical insights there are already many good data mining books some of which are referred in “further readings” sections given throughout the book. The decision of following this writing style had to do with the intended target audience of the book.

    In effect, the objective was to write a monograph that could be used as a supplemental book for practical classes on data mining that exist in several courses, but at the same time that could be attractive to professionals working on data mining in non-academic environments, and thus the choice of this more practically oriented approach.

    Regards my favorite case study that is a hard question for an author… still I would probably choose the “Predicting Stock Market Returns” case study (Chapter 3). Not only because I like this challenging problem, but mainly because the case study addresses all aspects of knowledge discovery in a real world scenario and not only the construction of predictive models. It tackles data collection, data pre-processing, model construction, transforming predictions into actions using different trading policies, using business-related performance metrics, implementing a trading simulator for “real-world” evaluation, and laying out grounds for constructing an online trading system.

    Obviously, for all these steps there are far too many options to be possible to describe/evaluate all of them in a chapter, still I do believe that for the reader it is important to see the overall picture, and read about the relevant questions on this problem and some possible paths that can be followed at these different steps.

    In other words: do not expect to become rich with the solution I describe in the chapter !

    Ajay- Apart from R, what other data mining software do you use or have used in the past. How would you compare their advantages and disadvantages with R

    Luis- I’ve played around with Clementine, Weka, RapidMiner and Knime, but really only playing with teaching goals, and no serious use/evaluation in the context of data mining projects. For the latter I mainly use R or software developed by myself (either in R or other languages). In this context, I do not think it is fair to compare R with these or other tools as I lack serious experience with them. I can however, tell you about what I see as the main pros and cons of R. The main reason for using R is really not only the power of the tool that does not stop surprising me in terms of what already exists and keeps appearing as contributions of an ever growing community, but mainly the ability of rapidly transforming ideas into prototypes. Regards some of its drawbacks I would probably mention the lack of efficiency when compared to other alternatives and the problem of data set sizes being limited by main memory.

    I know that there are several efforts around for solving this latter issue not only from the community (e.g. http://cran.at.r-project.org/web/views/HighPerformanceComputing.html), but also from the industry (e.g. Revolution Analytics), but I would prefer that at this stage this would be a standard feature of the language so the the “normal” user need not worry about it. But then this is a community effort and if I’m not happy with the current status instead of complaining I should do something about it!

    Ajay- Describe your writing habit- How do you set about writing the book- did you write a fixed amount daily or do you write in bursts etc

    Luis- Unfortunately, I write in bursts whenever I find some time for it. This is much more tiring and time consuming as I need to read back material far too often, but I cannot afford dedicating too much consecutive time to a single task. Actually, I frequently tease my PhD students when they “complain” about the lack of time for doing what they have to, that they should learn to appreciate the luxury of having a single task to complete because it will probably be the last time in their professional life!

    Ajay- What do you do to relax or unwind when not working?

    Luis- For me, the best way to relax from work is by playing sports. When I’m involved in some game I reset my mind and forget about all other things and this is very relaxing for me. A part from sports I enjoy a lot spending time with my family and friends. A good and long dinner with friends over a good bottle of wine can do miracles when I’m too stressed with work! Finally,I do love traveling around with my family.

    Luis Torgo

    Short Bio: Luis Torgo has a degree in Systems and Informatics Engineering and a PhD in Computer Science. He is an Associate Professor of the Department of Computer Science of the Faculty of Sciences of the University of Porto. He is also a researcher of the Laboratory of Artificial Intelligence and Data Analysis (LIAAD) belonging to INESC Porto LA. Luis Torgo has been an active researcher in Machine Learning and Data Mining for more than 20 years. He has lead several academic and industrial Data Mining research projects. Luis Torgo accompanies the R project almost since its beginning, using it on his research activities. He teaches R at different levels and has given several courses in different countries.

    For reading “Data Mining with R” – you can visit this site, also to avail of a 20% discount the publishers have generously given (message below)-

    For more information and to place an order, visit us at http://www.crcpress.com.  Order online and apply 20% Off discount code 907HM at checkout.  CRC is pleased to offer free standard shipping on all online orders!

    link to the book page  http://www.crcpress.com/product/isbn/9781439810187

    Price: $79.95
    Cat. #: K10510
    ISBN: 9781439810187
    ISBN 10: 1439810184
    Publication Date: November 09, 2010
    Number of Pages: 305
    Availability: In Stock
    Binding(s): Hardback 

    Assumptions on Guns

    This is a very crude yet functional homemade g...
    Image via Wikipedia

    While sitting in Delhi, India- I sometimes notice that there is one big new worthy gun related incident in the United States every six months (latest incident Gabrielle giffords incident) and the mythical NRA (which seems just as powerful as equally mythical Jewish American or Cuban American lobby ) . As someone who once trained to fire guns (.22 and SLR -rifles actually), comes from a gun friendly culture (namely Punjabi-North Indian), my dad carried a gun sometimes as a police officer during his 30 plus years of service, I dont really like guns (except when they are in a movie). My 3 yr old son likes guns a lot (for some peculiar genetic reason even though we are careful not to show him any violent TV or movie at all).

    So to settle the whole guns are good- guns are bad thing I turned to the one resource -Internet

    Here are some findings-

    1) A lot of hard statistical data on guns is biased by the perspective of the writer- it reminds me of the old saying Lies, True lies and Statistics.

    2) There is not a lot of hard data in terms of a universal research which can be quoted- unlike say lung cancer is caused by cigarettes- no broad research which can be definitive in this regards.

    3) American , European and Asian attitudes on guns actually seem a function of historical availability , historic crime rates and cultural propensity for guns.

    Switzerland and United States are two extreme outlier examples on gun causing violence causal statistics.

    4) Lot of old and outdated data quoted selectively.

    It seems you can fudge data about guns in the following ways-

    1) Use relative per capita numbers vis a vis aggregate numbers

    2) Compare and contrast gun numbers with crime numbers selectively

    3) Remove drill down of type of firearm- like hand guns, rifles, automatic, semi automatic

    Maybe I am being simplistic-but I found it easier to list credible data sources on guns than to summarize all assumptions on guns. Are guns good or bad- i dont know -it depends? Any research you can quote is welcome.

    Data Sources on Guns and Firearms and Crime-

    1) http://www.justfacts.com/guncontrol.asp

    Ownership

    * As of 2009, the United States has a population of 307 million people.[5]

    * Based on production data from firearm manufacturers,[6] there are roughly 300 million firearms owned by civilians in the United States as of 2010. Of these, about 100 million are handguns.[7]

    * Based upon surveys, the following are estimates of private firearm ownership in the U.S. as of 2010:

    Households With a Gun Adults Owning a Gun Adults Owning a Handgun
    Percentage 40-45% 30-34% 17-19%
    Number 47-53 million 70-80 million 40-45 million

    [8]

    * A 2005 nationwide Gallup poll of 1,012 adults found the following levels of firearm ownership:

    Category Percentage Owning 

    a Firearm

    Households 42%
    Individuals 30%
    Male 47%
    Female 13%
    White 33%
    Nonwhite 18%
    Republican 41%
    Independent 27%
    Democrat 23%

    [9]

    * In the same poll, gun owners stated they own firearms for the following reasons:

    Protection Against Crime 67%
    Target Shooting 66%
    Hunting 41%

    2) NationMaster.com

    http://www.nationmaster.com/graph/cri_mur_wit_fir-crime-murders-with-firearms

    VIEW DATA: Totals Per capita
    Definition Source Printable version
    Bar Graph Pie Chart Map

    Showing latest available data.

    Rank Countries Amount
    # 1 South Africa: 31,918
    # 2 Colombia: 21,898
    # 3 Thailand: 20,032
    # 4 United States: 9,369
    # 5 Philippines: 7,708
    # 6 Mexico: 2,606
    # 7 Slovakia: 2,356
    # 8 El Salvador: 1,441
    # 9 Zimbabwe: 598
    # 10 Peru: 442
    # 11 Germany: 269
    # 12 Czech Republic: 181
    # 13 Ukraine: 173
    # 14 Canada: 144
    # 15 Albania: 135
    # 16 Costa Rica: 131
    # 17 Azerbaijan: 120
    # 18 Poland: 111
    # 19 Uruguay: 109
    # 20 Spain: 97
    # 21 Portugal: 90
    # 22 Croatia: 76
    # 23 Switzerland: 68
    # 24 Bulgaria: 63
    # 25 Australia: 59
    # 26 Sweden: 58
    # 27 Bolivia: 52
    # 28 Japan: 47
    # 29 Slovenia: 39
    = 30 Hungary: 38
    = 30 Belarus: 38
    # 32 Latvia: 28
    # 33 Burma: 27
    # 34 Macedonia, The Former Yugoslav Republic of: 26
    # 35 Austria: 25
    # 36 Estonia: 21
    # 37 Moldova: 20
    # 38 Lithuania: 16
    = 39 United Kingdom: 14
    = 39 Denmark: 14
    # 41 Ireland: 12
    # 42 New Zealand: 10
    # 43 Chile: 9
    # 44 Cyprus: 4
    # 45 Morocco: 1
    = 46 Iceland: 0
    = 46 Luxembourg: 0
    = 46 Oman: 0
    Total: 100,693
    Weighted average: 2,097.8

    DEFINITION: Total recorded intentional homicides committed with a firearm. Crime statistics are often better indicators of prevalence of law enforcement and willingness to report crime, than actual prevalence.

    SOURCE: The Eighth United Nations Survey on Crime Trends and the Operations of Criminal Justice Systems (2002) (United Nations Office on Drugs and Crime, Centre for International Crime Prevention)

    3)

    Bureau of Justice Statistics

    see

    http://bjs.ojp.usdoj.gov/dataonline/Search/Homicide/State/RunHomTrendsInOneVar.cfm

    or the brand new website (till 2009) on which I CANNOT get gun crime but can get total

    http://www.ucrdatatool.gov/

    Estimated  murder rate *
    Year United States-Total

    1960 5.1
    1961 4.8
    1962 4.6
    1963 4.6
    1964 4.9
    1965 5.1
    1966 5.6
    1967 6.2
    1968 6.9
    1969 7.3
    1970 7.9
    1971 8.6
    1972 9.0
    1973 9.4
    1974 9.8
    1975 9.6
    1976 8.7
    1977 8.8
    1978 9.0
    1979 9.8
    1980 10.2
    1981 9.8
    1982 9.1
    1983 8.3
    1984 7.9
    1985 8.0
    1986 8.6
    1987 8.3
    1988 8.5
    1989 8.7
    1990 9.4
    1991 9.8
    1992 9.3
    1993 9.5
    1994 9.0
    1995 8.2
    1996 7.4
    1997 6.8
    1998 6.3
    1999 5.7
    2000 5.5
    2001 5.6
    2002 5.6
    2003 5.7
    2004 5.5
    2005 5.6
    2006 5.7
    2007 5.6
    2008 5.4
    2009 5.0
    Notes: National or state offense totals are based on data from all reporting agencies and estimates for unreported areas.
    * Rates are the number of reported offenses per 100,000 population
  • United States-Total –
    • The 168 murder and nonnegligent homicides that occurred as a result of the bombing of the Alfred P. Murrah Federal Building in Oklahoma City in 1995 are included in the national estimate.
    • The 2,823 murder and nonnegligent homicides that occurred as a result of the events of September 11, 2001, are not included in the national estimates.

     

  • Sources: 


    FBI, Uniform Crime Reports as prepared by the National Archive of Criminal Justice Data


    4) united nation statistics of 2002  were too old in my opinion.
    wikipedia seems too broad based to qualify as a research article but is easily accessible http://en.wikipedia.org/wiki/Gun_violence_in_the_United_States
    to actually buy a gun or see guns available for purchase in United States see
    http://www.usautoweapons.com/

    Interview Ajay Ohri Decisionstats.com with DMR

    From-

    http://www.dataminingblog.com/data-mining-research-interview-ajay-ohri/

    Here is the winner of the Data Mining Research People Award 2010: Ajay Ohri! Thanks to Ajay for giving some time to answer Data Mining Research questions. And all the best to his blog, Decision Stat!

    Data Mining Research (DMR): Could you please introduce yourself to the readers of Data Mining Research?

    Ajay Ohri (AO): I am a business consultant and writer based out of Delhi- India. I have been working in and around the field of business analytics since 2004, and have worked with some very good and big companies primarily in financial analytics and outsourced analytics. Since 2007, I have been writing my blog at http://decisionstats.com which now has almost 10,000 views monthly.

    All in all, I wrote about data, and my hobby is also writing (poetry). Both my hobby and my profession stem from my education ( a masters in business, and a bachelors in mechanical engineering).

    My research interests in data mining are interfaces (simpler interfaces to enable better data mining), education (making data mining less complex and accessible to more people and students), and time series and regression (specifically ARIMAX)
    In business my research interests software marketing strategies (open source, Software as a service, advertising supported versus traditional licensing) and creation of technology and entrepreneurial hubs (like Palo Alto and Research Triangle, or Bangalore India).

    DMR: I know you have worked with both SAS and R. Could you give your opinion about these two data mining tools?

    AO: As per my understanding, SAS stands for SAS language, SAS Institute and SAS software platform. The terms are interchangeably used by people in industry and academia- but there have been some branding issues on this.
    I have not worked much with SAS Enterprise Miner , probably because I could not afford it as business consultant, and organizations I worked with did not have a budget for Enterprise Miner.
    I have worked alone and in teams with Base SAS, SAS Stat, SAS Access, and SAS ETS- and JMP. Also I worked with SAS BI but as a user to extract information.
    You could say my use of SAS platform was mostly in predictive analytics and reporting, but I have a couple of projects under my belt for knowledge discovery and data mining, and pattern analysis. Again some of my SAS experience is a bit dated for almost 1 year ago.

    I really like specific parts of SAS platform – as in the interface design of JMP (which is better than Enterprise Guide or Base SAS ) -and Proc Sort in Base SAS- I guess sequential processing of data makes SAS way faster- though with computing evolving from Desktops/Servers to even cheaper time shared cloud computers- I am not sure how long Base SAS and SAS Stat can hold this unique selling proposition.

    I dislike the clutter in SAS Stat output, it confuses me with too much information, and I dislike shoddy graphics in the rendering output of graphical engine of SAS. Its shoddy coding work in SAS/Graph and if JMP can give better graphics why is legacy source code preventing SAS platform from doing a better job of it.

    I sometimes think the best part of SAS is actually code written by Goodnight and Sall in 1970’s , the latest procs don’t impress me much.

    SAS as a company is something I admire especially for its way of treating employees globally- but it is strange to see the rest of tech industry not following it. Also I don’t like over aggression and the SAS versus Rest of the Analytics /Data Mining World mentality that I sometimes pick up when I deal with industry thought leaders.

    I think making SAS Enterprise Miner, JMP, and Base SAS in a completely new web interface priced at per hour rates is my wishlist but I guess I am a bit sentimental here- most data miners I know from early 2000’s did start with SAS as their first bread earning software. Also I think SAS needs to be better priced in Business Intelligence- it seems quite cheap in BI compared to Cognos/IBM but expensive in analytical licensing.

    If you are a new stats or business student, chances are – you may know much more R than SAS today. The shift in education at least has been very rapid, and I guess R is also more of a platform than a analytics or data mining software.

    I like a lot of things in R- from graphics, to better data mining packages, modular design of software, but above all I like the can do kick ass spirit of R community. Lots of young people collaborating with lots of young to old professors, and the energy is infectious. Everybody is a CEO in R ’s world. Latest data mining algols will probably start in R, published in journals.

    Which is better for data mining SAS or R? It depends on your data and your deadline. The golden rule of management and business is -it depends.

    Also I have worked with a lot of KXEN, SQL, SPSS.

    DMR: Can you tell us more about Decision Stats? You have a traffic of 120′000 for 2010. How did you reach such a success?

    AO: I don’t think 120,000 is a success. Its not a failure. It just happened- the more I wrote, the more people read.In 2007-2008 I used to obsess over traffic. I tried SEO, comments, back linking, and I did some black hat experimental stuff. Some of it worked- some didn’t.

    In the end, I started asking questions and interviewing people. To my surprise, senior management is almost always more candid , frank and honest about their views while middle managers, public relations, marketing folks can be defensive.

    Social Media helped a bit- Twitter, Linkedin, Facebook really helped my network of friends who I suppose acted as informal ambassadors to spread the word.
    Again I was constrained by necessity than choices- my middle class finances ( I also had a baby son in 2007-my current laptop still has some broken keys :) – by my inability to afford traveling to conferences, and my location Delhi isn’t really a tech hub.

    The more questions I asked around the internet, the more people responded, and I wrote it all down.

    I guess I just was lucky to meet a lot of nice people on the internet who took time to mentor and educate me.

    I tried building other websites but didn’t succeed so i guess I really don’t know. I am not a smart coder, not very clever at writing but I do try to be honest.

    Basic economics says pricing is proportional to demand and inversely proportional to supply. Honest and candid opinions have infinite demand and an uncertain supply.

    DMR: There is a rumor about a R book you plan to publish in 2011 :-) Can you confirm the rumor and tell us more?

    AO: I just signed a contract with Springer for ” R for Business Analytics”. R is a great software, and lots of books for statistically trained people, but I felt like writing a book for the MBAs and existing analytics users- on how to easily transition to R for Analytics.

    Like any language there are tricks and tweaks in R, and with a focus on code editors, IDE, GUI, web interfaces, R’s famous learning curve can be bent a bit.

    Making analytics beautiful, and simpler to use is always a passion for me. With 3000 packages, R can be used for a lot more things and a lot more simply than is commonly understood.
    The target audience however is business analysts- or people working in corporate environments.

    Brief Bio-
    Ajay Ohri has been working in the field of analytics since 2004 , when it was a still nascent emerging Industries in India. He has worked with the top two Indian outsourcers listed on NYSE,and with Citigroup on cross sell analytics where he helped sell an extra 50000 credit cards by cross sell analytics .He was one of the very first independent data mining consultants in India working on analytics products and domestic Indian market analytics .He regularly writes on analytics topics on his web site www.decisionstats.com and is currently working on open source analytical tools like R besides analytical software like SPSS and SAS.

    Book Reviews- Hindu Myths- Mere Christianity

    A statue of Hindu deity Shiva in a temple in B...
    Image via Wikipedia

    Over the month long break I took, I was helping firm up my ideas for R for Analytics , I also took a break and read some books. Here are brief reviews of two, three of them-

    1) Hindu Myths

    This is a classical book translated from original Sanskrit written by Professor Wendy O Flaherty of University of Chicago. I found some of the older myths very interesting in terms of contradictions, retelling the same story in a modified way by another classic, the beautiful poetic and fantastic imagery evoked by Hindu myths. Some stories are as relevant in prayers, fasts and religious ceremonies as they were around 11000 years while most have morphed , edited or even distorted.

    It should help the non Indian reader understand why hundreds of millions of conservative Indians worship Shiv Ling ( or literally an idol of the Phallus of Shiva), the Hindu two cents of creation of the universe, and the somewhat fantastic stories on super heroes /gods/ in the ancient world.

    The book suffers from a few drawbacks in my opinion-

    1) Sanskrit is a bit like Latin- you can lose not just the flavor but original meaning of words and situational context. Some of the stories made better sense when i read a more recent Hindi translation.

    2) An excessive emphasis on sexual imagery rather than emotional imagery. The author seems wonder struck to read and translate ancient indians were so matter of fact about physical relationships. However the words were always written in discrete poetic than crass soft pornography.

    3) Almost no drawings or figures. This makes the book a bit dense to read at 300 pages.

    I liked another book on Hindu Myths (Myth= Mithya which I read in 2009) and you can see if you can read it if you find the topic interesting.

    A Handbook of Hindu Mythology

    Hindus have one God.
    They also have 330 million gods: male gods, female gods, personal gods, family gods, household gods, village gods, gods of space and time, gods for specific castes and particular professions, gods who reside in trees, in animals, in minerals, in geometrical patterns and in man-made objects.
    Then there are a whole host of demons.
    But no Devil.


    Mere Christianity by C S Lewis is a classic book on reinterpreting Christianity in modern times. However the author wrote this when World War 2 was on and it seems more like a British or Anglo Saxon interpretation of beliefs of Christ Jesus– who was actually a Jewish teacher born in Middle East Asia.

    While the language and reading makes it much easier to read- it is recommended more at Western audiences, than Eastern ones, as it seems some of the parables are a more palatable re interpretation of the New Testament. The Bible is a deceptively easy book to read, the language is short and beautiful-and the original parables in the Gospels remain powerful easy to understand.

    C S Lewis tends to emphasize morality than religiosity or faith, and there is not much comparison with any other faith or alternative morality. Dumbing down the Bible so as to market it better to reluctant consumers seems to be Mr Lewis intention and it is not as scholarly a work as an exercise in pure prose.

    However it is quite good as a self improvement book and is quite better than the “You Can Win” kind of books or even business concept books.

    Note- I find reading books on religion as good exercises in reading the fountain source of philosophies. As a polytheist- I tend to read more than one faith.