On 7 January 2009 the
New
York Times
carried a front page technology article on R where a SAS
representative is quoted:
``I think it addresses a niche market for high-end data analysts that
want free, readily available code,'' said Anne H. Milley, director of
technology product marketing at SAS. She adds, ``We have customers
who build engines for aircraft. I am happy they are not using
Freeware when I get on a jet.''
This is a common misunderstanding put out there by vendors. R is a
peer reviewed software product that any number of the worlds top
statisticians have reviewed, and over the years, any issues will have
been identified and rectified. On the other hand, SAS is a
non-peer-reviewed software product with closed source (i.e., hidden)
implementations of analytic methods that cannot be reproduced by
others. Who would you trust when building aircraft engines!
A common reason for changing to R is to do with the bureaucratic
processes that organisations impose upon users wanting
software. Often, barriers have to be jumped in order to purchase
software, doing due diligence in comparing what is available. With
free open source software, we just get the software we want to use and
if it doesn't server the purpose, we move on. For commercial
purchases, if the software is found not to serve the purpose, we are
stuck with it and have to make do. The decision making also turns into
seconds, from months.
Let's start with some of the advantages with using R:
- R is the most comprehensive statistical analysis
package available. It incorporates all of the standard
statistical tests, models, analyses, as well as providing a
comprehensive language for managing and manipulating data.
- R is a programming language and environment developed for
statistical analysis by practising statisticians and researchers.
- R is developed by a core team of some 10 developers,
including some of the worlds leading statisticians.
- The validity of the R software is ensured through openly
validated and comprehensive governance as documented for the
American Food and Drug Authority in XXXX. Because R is open
source, unlike commercial software, R has been reviewed by many
internationally renowned statisticians and computational
scientists.
- R has over 1400 packages available specialising in topics like
from Econometrics, Data Mining, Spatial Analysis, Bio-Informatics.
- R is free and open source software allowing anyone to
use and, importantly, to modify it. R is licensed under the
GNU General Public License, with Copyright held by The R
Foundation for Statistical Computing.
- Anyone can freely download and install the R software and
even freely modify the software, or look at the code behind the
software to learn how things are done.
- Anyone is welcome to provide bug fixes, code enhancements, and
new packages, and the wealth of quality packages available for R
is a testament to this approach to software development and sharing.
- R well integrates packages in different languages, including
Java (hence the RWeka
package), Fortran (hence
randomForest), C (hence arules), C++, and
Python.
- The R command line is much more powerful than a graphical user
interface.
- R is cross platform. R runs on many operating
systems and different hardware. It is popularly used on GNU/Linux,
Macintosh, and MW/Windows, running on both 32bit and 64bit
processors.
- R has active user groups where questions can be asked and are
often quickly responded to, and often responded to by the very
people who have developed the environment--this support is second
to none. Have you ever tried getting support from people who really
know SAS or are core developers of SAS?
- New books for R (the Springer Use R! series) are emerging and
there will soon be a very good library of books for using R.
- No license restrictions (other than ensuring our freedom to use
it at our own discretion) and so you can run R anywhere and at any
time.
- R probably has the most complete collection of statistical
functions of any statistical or data mining package. New technology
and ideas often appear first in R.
- The graphic capabilities of R are outstanding, providing a
fully programmable graphics language which surpasses most other
statistical and graphical packages.
- A very active email list, with some of the worlds leading
statisticians actively responding, is available for anyone to join.
Questions are quickly answered and the archive provides a wealth of
user solutions and examples. Be sure to read the
Posting Guide
first.
- Being open source the R source code is peer reviewed, and
anyone is welcome to review it and suggest improvements. Bugs are
fixed very quickly. Consequently, R is a rock solid product. New
packages provided with R do go through a life cycle, often
beginning as somewhat less quality tools, but usually quickly
evolving into top quality products.
- R plays well with many other tools, importing data, for
example, from CSV files, SAS, and SPSS, or directly from MS/Excel,
MS/Access, Oracle, MySQL, and SQLite. It can also produce graphics
output in PDF, JPG, PNG, and SVG formats, and table output for
LATEX and HTML.
Whilst the advantages might flow from the pen with a great deal of
enthusiasm, it is useful to note some of the disadvantages or
weaknesses of R, even if they are perhaps transitory!
- You may need to wear a fire proof jacket if you are looking for
help on some of the R mailing lists (but don't let that put you off
trying). If you get a nasty response from a particular character on
the mailing list, don't lose heart -- he's actually made a lot of
significant contributions to R but is well known for lacking on
the sensitivity scale. Do go gentle with him -- he does not seem to
realise what harm he does and reacts badly to criticism -- not
worth spending time worrying about as he'll probably never change.
- R is not so easy to use for the novice. There are several
simple to use graphical user interfaces (GUIs) for R that
encompass point and click interactions, but they generally do not
have the polish of the commercial offerings of Clementine
(See Chapter 54) and SAS/Enterprise Miner
(See Chapter 59).
- Documentation is sometimes patchy. Whilst there are extensive
documents on line and available in books and throughout the
Internet, it can sometimes be terse and even impenetrable to the
non-statistician. On the other hand, for example, SAS has extensive,
self-contained, and often well explained, documentation, readily
available to the user. Nonetheless, users do comment that the R
documentation is to the point and easy to consult.
- The quality of some packages is less than perfect, although if a
package is useful to many people, it will quickly evolve into a very
robust product through collaborative efforts.
- There is no one to complain to if something doesn't work - at
least no one who has a financial interest in keeping you, the user,
as a satisfied customer. Organisations are quite happy to pay major
premiums for that apparent peace of mind! Nonetheless, problems are
usually dealt with quickly on the mailing list, and bugs disappear
with lightning speed.
- R has a steep learning curve--it does take a while to get used
to the power of R--but no steeper than, for example, SAS.
- There is no graphical user interface that compares with the
SAS/Enterprise Guide or SAS/JMP interfaces which are more
comfortable for the new and infrequent users.
- Many R commands give little thought to memory management
and so R can very quickly consume all available memory. This can be
a restriction when doing data mining. There are various solutions,
including using 64bit operating systems that can access much more
memory than 32 bit operating systems.
Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010