September Roundup by Revolution

From the monthly newsletter- which I consider quite useful for keeping updated on application of R

——————————————————————————————————————————————————————————————————–

Revolution News
Every month, we’ll bring you the latest news about Revolution’s products and events in this section.
Follow us on Twitter at @RevolutionR for up-to-the-minute news and updates from Revolution Analytics!

Revolution R Enterprise 4.0 for Windows now available. Based on the latest R 2.11.1 and including the RevoScaleR package for big-data analysis in R, Revolution R Enterprise is now available for download for Windows 32-bit and 64-bit systems. Click here to subscribe, or available free to academia.

New! Integrate R with web applications, BI dashboards and more with web services. RevoDeployR is a new Web Services framework that integrates dynamic R-based computations into applications for business users. It will be available September 30 with Revolution R Enterprise Server on RHEL 5. Click here to learn more.

Free Webinar, September 22: In a joint webinar from Revolution Analytics and Jaspersoft, learn how to use RevoDeployR to integrate advanced analytics on-demand in applications, BI dashboards, and on the web. Register here.

Revolution in the News:
SearchBusinessAnalytics.com previews the forthcoming Revolution R GUI; Channel Register introduces RevoDeployR, while IT Business Edge shows off the Web Services architecture; and ReadWriteWeb.com looks at how RevoScaleR tackles the Big Data explosion.

Inside-R: A new site for the R Community. At www.inside-R.org you’ll find the latest information about R from around the Web, searchable R documentation and packages, hints and tips about R, and more. You can even add a “Download R” badge to your own web-page to help spread the word about R.

R News, Tips and Tricks from the Revolutions blog
The Revolutions blog brings you daily news and tips about R, statistics and open source. Here are some highlights from Revolutions from the past month
.

R’s key role in the oil spill response: Read how NIST’s Division Chief of Statistical Engineering used R to provide critical analysis in real time to the Secretaries of Energy and the Interior, and helped coordinate the government’s response.

Animating data with R and Google Earth: Learn how to use R to create animated visualizations of geographical data with Google Earth, such as this video showing how tuna migrations intersect with the location of the Gulf oil spill.

Are baseball games getting longer? Or is it just Red Sox games? Ryan Elmore uses nonparametric regression in R to find out.

Keynote presentations from useR! 2010: the worldwide R user’s conference was a great success, and there’s a wealth of useful tips and information in the presentations. Video of the keynote presentations are available too: check out in particular Frank Harrell’s talk Information Allergy, and Friedrich Leisch’s talk on reproducible statistical research.

Looking for more R tips and tricks? Check out the monthly round-ups at the Revolutions blog.

Upcoming Events
Every month, we’ll highlight some upcoming events from R Community Calendar.

September 23: The San Diego R User Group has a meetup on BioConductor and microarray data analysis.

September 28: The Sydney Users of R Forum has a meetup on building world-class predictive models in R (with dinner to follow).

September 28: The Los Angeles R User Group presents an introduction to statistical finance with R.

September 28: The Seattle R User Group meets to discuss, “What are you doing with R?”

September 29: The Raleigh-Durham-Chapel Hill R Users Group has its first meeting.

October 7: The NYC R User Group features a presentation by Prof. Andrew Gelman.

There are also new R user groups in SingaporeSeoulDenverBrisbane, and New Jersey.  Please let us know if we’re missing your R user group, or if want to get a new one started.

———————————————————————————————-Editor

David Smith, VP Marketing
david@revolutionanalytics.com
Twitter: @revodavid

subscribe here for Revo’s Monthly newsletter-

Sharing WordPress.com Blog Articles

Suppose you want to customize your blog shares to add one more service (apart from Facebook, Twitter etc)

Here is an example on creating a new share service – We are creating a blog share button for Hacker News at http://news.ycombinator.com/

See screenshot below-

Navigate there – by logging onto your wordpress.com account,

left margin bottom (Settings- Sharing)

Now on Add Service-

We put Service Name as

Hacker News (or you can put it as Y Combinator)

on URL Dropdown

Put it as- Copy and Paste Exactly

http://news.ycombinator.com/submitlink=&t=%post_title%+&u=%post_url%

On Icon URL

http://ycombinator.com/images/yc500.gif

Note there is no need for an Excerpt if you adding URL to Hacker News -so we can put it as blank

And now share all you want, wordpress.com hackers 😉

Kill R? Wait a sec

1) Is R efficient? (scripting wise, and performance wise) _ Depends on how you code it- some Packages like foreach can help but basic efficiency come from programmer. XDF formats from Revoscalar -the non open R package further improve programming efficiency

2) Should R be written from scratch?

You got to be kidding- It depends on how you define scratch after 2 million users

This has been done with S, then S Plus and now R.

3) What should be the license of R (if it was made a new)?

GPL license is fine. You need to do a better job of executing the license. Currently interfaces to R exist from SPSS, SAS, KXEN , other companies as well. To my knowledge royalty payments as well as formal code sharing does not agree.

R core needs to do a better job of protecting the work of 2500 package-creators rather than settling for a few snacks at events, sponsorships, Corporate Board Membership for Prof Gentleman, and 4-5 packages donated to it. The only way R developers can currently support their research is write a book (ny Springer mostly)

Eg GGplot and Hmisc are likely to be used more by average corporate user. Do their creators deserve royalty if creators of RevoScalar are getting it?

If some of 2 million users gave 1 $ to R core (compared to 9 million in last round of funding in Revolution Analytics)- you would have enough money to create a 64 bit optimized R for Linux (missing in Enterprise R), Amazon R APIs (like Karim Chine’s efforts), R GUIs (like Rattle’s commercial version) etc etc

The developments are not surprising given that Microsoft and Intel are funding Revolution Analytics http://www.dudeofdata.com/?p=1967

R controversies come and go (this has happened before including the NYT article and shakeup at Revo)

An interesting debate on whether R should be killed to make an upgrade to a more efficient language.

From Tal (creator R Bloggers) and on R help list-

There is currently a (very !) lively discussions happening around the web, surrounding the following topics:
1) Is R efficient? (scripting wise, and performance wise)
2) Should R be written from scratch?
3) What should be the license of R (if it was made a new)?

Very serious people have taken part in the debates so far.  I hope to let you know of the places I came by, so you might be able to follow/participate
in these (IMHO) important discussions.

The discussions started in the response for the following blog post on
Xi’An’s blog:
http://xianblog.wordpress.com/2010/09/06/insane/


Followed by the (short) response post by Ross Ihaka:
http://xianblog.wordpress.com/2010/09/13/simply-start-over-and-build-something-better/


Other discussions started to appear on Andrew Gelman’s blog:
http://www.stat.columbia.edu/~cook/movabletype/archives/2010/09/ross_ihaka_to_r.html

And (many) more responses started to appear in the hackers news website:
http://news.ycombinator.com/item?id=1687054

I hope these discussions will have fruitful results for our community,
Tal

—————-Contact
Details:——————————————————-
Contact me: Tal.Galili@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)

My 0 cents ( see it would 2 cents but it;s free)

SAS Sentiment Analysis wins Award

From Business Wire, the new Sentiment Analysis product by SAS Institute (created by acquisition Teragram ) wins an award. As per wikipedia

http://en.wikipedia.org/wiki/Sentiment_analysis

Sentiment analysis or opinion mining refers to a broad (definitionally challenged) area of natural language processingcomputational linguistics and text mining. Generally speaking, it aims to determine the attitude of a speaker or a writer with respect to some topic. The attitude may be their judgment or evaluation (see appraisal theory), their affective state (that is to say, the emotional state of the author when writing) or the intended emotional communication (that is to say, the emotional effect the author wishes to have on the reader).

It was developed by Teragram. Here is another Sentiment Analysis tool from Stanford Grad school at http://twittersentiment.appspot.com/search?query=sas

See-

Sentiment analysis for sas

Image Citation-

http://threeminds.organic.com/2009/09/five_reasons_sentiment_analysi.html

Read an article on sentiment analysis here at http://www.nytimes.com/2009/08/24/technology/internet/24emotion.html

And the complete press release at http://goo.gl/iVzf`

SAS Sentiment Analysis delivers insights on customer, competitor and organizational opinions to a degree never before possible via manual review of electronic text. As a result, SAS, the leader in business analytics software and services, has earned the prestigious Communications Solutions Product of the Year Award fromTechnology Marketing Corporation (TMC).

“SAS has automated the time-consuming process of reading individual documents and manually extracting relevant information”

“SAS Sentiment Analysis has shown benefits for its customers and it provides ROI for the companies that use it,” said Rich Tehrani, CEO, TMC. “Congratulations to the entire team at SAS, a company distinguished by its dedication to software quality and superiority to address marketplace needs.”

Derive positive and negative opinions, evaluations and emotions

SAS Sentiment Analysis’ high-performance crawler locates and extracts sentiment from digital content sources, including mainstream websites, social media outlets, internal servers and incoming news feeds. SAS’ unique hybrid approach combines powerful statistical techniques with linguistics rules to improve accuracy to the detailed feature level. It summarizes the sentiment expressed in all available text collections – identifying trends and creating graphical reports that describe the expressed feelings of consumers, partners, employees and competitors in real time. Output from SAS Sentiment Analysis can be stored in document repositories, surfaced in corporate portals and used as input to additional SAS Text Analytics software or search engines to help decision makers evaluate trends, predict future outcomes, minimize risks and capitalize on opportunities.

“SAS has automated the time-consuming process of reading individual documents and manually extracting relevant information,” said Fiona McNeill, Global Analytics Product Marketing Manager at SAS. “Our integrated analytics framework helps organizations maximize the value of information to improve their effectiveness.”

SAS Sentiment Analysis is included in the SAS Text Analytics suite, which helps organizations discover insights from electronic text materials, associate them for delivery to the right person or place, and provide intelligence to select the best course of action. Whether answering complex search-and-retrieval questions, ensuring appropriate content is presented to internal or external constituencies, or predicting which activity or channel will produce the best effect on existing sentiments, SAS Text Analytics provides exceptional real-time processing speeds for large volumes of text.

SAS Text Analytics solutions are part of the SAS Business Analytics Framework, backed by the industry’s most comprehensive range of consulting, training and support services, ensuring customers maximum return from their IT investments.

Recognizing vision

The Communications Solutions Product of the Year Award recognizes vision, leadership and thoroughness. The most innovative products and services brought to the market from March 2008 through March 2009 were chosen as winners of this Product of the Year Award and are published on the INTERNET TELEPHONY and Customer Interaction Solutions websites.

Oracle for possible takeover of REvolution Computing

Updated – Mr Smith gave an update in the comments section confirming the post.

From the press release –

Palo Alto, California – April 1, 2010 – REvolution Computing, the leading commercial provider of software and support for the open source “R” statistical computing language, announced that its CEO, Norman Nie, and Vice President of Community and Product Marketing, David Smith, will join Larry Ellison and other senior executives  of Oracle  at the 2010 Oracle  Business Conference at the Palace Hotel in San Francisco on April 17-18.

This meeting is to discuss exciting embedded analytical opportunities and will closely relate to an exciting announcement of recent breakthroughs by their product teams on in-database analytics.

Nie, Smith and Ellison will be available to meet with analysts, reporters and prospective business partners and clients interested in learning more about REvolution’s enterprise software and solutions for predictive analytics based on open source “R,” including new developments in REvolution’s products and recent deployments at leading pharmaceutical and financial services companies.

REvolution Computing is a featured portfolio company of North Bridge Venture Partners, a leading investor in open source companies.

Dear Google

Google.com has added its privacy policy to the main page to conform with California law. Here is a question to the masters of the algorithm that I sent to their query system ”

Dear Google ,

I understand that IP Addresses are stored routinely by you, that these IP addresses can be used as unique keys for analytical purposes, but also be used for identifying and locating privacy of people (like in China) with disproportionate technical effort. Why don’t you run a randomizing algorithm that masks the IP addresses but keeps the uniqueness factor alive, and delete the original IP addresses, thus sparing yourself any privacy concerns. The algorithm should be made in a manner that any masked unique  IP number cannot be unmasked , and all same IP addresses have same masked IP addresses.You retain analytical value, consumers retain privacy and we settle this debate once and for all.”
This is in response to its slightly biased privacy policy whose fine print is here ”

http://www.google.co.in/intl/en/privacypolicy.html

Data integrity

Google processes personal information only for the purposes for which it was collected and in accordance with this Policy or any applicable service-specific privacy notice. We review our data collection, storage and processing practices to ensure that we only collect, store and process the personal information needed to provide or improve our services. We take reasonable steps to ensure that the personal information we process is accurate, complete, and current, but we depend on our users to update or correct their personal information whenever necessary.

Accessing and updating personal information

When you use Google services, we make good faith efforts to provide you with access to your personal information and either to correct this data if it is inaccurate or to delete such data at your request if it is not otherwise required to be retained by law or for legitimate business purposes. We ask individual users to identify themselves and the information requested to be accessed, corrected or removed before processing such requests, and we may decline to process requests that are unreasonably repetitive or systematic,  require disproportionate technical effort , jeopardize the privacy of others, or would be extremely impractical (for instance, requests concerning information residing on backup tapes), or for which access is not otherwise required. In any case where we provide information access and correction, we perform this service free of charge, except if doing so would require a disproportionate effort. Some of our services have different procedures to access, correct or delete users’ personal information. We provide the details for these procedures in the specific privacy notices or FAQs for these services.”

This leaves enough loopholes for Google to pick and choose its privacy policy AND its response. Nice spin, but people understanding law, public relations, databases AND algorithms do exist in the non Google world. The New York Times blog “Bits”: is at the forefront. And its a very good blog for all tech news besides the renowned mashable (www.mashable.com) and Silicon Valley Insider (www.alleyinsider.com)

Watch this space.

FaceBook loses out to Google Driven Initiative…?

These are early days , but almost all the top Facebook application games like Triumph, Dope Wars Online etc are starting to get versions on MySpace and hi5. This basically means that the Facebook era of exclusive applications may face a threat from the open source applications of Google driven APIs that enable developers to make games that transcend all social networks.

This space is heating up, and the latecomers to the social networking party might just Facebook as the harbinger of Web 2.0 crash just as NetScape was to Web1.0 crash.

Speaking of which, the Mozilla Firefox browser is Beta Stage for Version 3. Yes…it is a cool one….and Yes the Force is Strong within Google despite the share price..:)

%d bloggers like this: