Home » Posts tagged 'Languages'

Tag Archives: Languages

Top 10 Regrets on Learning the SAS Language

  1. I didn’t learn the SAS Macro Language enough. SAS Macros are cool, and fast. Ditto for arrays. or ODS.
  2. Not keeping up with the changes in Version 9+. Especially the hash method.(Why name a technique after a recreational drug,  most unfair)
  3. Not studying more statistics theory.
  4. Flunking SAS Certification Twice.
  5. Not making enough money because customers need a solution not a p value.
  6. There is no Proc common sense. There is no Proc Clean the Data.
  7. No Macros to automate the model. Here is dirty data. There is clean model.  Wait till version 16.
  8. Not getting selected by SAS R & D.Not applying to SAS R & D.
  9. Google has better voice recognition for typing notes. No Voice Recognition in SAS langvuage to type syntax.
  10. Enhanced Editor and EG are both idiotic junk pushed by Marketing!

Inspired by true events at

http://www.sascommunity.org/wiki/Category:Bricolage

R 3.0 launched #rstats

The 3.0 Era for R starts today! Changes include  better Big Data support.

Read the NEWS here

  • install.packages() has a new argument quiet to reduce the amount of output shown.
  • New functions cite() and citeNatbib() have been added, to allow generation of in-text citations from "bibentry" objects. A cite() function may be added to bibstyle() environments.
  • merge() works in more cases where the data frames include matrices. (Wish of PR#14974.)
  • sample.int() has some support for n >= 2^31: see its help for the limitations.A different algorithm is used for (n, size, replace = FALSE, prob = NULL) for n > 1e7 and size <= n/2. This is much faster and uses less memory, but does give different results.
  • list.files() (aka dir()) gains a new optional argument no.. which allows to exclude "." and ".." from listings.
  • Profiling via Rprof() now optionally records information at the statement level, not just the function level.
  • available.packages() gains a "license/restricts_use" filter which retains only packages for which installation can proceed solely based on packages which are guaranteed not to restrict use.
  • File ‘share/licenses/licenses.db’ has some clarifications, especially as to which variants of ‘BSD’ and ‘MIT’ is intended and how to apply them to packages. The problematic licence ‘Artistic-1.0’ has been removed.
  • The breaks argument in hist.default() can now be a function that returns the breakpoints to be used (previously it could only return the suggested number of breakpoints).

LONG VECTORS

This section applies only to 64-bit platforms.

  • There is support for vectors longer than 2^31 – 1 elements. This applies to raw, logical, integer, double, complex and character vectors, as well as lists. (Elements of character vectors remain limited to 2^31 – 1 bytes.)
  • Most operations which can sensibly be done with long vectors work: others may return the error ‘long vectors not supported yet’. Most of these are because they explicitly work with integer indices (e.g. anyDuplicated() and match()) or because other limits (e.g. of character strings or matrix dimensions) would be exceeded or the operations would be extremely slow.
  • length() returns a double for long vectors, and lengths can be set to 2^31 or more by the replacement function with a double value.
  • Most aspects of indexing are available. Generally double-valued indices can be used to access elements beyond 2^31 – 1.
  • There is some support for matrices and arrays with each dimension less than 2^31 but total number of elements more than that. Only some aspects of matrix algebra work for such matrices, often taking a very long time. In other cases the underlying Fortran code has an unstated restriction (as was found for complex svd()).
  • dist() can produce dissimilarity objects for more than 65536 rows (but for example hclust() cannot process such objects).
  • serialize() to a raw vector is unlimited in size (except by resources).
  • The C-level function R_alloc can now allocate 2^35 or more bytes.
  • agrep() and grep() will return double vectors of indices for long vector inputs.
  • Many calls to .C() have been replaced by .Call() to allow long vectors to be supported (now or in the future). Regrettably several packages had copied the non-API .C() calls and so failed.
  • .C() and .Fortran() do not accept long vector inputs. This is a precaution as it is very unlikely that existing code will have been written to handle long vectors (and the R wrappers often assume that length(x) is an integer).
  • Most of the methods for sort() work for long vectors.
  • rank(), sort.list() and order() support long vectors (slowly except for radix sorting).
  • sample() can do uniform sampling from a long vector.

PERFORMANCE IMPROVEMENTS

  • More use has been made of R objects representing registered entry points, which is more efficient as the address is provided by the loader once only when the package is loaded.

    This has been done for packages base, methods, splines and tcltk: it was already in place for the other standard packages.

    Since these entry points are always accessed by the R entry points they do not need to be in the load table which can be substantially smaller and hence searched faster. This does mean that .C / .Fortran / .Call calls copied from earlier versions of R may no longer work – but they were never part of the API.

  • Many .Call() calls in package base have been migrated to .Internal() calls.
  • solve() makes fewer copies, especially when b is a vector rather than a matrix.
  • eigen() makes fewer copies if the input has dimnames.
  • Most of the linear algebra functions make fewer copies when the input(s) are not double (e.g. integer or logical).
  • A foreign function call (.C() etc) in a package without a PACKAGE argument will only look in the first DLL specified in the ‘NAMESPACE’ file of the package rather than searching all loaded DLLs. A few packages needed PACKAGE arguments added.
  • The @<- operator is now implemented as a primitive, which should reduce some copying of objects when used. Note that the operator object must now be in package base: do not try to import it explicitly from package methods.

SIGNIFICANT USER-VISIBLE CHANGES

  • Packages need to be (re-)installed under this version (3.0.0) of R.
  • There is a subtle change in behaviour for numeric index values 2^31 and larger. These never used to be legitimate and so were treated as NA, sometimes with a warning. They are now legal for long vectors so there is no longer a warning, and x[2^31] <- y will now extend the vector on a 64-bit platform and give an error on a 32-bit one.
  • It is now possible for 64-bit builds to allocate amounts of memory limited only by the OS. It may be wise to use OS facilities (e.g. ulimit in a bash shell, limit in csh), to set limits on overall memory consumption of an R process, particularly in a multi-user environment. A number of packages need a limit of at least 4GB of virtual memory to load.

    64-bit Windows builds of R are by default limited in memory usage to the amount of RAM installed: this limit can be changed by command-line option –max-mem-size or setting environment variable R_MAX_MEM_SIZE.

 

R in Oracle Java Cloud and Existing R – Java Integration #rstats

So I finally got my test plan accepted for a 1 month trial to the Oracle Public Cloud at https://cloud.oracle.com/ .

oc1 I am testing this for my next book R for Cloud Computing ( I have already covered Windows Azure, Amazon AWS, and in the middle of testing Google Compute).

Some initial thoughts- this Java cloud seemed more suitable for web apps, than for data science ( but I have to spend much more time on this).

I really liked the help and documentation and tutorials, Oracle has invested a lot in it to make it friendly to enterprise users.

Hopefully the Oracle R Enterprise  ORE guys can talk to the Oracle Cloud department and get some common use case projects going.

oc3.7

In the meantime, I did a roundup on all R -Java projects.

They include- (more…)

Top Funny Charts

I have recently become a Quora addict, and you can see why it is such a great site. If possible say hello to me there at

http://www.quora.com/Ajay-Ohri

My latest favorite question-

What are the most hilarious pie charts?

https://www.quora.com/Pie-Charts/What-are-the-most-hilarious-pie-charts

I am only showing you some of the answers, you can see the rest yourself.

 

 

Google Visualization Tools Can Help You Build a Personal Dashboard

The Google Visualization API is a great way for people to make dashboards with slick graphics based  on data without getting into the fine print of the scripting language  itself.  It utilizes the same tools as Google itself does, and makes visualizing data using API calls to the Visualization API. Thus a real-time customizable dashboard that is publishable to the internet can be created within minutes, and more importantly insights can be much more easily drawn from graphs than from looking at rows of tables and numbers.

  1. There are 41 gadgets (including made by both Google and third-party developers ) available in the Gadget  Gallery ( https://developers.google.com/chart/interactive/docs/gadgetgallery)
  2. There are 12 kinds of charts available in the Chart Gallery (https://developers.google.com/chart/interactive/docs/gallery) .
  3. However there 26 additional charts in the charts page at https://developers.google.com/chart/interactive/docs/more_charts )

Building and embedding charts is simplified to a few steps

  • Load the AJAX API
  • Load the Visualization API and the appropriate package (like piechart or barchart from the kinds of chart)
  • Set a callback to run when the Google Visualization API is loaded
    • Within the Callback – It creates and populates a data table, instantiates the particular chart type chosen, passes in the data and draws it.
    • Create the data table with appropriately named columns and data rows.
    • Set chart options with Title, Width and Height
  • Instantiate and draw the chart, passing in some options including the name and id
  • Finally write the HTML/ Div that will hold the chart

You can simply copy and paste the code directly from https://developers.google.com/chart/interactive/docs/quick_start without getting into any details, and tweak them according to your data, chart preference and voila your web dashboard is ready!
That is the beauty of working with API- you can create and display genius ideas without messing with the scripting languages and code (too much). If you like to dive deeper into the API, you can look at the various objects at https://developers.google.com/chart/interactive/docs/reference

First launched in Mar 2008, Google Visualization API has indeed come a long way in making dashboards easier to build for people wanting to utilize advanced data visualization . It came about directly as a result of Google’s 2007 acquisition of GapMinder (of Hans Rosling fame).
As invariably and inevitably computing shifts to the cloud, visualization APIs will be very useful. Tableau Software has been a pioneer in selling data visualizing to the lucrative business intelligence and business dashboards community (you can see the Tableau Software API at http://onlinehelp.tableausoftware.com/v7.0/server/en-us/embed_api.htm ), and Google Visualization can do the same and capture business dashboard and visualization market , if there is more focus on integrating it from Google in it’s multiple and often confusing API offerings.
However as of now, this is quite simply the easiest way to create a web dashboard for your personal needs. Google guarantees 3 years of backward compatibility with this API and it is completely free.

Machine Learning to Translate Code from different programming languages

Google Translate has been a pioneer in using machine learning for translating various languages (and so is the awesome Google Transliterate)

I wonder if they can expand it to programming languages and not just human languages.

 

Issues in converting  translating programming language code

1) Paths referred for stored objects

2) Object Names should remain the same and not translated

3) Multiple Functions have multiple uses , sometimes function translate is not straightforward

I think all these issues are doable, solveable and more importantly profitable.

 

I look forward to the day a iOS developer can convert his code to Android app code by simple upload and download.

Google Cloud is finally here

Amazon gets some competition, and customers should see some relief, unless Google withdraws commitment on these products after a few years of trying (like it often does now!)

 

http://cloud.google.com/products/index.html

Machine Type Pricing
Configuration Virtual Cores Memory GCEU * Local disk Price/Hour $/GCEU/hour
n1-standard-1-d 1 3.75GB *** 2.75 420GB *** $0.145 0.053
n1-standard-2-d 2 7.5GB 5.5 870GB $0.29 0.053
n1-standard-4-d 4 15GB 11 1770GB $0.58 0.053
n1-standard-8-d 8 30GB 22 2 x 1770GB $1.16 0.053
Network Pricing
Ingress Free
Egress to the same Zone. Free
Egress to a different Cloud service within the same Region. Free
Egress to a different Zone in the same Region (per GB) $0.01
Egress to a different Region within the US $0.01 ****
Inter-continental Egress At Internet Egress Rate
Internet Egress (Americas/EMEA destination) per GB
0-1 TB in a month $0.12
1-10 TB $0.11
10+ TB $0.08
Internet Egress (APAC destination) per GB
0-1 TB in a month $0.21
1-10 TB $0.18
10+ TB $0.15
Persistent Disk Pricing
Provisioned space $0.10 GB/month
Snapshot storage** $0.125 GB/month
IO Operations $0.10 per million
IP Address Pricing
Static IP address (assigned but unused) $0.01 per hour
Ephemeral IP address (attached to instance) Free
* GCEU is Google Compute Engine Unit — a measure of computational power of our instances based on industry benchmarks; review the GCEU definition for more information
** coming soon
*** 1GB is defined as 2^30 bytes
**** promotional pricing; eventually will be charged at internet download rates

Google Prediction API

Tap into Google’s machine learning algorithms to analyze data and predict future outcomes.

Leverage machine learning without the complexity
Use the familiar RESTful interface
Enter input in any format – numeric or text

Build smart apps

Learn how you can use Prediction API to build customer sentiment analysis, spam detection or document and email classification.

Google Translation API

Use Google Translate API to build multilingual apps and programmatically translate text in your webpage or application.

Translate text into other languages programmatically
Use the familiar RESTful interface
Take advantage of Google’s powerful translation algorithms

Build multilingual apps

Learn how you can use Translate API to build apps that can programmatically translate text in your applications or websites.

Google BigQuery

Analyze Big Data in the cloud using SQL and get real-time business insights in seconds using Google BigQuery. Use a fully-managed data analysis service with no servers to install or maintain.
Figure

Reliable & Secure

Complete peace of mind as your data is automatically replicated across multiple sites and secured using access control lists.
Scale infinitely

You can store up to hundreds of terabytes, paying only for what you use.
Blazing fast

Run ad hoc SQL queries on
multi-terabyte datasets in seconds.

Google App Engine

Create apps on Google’s platform that are easy to manage and scale. Benefit from the same systems and infrastructure that power Google’s applications.

Focus on your apps

Let us worry about the underlying infrastructure and systems.
Scale infinitely

See your applications scale seamlessly from hundreds to millions of users.
Business ready

Premium paid support and 99.95% SLA for business users

Follow

Get every new post delivered to your Inbox.

Join 1,094 other followers

%d bloggers like this: