Home » Posts tagged 'distinct'

Tag Archives: distinct

A 3D Printed World

From http://en.wikipedia.org/wiki/3D_printing

Additive manufacturing or 3D printing[1] is a process of making three dimensional solid objects from a digital model. 3D printing is achieved using additive processes, where an object is created by laying down successive layers of material.[2] 3D printing is considered distinct from traditional machining techniques (subtractive processes) which mostly rely on the removal of material by drilling, cutting etc.

A world without factories , or atleast not as many. Where the only thing to be bought is design and raw material . Direct from the creators to the consumers.

Imagine 2025 – with the latest generation of 3 D printers. You browse though online catalogs, select designs  for furniture, accessories, clothes. Click buy and then print.

No more inventory planning ( except for the raw material wood,synthetic,cloth, plastic or better still an intermediate that can be done in all of these). Everything is bio-degradable in this new world of 3D printers.

That future is closer than you think! No more Made in China vs Made in USA

Everything will be made at home! designed by artists! delivered by Internet.

This is probably how they will shift manufacturing back to the rest of the planet to the First World, as both China and India are lagging behind in understanding the ramifications of mass produced 3D printers. 3D printers could do to factories what automatic washing machines did to laundry.

  1. http://mashable.com/2012/07/17/3d-printing-continuum-fashion/
  2. http://singularityhub.com/2012/04/23/3d-printing-robot-produces-chairs-and-tables-from-recycled-waste/
  3. http://www.wired.com/dangerroom/2012/08/3d-weapons/

Radoop 0.3 launched- Open Source Graphical Analytics meets Big Data

What is Radoop? Quite possibly an exciting mix of analytics and big data computing

 

http://blog.radoop.eu/?p=12

What is Radoop?

Hadoop is an excellent tool for analyzing large data sets, but it lacks an easy-to-use graphical interface. RapidMiner is an excellent tool for data analytics, but its data size is limited by the memory available, and a single machine is often not enough to run the analyses on time. In this project, we combine the strengths of both projects and provide a RapidMiner extension for editing and running ETL, data analytics and machine learning processes over Hadoop.

We have closely integrated the highly optimized data analytics capabilities of Hive and Mahout, and the user-friendly interface of RapidMiner to form a powerful and easy-to-use data analytics solution for Hadoop.

 

and what’s new

http://blog.radoop.eu/?p=198

Radoop 0.3 released – fully graphical big data analytics

Today, Radoop had a major step forward with its 0.3 release. The new version of the visual big data analytics package adds full support for all major Hadoop distributions used these days: Apache Hadoop 0.20.2, 0.20.203, 1.0 and Cloudera’s Distribution including Apache Hadoop 3 (CDH3). It also adds support for large clusters by allowing the namenode, the jobtracker and the Hive server to reside on different nodes.

As Radoop’s promise is to make big data analytics easier, the 0.3 release is also focused on improving the user interface. It has an enhanced breakpointing system which allows to investigate intermediate results, and it adds dozens of quick fixes, so common process design mistakes get much easier to solve.

There are many further improvements and fixes, so please consult the release notes for more details. Radoop is in private beta mode, but heading towards a public release in Q2 2012. If you would like to get early access, then please apply at the signup page or describe your use case in email (beta at radoop.eu).

Radoop 0.3 (15 February 2012)

  • Support for Apache Hadoop 0.20.2, 0.20.203, 1.0 and Cloudera’s Distribution Including Apache Hadoop 3 (CDH3) in a single release
  • Support for clusters with separate master nodes (namenode, jobtracker, Hive server)
  • Enhanced breakpointing to evaluate intermediate results
  • Dozens of quick fixes for the most common process design errors
  • Improved process design and error reporting
  • New welcome perspective to help in the first steps
  • Many bugfixes and performance improvements

Radoop 0.2.2 (6 December 2011)

  • More Aggregate functions and distinct option
  • Generate ID operator for convenience
  • Numerous bug fixes and improvements
  • Improved user interface

Radoop 0.2.1 (16 September 2011)

  • Set Role and Data Multiplier operators
  • Management panel for testing Hadoop connections
  • Stability improvements for Hive access
  • Further small bugfixes and improvements

Radoop 0.2 (26 July 2011)

  • Three new algoritms: Fuzzy K-Means, Canopy, and Dirichlet clustering
  • Three new data preprocessing operators: Normalize, Replace, and Replace Missing Values
  • Significant speed improvements in data transmission and interactive analytics
  • Increased stability and speedup for K-Means
  • More flexible settings for Join operations
  • More meaningful error messages
  • Other small bugfixes and improvements

Radoop 0.1 (14 June 2011)

Initial release with 26 operators for data transmission, data preprocessing, and one clustering algorithm.

Note that Rapid Miner also has a great R extension so you can use R, a graphical interface and big data analytics is now easier and more powerful than ever.


Interview JJ Allaire Founder, RStudio

Here is an interview with JJ Allaire, founder of RStudio. RStudio is the IDE that has overtaken other IDE within the R Community in terms of ease of usage. On the eve of their latest product launch, JJ talks to DecisionStats on RStudio and more.

Ajay-  So what is new in the latest version of RStudio and how exactly is it useful for people?

JJ- The initial release of RStudio as well as the two follow-up releases we did last year were focused on the core elements of using R: editing and running code, getting help, and managing files, history, workspaces, plots, and packages. In the meantime users have also been asking for some bigger features that would improve the overall work-flow of doing analysis with R. In this release (v0.95) we focused on three of these features:

Projects. R developers tend to have several (and often dozens) of working contexts associated with different clients, analyses, data sets, etc. RStudio projects make it easy to keep these contexts well separated (with distinct R sessions, working directories, environments, command histories, and active source documents), switch quickly between project contexts, and even work with multiple projects at once (using multiple running versions of RStudio).

Version Control. The benefits of using version control for collaboration are well known, but we also believe that solo data analysis can achieve significant productivity gains by using version control (this discussion on Stack Overflow talks about why). In this release we introduced integrated support for the two most popular open-source version control systems: Git and Subversion. This includes changelist management, file diffing, and browsing of project history, all right from within RStudio.

Code Navigation. When you look at how programmers work a surprisingly large amount of time is spent simply navigating from one context to another. Modern programming environments for general purpose languages like C++ and Java solve this problem using various forms of code navigation, and in this release we’ve brought these capabilities to R. The two main features here are the ability to type the name of any file or function in your project and go immediately to it; and the ability to navigate to the definition of any function under your cursor (including the definition of functions within packages) using a keystroke (F2) or mouse gesture (Ctrl+Click).

Ajay- What’s the product road map for RStudio? When can we expect the IDE to turn into a full fledged GUI?

JJ- Linus Torvalds has said that “Linux is evolution, not intelligent design.” RStudio tries to operate on a similar principle—the world of statistical computing is too deep, diverse, and ever-changing for any one person or vendor to map out in advance what is most important. So, our internal process is to ship a new release every few months, listen to what people are doing with the product (and hope to do with it), and then start from scratch again making the improvements that are considered most important.

Right now some of the things which seem to be top of mind for users are improved support for authoring and reproducible research, various editor enhancements including code folding, and debugging tools.

What you’ll see is us do in a given release is to work on a combination of frequently requested features, smaller improvements to usability and work-flow, bug fixes, and finally architectural changes required to support current or future feature requirements.

While we do try to base what we work on as closely as possible on direct user-feedback, we also adhere to some core principles concerning the overall philosophy and direction of the product. So for example the answer to the question about the IDE turning into a full-fledged GUI is: never. We believe that textual representations of computations provide fundamental advantages in transparency, reproducibility, collaboration, and re-usability. We believe that writing code is simply the right way to do complex technical work, so we’ll always look for ways to make coding better, faster, and easier rather than try to eliminate coding altogether.

Ajay -Describe your journey in science from a high school student to your present work in R. I noticed you have been very successful in making software products that have been mostly proprietary products or sold to companies.

Why did you get into open source products with RStudio? What are your plans for monetizing RStudio further down the line?

JJ- In high school and college my principal areas of study were Political Science and Economics. I also had a very strong parallel interest in both computing and quantitative analysis. My first job out of college was as a financial analyst at a government agency. The tools I used in that job were SAS and Excel. I had a dim notion that there must be a better way to marry computation and data analysis than those tools, but of course no concept of what this would look like.

From there I went more in the direction of general purpose computing, starting a couple of companies where I worked principally on programming languages and authoring tools for the Web. These companies produced proprietary software, which at the time (between 1995 and 2005) was a workable model because it allowed us to build the revenue required to fund development and to promote and distribute the software to a wider audience.

By 2005 it was however becoming clear that proprietary software would ultimately be overtaken by open source software in nearly all domains. The cost of development had shrunken dramatically thanks to both the availability of high-quality open source languages and tools as well as the scale of global collaboration possible on open source projects. The cost of promoting and distributing software had also collapsed thanks to efficiency of both distribution and information diffusion on the Web.

When I heard about R and learned more about it, I become very excited and inspired by what the project had accomplished. A group of extremely talented and dedicated users had created the software they needed for their work and then shared the fruits of that work with everyone. R was a platform that everyone could rally around because it worked so well, was extensible in all the right ways, and most importantly was free (as in speech) so users could depend upon it as a long-term foundation for their work.

So I started RStudio with the aim of making useful contributions to the R community. We started with building an IDE because it seemed like a first-rate development environment for R that was both powerful and easy to use was an unmet need. Being aware that many other companies had built successful businesses around open-source software, we were also convinced that we could make RStudio available under a free and open-source license (the AGPLv3) while still creating a viable business. At this point RStudio is exclusively focused on creating the best IDE for R that we can. As the core product gets where it needs to be over the next couple of years we’ll then also begin to sell other products and services related to R and RStudio.

About-

http://rstudio.org/docs/about

Jjallaire

JJ Allaire

JJ Allaire is a software engineer and entrepreneur who has created a wide variety of products including ColdFusion,Windows Live WriterLose It!, and RStudio.

From http://en.wikipedia.org/wiki/Joseph_J._Allaire
In 1995 Joseph J. (JJ) Allaire co-founded Allaire Corporation with his brother Jeremy Allaire, creating the web development tool ColdFusion.[1] In March 2001, Allaire was sold to Macromedia where ColdFusion was integrated into the Macromedia MX product line. Macromedia was subsequently acquired by Adobe Systems, which continues to develop and market ColdFusion.
After the sale of his company, Allaire became frustrated at the difficulty of keeping track of research he was doing using Google. To address this problem, he co-founded Onfolio in 2004 with Adam Berrey, former Allaire co-founder and VP of Marketing at Macromedia.
On March 8, 2006, Onfolio was acquired by Microsoft where many of the features of the original product are being incorporated into the Windows Live Toolbar. On August 13, 2006, Microsoft released the public beta of a new desktop blogging client called Windows Live Writer that was created by Allaire’s team at Microsoft.
Starting in 2009, Allaire has been developing a web-based interface to the widely used R technical computing environment. A beta version of RStudio was publicly released on February 28, 2011.
JJ Allaire received his B.A. from Macalester College (St. Paul, MN) in 1991.
RStudio-

RStudio is an integrated development environment (IDE) for R which works with the standard version of R available from CRAN. Like R, RStudio is available under a free software license. RStudio is designed to be as straightforward and intuitive as possible to provide a friendly environment for new and experienced R users alike. RStudio is also a company, and they plan to sell services (support, training, consulting, hosting) related to the open-source software they distribute.

Faster Distinct Values using Proc Freq in SAS

I recently stumbled upon the nlevels function in SAS. It is awesome in terms of processing speed, given that the alternative is PROC SQL, COUNT(DISTINCT) etc etc

Truly the fastest way to find uniqueness in vars is use the nlevels in PROC  FREQ – and why do we need to find levels in character variables- well to check for binary variables (2 values), constants (just 1 level), and simple data analysis stuff.

See this extract from-

ods output nlevels=levels;
proc freq data=good.sas nlevels;
tables _char_ /noprint;
quit;

Games on Google Plus get- Faster, Higher, Stronger

I am spending some time and some money on two games on Google Plus. One is Crime City at https://plus.google.com/games/865772480172 which I talk about in this post

http://www.decisionstats.com/google-plus-games-crime-city-or-fun-with-funzio-on-g/

and Global Warfare  https://plus.google.com/games/216622099218 (which is similar to Evony of the bad ads fame, and I will write on that in another post)

But the total number of games at Google Plus is increasingly and quietly getting better. It seems there is a distinct preference for existing blockbuster games , from both Zynga and non Zynga sources Even though Google is an investor in  Zynga, it clearly wants Google plus to avoid being so dependent on Zynga as Facebook clearly is. (more…)

What is a White Paper?

Christine and Jimmy Wales

Image via Wikipedia

As per Jimmy Wales and his merry band at Wiki (pedia not leaky-ah)- The emphasis is mine

What is the best white paper you have read in the past 15 years.

Categories are-

  • Business benefits: Makes a business case for a certain technology or methodology.
  • Technical: Describes how a certain technology works.
  • Hybrid: Combines business benefits with technical details in a single document.
  • Policy: Makes a case for a certain political solution to a societal or economic challenge.
——————————————————————————————————————————————————



white paper is an authoritative report or guide that helps solve a problem. White papers are used to educate readers and help people make decisions, and are often requested and used in politics, policy, business, and technical fields. In commercial use, the term has also come to refer to documents used by businesses as a marketing or sales tool. Policy makers frequently request white papers from universities or academic personnel to inform policy developments with expert opinions or relevant research.

Government white papers

In the Commonwealth of Nations, “white paper” is an informal name for a parliamentary paper enunciating government policy; in the United Kingdom these are mostly issued as “Command papers“. White papers are issued by the government and lay out policy, or proposed action, on a topic of current concern. Although a white paper may on occasion be a consultation as to the details of new legislation, it does signify a clear intention on the part of a government to pass new law. White Papers are a “…. tool of participatory democracy … not [an] unalterable policy commitment.[1] “White Papers have tried to perform the dual role of presenting firm government policies while at the same time inviting opinions upon them.” [2]

In Canada, a white paper “is considered to be a policy document, approved by Cabinet, tabled in the House of Commons and made available to the general public.”[3] A Canadian author notes that the “provision of policy information through the use of white and green papers can help to create an awareness of policy issues among parliamentarians and the public and to encourage an exchange of information and analysis. They can also serve as educational techniques”.[4]

“White Papers are used as a means of presenting government policy preferences prior to the introduction of legislation”; as such, the “publication of a White Paper serves to test the climate of public opinion regarding a controversial policy issue and enables the government to gauge its probable impact”.[5]

By contrast, green papers, which are issued much more frequently, are more open ended. These green papers, also known as consultation documents, may merely propose a strategy to be implemented in the details of other legislation or they may set out proposals on which the government wishes to obtain public views and opinion.

White papers published by the European Commission are documents containing proposals for European Union action in a specific area. They sometimes follow a green paper released to launch a public consultation process.

For examples see the following:

 Commercial white papers

Since the early 1990s, the term white paper has also come to refer to documents used by businesses and so-called think tanks as marketing or sales tools. White papers of this sort argue that the benefits of a particular technologyproduct or policy are superior for solving a specific problem.

These types of white papers are almost always marketing communications documents designed to promote a specific company’s or group’s solutions or products. As a marketing tool, these papers will highlight information favorable to the company authorizing or sponsoring the paper. Such white papers are often used to generate sales leads, establish thought leadership, make a business case, or to educate customers or voters.

There are four main types of commercial white papers:

  • Business benefits: Makes a business case for a certain technology or methodology.
  • Technical: Describes how a certain technology works.
  • Hybrid: Combines business benefits with technical details in a single document.
  • Policy: Makes a case for a certain political solution to a societal or economic challenge.

Resources

  • Stelzner, Michael (2007). Writing White Papers: How to capture readers and keep them engaged. Poway, California: WhitePaperSource Publishing. pp. 214. ISBN 9780977716937.
  • Bly, Robert W. (2006). The White Paper Marketing Handbook. Florence, Kentucky: South-Western Educational Publishing. pp. 256. ISBN 9780324300826.
  • Kantor, Jonathan (2009). Crafting White Paper 2.0: Designing Information for Today’s Time and Attention Challenged Business Reader. Denver,Colorado: Lulu Publishing. pp. 167.ISBN 9780557163243.

Google Books Ngram Viewer

Here is a terrific data visualization from Google based on their digitized books collection. How does it work, basically you can test the frequency of various words across time periods from 1700s to 2010.

Like the frequency and intensity of kung fu vs yoga, or pizza versus hot dog. The basic datasets scans millions /billions of words.

Here is my yoga vs kung fu vs judo graph.

http://ngrams.googlelabs.com/info

What’s all this do?

When you enter phrases into the Google Books Ngram Viewer, it displays a graph showing how those phrases have occurred in a corpus of books (e.g., “British English”, “English Fiction”, “French”) over the selected years. Let’s look at a sample graph:

This shows trends in three ngrams from 1950 to 2000: “nursery school” (a 2-gram or bigram), “kindergarten” (a 1-gram or unigram), and “child care” (another bigram). What the y-axis shows is this: of all the bigrams contained in our sample of books written in English and published in the United States, what percentage of them are “nursery school” or “child care”? Of all the unigrams, what percentage of them are “kindergarten”? Here, you can see that use of the phrase “child care” started to rise in the late 1960s, overtaking “nursery school” around 1970 and then “kindergarten” around 1973. It peaked shortly after 1990 and has been falling steadily since.

(Interestingly, the results are noticeably different when the corpus is switched to British English.)

Corpora

Below are descriptions of the corpora that can be searched with the Google Books Ngram Viewer. All of these corpora were generated in July 2009; we will update these corpora as our book scanning continues, and the updated versions will have distinct persistent identifiers.

Informal corpus name Persistent identifier Description
American English googlebooks-eng-us-all-20090715 Same filtering as the English corpus but further restricted to books published in the United States.
British English googlebooks-eng-gb-all-20090715 Same filtering as the English corpus but further restricted to books published in Great Britain.
Follow

Get every new post delivered to your Inbox.

Join 735 other followers