Big Data for Big Brother. Now playing. At a computer near you. How to help water the tree of liberty using statistics?
or use SAS software
SAS/CIA from the last paragraph of
1) It takes 5 seconds to switch your Search Engine Provider.
2) It consistently does not lock you with added services of its own (like Youtube runs on Iphones and Facebook and on Internet Explorer and so does the search engine and so does Google Maps).
3) It has leveraged the maximum from goodwill from open source developers without investing too much money back into them Goodwill is precious and once you lose dev cred you lose it for some time (as some companies found out in the last decade).
4) It has very well funded rivals from MS, Oracle,and Apple and Facebook. That alone will guarantee competition more than any lawyer.
Some Google Strengths-
5) Its portfolio remains under monetized – that gives it plenty of flexibility. The following products are the best in class, and yet are mostly free for retail customers.
6) Its main product is free to use. You cant beat free. You can try. but cant.
7) It remains the prime source of CS/ Math/Stat related talent on this planet (remember Map Reduce paper). Only the NSA has more bad ass geeks.
8) It tries not to be evil. Mostly it is not evil. Sometimes the ads get irritating. But never evil.
I came across this lovely analytics company. Think Big Analytics. and I really liked their lovely explanation of the whole she-bang big data etc stuff. Because Hadoop isnt rocket science and can be made simpler to explain and deploy.
Check them out yourself at http://www.thinkbiganalytics.com/resources_reference
Also they have an awesome series of lectures coming up-
Over three days, explore the Big Data tools, technologies and techniques which allow organisations to gain insight and drive new business opportunities by finding signal in their data. Using Amazon Web Services, you’ll learn how to use the flexible map/reduce programming model to scale your analytics, use Hadoop with Elastic MapReduce, write queries with Hive, develop real world data flows with Pig and understand the operational needs of a production data platform
|Oracle Advanced Analytics — an option to Oracle Database 11g Enterprise Edition – extends the database into a comprehensive advanced analytics platform through two major components: Oracle R Enterprise and Oracle Data Mining. With Oracle Advanced Analytics, customers have a comprehensive platform for real-time analytic applications that deliver insight into key business subjects such as churn prediction, product recommendations, and fraud alerting.
Oracle R Enterprise tightly integrates the open source R programming language with the database to further extend the database with Rs library of statistical functionality, and pushes down computations to the database. Oracle R Enterprise dramatically advances the capability for R users, and allows them to use their existing R development skills and tools, and scripts can now also run transparently and scale against data stored in Oracle Database 11g.
Oracle Data Mining provides powerful data mining algorithms that run as native SQL functions for in-database model building and model deployment. It can be accessed through the SQL Developer extension Oracle Data Miner to build, evaluate, share and deploy predictive analytics methodologies. At the same time the high-performance Oracle-specific data mining algorithms are accessible from R.
|Oracle R Hadoop Connector||Gives R users high performance native access to Hadoop Distributed File System (HDFS) and MapReduce programming framework.|
Continuing the DecisionStats series on trends for 2012, Timo Elliott , Technology Evangelist at SAP Business Objects, looks at the predictions he made in the beginning of 2011 and follows up with the things that surprised him in 2011, and what he foresees in 2012.
You can read last year’s predictions by Mr Elliott at http://www.decisionstats.com/brief-interview-timo-elliott/
Timo- Here are my comments on the “top three analytics trends” predictions I made last year:
(1) Analytics, reinvented. New DW techniques make it possible to do sub-second, interactive analytics directly against row-level operational data. Now BI processes and interfaces need to be rethought and redesigned to make best use of this — notably by blurring the distinctions between the “design” and “consumption” phases of BI.
I spent most of 2011 talking about this theme at various conferences: how existing BI technology israpidly becoming obsolete and how the changes are akin to the move from film to digital photography. Technology that has been around for many years (in-memory, column stores, datawarehouse appliances, etc.) came together to create exciting new opportunities and even generally-skeptical industry analysts put out press releases such as “Gartner Says Data Warehousing Reaching Its Most Significant Inflection Point Since Its Inception.” Some of the smaller BI vendors had been pushing in-memory analytics for years, but the general market started paying more attention when megavendors like SAP started painting a long-term vision of in-memory becoming a core platform for applications, not just analytics. Database leader Oracle was forced to upgrade their in-memory messaging from “It’s a complete fantasy” to “we have that too”.
(2) Corporate and personal BI come together. The ability to mix corporate and personal data for quick, pragmatic analysis is a common business need. The typical solution to the problem — extracting and combining the data into a local data store (either Excel or a departmental data mart) — pleases users, but introduces duplication and extra costs and makes a mockery of information governance. 2011 will see the rise of systems that let individuals and departments load their data into personal spaces in the corporate environment, allowing pragmatic analytic flexibility without compromising security and governance.
The number of departmental “data discovery” initiatives continued to rise through 2011, but new tools do make it easier for business people to upload and manipulate their own information while using the corporate standards. 2012 will see more development of “enterprise data discovery” interfaces for casual users.
(3) The next generation of business applications. Where are the business applications designed to support what people really do all day, such as implementing this year’s strategy, launching new products, or acquiring another company? 2011 will see the first prototypes of people-focused, flexible, information-centric, and collaborative applications, bringing together the best of business intelligence, “enterprise 2.0”, and existing operational applications.
2011 saw the rise of sophisticated, user-centric mobile applications that combine data from corporate systems with GPS mapping and the ability to “take action”, such as mobile medical analytics for doctors or mobile beauty advisor applications, and collaborative BI started becoming a standard part of enterprise platforms.
And one that should happen, but probably won’t: (4) Intelligence = Information + PEOPLE. Successful analytics isn’t about technology — it’s about people, process, and culture. The biggest trend in 2011 should be organizations spending the majority of their efforts on user adoption rather than technical implementation.
Unsurprisingly, there was still high demand for presentations on why BI projects fail and how to implement BI competency centers. The new architectures probably resulted in even more emphasis on technology than ever, while business peoples’ expectations skyrocketed, fueled by advances in the consumer world. The result was probably even more dissatisfaction in the past, but the benefits of the new architectures should start becoming clearer during 2012.
What surprised me the most:
The rapid rise of Hadoop / NoSQL. The potentials of the technology have always been impressive, but I was surprised just how quickly these technology has been used to address real-life business problems (beyond the “big web” vendors where it originated), and how quickly it is becoming part of mainstream enterprise analytic architectures (e.g. Sybase IQ 15.4 includes native MapReduce APIs, Hadoop integration and federation, etc.)
Prediction for 2012:
As I sat down to gather my thoughts about BI in 2012, I quickly came up with the same long laundry list of BI topics as everybody else: in-memory, mobile, predictive, social, collaborative decision-making, data discovery, real-time, etc. etc. All of these things are clearly important, and where going to continue to see great improvements this year. But I think that the real “next big thing” in BI is what I’m seeing when I talk to customers: they’re using these new opportunities not only to “improve analytics” but also fundamentally rethink some of their key business processes.
Instead of analytics being something that is used to monitor and eventually improve a business process, analytics is becoming a more fundamental part of the business process itself. One example is a large telco company that has transformed the way they attract customers. Instead of laboriously creating a range of rate plans, promoting them, and analyzing the results, they now use analytics to automatically create hundreds of more complex, personalized rate plans. They then throw them out into the market, monitor in real time, and quickly cull any that aren’t successful. It’s a way of doing business that would have been inconceivable in the past, and a lot more common in the future.
Timo Elliott is a 20-year veteran of SAP BusinessObjects, and has spent the last quarter-century working with customers around the world on information strategy.
He works closely with SAP research and innovation centers around the world to evangelize new technology prototypes.
His popular Business Analytics blog tracks innovation in analytics and social media, including topics such as augmented corporate reality, collaborative decision-making, and social network analysis.
His PowerPoint Twitter Tools lets presenters see and react to tweets in real time, embedded directly within their slides.
A popular and engaging speaker, Elliott presents regularly to IT and business audiences at international conferences, on subjects such as why BI projects fail and what to do about it, and the intersection of BI and enterprise 2.0.
Prior to Business Objects, Elliott was a computer consultant in Hong Kong and led analytics projects for Shell in New Zealand. He holds a first-class honors degree in Economics with Statistics from Bristol University, England
Timo can be contacted via Twitter at https://twitter.com/timoelliott
Part 1 of this series was from James Kobielus, Forrestor at http://www.decisionstats.com/jim-kobielus-on-2012/
From the press release, Oracle gets on R and me too- NoSQL
The Oracle Big Data Appliance is a new engineered system that includes an open source distribution of Apache™ Hadoop™, Oracle NoSQL Database, Oracle Data Integrator Application Adapter for Hadoop, Oracle Loader for Hadoop, and an open source distribution of R.
the Big Data Appliance also includes the R programming language, a popular open source statistical-analysis tool. This R engine will integrate with 11g R2, so presumably if you want to do statistical analysis on unstructured data stored in and chewed by Hadoop, you will have to move it to Oracle after the chewing has subsided.
This approach to R-Hadoop integration is different from that announced last week between Revolution Analytics, the so-called Red Hat for stats that is extending and commercializing the R language and its engine, and Cloudera, which sells a commercial Hadoop setup called CDH3 and which was one of the early companies to offer support for Hadoop. Both Revolution Analytics and Cloudera now have Oracle as their competitor, which was no doubt no surprise to either.
In any event, the way they do it, the R engine is put on each node in the Hadoop cluster, and those R engines just see the Hadoop data as a native format that they can do analysis on individually. As statisticians do analyses on data sets, the summary data from all the nodes in the Hadoop cluster is sent back to their R workstations; they have no idea that they are using MapReduce on unstructured data.
Oracle did not supply configuration and pricing information for the Big Data Appliance, and also did not say when it would be for sale or shipping to customers
A Horizontally Scaled, Key-Value Database for the Enterprise
Oracle NoSQL Database is a commercial grade, general-purpose NoSQL database using a key/value paradigm. It allows you to manage massive quantities of data, cope with changing data formats, and submit simple queries. Complex queries are supported using Hadoop or Oracle Database operating upon Oracle NoSQL Database data.
Oracle NoSQL Database delivers scalable throughput with bounded latency, easy administration, and a simple programming model. It scales horizontally to hundreds of nodes with high availability and transparent load balancing. Customers might choose Oracle NoSQL Database to support Web applications, acquire sensor data, scale authentication services, or support online serves and social media.
Oracle says it will integrate R with its Oracle Database. Other signs from Oracle show the deeper interest in using the statistical framework for integration with Hadoop to potentially speed statistical analysis. This has particular value with analyzing vast amounts of unstructured data, which has overwhelmed organizations, especially over the past year.
Oracle R Enterprise
so apparantly ole client AsterData continues to thrive under gentle touch of Terrific Data
Aster Data today launched the SQL-MapReduce Developer Portal, a new online community for data scientists and analytic developers. For your convenience, I copied the release below and it can also be found here. Please let me know if you have any questions or if there is anything else I can help you with.
Point Communications Group for Aster Data
Teradata Accelerates Big Data Analytics with First Collaborative Community for SQL-MapReduce®
New online community for data scientists and analytic developers enables development and sharing of powerful MapReduce analytics
San Carlos, California – Teradata Corporation (NYSE:TDC) today announced the launch of the Aster Data SQL-MapReduce® Developer Portal. This portal is the first collaborative online developer community for SQL-MapReduce analytics, an emerging framework for processing non-relational data and ultra-fast analytics.
“Aster Data continues to deliver on its unique vision for powerful analytics with a rich set of tools to make development of those analytics quick and easy,” said Tasso Argyros, vice president of Aster Data Marketing and Product Management, Teradata Corporation. “This new developer portal builds on Aster Data’s continuing SQL-MapReduce innovation, leveraging the flexibility and power of SQL-MapReduce for analytics that were previously impossible or impractical.”
The developer portal showcases the power and flexibility of Aster Data’s SQL-MapReduce – which uniquely combines standard SQL with the popular MapReduce distributed computing technology for processing big data – by providing a collaborative community for sharing SQL-MapReduce expert insights in addition to sharing SQL-MapReduce analytic functions and sample code. Data scientists, quantitative analysts, and developers can now leverage the experience, knowledge, and best practices of a community of experts to easily harness the power of SQL-MapReduce for big data analytics.
A recent report from IDC Research, “Taking Care of Your Quants: Focusing Data Warehousing Resources on Quantitative Analysts Matters,” has shown that by enabling data scientists with the tools to harness emerging types and sources of data, companies create significant competitive advantage and become leaders in their respective industry.
“The biggest positive differences among leaders and the rest come from the introduction of new types of data,” says Dan Vesset, program vice president, Business Analytics Solutions, IDC Research. “This may include either new transactional data sources or new external data feeds of transactional or multi-structured interactional data — the latter may include click stream or other data that is a by-product of social networking.”
Vesset goes on to say, “Aster Data provides a comprehensive platform for analytics and their SQL-MapReduce Developer Portal provides a community for sharing best practices and functions which can have an even greater impact to an organization’s business.”
With this announcement Aster Data extends its industry leadership in delivering the most comprehensive analytic platform for big data analytics — not only capable of processing massive volumes of multi-structured data, but also providing an extensive set of tools and capabilities that make it simple to leverage the power of MapReduce analytics. The Aster Data
SQL-MapReduce Developer Portal brings the power of SQL-MapReduce accessible to data scientists, quantitative analysis, and analytic developers by making it easy to share and collaborate with experts in developing SQL-MapReduce analytics. This portal builds on Aster Data’s history of SQL-MapReduce innovations, including:
Aster Data’s patent-pending SQL-MapReduce enables analytic applications and functions that can deliver faster, deeper insights on terabytes to petabytes of data. These applications are implemented using MapReduce but delivered through standard SQL and business intelligence (BI) tools.
SQL-MapReduce makes it possible for data scientists and developers to empower business analysts with the ability to make informed decisions, incorporating vast amounts of data, regardless of query complexity or data type. Aster Data customers are using SQL-MapReduce for rich analytics including analytic applications for social network analysis, digital marketing optimization, and on-the-fly fraud detection and prevention.
“Collaboration is at the core of our success as one of the leading providers, and pioneers of social software,” said Navdeep Alam, director of Data Architecture at Mzinga. “We are pleased to be one of the early members of The Aster Data SQL-MapReduce Developer Portal, which will allow us the ability to share and leverage insights with others in using big data analytics to attain a deeper understanding of customers’ behavior and create competitive advantage for our business.”
SQL-MapReduce is one of the core capabilities within Aster Data’s flagship product. Aster DatanCluster™ 4.6, the industry’s first massively parallel processing (MPP) analytic platform has an integrated analytics engine that stores and processes both relational and non-relational data at scale. With Aster Data’s unique analytics framework that supports both SQL and
SQL-MapReduce™, customers benefit from rich, new analytics on large data volumes with complex data types. Aster Data analytic functions are embedded within the analytic platform and processed locally with data, which allows for faster data exploration. The SQL-MapReduce framework provides scalable fault-tolerance for new analytics, providing users with superior reliability, regardless of number of users, query size, or data types.
About Aster Data
Aster Data is a market leader in big data analytics, enabling the powerful combination of cost-effective storage and ultra-fast analysis of new sources and types of data. The Aster Data nCluster analytic platform is a massively parallel software solution that embeds MapReduce analytic processing with data stores for deeper insights on new data sources and types to deliver new analytic capabilities with breakthrough performance and scalability. Aster Data’s solution utilizes Aster Data’s patent-pending SQL-MapReduce to parallelize processing of data and applications and deliver rich analytic insights at scale. Companies including Barnes & Noble, Intuit, LinkedIn, Akamai, and MySpace use Aster Data to deliver applications such as digital marketing optimization, social network and relationship analysis, and fraud detection and prevention.
Teradata is the world’s leader in data warehousing and integrated marketing management through itsdatabase software, data warehouse appliances, and enterprise analytics. For more information, visitteradata.com.
# # #
Teradata is a trademark or registered trademark of Teradata Corporation in the United States and other countries.