Radoop 0.3 launched- Open Source Graphical Analytics meets Big Data

What is Radoop? Quite possibly an exciting mix of analytics and big data computing

 

http://blog.radoop.eu/?p=12

What is Radoop?

Hadoop is an excellent tool for analyzing large data sets, but it lacks an easy-to-use graphical interface. RapidMiner is an excellent tool for data analytics, but its data size is limited by the memory available, and a single machine is often not enough to run the analyses on time. In this project, we combine the strengths of both projects and provide a RapidMiner extension for editing and running ETL, data analytics and machine learning processes over Hadoop.

We have closely integrated the highly optimized data analytics capabilities of Hive and Mahout, and the user-friendly interface of RapidMiner to form a powerful and easy-to-use data analytics solution for Hadoop.

 

and what’s new

http://blog.radoop.eu/?p=198

Radoop 0.3 released – fully graphical big data analytics

Today, Radoop had a major step forward with its 0.3 release. The new version of the visual big data analytics package adds full support for all major Hadoop distributions used these days: Apache Hadoop 0.20.2, 0.20.203, 1.0 and Cloudera’s Distribution including Apache Hadoop 3 (CDH3). It also adds support for large clusters by allowing the namenode, the jobtracker and the Hive server to reside on different nodes.

As Radoop’s promise is to make big data analytics easier, the 0.3 release is also focused on improving the user interface. It has an enhanced breakpointing system which allows to investigate intermediate results, and it adds dozens of quick fixes, so common process design mistakes get much easier to solve.

There are many further improvements and fixes, so please consult the release notes for more details. Radoop is in private beta mode, but heading towards a public release in Q2 2012. If you would like to get early access, then please apply at the signup page or describe your use case in email (beta at radoop.eu).

Radoop 0.3 (15 February 2012)

  • Support for Apache Hadoop 0.20.2, 0.20.203, 1.0 and Cloudera’s Distribution Including Apache Hadoop 3 (CDH3) in a single release
  • Support for clusters with separate master nodes (namenode, jobtracker, Hive server)
  • Enhanced breakpointing to evaluate intermediate results
  • Dozens of quick fixes for the most common process design errors
  • Improved process design and error reporting
  • New welcome perspective to help in the first steps
  • Many bugfixes and performance improvements

Radoop 0.2.2 (6 December 2011)

  • More Aggregate functions and distinct option
  • Generate ID operator for convenience
  • Numerous bug fixes and improvements
  • Improved user interface

Radoop 0.2.1 (16 September 2011)

  • Set Role and Data Multiplier operators
  • Management panel for testing Hadoop connections
  • Stability improvements for Hive access
  • Further small bugfixes and improvements

Radoop 0.2 (26 July 2011)

  • Three new algoritms: Fuzzy K-Means, Canopy, and Dirichlet clustering
  • Three new data preprocessing operators: Normalize, Replace, and Replace Missing Values
  • Significant speed improvements in data transmission and interactive analytics
  • Increased stability and speedup for K-Means
  • More flexible settings for Join operations
  • More meaningful error messages
  • Other small bugfixes and improvements

Radoop 0.1 (14 June 2011)

Initial release with 26 operators for data transmission, data preprocessing, and one clustering algorithm.

Note that Rapid Miner also has a great R extension so you can use R, a graphical interface and big data analytics is now easier and more powerful than ever.


Moving from OpenDNS to Google DNS

It is best to use a DNS resolution service to avoid targeted attacks on your machine esp if you use the browser a lot. and it is quite fast!! Takes 2 minutes to set it up even for non geeks

I was getting slower browsing speeds on OpenDNS http://www.opendns.com/

so I switched to Google DNS (though I am not sure how people in Iran and China – who have a much greater need for DNS verification services will get secure resolution of DNS)

http://code.google.com/speed/public-dns/

What is Google Public DNS?

Google Public DNS is a free, global Domain Name System (DNS) resolution service, that you can use as an alternative to your current DNS provider.

To try it out:

  • Configure your network settings to use the IP addresses 8.8.8.8 and 8.8.4.4 as your DNS servers or
  • Read our configuration instructions.

New! For IPv6 addresses, see our configuration instructions.

If you decide to try Google Public DNS, your client programs will perform all DNS lookups using Google Public DNS.

Why does DNS matter?

The DNS protocol is an important part of the web’s infrastructure, serving as the Internet’s phone book: every time you visit a website, your computer performs a DNS lookup. Complex pages often require multiple DNS lookups before they start loading, so your computer may be performing hundreds of lookups a day.

Why should you try Google Public DNS?

By using Google Public DNS you can:

Machine Learning for Hackers – #rstats

I got the incredible and intriguing Machine Learning for Hackers for just $15.99 for an electronic copy from O Reilly Media. (Deal of the Day!)

It has just been launched this month!!

It is an incredible book- and I really  like the way O Reilly has made it so easy to download E Books

I am trying to read it while trying to a write a whole lot of other stuff— and it seems easy to read and understand even for non-hackers like me. Esp with Stanford delaying its online machine learning course- this is one handy e-book to have  to get you started in ML and data science!!

Click the image to see the real deal.

http://shop.oreilly.com/product/0636920018483.do

 

How to find out people who are spamming you

Step 1-

We assume you have Gmail. If you dont have Gmail, you deserve the Spam

You click -show original on the drop down in the spammy message

 

you see a lot of mumbo jumbo

(or you just pick the IP addresses from comment spam)

Step 2-

You pick the IP addresses from the mumbo jumbo above (called headers )

http://en.wikipedia.org/wiki/IP_address

An Internet Protocol address (IP address) is a numerical label assigned to each device (e.g., computer, printer) participating in a computer networkthat uses the Internet Protocol for communication.[1] An IP address serves two principal functions: host or network interface identification and locationaddressing

Step 3-

You find out who has that IP address using arin

https://www.arin.net/

 

Step 4-

You put those IP addresses in your firewall for your computer

http://technet.microsoft.com/en-us/library/cc733090(v=ws.10).aspx

(or if you have a self-hosted blog using Website cpanel ip deny)

http://www.siteground.com/tutorials/cpanel/ip_deny_manager.htm

Step 5-

 

Communicate to that IP Address using IRC

http://en.wikipedia.org/wiki/Internet_Relay_Chat

Internet Relay Chat (IRC) is a protocol for real-time Internet text messaging (chat) or synchronous conferencing.[1] It is mainly designed for group communication in discussion forums, called channels,[2] but also allows one-to-one communication via private message[3] as well as chat and data transfer,[4] including file sharing.[5]

or use HOIC to test your own firewall better before people  spam  you

http://gizmodo.com/5883146/what-is-hoic or

http://www.decisionstats.com/occupy-the-internet/

 

Predictive Analytics World Events in 2012

A new line up of Predictive Analytics World and Text Analytics World conferences and workshops are coming March through July, plus see the save-the-dates and call-for-speakers for events in Sept, Oct, and Nov.

CONFERENCE: Predictive Analytics World – San Francisco

March 4-10, 2012 in San Francisco, CA
http://predictiveanalyticsworld.com/sanfrancisco/2012
Discount Code for $150 off: AJAYBP12

CONFERENCE: Text Analytics World – San Francisco
March 6-7, 2012 in San Francisco, CA
http://textanalyticsworld.com/sanfrancisco/2012
Discount Code for $150 off: AJAYBP12

VARIOUS ANALYTICS WORKSHOPS:
A plethora of 1-day workshops are held alongside PAW and TAW
For details see: http://pawcon.com/sanfrancisco/2012/analytics_workshops.php

SEMINAR: Predictive Analytics for Business, Marketing & Web
March 22-23, 2012 in New York City, NY
July 26-27, 2012 in São Paulo, Brazil
Oct 11-12, 2012 in San Francisco
A concentrated training program lead by PAW’s chair, Eric Siegel
http://businessprediction.com

CONFERENCE: Predictive Analytics World – Toronto
April 25-26, 2012 in Toronto, Ontario
http://predictiveanalyticsworld.com/toronto/2012
Discount Code for $150 off: AJAYBP12

CONFERENCE: Predictive Analytics World – Chicago
June 25-26, 2012 in Chicago, IL
http://www.predictiveanalyticsworld.com/chicago/2012/
Discount Code for $150 off: AJAYBP12

 

From Ajay-

CONTEST- If you use the discount code AJAYBP12, you will not only get the $150 off, but you will be entered in a contest to get 2 complementary passes like I did last year . Matt Stromberg won that one

http://www.decisionstats.com/contest-2-free-passes-to-predictive-analytics-world/

 

see last year results-

http://www.decisionstats.com/congrats-to-matt-stromberg-winner-2-free-passes-to-paw-new-york/

Internet Encryption Algols are flawed- too little too late!

Some news from a paper I am reading- not surprised that RSA has a problem .

http://eprint.iacr.org/2012/064.pdf

Abstract. We performed a sanity check of public keys collected on the web. Our main goal was to test the validity of the assumption that di erent random choices are made each time keys are generated.We found that the vast majority of public keys work as intended. A more disconcerting fi nding is that two out of every one thousand RSA moduli that we collected off er no security.

 

Our conclusion is that the validity of the assumption is questionable and that generating keys in the real world for multiple-secrets” cryptosystems such as RSA is signi cantly riskier than for single-secret” ones such as ElGamal or (EC)DSA which are based on Die-Hellman.

Keywords: Sanity check, RSA, 99.8% security, ElGamal, DSA, ECDSA, (batch) factoring, discrete logarithm, Euclidean algorithm, seeding random number generators, K9.

and

 

99.8% Security. More seriously, we stumbled upon 12720 di erent 1024-bit RSA moduli that o ffer no security. Their secret keys are accessible to anyone who takes the trouble to redo our work. Assuming access to the public key collection, this is straightforward compared to more

traditional ways to retrieve RSA secret keys (cf. [5,15]). Information on the a ected X.509 certi cates and PGP keys is given in the full version of this paper, cf. below. Overall, over the data we collected 1024-bit RSA provides 99.8% security at best (but see Appendix A).

 

However no algol is perfect and even Elliptic Based Crypto ( see http://en.wikipedia.org/wiki/Elliptic_curve_cryptography#Fast_reduction_.28NIST_curves.29 )has a flaw called Shor http://en.wikipedia.org/wiki/Shor%27s_algorithm

Funny thing is ECC is now used for Open DNS


http://dnscurve.org/crypto.html

The DNSCurve project adds link-level public-key protection to DNS packets. This page discusses the cryptographic tools used in DNSCurve.

ELLIPTIC-CURVE CRYPTOGRAPHY

DNSCurve uses elliptic-curve cryptography, not RSA.

RSA is somewhat older than elliptic-curve cryptography: RSA was introduced in 1977, while elliptic-curve cryptography was introduced in 1985. However, RSA has shown many more weaknesses than elliptic-curve cryptography. RSA’s effective security level was dramatically reduced by the linear sieve in the late 1970s, by the quadratic sieve and ECM in the 1980s, and by the number-field sieve in the 1990s. For comparison, a few attacks have been developed against some rare elliptic curves having special algebraic structures, and the amount of computer power available to attackers has predictably increased, but typical elliptic curves require just as much computer power to break today as they required twenty years ago.

IEEE P1363 standardized elliptic-curve cryptography in the late 1990s, including a stringent list of security criteria for elliptic curves. NIST used the IEEE P1363 criteria to select fifteen specific elliptic curves at five different security levels. In 2005, NSA issued a new “Suite B” standard, recommending the NIST elliptic curves (at two specific security levels) for all public-key cryptography and withdrawing previous recommendations of RSA.

Some specific types of elliptic-curve cryptography are patented, but DNSCurve does not use any of those types of elliptic-curve cryptography.

No wonder college kids are hacking defense databases easily nowadays!!

Predictive analytics in the cloud : Angoss

I interviewed Angoss in depth here at http://www.decisionstats.com/interview-eberhard-miethke-and-dr-mamdouh-refaat-angoss-software/

Well they just announced a predictive analytics in the cloud.

 

http://www.angoss.com/predictive-analytics-solutions/cloud-solutions/

Solutions

Overview

KnowledgeCLOUD™ solutions deliver predictive analytics in the Cloud to help businesses gain competitive advantage in the areas of sales, marketing and risk management by unlocking the predictive power of their customer data.

KnowledgeCLOUD clients experience rapid time to value and reduced IT investment, and enjoy the benefits of Angoss’ industry leading predictive analytics – without the need for highly specialized human capital and technology.

KnowledgeCLOUD solutions serve clients in the asset management, insurance, banking, high tech, healthcare and retail industries. Industry solutions consist of a choice of analytical modules:

KnowledgeCLOUD for Sales/Marketing

KnowledgeCLOUD solutions are delivered via KnowledgeHUB™, a secure, scalable cloud-based analytical platform together with supporting deployment processes and professional services that deliver predictive analytics to clients in a hosted environment. Angoss industry leading predictive analytics technology is employed for the development of models and deployment of solutions.

Angoss’ deep analytics and domain expertise guarantees effectiveness – all solutions are back-tested for accuracy against historical data prior to deployment. Best practices are shared throughout the service to optimize your processes and success. Finely tuned client engagement and professional services ensure effective change management and program adoption throughout your organization.

For businesses looking to gain a competitive edge and put their data to work, Angoss is the ideal partner.

—-

Hmm. Analytics in the cloud . Reduce hardware costs. Reduce software costs . Increase profitability margins.

Hmmmmm

My favorite professor in North Carolina who calls cloud as a time sharing, are you listening Professor?