Ajay Ohri

PMML Augustus

Here is a new-old system in open source for

for building and scoring statistical models designed to work with data sets that are too large to fit into memory.

http://code.google.com/p/augustus/

Augustus is an open source software toolkit for building and scoring statistical models. It is written in Python and its
most distinctive features are:
• Ability to be used on sets of big data; these are data sets that exceed either memory capacity or disk capacity, so
that existing solutions like R or SAS cannot be used. Augustus is also perfectly capable of handling problems
that can fit on one computer.
• PMML compliance and the ability to both:
– produce models with PMML-compliant formats (saved with extension .pmml).
– consume models from files with the PMML format.
Augustus has been tested and deployed on serveral operating systems. It is intended for developers who work in the
financial or insurance industry, information technology, or in the science and research communities.
Usage
Augustus produces and consumes Baseline, Cluster, Tree, and Ruleset models. Currently, it uses an event-based
approach to building Tree, Cluster and Ruleset models that is non-standard.

New to PMML ?

Read on http://code.google.com/p/augustus/wiki/PMML

The Predictive Model Markup Language or PMML is a vendor driven XML markup language for specifying statistical and data mining models. In other words, it is an XML language so that Continue reading “PMML Augustus”

Using Opera Unite to defeat SOPA?

Lets assume that the big bad world of American electoral politics forces some kind of modified SOPA to be passed, and the big American companies have to abide by that law (just as they do share data for National Security under Patriot Act but quitely).

I belive Opera Unite is the way forward to sharing content on the Internet.

From-

http://dev.opera.com/articles/view/opera-unite-developer-primer-revisited/

Opera Unite features a Web server running inside the Opera browser, which allows you to do some amazing things. At the touch of a button, you can share images, documents, video, music, games, collaborative applications and all manner of other things with your friends and colleagues

I can share music, and files , and the web server is actually my own laptop. try beating 2 billion new web servers that sprout!! File system sharing is totally secure- you can create private, public, or password protected files, a messaging system that can be used for drop messages (called fridge), a secure messaging system and your own web server is ready to start at a click. the open web may just use opera instead of chromium, and US regulation would be solely to blame. even URL blocking is of limited appeal thanks to software like MafiaWire Extension

Throw in Ad block, embedded bit torrent sharing and some more Tor level encryption within the browser and sorry Senator, but the internet belongs to the planet not to your lobbyist.

see-http://dev.opera.com/web

Going off Search Radar for 2012 Q1

I just used the really handy tools at

https://www.google.com/webmasters/tools/crawl-access

, clicked Remove URL

https://www.google.com/webmasters/tools/crawl-access?hl=en&siteUrl=https://decisionstats.com/&tid=removal-list

and submitted http://www.decisionstats.com

and I also modified my robots.txt file to

User-agent: *
Disallow: /

Just to make sure- I added the meta tag to each right margin of my blog

“<meta name=”robots” content=”noindex”>”

Now for last six months of 2011 as per Analytics, search engines were really generous to me- Giving almost 170 K page views,

Source Visits Pages/Visit
1. google 58,788 2.14
2. (direct) 10,832 2.24
3. linkedin.com 2,038 2.50
4. google.com 1,823 2.15
5. bing 1,007 2.04
6. reddit.com 749 1.93
7. yahoo 740 2.25
8. google.co.in 576 2.13
9. search 572 2.07

I do like to experiment though, and I wonder if search engines just –

1) Make people lazy to bookmark or type the whole website name in Chrome/Opera toolbars

2) Help disguise sources of traffic by encrypted search terms

3) Help disguise corporate traffic watchers and aggregators

So I am giving all spiders a leave for Q1 2012. I am interested in seeing impact of this on my traffic , and I suspect that the curves would not be as linear as I think.

Is search engine optimization over rated? Let the data decide…. 🙂

I am also interested in seeing how social sharing can impact traffic in the absence of search engine interaction effects- and whether it is possible to retain a bigger chunk of traffic by reducing SEO efforts and increasing social efforts!

Indian Govt tries to censor Internet

Stupidity is contiguous and Stupid Politicians are legion.

From-

http://online.wsj.com/article/SB10001424052970204542404577158342623999990.html?mod=WSJINDIA_hpp_LEFTTopStories

Google Inc. and Facebook Inc. are fighting back against increasing censorship demands from the Indian government and courts, arguing that they aren’t legally responsible for monitoring their websites and proactively removing user content that regulators deem objectionable.

The big threat for the companies at the moment is a lawsuit in a New Delhi trial court, which seeks to hold them and several other websites criminally liable for not censoring online content, including material that mocks or criticizes religious and political figures.

————————————————————————————————————————————–

One not so apparent reason for Indian Govt to censor Internet is that the internet and social media were used for massive anti-Govt and anti-corruption protests in 2011. The Govt found itself on the backfoot, newspapers and television in India are generally considered pliable and manipulable by Govt of India (thanks to ad spends).Judiciary in India is also not known to be 100% honest or resistant of political pressures.

The incumbent Congress govt needs more legal weapons in its arsenal given elections are approaching this year in many states, and the need for more arrows in legal quivers in India against the Internet is an inevitable and unfortunate next step. Since this is a global phenomenon (read- SOPA debate in US) ,and the huge huge internet population in India- this is one interesting battle to watch.

—————————————————————————————————————————————

SOPA RIP

From http://www.whitehouse.gov/blog/2012/01/14/obama-administration-responds-we-people-petitions-sopa-and-online-piracy

Any effort to combat online piracy must guard against the risk of online censorship of lawful activity and must not inhibit innovation by our dynamic businesses large and small (AJ-yup)
We must avoid creating new cybersecurity risks or disrupting the underlying architecture of the Internet. (AJ-note this may include peer-to-peer browsers, browser extensions for re-routing and newer forms of encryption, or even relocation of internet routers in newer geographies )

We must avoid legislation that drives users to dangerous, unreliable DNS servers and puts next-generation security policies, such as the deployment of DNSSEC, at risk.

While we are strongly committed to the vigorous enforcement of intellectual property rights, existing tools are not strong enough to root out the worst online pirates beyond our borders.

We should never let criminals hide behind a hollow embrace of legitimate American values

and

We should all be committed to working with all interested constituencies to develop new legal tools to protect global intellectual property rights without jeopardizing the openness of the Internet. Our hope is that you will bring enthusiasm and know-how to this important challenge

Authored by

Victoria Espinel is Intellectual Property Enforcement Coordinator at Office of Management and Budget

Aneesh Chopra is the U.S. Chief Technology Officer and Assistant to the President and Associate Director for Technology at the Office of Science and Technology Policy

Howard Schmidt is Special Assistant to the President and Cybersecurity Coordinator for National Security Staff

————————————————————————–

AJ-Why not sponser a hackathon, White House and create a monetary incentive for hackers to suggest secure ways? Atleast a secure dialogue between policy makers and policy breakers could be a way forward.

SOPA in its current form is dead. We live to fight another day.

—————————————————————————–

Quote-

Let us never negotiate out of fear. But let us never fear to negotiate. John F K

Opera Unite- the future of cloud computing browsers

The boys (and ladies) at opera have been busy writing code , while the rest of the coders on the cloud were issuing press releases, attending meetings or just sky diving from the cloud. Judging by the language of apps and extensions, it seems that the engineers de Vikings et Slavs were busy coding while the Anglo Saxons were busy preparing for IPOs.

I really like the complete anonymity offered by Opera and especially Opera Unite

1) The Adblock option blocks all ads (same as other extensions)

2) The lovely Opera Unite has incredible apps for peer to peer sharing. You can create your own spotify, host your own chat application, transfer files, remote manage your computer. C’est magnifique!

Some really awesome apps on Opera Unite

All these apps can make your own desktop into a remotely managed website- so SOPA is irrelevant even if passed without any protest or non violent protests

(SOPA- an acronym for STOP OBAMA or STOP A (?) , since OBAMA is the one the internet really supports , and he is dependent on that goodwill for fundraising or A is the acronym of a legendary media myth of an imaginary web based organization (imaginary as in iota)

QUOTE–

I think it would be a good idea.

Mahatma Gandhi, when asked what he thought of Western civilization

Some Ways Anonymous Could Disrupt the Internet if SOPA is passed

This is a piece of science fiction. I wrote while reading Isaac Assimov’s advice to writers in GOLD, while on a beach in Anjuna.

1) Identify senators, lobbyists, senior executives of companies advocating for SOPA. Go for selective targeting of these people than massive Denial of Service Attacks.

This could also include election fund raising websites in the United States.

2) Create hacking tools with simple interfaces to probe commonly known software errors, to enable wider audience including the Occupy Movement students to participate in hacking. thus making hacking more democratic. What are the top 25 errors as per http://cwe.mitre.org/cwss/

–http://www.decisionstats.com/top-25-most-dangerous-software-errors/ ?

Easy interface tools to check vulnerabilities would be the next generation to flooding tools like HOIC, LOIC – Massive DDOS atttacks make good press coverage but not so good technically

3) Disrupt digital payment mechanisms for selected targets (in step1) using tools developed in Step 2, and introduce random noise errors in payment transfers.

4) Help create a better secure internet by embedding Tor within Chromium with all tools for anonymity embedded for easy usage – a more secure peer to peer browser (like a mashup of Opera , tor and chromium).

or maybe embed bit torrents within a browser.

5) Disrupt media companies and cloud computing based companies like iTunes, Spotify or Google Music, just like virus, ant i viruses disrupted the desktop model of computing. After that offer solutions to the problems like companies of anti virus software did for decades.

6) Hacking websites is fine fun, but hacking internet databases and massively parallel data scrapers can help disrupt some of the status quo.

This applies to databases that offer data for sale, like credit bureaus etc. Making this kind of data public will eliminate data middlemen.

7) Use cross border, cross country regulatory arbitrage for better risk control of hacker attacks.

8) recruiting among universities using easy to use hacking tools to expand the pool of dedicated hacker armies.

9) using operations like those targeting child pornography to increase political acceptability of the hacker sub culture. Refrain from overtly negative and unimaginative bad Press Relations

10) If you cant convince them to pass SOPA, confuse them 😉 Use bots for random clicks on ads to confuse internet commerce.

Please share:

Please share:

Please share:

Please share:

Please share:

Please share:

Please share: