Is Random Poetry Click Fraud

Meta-search-vi
Image via Wikipedia

Is poetry when randomized

Tweaked, meta tagged , search engine optimized

Violative of unseen terms and conditional clauses

Is random poetry or aggregated prose farmed for click fraud uses

 

 

 

I dont know, you tell me, says the blog boy,

Tapping away at the keyboard like a shiny new toy,

Geeks unfortunately too often are men too many,

Forgive the generalization, but the tech world is yet to be equalized.

 

If a New York Hot Dog  is a slice of heaven at four bucks a piece

Then why is prose and poetry at five bucks an hour considered waste

Ah I see, you have grown old and cynical,

Of the numerous stupid internet capers and cyber ways

 

The clicking finger clicks on

swiftly but mostly delightfully virally moves on

While people collect its trails and

ponder its aggregated merry ways

 

All people are equal but all links are not,

Thus overturning two centuries of psychology had you been better taught,

But you chose to drop out of school, and create that search engine so big

It is now a fraud catchers head ache that millions try to search engine optimize and rig

 

Once again, people are different, in so many ways so prettier

Links are the same hyper linked code number five or earlier

People think like artificial artificial (thus natural) neural nets

Biochemically enhanced Harmonically possessed.

 

rather than  analyze forensically and quite creepily

where people have been

Gentic Algorithms need some chaos

To see what till now hasnt been seen.

 

Again this was a random poem,

inspired by a random link that someone clicked

To get here, on a carbon burning cyber machine,

Having digested poem, moves on, unheard , unseen.

(Inspired by the Hyper Link at http://goo.gl/a8ijW )

Also-

Common Analytical Tasks

WorldWarII-DeathsByCountry-Barchart
Image via Wikipedia

 

Some common analytical tasks from the diary of the glamorous life of a business analyst-

1) removing duplicates from a dataset based on certain key values/variables
2) merging two datasets based on a common key/variable/s
3) creating a subset based on a conditional value of a variable
4) creating a subset based on a conditional value of a time-date variable
5) changing format from one date time variable to another
6) doing a means grouped or classified at a level of aggregation
7) creating a new variable based on if then condition
8) creating a macro to run same program with different parameters
9) creating a logistic regression model, scoring dataset,
10) transforming variables
11) checking roc curves of model
12) splitting a dataset for a random sample (repeatable with random seed)
13) creating a cross tab of all variables in a dataset with one response variable
14) creating bins or ranks from a certain variable value
15) graphically examine cross tabs
16) histograms
17) plot(density())
18)creating a pie chart
19) creating a line graph, creating a bar graph
20) creating a bubbles chart
21) running a goal seek kind of simulation/optimization
22) creating a tabular report for multiple metrics grouped for one time/variable
23) creating a basic time series forecast

and some case studies I could think of-

 

As the Director, Analytics you have to examine current marketing efficiency as well as help optimize sales force efficiency across various channels. In addition you have to examine multiple sales channels including inbound telephone, outgoing direct mail, internet email campaigns. The datawarehouse is an RDBMS but it has multiple data quality issues to be checked for. In addition you need to submit your budget estimates for next year’s annual marketing budget to maximize sales return on investment.

As the Director, Risk you have to examine the overdue mortgages book that your predecessor left you. You need to optimize collections and minimize fraud and write-offs, and your efforts would be measured in maximizing profits from your department.

As a social media consultant you have been asked to maximize social media analytics and social media exposure to your client. You need to create a mechanism to report particular brand keywords, as well as automated triggers between unusual web activity, and statistical analysis of the website analytics metrics. Above all it needs to be set up in an automated reporting dashboard .

As a consultant to a telecommunication company you are asked to monitor churn and review the existing churn models. Also you need to maximize advertising spend on various channels. The problem is there are a large number of promotions always going on, some of the data is either incorrectly coded or there are interaction effects between the various promotions.

As a modeller you need to do the following-
1) Check ROC and H-L curves for existing model
2) Divide dataset in random splits of 40:60
3) Create multiple aggregated variables from the basic variables

4) run regression again and again
5) evaluate statistical robustness and fit of model
6) display results graphically
All these steps can be broken down in little little pieces of code- something which i am putting down a list of.
Are there any common data analysis tasks that you think I am missing out- any common case studies ? let me know.

 

 

 

Interview Ajay Ohri Decisionstats.com with DMR

From-

http://www.dataminingblog.com/data-mining-research-interview-ajay-ohri/

Here is the winner of the Data Mining Research People Award 2010: Ajay Ohri! Thanks to Ajay for giving some time to answer Data Mining Research questions. And all the best to his blog, Decision Stat!

Data Mining Research (DMR): Could you please introduce yourself to the readers of Data Mining Research?

Ajay Ohri (AO): I am a business consultant and writer based out of Delhi- India. I have been working in and around the field of business analytics since 2004, and have worked with some very good and big companies primarily in financial analytics and outsourced analytics. Since 2007, I have been writing my blog at http://decisionstats.com which now has almost 10,000 views monthly.

All in all, I wrote about data, and my hobby is also writing (poetry). Both my hobby and my profession stem from my education ( a masters in business, and a bachelors in mechanical engineering).

My research interests in data mining are interfaces (simpler interfaces to enable better data mining), education (making data mining less complex and accessible to more people and students), and time series and regression (specifically ARIMAX)
In business my research interests software marketing strategies (open source, Software as a service, advertising supported versus traditional licensing) and creation of technology and entrepreneurial hubs (like Palo Alto and Research Triangle, or Bangalore India).

DMR: I know you have worked with both SAS and R. Could you give your opinion about these two data mining tools?

AO: As per my understanding, SAS stands for SAS language, SAS Institute and SAS software platform. The terms are interchangeably used by people in industry and academia- but there have been some branding issues on this.
I have not worked much with SAS Enterprise Miner , probably because I could not afford it as business consultant, and organizations I worked with did not have a budget for Enterprise Miner.
I have worked alone and in teams with Base SAS, SAS Stat, SAS Access, and SAS ETS- and JMP. Also I worked with SAS BI but as a user to extract information.
You could say my use of SAS platform was mostly in predictive analytics and reporting, but I have a couple of projects under my belt for knowledge discovery and data mining, and pattern analysis. Again some of my SAS experience is a bit dated for almost 1 year ago.

I really like specific parts of SAS platform – as in the interface design of JMP (which is better than Enterprise Guide or Base SAS ) -and Proc Sort in Base SAS- I guess sequential processing of data makes SAS way faster- though with computing evolving from Desktops/Servers to even cheaper time shared cloud computers- I am not sure how long Base SAS and SAS Stat can hold this unique selling proposition.

I dislike the clutter in SAS Stat output, it confuses me with too much information, and I dislike shoddy graphics in the rendering output of graphical engine of SAS. Its shoddy coding work in SAS/Graph and if JMP can give better graphics why is legacy source code preventing SAS platform from doing a better job of it.

I sometimes think the best part of SAS is actually code written by Goodnight and Sall in 1970’s , the latest procs don’t impress me much.

SAS as a company is something I admire especially for its way of treating employees globally- but it is strange to see the rest of tech industry not following it. Also I don’t like over aggression and the SAS versus Rest of the Analytics /Data Mining World mentality that I sometimes pick up when I deal with industry thought leaders.

I think making SAS Enterprise Miner, JMP, and Base SAS in a completely new web interface priced at per hour rates is my wishlist but I guess I am a bit sentimental here- most data miners I know from early 2000’s did start with SAS as their first bread earning software. Also I think SAS needs to be better priced in Business Intelligence- it seems quite cheap in BI compared to Cognos/IBM but expensive in analytical licensing.

If you are a new stats or business student, chances are – you may know much more R than SAS today. The shift in education at least has been very rapid, and I guess R is also more of a platform than a analytics or data mining software.

I like a lot of things in R- from graphics, to better data mining packages, modular design of software, but above all I like the can do kick ass spirit of R community. Lots of young people collaborating with lots of young to old professors, and the energy is infectious. Everybody is a CEO in R ’s world. Latest data mining algols will probably start in R, published in journals.

Which is better for data mining SAS or R? It depends on your data and your deadline. The golden rule of management and business is -it depends.

Also I have worked with a lot of KXEN, SQL, SPSS.

DMR: Can you tell us more about Decision Stats? You have a traffic of 120′000 for 2010. How did you reach such a success?

AO: I don’t think 120,000 is a success. Its not a failure. It just happened- the more I wrote, the more people read.In 2007-2008 I used to obsess over traffic. I tried SEO, comments, back linking, and I did some black hat experimental stuff. Some of it worked- some didn’t.

In the end, I started asking questions and interviewing people. To my surprise, senior management is almost always more candid , frank and honest about their views while middle managers, public relations, marketing folks can be defensive.

Social Media helped a bit- Twitter, Linkedin, Facebook really helped my network of friends who I suppose acted as informal ambassadors to spread the word.
Again I was constrained by necessity than choices- my middle class finances ( I also had a baby son in 2007-my current laptop still has some broken keys :) – by my inability to afford traveling to conferences, and my location Delhi isn’t really a tech hub.

The more questions I asked around the internet, the more people responded, and I wrote it all down.

I guess I just was lucky to meet a lot of nice people on the internet who took time to mentor and educate me.

I tried building other websites but didn’t succeed so i guess I really don’t know. I am not a smart coder, not very clever at writing but I do try to be honest.

Basic economics says pricing is proportional to demand and inversely proportional to supply. Honest and candid opinions have infinite demand and an uncertain supply.

DMR: There is a rumor about a R book you plan to publish in 2011 :-) Can you confirm the rumor and tell us more?

AO: I just signed a contract with Springer for ” R for Business Analytics”. R is a great software, and lots of books for statistically trained people, but I felt like writing a book for the MBAs and existing analytics users- on how to easily transition to R for Analytics.

Like any language there are tricks and tweaks in R, and with a focus on code editors, IDE, GUI, web interfaces, R’s famous learning curve can be bent a bit.

Making analytics beautiful, and simpler to use is always a passion for me. With 3000 packages, R can be used for a lot more things and a lot more simply than is commonly understood.
The target audience however is business analysts- or people working in corporate environments.

Brief Bio-
Ajay Ohri has been working in the field of analytics since 2004 , when it was a still nascent emerging Industries in India. He has worked with the top two Indian outsourcers listed on NYSE,and with Citigroup on cross sell analytics where he helped sell an extra 50000 credit cards by cross sell analytics .He was one of the very first independent data mining consultants in India working on analytics products and domestic Indian market analytics .He regularly writes on analytics topics on his web site www.decisionstats.com and is currently working on open source analytical tools like R besides analytical software like SPSS and SAS.

How to balance your online advertising and your offline conscience

Google in 1998, showing the original logo
Image via Wikipedia

I recently found an interesting example of  a website that both makes a lot of money and yet is much more efficient than any free or non profit. It is called ECOSIA

If you see a website that wants to balance administrative costs  plus have a transparent way to make the world better- this is a great example.

  • http://ecosia.org/how.php
  • HOW IT WORKS
    You search with Ecosia.
  • Perhaps you click on an interesting sponsored link.
  • The sponsoring company pays Bing or Yahoo for the click.
  • Bing or Yahoo gives the bigger chunk of that money to Ecosia.
  • Ecosia donates at least 80% of this income to support WWF’s work in the Amazon.
  • If you like what we’re doing, help us spread the word!
  • Key facts about the park:

    • World’s largest tropical forest reserve (38,867 square kilometers, or about the size of Switzerland)
    • Home to about 14% of all amphibian species and roughly 54% of all bird species in the Amazon – not to mention large populations of at least eight threatened species, including the jaguar
    • Includes part of the Guiana Shield containing 25% of world’s remaining tropical rainforests – 80 to 90% of which are still pristine
    • Holds the last major unpolluted water reserves in the Neotropics, containing approximately 20% of all of the Earth’s water
    • One of the last tropical regions on Earth vastly unaltered by humans
    • Significant contributor to climatic regulation via heat absorption and carbon storage

     

    http://ecosia.org/statistics.php

    They claim to have donated 141,529.42 EUR !!!

    http://static.ecosia.org/files/donations.pdf

     

     

     

     

     

     

     

     

     

     

    Well suppose you are the Web Admin of a very popular website like Wikipedia or etc

    One way to meet server costs is to say openly hey i need to balance my costs so i need some money.

    The other way is to use online advertising.

    I started mine with Google Adsense.

    Click per milli (or CPM)  gives you a very low low conversion compared to contacting ad sponsor directly.

    But its a great data experiment-

    as you can monitor which companies are likely to be advertised on your site (assume google knows more about their algols than you will)

    which formats -banner or text or flash have what kind of conversion rates

    what are the expected pay off rates from various keywords or companies (like business intelligence software, predictive analytics software and statistical computing software are similar but have different expected returns (if you remember your eco class)

     

    NOW- Based on above data, you know whats your minimum baseline to expect from a private advertiser than a public, crowd sourced search engine one (like Google or Bing)

    Lets say if you have 100000 views monthly. and assume one out of 1000 page views will lead to a click. Say the advertiser will pay you 1 $ for every 1 click (=1000 impressions)

    Then your expected revenue is $100.But if your clicks are priced at 2.5$ for every click , and your click through rate is now 3 out of 1000 impressions- (both very moderate increases that can done by basic placement optimization of ad type, graphics etc)-your new revenue is  750$.

    Be a good Samaritan- you decide to share some of this with your audience -like 4 Amazon books per month ( or I free Amazon book per week)- That gives you a cost of 200$, and leaves you with some 550$.

    Wait! it doesnt end there- Adam Smith‘s invisible hand moves on .

    You say hmm let me put 100 $ for an annual paper writing contest of $1000, donate $200 to one laptop per child ( or to Amazon rain forests or to Haiti etc etc etc), pay $100 to your upgraded server hosting, and put 350$ in online advertising. say $200 for search engines and $150 for Facebook.

    Woah!

    Month 1 would should see more people  visiting you for the first time. If you have a good return rate (returning visitors as a %, and low bounce rate (visits less than 5 secs)- your traffic should see atleast a 20% jump in new arrivals and 5-10 % in long term arrivals. Ignoring bounces- within  three months you will have one of the following

    1) An interesting case study on statistics on online and social media advertising, tangible motivations for increasing community response , and some good data for study

    2) hopefully better cost management of your server expenses

    3)very hopefully a positive cash flow

     

    you could even set a percentage and share the monthly (or annually is better actions) to your readers and advertisers.

    go ahead- change the world!

    the key paradigms here are sharing your traffic and revenue openly to everyone

    donating to a suitable cause

    helping increase awareness of the suitable cause

    basing fixed percentages rather than absolute numbers to ensure your site and cause are sustained for years.

    The Year 2010

    Nokia N800 internet tablet, with open source s...
    Image via Wikipedia

    My annual traffic to this blog was almost 99,000 . Add in additional views on networking sites plus the 400 plus RSS readers- so I can say traffic was 1,20,000 for 2010. Nice. Thanks for reading and hope it was worth your time. (this is a long post and will take almost 440 secs to read but the summary is just given)

    My intent is either to inform you, give something useful or atleast something interesting.

    see below-

    Jan Feb Mar Apr May Jun
    2010 6,311 4,701 4,922 5,463 6,493 4,271
    Jul Aug Sep Oct Nov Dec Total
    5,041 5,403 17,913 16,430 11,723 10,096 98,767

     

     

    Sandro Saita from http://www.dataminingblog.com/ just named me for an award on his blog (but my surname is ohRi , Sandro left me without an R- What would I be without R :)) ).

    Aw! I am touched. Google for “Data Mining Blog” and Sandro is the best that it is in data mining writing.

    DMR People Award 2010
    There are a lot of active people in the field of data mining. You can discuss with them on forums. You can read their blogs. You can also meet them in events such as PAW or KDD. Among the people I follow on a regular basis, I have elected:

    Ajay Ori

    He has been very active in 2010, especially on his blog . Good work Ajay and continue sharing your experience with us!”

    What did I write in 2010- stuff.

    What did you read on this blog- well thats the top posts list.

    2009-12-31 to Today

    Title Views
    Home page More stats 21,150
    Top 10 Graphical User Interfaces in Statistical Software More stats 6,237
    Wealth = function (numeracy, memory recall) More stats 2,014
    Matlab-Mathematica-R and GPU Computing More stats 1,946
    The Top Statistical Softwares (GUI) More stats 1,405
    About DecisionStats More stats 1,352
    Using Facebook Analytics (Updated) More stats 1,313
    Test drive a Chrome notebook. More stats 1,170
    Top ten RRReasons R is bad for you ? More stats 1,157
    Libre Office More stats 1,151
    Interview Hadley Wickham R Project Data Visualization Guru More stats 1,007
    Using Red R- R with a Visual Interface More stats 854
    SAS Institute files first lawsuit against WPS- Episode 1 More stats 790
    Interview Professor John Fox Creator R Commander More stats 764
    R Package Creating More stats 754
    Windows Azure vs Amazon EC2 (and Google Storage) More stats 726
    Norman Nie: R GUI and More More stats 716
    Startups for Geeks More stats 682
    Google Maps – Jet Ski across Pacific Ocean More stats 670
    Not so AWkward after all: R GUI RKWard More stats 579
    Red R 1.8- Pretty GUI More stats 570
    Parallel Programming using R in Windows More stats 569
    R is an epic fail or is it just overhyped More stats 559
    Enterprise Linux rises rapidly:New Report More stats 537
    Rapid Miner- R Extension More stats 518
    Creating a Blog Aggregator for free More stats 504
    So which software is the best analytical software? Sigh- It depends More stats 473
    Revolution R for Linux More stats 465
    John Sall sets JMP 9 free to tango with R More stats 460

    So how do people come here –

    well I guess I owe Tal G for almost 9000 views ( incidentally I withdrew posting my blog from R- Bloggers and Analyticbridge blogs – due to SEO keyword reasons and some spam I was getting see (below))

    http://r-bloggers.com is still the CAT’s whiskers and I read it  a lot.

    I still dont know who linked my blog to a free sex movie site with 400 views but I have a few suspects.

    2009-12-31 to Today

    Referrer Views
    r-bloggers.com 9,131
    Reddit 3,829
    rattle.togaware.com 1,500
    Twitter 1,254
    Google Reader 1,215
    linkedin.com 717
    freesexmovie.irwanaf.com 422
    analyticbridge.com 341
    Google 327
    coolavenues.com 322
    Facebook 317
    kdnuggets.com 298
    dataminingblog.com 278
    en.wordpress.com 185
    google.co.in 151
    xianblog.wordpress.com 130
    inside-r.org 124
    decisionstats.com 119
    ifreestores.com 117
    bits.blogs.nytimes.com 108

    Still reading this post- gosh let me sell you some advertising. It is only $100 a month (yes its a recession)

    Advertisers are treated on First in -Last out (FILO)

    I have been told I am obsessed with SEO , but I dont care much for search engines apart from Google, and yes SEO is an interesting science (they should really re name it GEO or Google Engine Optimization)

    Apparently Hadley Wickham and Donald Farmer are big keywords for me so I should be more respectful I guess.

    Search Terms for 365 days ending 2010-12-31 (Summarized)

    2009-12-31 to Today

    Search Views
    libre office 925
    facebook analytics 798
    test drive a chrome notebook 467
    test drive a chrome notebook. 215
    r gui 203
    data mining 163
    wps sas lawsuit 158
    wordle.net 133
    wps sas 123
    google maps jet ski 123
    test drive chrome notebook 96
    sas wps 89
    sas wps lawsuit 85
    chrome notebook test drive 83
    decision stats 83
    best statistics software 74
    hadley wickham 72
    google maps jetski 72
    libreoffice 70
    doug savage 65
    hive tutorial 58
    funny india 56
    spss certification 52
    donald farmer microsoft 51
    best statistical software 49

    What about outgoing links? Apparently I need to find a way to ask Google to pay me for the free advertising I gave their chrome notebook launch. But since their search engine and browser is free to me, guess we are even steven.

    Clicks for 365 days ending 2010-12-31 (Summarized)

    2009-12-31 to Today

    URL Clicks
    rattle.togaware.com 378
    facebook.com/Decisionstats 355
    rapid-i.com/content/view/182/196 319
    services.google.com/fb/forms/cr48basic 313
    red-r.org 228
    decisionstats.wordpress.com/2010/05/07/the-top-statistical-softwares-gui 199
    teamwpc.co.uk/products/wps 162
    r4stats.com/popularity 148
    r-statistics.com/2010/04/r-and-the-google-summer-of-code-2010-accepted-students-and-projects 138
    socserv.mcmaster.ca/jfox/Misc/Rcmdr 138
    spss.com/certification 116
    learnr.wordpress.com 114
    dudeofdata.com/decisionstats 108
    r-project.org 107
    documentfoundation.org/faq 104
    goo.gl/maps/UISY 100
    inside-r.org/download 96
    en.wikibooks.org/wiki/R_Programming 92
    nytimes.com/external/readwriteweb/2010/12/07/07readwriteweb-report-google-offering-chrome-notebook-test-11919.html 92
    sourceforge.net/apps/mediawiki/rkward/index.php?title=Main_Page 92
    analyticdroid.togaware.com 88
    yeroon.net/ggplot2 87

    so in 2010,

    SAS remained top daddy in business analytics,

    R made revolutionary strides in terms of new packages,

    JMP  launched a new version,

    SPSS got integrated with Cognos,

    Oracle sued Google and did build a great Data Mining GUI,

    Libre Office gave you a non Oracle Open office ( or open even more office)

    2011 looks like  a fun year. Have safe partying .

    New Google Ad Planner

    Dusan's User Interface challenge
    Image by moggs oceanlane via Flickr

    The new Google Ad Planner is really nice-seems better than old Adwords interface, though needs a UI redesign before it can complete with the clean cut slice and dice of Facebook Ad Planner.

    It’s the interface, stupid that makes an Iphone sell more than the Symbian even with 90% functionality. Same reasons why Google Storage is okay but Google Prediction API gets slower liftoff than Amazon Console (now with FREE instances) – though the R interface to Prediction API sure helps.

    Prediction API is a terrific tool dying for oxygen out there (and will end up like Wave- I hope not)

    Sometimes you need artists as well as engineers to design query tools, G Men- and guess the Double Click anti trust rumours have quietened down enough because why the heck did double click interface integration take so loooong.

    ( and btw why cant Google just get into the multi billion dashboard business if they can manage ALL the data IN THE INTERNET ——they sure can do it for specific companies- – but wait-

    they are probably waiting for AsterData to stop sucking thumbs ,chanting on MapReduce SQL,  MapReduce SQL nursery rhymes and start inventing NEW STUFF again (or atleast creating two product brands from nCluster (when you and I were in school together giggle)

    Btw the time Google make up their mind to enter BI or wait for Aster to finish- IBM would have gulped and burped all there it is- and thats the way that market rolls.

    Back to Ad s and Mad Men.

    Here are some screenshots-of the new Google Ad Planner-

    I found it useful to review traffic for third party websites (even better than Google Trends) and thats a definite plus over Facebooks closed dormitory world of ads.

    Click on them for some more views or go straight to http://google.com/adplanner and Enjoy Baby!

    Which websites attract your target customers?

    View a site listing: 

    Ad Planner top 1,000 sites

    Refine your online advertising with DoubleClick Ad Planner, a free media planning tool that can help you:

    Identify websites your target customers are likely to visit

    • Define audiences by demographics and interests.
    • Search for websites relevant to your target audience.
    • Access unique users, page views, and other data for millions of websites from over 40 countries.

    Easily build media plans for yourself or your clients

    • Create lists of websites where you’d like to advertise.
    • Generate aggregated website statistics for your media plan.

    and

    Take charge of your DoubleClick Ad Planner site listing

    View a site listing: 

    Ad Planner top 1,000 sites

    DoubleClick Ad Planner is a media planning tool where advertisers find sites for their media buys. As a site owner, you can access the DoubleClick Ad Planner Publisher Center and
    Market your site
    Write a site description to present your audience and unique value to advertisers.
    Help advertisers search for you
    Choose categories for your site and ad formats you support.
    Improve the data that advertisers see
    Share your Google Analytics data to reflect the most accurate traffic numbers for your site.

     

    Stuff I like to Read to Kush: Kush's Blog

    RSS
    Image via Wikipedia

    I am putting together a list of top 500 Blogs on –

     

    Some additional points-

    • I like YCombinator‘s Hacker News– so the auto parsed links are like that on main page. They lead to original websites.
    • Comments are disabled, feed is jumbled, only 40 word excerpts are shown.
    • Intent is also to show open source blogs and enterprise blogs at same time (regardless of advertising by vendors 😉 )
    • If your blog feed is there, I will keep it there – either dont write or dont use RSS if you dont want to share
    • If your blog feed is not there, it is probably not there for a reason.
    • No ads will be shown NOW or FOREVER on that site.

    And after all that noise- you can see Kush’s Blog –http://www.kushohri.com/