Going off Search Radar for 2012 Q1

I just used the really handy tools at

https://www.google.com/webmasters/tools/crawl-access

, clicked Remove URL

https://www.google.com/webmasters/tools/crawl-access?hl=en&siteUrl=https://decisionstats.com/&tid=removal-list

and submitted http://www.decisionstats.com

and I also modified my robots.txt file to

User-agent: *
Disallow: /

Just to make sure- I added the meta tag to each right margin of my blog

“<meta name=”robots” content=”noindex”>”

Now for last six months of 2011 as per Analytics, search engines were really generous to me- Giving almost 170 K page views,

Source                            Visits          Pages/Visit
1. google                       58,788                       2.14
2. (direct)                     10,832                       2.24
3. linkedin.com            2,038                       2.50
4. google.com                1,823                       2.15
5. bing                              1,007                      2.04
6. reddit.com                    749                       1.93
7. yahoo                              740                      2.25
8. google.co.in                  576                       2.13
9. search                             572                       2.07

 

I do like to experiment though, and I wonder if search engines just –

1) Make people lazy to bookmark or type the whole website name in Chrome/Opera  toolbars

2) Help disguise sources of traffic by encrypted search terms

3) Help disguise corporate traffic watchers and aggregators

So I am giving all spiders a leave for Q1 2012. I am interested in seeing impact of this on my traffic , and I suspect that the curves would not be as linear as I think.

Is search engine optimization over rated? Let the data decide…. 🙂

I am also interested in seeing how social sharing can impact traffic in the absence of search engine interaction effects- and whether it is possible to retain a bigger chunk of traffic by reducing SEO efforts and increasing social efforts!

 

Automatically creating tags for big blogs with WordPress

I use the simple-tags plugin in WordPress for automatically creating and posting tags. I am hoping this makes the site better to navigate. Given the fact that I had not been a very efficient tagger before, this plugin can really be useful for someone in creating tags for more than 100 (or 1000 posts) especially WordPress based blog aggregators.

 

 

The plugin is available here –

Simple Tags is the successor of Simple Tagging Plugin This is THE perfect tool to manage perfectly your WP terms for any taxonomy

It was written with this philosophy : best performances, more secured and brings a lot of new functions

This plugin is developped on WordPress 3.3, with the constant WP_DEBUG to TRUE.

  • Administration
  • Tags suggestion from Yahoo! Term Extraction API, OpenCalais, Alchemy, Zemanta, Tag The Net, Local DB with AJAX request
    • Compatible with TinyMCE, FCKeditor, WYMeditor and QuickTags
  • tags management (rename, delete, merge, search and add tags, edit tags ID)
  • Edit mass tags (more than 50 posts once)
  • Auto link tags in post content
  • Auto tags !
  • Type-ahead input tags / Autocompletion Ajax
  • Click tags
  • Possibility to tag pages (not only posts) and include them inside the tags results
  • Easy configuration ! (in WP admin)

The above plugin can be combined with the RSS Aggregator plugin for Search Engine Optimization purposes

Ajay-You can also combine this plugin with RSS auto post blog aggregator (read instructions here) and create SEO optimized Blog Aggregation or Curation

Related –http://www.decisionstats.com/creating-a-blog-aggregator-for-free/

Funny Stuff on Google Plus

Here is more funny stuff on my Google Plus stream- you can now add me to a circle using the icon in the right margin.

0) Funny Cats are here to stay

1) So what?

Continue reading “Funny Stuff on Google Plus”

Making your website cool

Some notes and thoughts on Websites ( which may be back in fashion once the social media bubble bubble  burps, I mean bursts)

0) Write Great Content. Do not write in haste. Do not revise in haste. Publish and share url only at a time when you think it will lead to views.

1) Design-Benchmarking Beauty

Bad Artists borrow, Great Artists Steal- Continue reading “Making your website cool”

Review of Google Plus

After resisting for two weeks I have decided to write a Google Plus review. This includes both the changed designed parameters, the invite growth features and all of the main sub-items and activities you can do in the G+  Stream, Share, Hang Out, Pictures, Circles.

Since I have 2500 people in my circles and I am in 91 circles

To keep it simple – I have noted the following 6 main sub-points.

1) Content Dissemination-

 

  • Sharing Blog Articles
  • Micro-Blogging
  • Sharing Pictures

2) Online Professional Networking  and 3) Online Personal Socializing

4) Spam Control / Malware /Phishing/Porn Protection

5) Time Cost versus Networking Benefit

————————————————————————————————————————————————————–

1) Content Dissemination-

  • Sharing Blog Articles

 

Sharing is as simple as Facebook but the design makes it simpler.

Note G+ uses lower number of colors, bigger fonts, slightly bigger icons to reduce the appearance of clutter.

Contrast this

with this-

 

Interesting to see that G+ has four types of media to share- besides writing the status/micro-blog (unfettered by 140 characters). Note these show icons only with hover text to tell you what the icon stands for.

Photo,Video,URL,Location (which seems to be Twitter like in every share)

Facebook has 5 types of Sharing and note the slightly different order as well the fact that both icon and text make it slightly more cluttered- Status (which is redundant clearly ),Photo,Link,Video,Question

G+ thus lacks polls /questions features. It is much easier to share content on Facebook automatically as of now- but for G+ you need to share the URL privately though. There exist G+ meme-s already thanks to re-sharing in G+ plus which seems to be inspired by Tumblr (?).

In addition Google has made your Google Profile the number one SERP for searching your name, so there seem clear tied in benefits of SEO with content disseminated here.

G+ has sharing in circles whereas Facebook has only Everyone, Friends, Friends of Friends ,Customize.  This makes G+ interface slightly better in tweaking the spread of content to targeted audience esp by Bloggers.

  • For sharing Photos– G+ goes in for a whole new separate tab (one out of four) whereas Facebook treats photo sharing less prominently.
  • Google has lesser white space between photos, (The Facebook way used to be just snap photo by iPhone and send by email to auto-post), and the privacy in sharing photos is much better in G+ as the dropdowns in Facebook are not as granular and neither as nifty in icon design.
  •  
  • Also I like the hover and photo grows bigger feature and the auto import from Picassa (but I would like to auto-import into G+ from Flickr just as I can do in Facebook)
  • Google Plus also has a much more detailed version for sharing videos than photos as compared to Facebook  -upload Photo options  versus
  • G+ has much more focus on auto-sharing from mobiles

 

 

 

2) Online Professional Networking  and 3) Online Personal Socializing Organizing Contacts in Google Plus and seperate privacy controls make it easier to customize sharing without getting too complex. You can make as many circles and drag and drop very easily instead of manually clicking a dropdown box. Effectively speaking Facebook has just 4 kinds of circles and it does not distinguish between various types of friends which is great from philosophical point of view but not so goodn enforcing separateness between professional and personal networks. Note Facebook privacy settings are overwhelming despite the groovy data viz

4) Spam Control / Malware /Phishing/Porn Protection 

Spam Control in Facebook versus in Google Plus- note the different options in Google Plus (including the ability to NOT reshare). I am not aware of more enhanced protection than the ones available for Gmail already. Spam is what really killed off a lot many social networks and the ability to control or reduce spam will be a critical design choice

5) Time Cost versus Networking Benefit

Linkedin has the lowest cost in time spent and networking done. If G+ adds a resume section for jobs, recruiters, and adds in Zynga games, the benefit from G+ will expand. As of now G+ is a minimal social network with minimalism as design ethos.

(Zynga would do well to partner with G+)

 

Calling #Rstats lovers and bloggers – to work together on “The R Programming wikibook”

so you think u like R, huh. Well it is time to pay it forward.

Message from a dear R blogger, Tal G from Tel Aviv (creator of R-bloggers.com and SAS-X.com)

———————————————————————————————————-
Calling R lovers and bloggers – to work together on “The R Programming wikibook”
Posted: 20 Jun 2011 07:05 AM PDT

This post is a call for both R community members and R-bloggers, to come and help make The R Programming wikibook be amazing:

Dear R community member – please consider giving a visit to The R Programming wikibook. If you wish to contribute your knowledge and editing skills to the project, then you could learn how to write in wiki-markup here, and how to edit a wikibook here (you can even use R syntax highlighting in the wikibook). You could take information into the site from the (soon to be) growing list of available R resources for harvesting.

Dear R blogger, you can help The R Programming wikibook by doing the following:

Write to your readers about the project and invite them to join.
Add your blog’s R content as an available resource for other editors to use for the wikibook. Here is how to do that:
First, make a clear indication on your blog that your content is licensed under cc-by-sa copyrights (*see what it means at the end of the post). You can do this by adding it to the footer of your blog, or by writing a post that clearly states that this is the case (what a great opportunity to write to your readers about the project…).
Next, go and add a link, to where all of your R content is located on your site, to the resource page (also with a link to the license post, if you wrote one). For example, since I write about other things besides R, I would give a link to my R category page, and will also give a link to this post. If you do not know how to add it to the wiki, just e-mail me about it (tal.galili@gmail.com).
If you are an R blogger, besides living up to the spirit of the R community, you will benefit from joining this project in that every time someone will use your content on the wikibook, they will add your post as a resource. In the long run, this is likely to help visitors of the site get to know about you and strengthen your site’s SEO ranking. Which reminds me, if you write about this, I always appreciate a link back to my blog

* Having a cc-by-sa copyrights means that you will agree that anyone may copy, distribute, display, and make derivative works based on your content, only if they give the author (you) the credits in the manner specified by you. And also that the user may distribute derivative works only under a license identical to the license that governs the original work.

———-

Three more points:

1) This post is a result of being contacted by Paul (a.k.a: PAC2), asking if I could help promote “The R Programming wikibook” among R-bloggers and their readers. Paul has made many contributions to the book so far. So thank you Paul for both reaching out and helping all of us with your work on this free open source project.

2) I should also mention that the R wiki exists and is open for contribution. And naturally, every thing that will help the R wikibook will help the R wiki as well.

3) Copyright notice: I hereby release all of the writing material content that is categoriesed in the R category page, under the cc-by-sa copyrights (date: 20.06.2011). Now it’s your turn!

———-

List of R bloggers who have joined: (This list will get updated as this “group writing” project will progress)

R-statistics blog (that’s Tal…)
Decisionstats.com (That’s me)
……………………………………………………………………………….
3) Copyright notice: I hereby release all of the writing material content of this website, under the cc-by-sa copyrights (date: 21.06.2011). Now it’s your turn!

https://decisionstats.com/privacy-3/

Content Licensing-
This website has all content licensed under
http://creativecommons.org/licenses/by-sa/3.0/
You are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work

Interview- Top Data Mining Blogger on Earth , Sandro Saitta

Surajustement Modèle 2
Image via Wikipedia

If you do a Google search for Data Mining Blog- for the past several years one Blog will come on top. data mining blog – Google Search http://bit.ly/kEdPlE

To honor 5 years of Sandro Saitta’s blog (yes thats 5 years!) , we cover an exclusive interview with him where he reveals his unique sauce for cool techie blogging.

Ajay- Describe your journey as a scientist and data miner, from early experiences, to schooling to your work/research/blogging.

Sandro- My first experience with data mining was my master project. I used decision tree to predict pollen concentration for the following week using input data such as wind, temperature and rain. The fact that an algorithm can make a computer learn from experience was really amazing to me. I found it so interesting that I started a PhD in data mining. This time, the field of application was civil engineering. Civil engineers put a lot of sensors on their structure in order to understand how they behave. With all these sensors they generate a lot of data. To interpret these data, I used data mining techniques such as feature selection and clustering. I started my blog, Data Mining Research, during my PhD, to share with other researchers.

I then started applying data mining in the stock market as my first job in industry. I realized the difference between image recognition, where 99% correct classification rate is state of the art, and stock market, where you’re happy with 55%. However, the company ambiance was not as good as I thought, so I moved to consulting. There, I applied data mining in behavioral targeting to increase click-through rates. When you compare the number of customers who click with the ones who don’t, then you really understand what class imbalance mean. A few months ago, I accepted a very good opportunity at SICPA. I’m looking forward to resolving new challenges there.

Ajay- Your blog is the top ranked blog for “data mining blog”. Could you share some tips on better blogging for analytics and technical people

Sandro- It’s always difficult to start a blog, since at the beginning you have no reader. Writing for nobody may seem stupid, but it is not. By writing my first posts during my PhD I was reorganizing my ideas. I was expressing concepts which were not always clear to me. I thus learned a lot and also improved my English level. Of course, it’s still not perfect, but I hope most people can understand me.

Next come the readers. A few dozen each week first. To increase this number, I then started to learn SEO (Search Engine Optimization) by reading books and blogs. I tested many techniques that increased Data Mining Research visibility in the blogosphere. I think SEO is interesting when you already have some content published (which means not at the very beginning of your blog). After a while, once your blog is nicely ranked, the main task is to work on the content of the blog. To be of interest, your content must be particular: original, informative or provocative for example. I also had the chance to have a good visibility thanks to well-known people in the field like Kevin Hillstrom, Gregory Piatetsky-Shapiro, Will Dwinnell / Dean Abbott, Vincent Granville, Matthew Hurst and many others.

Ajay- Whats your favorite statistical software and what are the various softwares that you have worked with.
Could you compare and contrast these software as well.

Sandro- My favorite software at this point is SAS. I worked with it for two years. Once you know the language, you can perform ETL and data mining so easily. It’s also very fast compared to others. There are a lot of tools for data mining, but I cannot think of a tool that is as powerful as SAS and, in the same time, has a high-level programming language behind it.

I also worked with R and Matlab. R is very nice since you have all the up-to-date data mining algorithms implemented. However, working in the memory is not always a good choice, especially for ETL. Matlab is an excellent tool for prototyping. It’s not so fast and certainly not done for ETL, but the price is low regarding all the possibilities for data mining. According to me, SAS is the best choice for ETL and a good choice for data mining. Of course, there is the price.

Ajay- What are your favorite techniques and training resources for learning basics of data mining to say statisticians or business management graduates.

Sandro- I’m the kind of guy who likes to read books. I read data mining books one after the other. The fact that the same concepts are explained differently (and by different people) helps a lot in learning a topic like data mining. Of course, nothing replaces experience in the field. You can read hundreds of books, you will still not be a good practitioner until you really apply data mining in specific fields. My second choice after books is blogs. By reading data mining blogs, you will really see the issues and challenges in the field. It’s still not experience, but we are closer. Finally, web resources and networks such as KDnuggets of course, but also AnalyticBridge and LinkedIn.

Ajay- Describe your hobbies and how they help you ,if at all in your professional life.

Sandro- One of my hobbies is reading. I read a lot of books about data mining, SEO, Google as well as Sci-Fi and Fantasy. I’m a big fan of Asimov by the way. My other hobby is playing tennis. I think I simply use my hobbies as a way to find equilibrium in my life. I always try to find the best balance between work, family, friends and sport.

Ajay- What are your plans for your website for 2011-2012.

Sandro- I will continue to publish guest posts and interviews. I think it is important to let other people express themselves about data mining topics. I will not write about my current applications due to the policies of my current employer. But don’t worry, I still have a lot to write, whether it is technical or not. I will also emphasis more on my experience with data mining, advices for data miners, tips and tricks, and of course book reviews!

Standard Disclosure of Blogging- Sandro awarded me the Peoples Choice award for his blog for 2010 and carried out my interview. There is a lot of love between our respective wordpress blogs, but to reassure our puritan American readers- it is platonic and intellectual.

About Sandro S-



Sandro Saitta is a Data Mining Research Engineer at SICPA Security Solutions. He is also a blogger at Data Mining Research (www.dataminingblog.com). His interests include data mining, machine learning, search engine optimization and website marketing.

You can contact Mr Saitta at his Twitter address- 

https://twitter.com/#!/dataminingblog