The best of Google Plus this week

Its been slightly over a month- and I noticed Google Plus stream is now getting to look like my Facebook stream as more of my friends join up. However there is no (share this on Google Plus button still!)

Top Meme’s this week on Google Plus

1) Points of View

Continue reading “The best of Google Plus this week”

Best of Google Plus-Week 2-Top 1/0

Stuff I like from week  2 of Google Plus meme- animated GIFS,jokes,nice photos  are just some of them-

Here is week 1 in case you missed it

https://decisionstats.com/best-of-google-plus-week-1-top10/

 

Continue reading “Best of Google Plus-Week 2-Top 1/0”

Review of Google Plus

After resisting for two weeks I have decided to write a Google Plus review. This includes both the changed designed parameters, the invite growth features and all of the main sub-items and activities you can do in the G+  Stream, Share, Hang Out, Pictures, Circles.

Since I have 2500 people in my circles and I am in 91 circles

To keep it simple – I have noted the following 6 main sub-points.

1) Content Dissemination-

 

  • Sharing Blog Articles
  • Micro-Blogging
  • Sharing Pictures

2) Online Professional Networking  and 3) Online Personal Socializing

4) Spam Control / Malware /Phishing/Porn Protection

5) Time Cost versus Networking Benefit

————————————————————————————————————————————————————–

1) Content Dissemination-

  • Sharing Blog Articles

 

Sharing is as simple as Facebook but the design makes it simpler.

Note G+ uses lower number of colors, bigger fonts, slightly bigger icons to reduce the appearance of clutter.

Contrast this

with this-

 

Interesting to see that G+ has four types of media to share- besides writing the status/micro-blog (unfettered by 140 characters). Note these show icons only with hover text to tell you what the icon stands for.

Photo,Video,URL,Location (which seems to be Twitter like in every share)

Facebook has 5 types of Sharing and note the slightly different order as well the fact that both icon and text make it slightly more cluttered- Status (which is redundant clearly ),Photo,Link,Video,Question

G+ thus lacks polls /questions features. It is much easier to share content on Facebook automatically as of now- but for G+ you need to share the URL privately though. There exist G+ meme-s already thanks to re-sharing in G+ plus which seems to be inspired by Tumblr (?).

In addition Google has made your Google Profile the number one SERP for searching your name, so there seem clear tied in benefits of SEO with content disseminated here.

G+ has sharing in circles whereas Facebook has only Everyone, Friends, Friends of Friends ,Customize.  This makes G+ interface slightly better in tweaking the spread of content to targeted audience esp by Bloggers.

  • For sharing Photos– G+ goes in for a whole new separate tab (one out of four) whereas Facebook treats photo sharing less prominently.
  • Google has lesser white space between photos, (The Facebook way used to be just snap photo by iPhone and send by email to auto-post), and the privacy in sharing photos is much better in G+ as the dropdowns in Facebook are not as granular and neither as nifty in icon design.
  •  
  • Also I like the hover and photo grows bigger feature and the auto import from Picassa (but I would like to auto-import into G+ from Flickr just as I can do in Facebook)
  • Google Plus also has a much more detailed version for sharing videos than photos as compared to Facebook  -upload Photo options  versus
  • G+ has much more focus on auto-sharing from mobiles

 

 

 

2) Online Professional Networking  and 3) Online Personal Socializing Organizing Contacts in Google Plus and seperate privacy controls make it easier to customize sharing without getting too complex. You can make as many circles and drag and drop very easily instead of manually clicking a dropdown box. Effectively speaking Facebook has just 4 kinds of circles and it does not distinguish between various types of friends which is great from philosophical point of view but not so goodn enforcing separateness between professional and personal networks. Note Facebook privacy settings are overwhelming despite the groovy data viz

4) Spam Control / Malware /Phishing/Porn Protection 

Spam Control in Facebook versus in Google Plus- note the different options in Google Plus (including the ability to NOT reshare). I am not aware of more enhanced protection than the ones available for Gmail already. Spam is what really killed off a lot many social networks and the ability to control or reduce spam will be a critical design choice

5) Time Cost versus Networking Benefit

Linkedin has the lowest cost in time spent and networking done. If G+ adds a resume section for jobs, recruiters, and adds in Zynga games, the benefit from G+ will expand. As of now G+ is a minimal social network with minimalism as design ethos.

(Zynga would do well to partner with G+)

 

Contribution to #Rstats by Revolution

I have been watching for Revolution Analytics product almost since the inception of the company. It has managed to sail over storms, naysayers and critics with simple and effective strategy of launching good software, making good partnerships and keeping up media visibility with white papers, joint webinars, blogs, conferences and events.

However this is a listing of all technical contributions made by Revolution Analytics products to the #rstats project.

1) Useful Packages mostly in parallel processing or more efficient computing like

 

2) RevoScaler package to beat R’s memory problem (this is probably the best in my opinion as it is yet to be replicated by the open source version and is a clear cut reason for going in for the paid version)

http://www.revolutionanalytics.com/products/enterprise-big-data.php

  • Efficient XDF File Format designed to efficiently handle huge data sets.
  • Data Step Functionality to quickly clean, transform, explore, and visualize huge data sets.
  • Data selection functionality to store huge data sets out of memory, and select subsets of rows and columns for in-memory operation with all R functions.
  • Visualize Large Data sets with line plots and histograms.
  • Built-in Statistical Algorithms for direct analysis of huge data sets:
    • Summary Statistics
    • Linear Regression
    • Logistic Regression
    • Crosstabulation
  • On-the-fly data transformations to include derived variables in models without writing new data files.
  • Extend Existing Analyses by writing user- defined R functions to “chunk” through huge data sets.
  • Direct import of fixed-format text data files and SAS data sets into .xdf format

 

3) RevoDeploy R for  API based R solution – I somehow think this feature will get more important as time goes on but it seems a lower visibility offering right now.

http://www.revolutionanalytics.com/products/enterprise-deployment.php

  • Collection of Web services implemented as a RESTful API.
  • JavaScript and Java client libraries, allowing users to easily build custom Web applications on top of R.
  • .NET Client library — includes a COM interoperability to call R from VBA
  • Management Console for securely administrating servers, scripts and users through HTTP and HTTPS.
  • XML and JSON format for data exchange.
  • Built-in security model for authenticated or anonymous invocation of R Scripts.
  • Repository for storing R objects and R Script execution artifacts.

 

4) Revolutions IDE (or Productivity Environment) for a faster coding environment than command line. The GUI by Revolution Analytics is in the works. – Having used this- only the Code Snippets function is a clear differentiator from newer IDE and GUI. The code snippets is awesome though and even someone who doesnt know much R can get analysis set up quite fast and accurately.

http://www.revolutionanalytics.com/products/enterprise-productivity.php

  • Full-featured Visual Debugger for debugging R scripts, with call stack window and step-in, step-over, and step-out capability.
  • Enhanced Script Editor with hover-over help, word completion, find-across-files capability, automatic syntax checking, bookmarks, and navigation buttons.
  • Run Selection, Run to Line and Run to Cursor evaluation
  • R Code Snippets to automatically generate fill-in-the-blank sections of R code with tooltip help.
  • Object Browser showing available data and function objects (including those in packages), with context menus for plotting and editing data.
  • Solution Explorer for organizing, viewing, adding, removing, rearranging, and sourcing R scripts.
  • Customizable Workspace with dockable, floating, and tabbed tool windows.
  • Version Control Plug-in available for the open source Subversion version control software.

 

Marketing contributions from Revolution Analytics-

1) Sponsoring R sessions and user meets

2) Evangelizing R at conferences  and partnering with corporate partners including JasperSoft, Microsoft , IBM and others at http://www.revolutionanalytics.com/partners/

3) Helping with online initiatives like http://www.inside-r.org/ (which is curiously dormant and now largely superseded by R-Bloggers.com) and the syntax highlighting tool at http://www.inside-r.org/pretty-r. In addition Revolution has been proactive in reaching out to the community

4) Helping pioneer blogging about R and Twitter Hash tag discussions , and contributing to Stack Overflow discussions. Within a short while, #rstats online community has overtaken a lot more established names- partly due to decentralized nature of its working.

 

Did I miss something out? yes , they share their code by GPL.

 

Let me know by feedback

Interview- Top Data Mining Blogger on Earth , Sandro Saitta

Surajustement Modèle 2
Image via Wikipedia

If you do a Google search for Data Mining Blog- for the past several years one Blog will come on top. data mining blog – Google Search http://bit.ly/kEdPlE

To honor 5 years of Sandro Saitta’s blog (yes thats 5 years!) , we cover an exclusive interview with him where he reveals his unique sauce for cool techie blogging.

Ajay- Describe your journey as a scientist and data miner, from early experiences, to schooling to your work/research/blogging.

Sandro- My first experience with data mining was my master project. I used decision tree to predict pollen concentration for the following week using input data such as wind, temperature and rain. The fact that an algorithm can make a computer learn from experience was really amazing to me. I found it so interesting that I started a PhD in data mining. This time, the field of application was civil engineering. Civil engineers put a lot of sensors on their structure in order to understand how they behave. With all these sensors they generate a lot of data. To interpret these data, I used data mining techniques such as feature selection and clustering. I started my blog, Data Mining Research, during my PhD, to share with other researchers.

I then started applying data mining in the stock market as my first job in industry. I realized the difference between image recognition, where 99% correct classification rate is state of the art, and stock market, where you’re happy with 55%. However, the company ambiance was not as good as I thought, so I moved to consulting. There, I applied data mining in behavioral targeting to increase click-through rates. When you compare the number of customers who click with the ones who don’t, then you really understand what class imbalance mean. A few months ago, I accepted a very good opportunity at SICPA. I’m looking forward to resolving new challenges there.

Ajay- Your blog is the top ranked blog for “data mining blog”. Could you share some tips on better blogging for analytics and technical people

Sandro- It’s always difficult to start a blog, since at the beginning you have no reader. Writing for nobody may seem stupid, but it is not. By writing my first posts during my PhD I was reorganizing my ideas. I was expressing concepts which were not always clear to me. I thus learned a lot and also improved my English level. Of course, it’s still not perfect, but I hope most people can understand me.

Next come the readers. A few dozen each week first. To increase this number, I then started to learn SEO (Search Engine Optimization) by reading books and blogs. I tested many techniques that increased Data Mining Research visibility in the blogosphere. I think SEO is interesting when you already have some content published (which means not at the very beginning of your blog). After a while, once your blog is nicely ranked, the main task is to work on the content of the blog. To be of interest, your content must be particular: original, informative or provocative for example. I also had the chance to have a good visibility thanks to well-known people in the field like Kevin Hillstrom, Gregory Piatetsky-Shapiro, Will Dwinnell / Dean Abbott, Vincent Granville, Matthew Hurst and many others.

Ajay- Whats your favorite statistical software and what are the various softwares that you have worked with.
Could you compare and contrast these software as well.

Sandro- My favorite software at this point is SAS. I worked with it for two years. Once you know the language, you can perform ETL and data mining so easily. It’s also very fast compared to others. There are a lot of tools for data mining, but I cannot think of a tool that is as powerful as SAS and, in the same time, has a high-level programming language behind it.

I also worked with R and Matlab. R is very nice since you have all the up-to-date data mining algorithms implemented. However, working in the memory is not always a good choice, especially for ETL. Matlab is an excellent tool for prototyping. It’s not so fast and certainly not done for ETL, but the price is low regarding all the possibilities for data mining. According to me, SAS is the best choice for ETL and a good choice for data mining. Of course, there is the price.

Ajay- What are your favorite techniques and training resources for learning basics of data mining to say statisticians or business management graduates.

Sandro- I’m the kind of guy who likes to read books. I read data mining books one after the other. The fact that the same concepts are explained differently (and by different people) helps a lot in learning a topic like data mining. Of course, nothing replaces experience in the field. You can read hundreds of books, you will still not be a good practitioner until you really apply data mining in specific fields. My second choice after books is blogs. By reading data mining blogs, you will really see the issues and challenges in the field. It’s still not experience, but we are closer. Finally, web resources and networks such as KDnuggets of course, but also AnalyticBridge and LinkedIn.

Ajay- Describe your hobbies and how they help you ,if at all in your professional life.

Sandro- One of my hobbies is reading. I read a lot of books about data mining, SEO, Google as well as Sci-Fi and Fantasy. I’m a big fan of Asimov by the way. My other hobby is playing tennis. I think I simply use my hobbies as a way to find equilibrium in my life. I always try to find the best balance between work, family, friends and sport.

Ajay- What are your plans for your website for 2011-2012.

Sandro- I will continue to publish guest posts and interviews. I think it is important to let other people express themselves about data mining topics. I will not write about my current applications due to the policies of my current employer. But don’t worry, I still have a lot to write, whether it is technical or not. I will also emphasis more on my experience with data mining, advices for data miners, tips and tricks, and of course book reviews!

Standard Disclosure of Blogging- Sandro awarded me the Peoples Choice award for his blog for 2010 and carried out my interview. There is a lot of love between our respective wordpress blogs, but to reassure our puritan American readers- it is platonic and intellectual.

About Sandro S-



Sandro Saitta is a Data Mining Research Engineer at SICPA Security Solutions. He is also a blogger at Data Mining Research (www.dataminingblog.com). His interests include data mining, machine learning, search engine optimization and website marketing.

You can contact Mr Saitta at his Twitter address- 

https://twitter.com/#!/dataminingblog

#Rstats gets into Enterprise Cloud Software

Defense Agencies of the United States Departme...
Image via Wikipedia

Here is an excellent example of how websites should help rather than hinder new customers take a demo of the software without being overwhelmed by sweet talking marketing guys who dont know the difference between heteroskedasticity, probability, odds and likelihood.

It is made by Zementis (Dr Michael Zeller has been a frequent guest here) and Revolution Analytics is still the best shot in Enterprise software for #Rstats

Now if only Revo could get into the lucrative Department of Energy or Department of Defense business- they could change the world AND earn some more revenue than they have been doing. But seriously.

Check out http://deployr.revolutionanalytics.com/zementis/ and play with it. or better still mash it with some data viz and ROC curves.- or extend it with some APIS 😉

Cognitive Biases exploited by Spammers and Phishers

"Keep Walking"

Since they day you arrive on this planet, you are programmed into accepting reality as good and bad.

Beautiful people good. Ugly people not good.

Fellow countrymen good. Fellow earthling not so good.

Same religion is good. Different religion is awkward.

These cognitive biases are exploited in social media in the following manner-

1) Same Name Bias- You like people of the same name as you. or people who remind you of your brothers name. or uncles name.

All that information is already known. Esp true on Linkedin.

2) Same Orientation Bias- People tend to react better to photos considered attractive of opposite sex / opposite preference. Especially true on Twitter and Facebook.

3) Nationality Bias- Israeli Americans tend to respond better to Jewish looking phishers who claim to be from Israel but are not. Ditto for Indians- Arabs etc. E|sp true on Linkedin and Facebook.

You are positively biased to people of same country or of friendly nation states and will likely accept invites/friend/poke

4) Same organization/ alumni bias- People at end of phishing attack will have higher response rate if proxy identity claims familiarity with organizations or schools attended. Especially true on Facebook and Linkedin.

5) Same interests/movies/books bias- Your likely response rate is higher to someone who has seen your profile page on Facebook for interests, and checked the RSS stream of your tweets for stuff you like.

Bias is just maths. Period.