Fixing Search for Jobs and Resumes

When we search for websites on Google and Bing, we get relatively efficient results of what we are searching for just based on keywords. However for both candidates as well as companies, searching across jobs and resumes is tougher because most job portals do not have the chops to invest in algorithmic unstructured text search. Instead we encounter a scenario where the entire industry of recruitment agencies and consultants exist so that manual intervention reduces the inefficiency of this particular case of search. Even recruitment agencies have a checklist of questions to ask and they store the data in CRM software

Why is this possible? Economics is the study of incentives and a big chunk of paying customers for Job Portals is recruitment consultants. Making Job and Resume search much more efficient would enable both candidates and companies to bypass the traditional model of going via agencies and consultants.

Perhaps the only company with a strong enough database is LinkedIN with Google and Facebook as close behinds. This is a billion dollar industry, it is ripe for disruption, and the bits and pieces for fixing this basic mathematical search problem already exist. Perhaps what is needed is a database with enough data, activity captured through large enough agencies or HCRM software, integrated with algorithms.

The basic problem is Spammy or Outdated results in both resume search and job search. Spam should be fixed in a cutting way, dont you think?

Screenshot 2014-06-19 08.06.35

Big Data Analytics using Google BigQuery and R #rstats

a revised ppt I created on kick starting your Big Data Analytics stack in less than 15 minutes using both Google BigQuery and R

Citation-

https://github.com/hadley/bigrquery

 

Interns for Decisionstats.com

Do you know a bright young person whom you think should have a crack at an analytics career?

I am trying to get on site or remote location interns for helping me manage Decisionstats.com’s growth-Remote candidates would be expected to be available for a Skype video call for not more than 30 minutes daily and adherence to commited quality and timelines.

Please spread this if you would like to help. Candidates can apply here-

http://internshala.com/internship/detail/multiple-profiles-management-graphic-design-internship-in-delhi-ncr-at-decisionstats1402654998

 

INTERNSHIP DETAILS

AboutDecisionstats (http://decisionstats.com):Data Science and Analytics Website that deals in cutting edge research, consulting, writing and speaking assignments

About the Internship:The communication intern will proof read, edit and write content including blog posts and social media. The intern will be given on the job training for social media, web analytics and search engine optimization as well as an understanding of digital business. Only requirement needs to be learnability, truthfulness and a good command of English

The graphic design intern will create , edit and write graphics including icons, logos, posters and infographics. The intern will be given on the job training for designing in a real time environment, web analytics and search engine optimization as well as an understanding of digital business. Only requirement needs to be learnability, truthfulness and a good command of design.

The management intern will create , edit and make schedules and assist in cordination. The intern will be given on the job training for managing in a start up environment, web analytics and search engine marketing as well as an understanding of digital business. Only requirement needs to be learnability, truthfulness, passion and good management skills.

The data science intern will create , edit and make data science research and assist in writing. The intern will be given on the job training for data science and analytics. Only requirement needs to be learnability, truthfulness, passion for writing code and hacking problems on the fly.

# of Internships available:  4
Who can apply:The internships require people who are serious about careers, can devote the agreed upon hours per week and meet deadlines. Preferences will be given to candidates from established institutes and prior academic record.

Streams: Analytics, Design, Engineering Management, English, Humanities, Management, Engineering

Cloud versions of Latex

I work with Lyx http://www.lyx.org/, the GUI for Latex http://en.wikipedia.org/wiki/LaTeX, for writing my books. 18 years of writing in MS Word, and yes I have rightly criticized for my bad formatting. I hope to do a better job for R for Cloud Computing. Someday I will learn Latex and Sweave http://www.stat.uni-muenchen.de/~leisch/Sweave/ as well (sighs)

Sweave is a tool that allows to embed the R code for complete data analyses in latex documents. The purpose is to create dynamic reports, which can be updated automatically if data or analysis change. Instead of inserting a prefabricated graph or table into the report, the master document contains the R code necessary to obtain it. When run through R, all data analysis output (tables, graphs, etc.) is created on the fly and inserted into a final latex document. The report can be automatically updated if data or analysis change, which allows for truly reproducible research.

Where can I get it?

The Sweave software itself is part of every R installation

But alternatives to Lyx for a browser only version of Latex do exist.

There are two three of them right now

1) https://www.sharelatex.com/ ShareLaTeX is now open source! ShareLaTeX is an online real-time collaborative LaTeX editor, and you can now run your own local version where you can host, edit, collaborate in real-time, and compile your LaTeX documents. You can run  the hosted version at http://www.sharelatex.com,

Screenshot 2014-06-12 22.01.51

2) http://fiduswriter.org/  Fidus Writer is an online collaborative editor especially made for academics who need to use citations and/or formulas. The editor focuses on the content rather than the layout, so that with the same text, you can later on publish it in multiple ways: On a website, as a printed book, or as an ebook. In each case, you can choose from a number of layouts that are adequate for the medium of choice.Screenshot 2014-06-12 22.16.38

3) https://www.writelatex.com/

Screenshot 2014-06-13 01.40.54

All are equally good and equally nascent. I like that Writelatex has an API

 

I like the Fidus Writer interface more but the ShareLatex has a bigger set of templates. I think Write Latex is more evolved than Fidus Writer but will still need to catch up with Share Latex

Both are available on Github for tinkering.

https://github.com/fiduswriter/fiduswriter and

https://github.com/sharelatex/sharelatex and

https://github.com/sweenzor/writelatex-compile

Maybe I will have to wait for Google Docs for creating an application for Latex typesetting. In the meantime, we shall Lyx.

(Hat tip – S Boucher for pointing me to write latex)

Price of Analytics Education from Indian Service Providers

This is an unedited list of education providers from India, with both classroom and online trainings. 1$ =55Rs. The list will be updated as and when changes occur or when a reader suggests. I will only be putting prices that can be referenced via a URL. Also later I will try and create an index to track prices. Because I have had relationships with a lot of people in Indian Analytics- I will try and put this in a Google Docs spreadsheet.

The basic template will  be

  • Service Provider-
  • Location-.
  • Type -Online /Classroom
  • URL (reference)
  • Dated-
  • Screenshot-

Example

  • Service Provider-Venturesity
  • Location-Bangalore
  • Type -Classroom
  • URL (reference) -http://www.venturesity.com/course/big-data-analytics-bootcamp/
  • Dated-11 June 2014
  • Screenshot-Screenshot 2014-06-11 12.23.59

 

Brewer’s CAP Theorm

CAP theorem states that there are three basic requirements which exist in a special relation when designing applications for a distributed architecture.

Consistency – This means that the data in the database remains consistent after the execution of an operation. For example after an update operation all clients see the same data.

Availability – This means that the system is always on (service guarantee availability), no downtime.

Partition Tolerance – This means that the system continues to function even the communication among the servers is unreliable, i.e. the servers may be partitioned into multiple groups that cannot communicate with one another.

In theoretically it is impossible to fulfill all 3 requirements. CAP provides the basic requirements for a distributed system to follow 2 of the 3 requirements. Therefore all the current NoSQL database follow the different combinations of the C, A, P from the CAP theorem. Here is the brief description of three combinations CA, CP, AP :

CA – Single site cluster, therefore all nodes are always in contact. When a partition occurs, the system blocks.
CP – Some data may not be accessible, but the rest is still consistent/accurate.
AP – System is still available under partitioning, but some of the data returned may be inaccurate.

from —

http://mydatewithanalytics.wordpress.com/2014/06/01/brewers-cap-theorm/

cap-theoram-image