A better Iris Dataset for the current era

The IRIS dataset is the curse of teaching data science. It makes applying algorithms very simple. What is needed is a BIGGER DATASET with missing values and many many variables/features that teaches the whole cycle of data science, not just plain machine learning but also data pre-processing , dimensionality, standardization, as well. Ideally the dataset should be bigger than memory (RAM) to teach efficiency as well. #datascience #machinelearning #algorithms 



What do young and budding data scientists want?

-To be mentored by experienced data scientists on acquiring skills what to study and where to study given the huge number of choices in studying data science (videos, Moocs, classroom,online education) at different price points (from 0$ to 10000$). surprising number of would be data scientists in India dont know basics of git or linux even though it is free at places like codeacademy. surprising number of experienced data scientists in India do not give back to community in meetups or webcasts even though doing so would increase their brands.

-many institutes offer huge number of courses ranging from 400$ to 10000$ programs. budding data scientists need some sort of protection from confusion created by unethical marketing promises

– students need capstone projects, how to do competitions like kaggle, hackathons, open datasets to practice skills above. competitive coding, data structures and algorithms is the next round of knowledge to be acquired to get interviews.

– opportunity to showcase their skills and output in internships. A single data science post on LetsIntern gets ~300 applicants!

Lets all try and give knowledge if we are experienced, and lets all try to humbly work hard to acquire knowledge if we are new. This is next revolution in information technology due to cloud, big data, open source, machine and deep learning, and AI.

Lets make knowledge equitable. Thats the only way it grows.

Finally a credible data scientist certification

Anaconda announced their exhaustive and superb data science certification. While Big Data and other fields have had certifications , data science only had expensive tutorials and training but no certification. This will be a game  changer in the data science training industry.



Safeguarding your data in the era of electoral hackers

  1. Delete Apps connected to LinkedIn  at https://www.linkedin.com/psettings/permitted-services
  2. Delete Apps connected to Facebook  at https://www.facebook.com/settings?tab=applications
  3. Delete Apps connected to Google Accounts https://myaccount.google.com/security?&pli=1#connectedapps

That should reduce your digital threat footprint

Next up should be a anti-phishing bot


#fixfacebook rather than #deletefacebook

While I have written articles on mining Facebook data using R (here https://decisionstats.com/2014/05/10/analyzing-facebook-networks-using-rstats/ ) I view it as a place to share photos (from Instagram and Whatsapp) and keep in touch. Deleting facebook means deleting most of my adult memories- or deleting  baby along with bath water. FB has smart people. Surely they can find the energy and focus to fix this before it goes worse

I love my company but I hate my boss



(a parable )

The principal–agent problem, in political science and economics, (also known as agency dilemma or the agency problem) occurs when one person or entity (the “agent”) is able to make decisions on behalf of, or that impact, another person or entity: the “principal”.[1] This dilemma exists in circumstances where agents are motivated to act in their own best interests, which are contrary to those of their principals, and is an example of moral hazard. (from https://en.wikipedia.org/wiki/Principal%E2%80%93agent_problem)

Why do people join companies and startups but leave bosses? because a BOSS does not have the same ownership or value system as the company has. A project manager is focused on expanding his project not how profitable it is or long term effects.

A Boss is more concerned with his bonus- and will let go of good people to meet his cost /billing ratio targets. By doing so he can say he achieved profit by replacing high quality high cost resources with others.But in truth he may have damaged the company’s image as a potential recruiter 

The real challenge then arises for company founders/ owners to create metrics to avoid agency conflict. or to align individual metrics to global company metrics. This is true because many teams need to work together but they have different goals.

The product team wants to push the  latest release of product to get feedback on bugs, but the client services team wants to resist it and stick to stable builds (an example)

However the sum of a local optima is rarely a global optimum, mathematically speaking. A local optimum is whatever is best for the performance of an individual part, whereas the global optimum is what is best for the performance of the system as a whole.  Hat tip – http://bit.ly/2Iyp3pc for amazing insights and images

many people unknowingly subscribe to a defunct management philosophy: that you can improve the performance of a company as a whole by individually improving the performance of its parts.

from the Boss’s perspective, probably the local optima is their global optima as it keep them safe from being overtaken by people better than them.

the job of founders and senior management to neutralize this very human insecurity

That is why culture eats strategy for lunch. But how do you quantify culture?  (to be continued)