2) To get a random sample of your RDD (named data) say with 100000 rows and to get 20% values
where data.sample takes the parameters
Signature: data.sample(withReplacement, fraction, seed=None)
and .collect helps in getting data
2) takeSample when I specify by size of sample (say 100)
data.takeSample(withReplacement, num, seed=None) Docstring: Return a fixed-size sampled subset of this RDD.
Simple isnt it