Random Sample of RDD in Spark

2) To get a random sample of your RDD (named data) say with 100000 rows and to get 20% values


where data.sample takes the parameters

Signature: data.sample(withReplacement, fraction, seed=None)

and .collect helps in getting data

2) takeSample when I specify  by size of sample (say 100)


data.takeSample(withReplacement, num, seed=None)
Return a fixed-size sampled subset of this RDD.



Simple isnt it

Author: Ajay Ohri


