Random Sample with Hive and Download results

Random Selection

select * from data_base.table_name
where rand() <=0.01
distribute by rand()
sort by rand()
limit 100000;

Download Manually

Run the Hive Query.

When it is finished, scroll down to where results are and use the download icon (fourth from top)

 

 

 

 

 

 

 

 

Download from Hive Programmatically

Use MOBAXTERM to connect to server

Use VI/VIM to put query in a .hql file. Use i to insert and :wq to save and exit

Use nohup to run and direct the .hql results to a file

[ajayuser@server~]$ mkdir ajay

[ajayuser@server~]$ cd ajay

[ajayuser@serverajay]$ ls

[ajayuser@serverajay]$ vi agesex.hql

[ajayuser@serverajay]$ mv agesex.hql customer_demo.hql

[ajayuser@serverajay]$ ls

customer_demo.hql

[ajayuser@serverajay]$ nohup hive -f customer_demo.hql >>  log_cust.${date}.log;

[ajayuser@serverajay]$ nohup: ignoring input and redirecting stderr to stdout

 

To check progress

[ajayuser@serverajay]$ tail -f log_cust.${date}.log

Author: Ajay Ohri

http://about.me/ajayohri

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s