Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive Global file Sort

Hive Global file Sort

Rising Star

Hi dear experts!

 

i have a challenge - i do have unsorted set of the csv files and want to sort output and distribute ranges across many files

example, input file:

1
2
7
3
2
4
5
8
6

as output i would like to have few files, like:

1 file:

1
2
2
3

2 file:

4
5

3 file:

6
7
8

could someone recommend the hive function which could perform this?

 

thank you!

 

1 REPLY 1
Highlighted

Re: Hive Global file Sort

New Contributor

Hi,

 

You can use sort by function in hive to get this output.

 

Sort by:- It will run multiple reducers and with multiple number of sorted files but the full output is not sorted.

 

Hope this helps.

 

To read about the sort by vs order by vs distribute by vs cluster by:-

http://stackoverflow.com/questions/13715044/hive-cluster-by-vs-order-by-vs-sort-by

 

Nitish

Don't have an account?
Coming from Hortonworks? Activate your account here