Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

top function in pig/hive

Solved Go to solution
Highlighted

top function in pig/hive

In a dataset (approx. 2 lakh records), there is coloumn named tags ( comma separated list of tags associated with question. examples of tags are "html","error" etc so on .

php,error,gd,image-processing

php,error,gd,image-processing

lisp,scheme,subjective,clojure

lisp,scheme,subjective,clojure

lisp,scheme,subjective,clojure

lisp,scheme,subjective,clojure

lisp,scheme,subjective,clojure

lisp,scheme,subjective,clojure

lisp,scheme,subjective,clojure

lisp,scheme,subjective,clojure

lisp,scheme,subjective,clojure

lisp,scheme,subjective,clojure

lisp,scheme,subjective,clojure

lisp,scheme,subjective,clojure

lisp,scheme,subjective,clojure

lisp,scheme,subjective,clojure

lisp,scheme,subjective,clojure

lisp,scheme,subjective,clojure

cocoa-touch,objective-c,design-patterns

cocoa-touch,objective-c,design-patterns

cocoa-touch,objective-c,design-patterns

core-animation

django,django-models

django,django-models

aspûnet

scala,pattern-matching,oop,object-oriented-design,design-principles

scala,pattern-matching,oop,object-oriented-design,design-principles

scala,pattern-matching,oop,object-oriented-design,design-principles

. . . . .

how to find top 10 most commonly used tags in dataset? in pig or hive

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: top function in pig/hive

Here is a Pig word count with comments. Give the delimiter to TOKENIZE, in you case comma: TOKENIZE(line,','). You might have to select a different filter based on your input. You can start by commenting the filter out and adding it later if needed. Finally, to extract only 10 top entries you can use LIMIT: top10 = LIMIT ordered_word_count, 10. Be sure to inspect the stored file and make sure words (tags) have been properly tokenized. If not, add a filter mentioned above.

View solution in original post

4 REPLIES 4
Highlighted

Re: top function in pig/hive

Mentor

@priyanka vijayakumar good word count tutorial link. It uses Pig, Hcatalog and Hive, you will be better off with the combination of these.

Highlighted

Re: top function in pig/hive

thanks a lot.

Highlighted

Re: top function in pig/hive

Here is a Pig word count with comments. Give the delimiter to TOKENIZE, in you case comma: TOKENIZE(line,','). You might have to select a different filter based on your input. You can start by commenting the filter out and adding it later if needed. Finally, to extract only 10 top entries you can use LIMIT: top10 = LIMIT ordered_word_count, 10. Be sure to inspect the stored file and make sure words (tags) have been properly tokenized. If not, add a filter mentioned above.

View solution in original post

Highlighted

Re: top function in pig/hive

thanks a lot.

Don't have an account?
Coming from Hortonworks? Activate your account here