Support Questions
Find answers, ask questions, and share your expertise

Using Impala/Hive query output in MapReduce Job

Using Impala/Hive query output in MapReduce Job

New Contributor

Hi, 

Need some help implementing a usecase where output from an Impala/Hive query can be used as in input to a Map class in a MapReduce job. The Imapala/Hive query is given by the user at runtime (Basically, its not hard-coded parameterized query). A simplied work flow:

 

1) User queries Impala table, and gets an output which contains filename and line number. An example:

file1 line_number2
file1 line_number3
file1 line_number4
file2 line_number1
file3 line_number2

 

2) I have a MapReduce job which processes a list of filename, line number values and extracts the specified lines from the files, which are stored in HDFS.

 

I have implemented both of the steps above, but don't know how to link them. Any advice/help greatly appreciated.

Don't have an account?