Created 10-22-2014 04:33 AM
Hi,
Need some help implementing a usecase where output from an Impala/Hive query can be used as in input to a Map class in a MapReduce job. The Imapala/Hive query is given by the user at runtime (Basically, its not hard-coded parameterized query). A simplied work flow:
1) User queries Impala table, and gets an output which contains filename and line number. An example:
file1 line_number2 file1 line_number3 file1 line_number4 file2 line_number1 file3 line_number2
2) I have a MapReduce job which processes a list of filename, line number values and extracts the specified lines from the files, which are stored in HDFS.
I have implemented both of the steps above, but don't know how to link them. Any advice/help greatly appreciated.