I am trying to create a Hadoop MapReduce job, which maps creates a key-value pair of all files with query to be executed and reducer function executes or applies the sql query to the input text files
Map() function should search for the files with given keyword and send files as input to the reducer (key)
Reducer() function should execute the query to the files (key - value) key-files and value-query
Map() - Input Key-Value: Keyword-Query
-- how to search for files in the specific directory?
Reducer() - Input Key-Value: Files-Query
-- how to execute the query or apply the sql query to the files?
Please provide a sample dataset
As @Emil mentioned, depending on your data you could also create an external Hive table. For example:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> (
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
select * from <table_name> where <field_1>='<term_1>' and <field_2>='<term_2>';
Hope this helps,