Member since
05-29-2014
4
Posts
0
Kudos Received
0
Solutions
10-27-2014
09:23 AM
No, currently the user need to pick either the Hive or Impala App. I would recommend to ask the Impala Team to see how/when your UDF could be made compatible. Romain
... View more
10-22-2014
04:33 AM
Hi, Need some help implementing a usecase where output from an Impala/Hive query can be used as in input to a Map class in a MapReduce job. The Imapala/Hive query is given by the user at runtime (Basically, its not hard-coded parameterized query). A simplied work flow: 1) User queries Impala table, and gets an output which contains filename and line number. An example: file1 line_number2
file1 line_number3
file1 line_number4
file2 line_number1
file3 line_number2 2) I have a MapReduce job which processes a list of filename, line number values and extracts the specified lines from the files, which are stored in HDFS. I have implemented both of the steps above, but don't know how to link them. Any advice/help greatly appreciated.
... View more
08-18-2014
10:01 PM
Hi, I am using multiple flume agents, and within them a HDFS sink to push data to HDFS. But I am unsure of where to run these agents. 1) Runnning them on data nodes deprives the task tracker on the data nodes 100% CPU (https://wiki.apache.org/hadoop/DataNode) 2) Running them on a separate machine, takes it away from HDFS. Can someone share proper enterprise practices of deploying the Flume tier closest to HDFS?
... View more
05-29-2014
02:58 AM
Impala can do the conversion via SQL statements. I'd recommend asking the Impala guys for advice there as my information is a bit dated on this front, now that views and improved meta store features have been added. Mike
... View more