Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Merge multiple json flow files in to One and then to PutHDFS and PutHQL

Highlighted

Merge multiple json flow files in to One and then to PutHDFS and PutHQL

New Contributor

I am new to Hadoop environment and need some experts suggestion. I have a requirement to put the data from Kafka topic in to Hive table.

Here is how i designed it .

KafkaConsumer process=>EvaluateJson ( to retrieve required attributes)=> Attribute to Json (Flatten Json)=> MergeContent ( Merge ~1000 Json flow files in to one file)=> PutHDFS ( Hive table on top of it)= > replacetext ( insert statement to final table) =>

I am using mergecontent as, i don't want to have a every record from kafka topic going in to single file in HDFS.

Though mergecontent processor concatenates json flow files into one large file and puts to HDFS but when I query the Hive table i get only data from the first flow file. What am I doing wrong here?

Here is my Hive table -

ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION XXXXXXXX


Thanks for you time.