I am new to Hadoop environment and need some experts suggestion. I have a requirement to put the data from Kafka topic in to Hive table. Here is how i designed it . KafkaConsumer process=>EvaluateJson ( to retrieve required attributes)=> Attribute to Json (Flatten Json)=> MergeContent ( Merge ~1000 Json flow files in to one file)=> PutHDFS ( Hive table on top of it)= > replacetext ( insert statement to final table) => I am using mergecontent as, i don't want to have a every record from kafka topic going in to single file in HDFS. Though mergecontent processor concatenates json flow files into one large file and puts to HDFS but when I query the Hive table i get only data from the first flow file. What am I doing wrong here? Here is my Hive table - ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION XXXXXXXX Thanks for you time.
... View more