Created 11-19-2015 08:19 PM
I am using https://github.com/rcongiu/Hive-JSON-Serde to query the JSON data via hive.
As part of testing, I am using an external table to query the JSON plain text file in HDFS.
i am able to query the data from hive using select, However when i do select * from JSON_EXTERNAL_TABLE limit 1, the output is an Invalid JSON though the message in HDFS is a valid JSON. Is this an expected one ?
Created 11-19-2015 08:20 PM
JSON Serde works on 1 line at a time, with each line being parsed independently. Is your JSON encoded to fit into 1 line for each record in your stream?
Created 10-24-2016 05:09 PM
what is your sample data?
Created 10-24-2016 05:16 PM
iam attaching the tweeter file that was created using flume . can you please see if its of valid structure as I am unable to read/view this file .
Created 10-24-2016 05:19 PM
and this is how I generate these twitter files (based on internet demos)
flume-ng agent --conf-file twitter-to-hdfs.properties --name agent1 -Dflume.root.logger=WARN,console -Dtwitter4j.http.proxyHost=dotatofwproxy.tolls.dot.state.fl.us -Dtwitter4j.http.proxyPort=8080 [root@hadoop1 ~]# more twitter-to-hdfs.properties agent1.sources =source1 agent1.sinks = sink1 agent1.channels = channel1 agent1.sources.source1.channels = channel1 agent1.sinks.sink1.channel = channel1 agent1.sources.source1.type = org.apache.flume.source.twitter.TwitterSource agent1.sources.source1.consumerKey = xxxxxxxxxxxxxxxxxxxxxxxxxTaz agent1.sources.source1.consumerSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCI9 agent1.sources.source1.accessToken = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxwov agent1.sources.source1.accessTokenSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxY5H3 agent1.sources.source1.keywords = Clinton Trump agent1.sinks.sink1.type = hdfs agent1.sinks.sink1.hdfs.path = /user/flume/tweets agent1.sinks.sink1.hdfs.filePrefix = events agent1.sinks.sink1.hdfs.fileSuffix = .log agent1.sinks.sink1.hdfs.inUsePrefix = _ agent1.sinks.sink1.hdfs.fileType = DataStream agent1.channels.channel1.type = file
Created 10-25-2016 01:45 PM
Artem any advise? I am anxiously waiting for yours or anyone feedback
Created 10-25-2016 04:01 PM
@Sami Ahmad if you can't view or read the file means this is not a valid text file, hence the problems you're facing. From your output, looks like you have some binary structures in your file and therefore you're having difficulty setting Hive schema on top of it. Please review your HDFS sink properties. Also, I highly recommend investing in Apache Nifi, all these problems would go away quickly and since this is a closed thread, please open a new question instead of here. This makes it difficult to understand the context of this thread. Again, once you can view your resultant twitter file output, you should be able to apply Hive schema on it.
Created 10-25-2016 05:49 PM
I already have another thread open but not getting much responses there . can you please follow up on that thread ?
also I attached my output file and how I generate it in this thread and the other , can you help me identify why the file is bad ?
https://community.hortonworks.com/questions/61181/reading-json-files.html#comment-63383
Created 10-25-2016 06:19 PM
also created a new one just for finding out how to create the twitter json file from flume
https://community.hortonworks.com/questions/63419/bad-hdfs-sink-property.html