Created on 05-04-2016 01:46 PM - edited 08-19-2019 02:26 AM
I’m trying to get some twitter JSON to show up in Hive and I’m not having any luck. All I get are NULLS returned and no errors. I've tried the native JSON serde as well as the openx serde but get the same results.
LOAD DATA INPATH '/tmp/tweets_staging/‘ OVERWRITE INTO tweets;
ADD JAR /hadoop/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar;
SELECT * FROM tweets LIMIT 100;
Created 05-04-2016 02:22 PM
You might want to check that the JSON is legitimate. You can do this with Spark (spark-sql). Here's what I'd do in spark-sql:
To create the temp table in Spark:
CREATE TEMPORARY TABLE tweets_temp USING org.apache.spark.sql.json OPTIONS (path '[the path to the JSON dataset]')
Read more about this feature here.
Created 05-04-2016 02:22 PM
You might want to check that the JSON is legitimate. You can do this with Spark (spark-sql). Here's what I'd do in spark-sql:
To create the temp table in Spark:
CREATE TEMPORARY TABLE tweets_temp USING org.apache.spark.sql.json OPTIONS (path '[the path to the JSON dataset]')
Read more about this feature here.
Created 05-04-2016 06:46 PM
I suppose is the issue with loading data.
Try to create external table instead..
create EXTERNAL table tweets .... row format serde 'org.openx.data.jsonserde.JsonSerDe' LOCATION '/tmp/tweets_staging/';