Support Questions

SQLShaw · ‎05-04-2016

I’m trying to get some twitter JSON to show up in Hive and I’m not having any luck. All I get are NULLS returned and no errors. I've tried the native JSON serde as well as the openx serde but get the same results.

LOAD DATA INPATH '/tmp/tweets_staging/‘ OVERWRITE INTO tweets;

ADD JAR /hadoop/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar;

SELECT * FROM tweets LIMIT 100;

tweet-table.txt

clukasik · ‎05-04-2016

You might want to check that the JSON is legitimate. You can do this with Spark (spark-sql). Here's what I'd do in spark-sql:

Create temp table over the JSON files
See if "select " statement returns desired results
Describe the temp table and sanity check the schema against what you were using in Hive

To create the temp table in Spark:

CREATE TEMPORARY TABLE tweets_temp
USING org.apache.spark.sql.json
OPTIONS (path '[the path to the JSON dataset]')

Read more about this feature here.

View solution in original post

clukasik · ‎05-04-2016

You might want to check that the JSON is legitimate. You can do this with Spark (spark-sql). Here's what I'd do in spark-sql:

Create temp table over the JSON files
See if "select " statement returns desired results
Describe the temp table and sanity check the schema against what you were using in Hive

To create the temp table in Spark:

CREATE TEMPORARY TABLE tweets_temp
USING org.apache.spark.sql.json
OPTIONS (path '[the path to the JSON dataset]')

Cloudera Community

Support Questions

Getting all NULLS when selecting from a Hive JSON table