Support Questions

Find answers, ask questions, and share your expertise

Getting all NULLS when selecting from a Hive JSON table

avatar

I’m trying to get some twitter JSON to show up in Hive and I’m not having any luck. All I get are NULLS returned and no errors. I've tried the native JSON serde as well as the openx serde but get the same results.

LOAD DATA INPATH '/tmp/tweets_staging/‘ OVERWRITE INTO tweets;

ADD JAR /hadoop/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar;

SELECT * FROM tweets LIMIT 100;

4001-9688f87e-f96b-4330-833c-c2e59ab2e9c9.png

tweet-table.txt

1 ACCEPTED SOLUTION

avatar
Super Collaborator

You might want to check that the JSON is legitimate. You can do this with Spark (spark-sql). Here's what I'd do in spark-sql:

  1. Create temp table over the JSON files
  2. See if "select " statement returns desired results
  3. Describe the temp table and sanity check the schema against what you were using in Hive

To create the temp table in Spark:

CREATE TEMPORARY TABLE tweets_temp
USING org.apache.spark.sql.json
OPTIONS (path '[the path to the JSON dataset]')

Read more about this feature here.

View solution in original post

2 REPLIES 2

avatar
Super Collaborator

You might want to check that the JSON is legitimate. You can do this with Spark (spark-sql). Here's what I'd do in spark-sql:

  1. Create temp table over the JSON files
  2. See if "select " statement returns desired results
  3. Describe the temp table and sanity check the schema against what you were using in Hive

To create the temp table in Spark:

CREATE TEMPORARY TABLE tweets_temp
USING org.apache.spark.sql.json
OPTIONS (path '[the path to the JSON dataset]')

Read more about this feature here.

avatar
Super Collaborator

I suppose is the issue with loading data.

Try to create external table instead..

create EXTERNAL table tweets 
....

row format serde 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/tmp/tweets_staging/';