Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Getting all NULLS when selecting from a Hive JSON table

avatar

I’m trying to get some twitter JSON to show up in Hive and I’m not having any luck. All I get are NULLS returned and no errors. I've tried the native JSON serde as well as the openx serde but get the same results.

LOAD DATA INPATH '/tmp/tweets_staging/‘ OVERWRITE INTO tweets;

ADD JAR /hadoop/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar;

SELECT * FROM tweets LIMIT 100;

4001-9688f87e-f96b-4330-833c-c2e59ab2e9c9.png

tweet-table.txt

1 ACCEPTED SOLUTION

avatar
Super Collaborator

You might want to check that the JSON is legitimate. You can do this with Spark (spark-sql). Here's what I'd do in spark-sql:

  1. Create temp table over the JSON files
  2. See if "select " statement returns desired results
  3. Describe the temp table and sanity check the schema against what you were using in Hive

To create the temp table in Spark:

CREATE TEMPORARY TABLE tweets_temp
USING org.apache.spark.sql.json
OPTIONS (path '[the path to the JSON dataset]')

Read more about this feature here.

View solution in original post

2 REPLIES 2

avatar
Super Collaborator

You might want to check that the JSON is legitimate. You can do this with Spark (spark-sql). Here's what I'd do in spark-sql:

  1. Create temp table over the JSON files
  2. See if "select " statement returns desired results
  3. Describe the temp table and sanity check the schema against what you were using in Hive

To create the temp table in Spark:

CREATE TEMPORARY TABLE tweets_temp
USING org.apache.spark.sql.json
OPTIONS (path '[the path to the JSON dataset]')

Read more about this feature here.

avatar
Super Collaborator

I suppose is the issue with loading data.

Try to create external table instead..

create EXTERNAL table tweets 
....

row format serde 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/tmp/tweets_staging/';