- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Getting all NULLS when selecting from a Hive JSON table
- Labels:
-
Apache Hive
Created on 05-04-2016 01:46 PM - edited 08-19-2019 02:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I’m trying to get some twitter JSON to show up in Hive and I’m not having any luck. All I get are NULLS returned and no errors. I've tried the native JSON serde as well as the openx serde but get the same results.
LOAD DATA INPATH '/tmp/tweets_staging/‘ OVERWRITE INTO tweets;
ADD JAR /hadoop/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar;
SELECT * FROM tweets LIMIT 100;
Created 05-04-2016 02:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You might want to check that the JSON is legitimate. You can do this with Spark (spark-sql). Here's what I'd do in spark-sql:
- Create temp table over the JSON files
- See if "select " statement returns desired results
- Describe the temp table and sanity check the schema against what you were using in Hive
To create the temp table in Spark:
CREATE TEMPORARY TABLE tweets_temp USING org.apache.spark.sql.json OPTIONS (path '[the path to the JSON dataset]')
Read more about this feature here.
Created 05-04-2016 02:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You might want to check that the JSON is legitimate. You can do this with Spark (spark-sql). Here's what I'd do in spark-sql:
- Create temp table over the JSON files
- See if "select " statement returns desired results
- Describe the temp table and sanity check the schema against what you were using in Hive
To create the temp table in Spark:
CREATE TEMPORARY TABLE tweets_temp USING org.apache.spark.sql.json OPTIONS (path '[the path to the JSON dataset]')
Read more about this feature here.
Created 05-04-2016 06:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suppose is the issue with loading data.
Try to create external table instead..
create EXTERNAL table tweets .... row format serde 'org.openx.data.jsonserde.JsonSerDe' LOCATION '/tmp/tweets_staging/';
