You might want to check that the JSON is legitimate. You can do this with Spark (spark-sql). Here's what I'd do in spark-sql:
- Create temp table over the JSON files
- See if "select " statement returns desired results
- Describe the temp table and sanity check the schema against what you were using in Hive
To create the temp table in Spark:
CREATE TEMPORARY TABLE tweets_temp
OPTIONS (path '[the path to the JSON dataset]')
Read more about this feature here.