Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How can you query using JSON against HDP?

avatar
Explorer

The customer wants to use something like Apache Drill to query HDP using JSON due to the fact that it's self-describing.

1 ACCEPTED SOLUTION

avatar

One of the prospects recently evaluated Drill and while it worked for the structured / self-describing formats without creating schema, their experience was that the data type resolution aspect slowed the performance down. In any case, HWX does not support Drill officially so the on-us will be on customer to resolve any Drill related issues when using it with HDP.

On the other hand, my comment to customers is that Hive provides a consistent approach and in a way / semantics that is known to the database developers. Additionally, a larger community involvement and maturity of the product has hardened Hive over number of years.

JSONSerde is the easy to use way to handle JSON in HDP. In return of one time table creation, you get better performance as compared to Drill which does not seem like a bad trade off at all.

View solution in original post

3 REPLIES 3

avatar

Take a look at Spark (and SparkSQL). It can automatically infer the schema of a JSON dataset

https://spark.apache.org/docs/1.4.1/sql-programming-guide.html#json-datasets

avatar

Apache Drill supports JSON as self describing data format, you can find the usage here. In Hive, HCatalog supports JSON as serde format for reading and writing data into tables.

avatar

One of the prospects recently evaluated Drill and while it worked for the structured / self-describing formats without creating schema, their experience was that the data type resolution aspect slowed the performance down. In any case, HWX does not support Drill officially so the on-us will be on customer to resolve any Drill related issues when using it with HDP.

On the other hand, my comment to customers is that Hive provides a consistent approach and in a way / semantics that is known to the database developers. Additionally, a larger community involvement and maturity of the product has hardened Hive over number of years.

JSONSerde is the easy to use way to handle JSON in HDP. In return of one time table creation, you get better performance as compared to Drill which does not seem like a bad trade off at all.