Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How can you query using JSON against HDP?

SOLVED Go to solution
Highlighted

How can you query using JSON against HDP?

New Contributor

The customer wants to use something like Apache Drill to query HDP using JSON due to the fact that it's self-describing.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How can you query using JSON against HDP?

One of the prospects recently evaluated Drill and while it worked for the structured / self-describing formats without creating schema, their experience was that the data type resolution aspect slowed the performance down. In any case, HWX does not support Drill officially so the on-us will be on customer to resolve any Drill related issues when using it with HDP.

On the other hand, my comment to customers is that Hive provides a consistent approach and in a way / semantics that is known to the database developers. Additionally, a larger community involvement and maturity of the product has hardened Hive over number of years.

JSONSerde is the easy to use way to handle JSON in HDP. In return of one time table creation, you get better performance as compared to Drill which does not seem like a bad trade off at all.

3 REPLIES 3

Re: How can you query using JSON against HDP?

Take a look at Spark (and SparkSQL). It can automatically infer the schema of a JSON dataset

https://spark.apache.org/docs/1.4.1/sql-programming-guide.html#json-datasets

Re: How can you query using JSON against HDP?

Master Collaborator

Apache Drill supports JSON as self describing data format, you can find the usage here. In Hive, HCatalog supports JSON as serde format for reading and writing data into tables.

Re: How can you query using JSON against HDP?

One of the prospects recently evaluated Drill and while it worked for the structured / self-describing formats without creating schema, their experience was that the data type resolution aspect slowed the performance down. In any case, HWX does not support Drill officially so the on-us will be on customer to resolve any Drill related issues when using it with HDP.

On the other hand, my comment to customers is that Hive provides a consistent approach and in a way / semantics that is known to the database developers. Additionally, a larger community involvement and maturity of the product has hardened Hive over number of years.

JSONSerde is the easy to use way to handle JSON in HDP. In return of one time table creation, you get better performance as compared to Drill which does not seem like a bad trade off at all.