Support Questions

Find answers, ask questions, and share your expertise

Create Hive table to read parquet files from parquet/avro schema

avatar
Rising Star

Hello Experts !

We are looking for a solution in order to create an external hive table to read data from parquet files according to a parquet/avro schema.

in other way, how to generate a hive table from a parquet/avro schema ?

thanks 🙂

tazimehdi.com
1 ACCEPTED SOLUTION

avatar
Rising Star

The solution is to create dynamically a table from avro, and then create a new table of parquet format from the avro one.

there is the source code from Hive, which this helped you

CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS AVRO TBLPROPERTIES ('avro.schema.url'='myHost/myAvroSchema.avsc'); 
CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath';
tazimehdi.com

View solution in original post

10 REPLIES 10

avatar
Explorer

with newer versions of spark, the sqlContext is not load by default, you have to specify it explicitly :

 

scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@6179af64

scala> import sqlContext.implicits._
import sqlContext.implicits._

scala> sqlContext.sql("describe mytable")
res2: org.apache.spark.sql.DataFrame = [col_name: string, data_type: string ... 1 more field]

 I'm working with spark 2.3.2