Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Create Hive table to read parquet files from parquet/avro schema

avatar
Rising Star

Hello Experts !

We are looking for a solution in order to create an external hive table to read data from parquet files according to a parquet/avro schema.

in other way, how to generate a hive table from a parquet/avro schema ?

thanks 🙂

tazimehdi.com
1 ACCEPTED SOLUTION

avatar
Rising Star

The solution is to create dynamically a table from avro, and then create a new table of parquet format from the avro one.

there is the source code from Hive, which this helped you

CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS AVRO TBLPROPERTIES ('avro.schema.url'='myHost/myAvroSchema.avsc'); 
CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath';
tazimehdi.com

View solution in original post

10 REPLIES 10

avatar
Explorer

with newer versions of spark, the sqlContext is not load by default, you have to specify it explicitly :

 

scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@6179af64

scala> import sqlContext.implicits._
import sqlContext.implicits._

scala> sqlContext.sql("describe mytable")
res2: org.apache.spark.sql.DataFrame = [col_name: string, data_type: string ... 1 more field]

 I'm working with spark 2.3.2