Archives of Support Questions (Read Only)

TAZIMehdi · ‎12-10-2015

Hello Experts !

We are looking for a solution in order to create an external hive table to read data from parquet files according to a parquet/avro schema.

in other way, how to generate a hive table from a parquet/avro schema ?

thanks 🙂

tazimehdi.com

TAZIMehdi · ‎02-02-2016

The solution is to create dynamically a table from avro, and then create a new table of parquet format from the avro one.

there is the source code from Hive, which this helped you

CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS AVRO TBLPROPERTIES ('avro.schema.url'='myHost/myAvroSchema.avsc'); 
CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath';

tazimehdi.com

View solution in original post

obrobecker · ‎02-19-2020

with newer versions of spark, the sqlContext is not load by default, you have to specify it explicitly :

scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@6179af64

scala> import sqlContext.implicits._
import sqlContext.implicits._

scala> sqlContext.sql("describe mytable")
res2: org.apache.spark.sql.DataFrame = [col_name: string, data_type: string ... 1 more field]

I'm working with spark 2.3.2

Cloudera Community

Archives of Support Questions (Read Only)

Create Hive table to read parquet files from parquet/avro schema