Created 12-10-2015 01:02 PM
Hello Experts !
We are looking for a solution in order to create an external hive table to read data from parquet files according to a parquet/avro schema.
in other way, how to generate a hive table from a parquet/avro schema ?
thanks 🙂
Created 02-02-2016 03:08 PM
The solution is to create dynamically a table from avro, and then create a new table of parquet format from the avro one.
there is the source code from Hive, which this helped you
CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS AVRO TBLPROPERTIES ('avro.schema.url'='myHost/myAvroSchema.avsc');
CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath';
Created 02-19-2020 10:49 PM
with newer versions of spark, the sqlContext is not load by default, you have to specify it explicitly :
scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@6179af64
scala> import sqlContext.implicits._
import sqlContext.implicits._
scala> sqlContext.sql("describe mytable")
res2: org.apache.spark.sql.DataFrame = [col_name: string, data_type: string ... 1 more field]
I'm working with spark 2.3.2