Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Save a dataframe to Hive without creating a custom table using HiveWarehouseConnector.

avatar

As I read here I have to create a table first and name all columns and after to write on it.

Create newTable:

val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()

hive.createTable("newTable")
  .ifNotExists()
  .column("ws_sold_time_sk", "bigint")
  .column("ws_ship_date_sk", "bigint")
  .create()

Write to NewTable:

df.write.format(HIVE_WAREHOUSE_CONNECTOR)
  .option("table", "newTable")
  .save()

How to create a table with the same columns of the dataframe automatically. I have a dataframe with many columns and I can't write every column one by one. Is there anyway?

I tried:

//Read df from a path
var df = (spark
    .read
    .format("parquet")
    .option("inferSchema", "true")
    .option("header", "true")
    .load(dataPath))

//Write df into a newTable but it doesn't create the table
df.write.format(HIVE_WAREHOUSE_CONNECTOR)
  .option("table", "newTable")
  .save()
1 REPLY 1

avatar
save_table_hwc(df1, "default", "table_test1")

def save_table_hwc(df: DataFrame, database: String, tableName: String) : Unit = {
    hive.setDatabase(database)
    hive.dropTable(tableName, true, false)
    hive.createTable(tableName)
    var table_builder = hive.createTable(tableName)
    for( i <- 0 to df.schema.length-1){
        var name = df.schema.toList(i).name.replaceAll("[^\\p{L}\\p{Nd}]+", "")
        var data_type = df.schema.toList(i).dataType.sql
        table_builder = table_builder.column(name, data_type)
    }
    table_builder.create()
    df.write.format(HIVE_WAREHOUSE_CONNECTOR).option("table", tableName).save()
}