Options
- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Save a dataframe to Hive without creating a custom table using HiveWarehouseConnector.
Labels:
Explorer
Created ‎11-12-2018 06:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As I read here I have to create a table first and name all columns and after to write on it.
Create newTable:
val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build() hive.createTable("newTable") .ifNotExists() .column("ws_sold_time_sk", "bigint") .column("ws_ship_date_sk", "bigint") .create()
Write to NewTable:
df.write.format(HIVE_WAREHOUSE_CONNECTOR) .option("table", "newTable") .save()
How to create a table with the same columns of the dataframe automatically. I have a dataframe with many columns and I can't write every column one by one. Is there anyway?
I tried:
//Read df from a path var df = (spark .read .format("parquet") .option("inferSchema", "true") .option("header", "true") .load(dataPath)) //Write df into a newTable but it doesn't create the table df.write.format(HIVE_WAREHOUSE_CONNECTOR) .option("table", "newTable") .save()
1 REPLY 1
Explorer
Created ‎11-21-2018 11:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
save_table_hwc(df1, "default", "table_test1")
def save_table_hwc(df: DataFrame, database: String, tableName: String) : Unit = {
hive.setDatabase(database)
hive.dropTable(tableName, true, false)
hive.createTable(tableName)
var table_builder = hive.createTable(tableName)
for( i <- 0 to df.schema.length-1){
var name = df.schema.toList(i).name.replaceAll("[^\\p{L}\\p{Nd}]+", "")
var data_type = df.schema.toList(i).dataType.sql
table_builder = table_builder.column(name, data_type)
}
table_builder.create()
df.write.format(HIVE_WAREHOUSE_CONNECTOR).option("table", tableName).save()
}
