I am loading a JSON file with spark in order to insert it into Hive, this works very well.
Dataset<Row> testjson = spark.read().json("file:///root/test.json").withColumn("timestamp", new Column("timestamp").cast("timestamp"));
My JSON look like this (I simplified it for the purpose of readability):
"name": "joe", "age": 30, "hair": "brown", "knowledge": {"java": "average", "php": "good"},
and so on...
As you can see this JSON has something that is by default inserted as a struct:
"knowledge": {"java": "average", "php": "good"}
Now to the problem: I want the knowledge part of my JSON to be inserted as map<string, string> to hive instead of as it is now: struct<java:string,php:string>). I thought I can do like this
.withColumn("knowledge", new Column("knowledge").cast("Map")); //Map or Map<String, String> or equal but this is not working as struct cannot be casted to map.
This has been disturbing me for a while now and I cannot find a solution to it. I would therefore appreciate help a lot!
Please find the whole code:
public static void main(String[] args) {<br>
SparkSession spark = SparkSession
.builder()
.appName("Test Spark")
.master("local[*]")
.config("hive.metastore.uris", "thrift://localhost:9083")
.enableHiveSupport()
.getOrCreate();
//Here I cast timestamp in my JSON to timestamp in hive, working good
Dataset<Row> testjson = spark.read().json("file:///root/test.json").withColumn("timestamp", new Column("timestamp").cast("timestamp"));
testjson.createOrReplaceTempView("testjson");
Dataset<Row> showAll = spark.sql("SELECT * FROM testjson");
showAll.write().mode("overwrite").saveAsTable("finaljson");
showAll.show();
spark.stop();
}