Member since
02-08-2018
2
Posts
1
Kudos Received
0
Solutions
02-08-2018
03:15 PM
1 Kudo
I am loading a JSON file with spark in order to insert it into Hive, this works very well. Dataset<Row> testjson = spark.read().json("file:///root/test.json").withColumn("timestamp", new Column("timestamp").cast("timestamp")); My JSON look like this (I simplified it for the purpose of readability): "name": "joe", "age": 30, "hair": "brown", "knowledge": {"java": "average", "php": "good"}, and so on... As you can see this JSON has something that is by default inserted as a struct: "knowledge": {"java": "average", "php": "good"} Now to the problem: I want the knowledge part of my JSON to be inserted as map<string, string> to hive instead of as it is now: struct<java:string,php:string>). I thought I can do like this .withColumn("knowledge", new Column("knowledge").cast("Map")); //Map or Map<String, String> or equal but this is not working as struct cannot be casted to map. This has been disturbing me for a while now and I cannot find a solution to it. I would therefore appreciate help a lot! Please find the whole code: public static void main(String[] args) {<br>
SparkSession spark = SparkSession
.builder()
.appName("Test Spark")
.master("local[*]")
.config("hive.metastore.uris", "thrift://localhost:9083")
.enableHiveSupport()
.getOrCreate();
//Here I cast timestamp in my JSON to timestamp in hive, working good
Dataset<Row> testjson = spark.read().json("file:///root/test.json").withColumn("timestamp", new Column("timestamp").cast("timestamp"));
testjson.createOrReplaceTempView("testjson");
Dataset<Row> showAll = spark.sql("SELECT * FROM testjson");
showAll.write().mode("overwrite").saveAsTable("finaljson");
showAll.show();
spark.stop();
}
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark