Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Java Spark issues casting/converting struct to map from JSON data before insert to HIVE

Highlighted

Java Spark issues casting/converting struct to map from JSON data before insert to HIVE

New Contributor

I am loading a JSON file with spark in order to insert it into Hive, this works very well.

Dataset<Row> testjson = spark.read().json("file:///root/test.json").withColumn("timestamp", new Column("timestamp").cast("timestamp")); 

My JSON look like this (I simplified it for the purpose of readability):

"name": "joe", "age": 30, "hair": "brown", "knowledge": {"java": "average", "php": "good"},

and so on...

As you can see this JSON has something that is by default inserted as a struct:

"knowledge": {"java": "average", "php": "good"} 

Now to the problem: I want the knowledge part of my JSON to be inserted as map<string, string> to hive instead of as it is now: struct<java:string,php:string>). I thought I can do like this

.withColumn("knowledge", new Column("knowledge").cast("Map")); //Map or Map<String, String> or equal but this is not working as struct cannot be casted to map.

This has been disturbing me for a while now and I cannot find a solution to it. I would therefore appreciate help a lot!

Please find the whole code:

  public static void main(String[] args) {<br>
    SparkSession spark = SparkSession
            .builder()
            .appName("Test Spark")
            .master("local[*]")
            .config("hive.metastore.uris", "thrift://localhost:9083")
            .enableHiveSupport()
            .getOrCreate();

    //Here I cast timestamp in my JSON to timestamp in hive, working good
    Dataset<Row> testjson = spark.read().json("file:///root/test.json").withColumn("timestamp", new Column("timestamp").cast("timestamp"));

    testjson.createOrReplaceTempView("testjson");
    Dataset<Row> showAll = spark.sql("SELECT * FROM testjson");
    showAll.write().mode("overwrite").saveAsTable("finaljson");

    showAll.show();
    spark.stop();
}
1 REPLY 1

Re: Java Spark issues casting/converting struct to map from JSON data before insert to HIVE

New Contributor

I really have the same problem. Does anyone know the solution?

Don't have an account?
Coming from Hortonworks? Activate your account here