Member since
07-18-2018
4
Posts
0
Kudos Received
0
Solutions
08-07-2018
01:37 PM
I'm using Java-Spark. I have the following table in Dataset object: <code> creationDate
15/06/2018 09:15:28
I make select to this column <code>Dataset<Row> ds = dataframe.select(new Column("creationDate").as("mydate").cast("date"));
And I write it with: <code>ds.write().mode(mode).save(hdfsDirectory);
Try also: <code>ds.write().option("dateFormat","dd/MM/yyyy HH:mm:ss").mode(mode).save(hdfsDirectory);
But When I'm looking on my table the column mydate is null. How can I write my date into my Hive table? I know the default date format should be dd-MM-yyyy but my text is with dd/MM/yyyy format and I can't change it. Any suggestions? Thanks.
... View more
Labels:
- Labels:
-
Apache Spark
07-26-2018
02:12 PM
That's exactly what I'm did, I just selected the columns with name like my ORC table and that's isn't write to the columns
... View more
07-19-2018
06:39 AM
How can I do it?
... View more
07-18-2018
07:58 AM
I have Hive table (STORED AS ORC) and JSON data. I'm write the JSON to my ORC table as follow: JavaRDD<String> jsonData = rdd.map(t -> t.value());//Get the json from RDD
Dataset<String> jsonSet = sparkSession.createDataset(JavaRDD.toRDD(jsonData), org.apache.spark.sql.encoders.STRING());
Dataset<Row> df = sparkSession.read.json(jsonSet);
Dataset<Row> dfSelect = df.select(cols);//cols - Column[] that contains the columns that I need from Json
dfSelect.write().format("orc").mode("append").save(path);
This code works fine and insert the data into my table but my problem is the data is not insert to the correct columns, If my ORC table looks like colA colB colC colD colE
------------------------------------------------
valubB valueD valueE
And I make select from json to columns ColB, ColD, ColE and make the insert the data inserted to the first 3 columns instead to ColB, ColD,ColE How can I solve it? I need to get the schema from the Json dataset? Thanks.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Spark