<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Help with spark partition syntax (scala) in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Help-with-spark-partition-syntax-scala/m-p/190233#M152322</link>
    <description>&lt;P&gt;
	Thanks &lt;A rel="user" href="https://community.cloudera.com/users/24173/hmatta.html" nodeid="24173"&gt;@hmatta&lt;/A&gt;&lt;/P&gt;&lt;PRE&gt;Printing schema for sqlDFProdDedup:
root
 |-- time_of_event_day: date (nullable = true)
 |-- endpoint_id: integer (nullable = true)
 ...
 |-- time_of_event: integer (nullable = true)
 ...
 |-- source_file_name: string (nullable = true)
Printing schema for deviceData:
root
...
 |-- endpoint_id: integer (nullable = true)
 |-- source_file_name: string (nullable = true)
 ...
 |-- start_dt_unix: long (nullable = true)
 |-- end_dt_unix: long (nullable = true)
Printing schema for incrementalKeyed (result of joining 2 sets above):
root
 |-- source_file_name: string (nullable = true)
 |-- ingest_timestamp: timestamp (nullable = false)
 ...
 |-- endpoint_id: integer (nullable = true)
 ...
 |-- time_of_event: integer (nullable = true)
...
 |-- time_of_event_day: date (nullable = true)
&lt;/PRE&gt;</description>
    <pubDate>Thu, 12 Jul 2018 17:29:36 GMT</pubDate>
    <dc:creator>zack_riesland</dc:creator>
    <dc:date>2018-07-12T17:29:36Z</dc:date>
    <item>
      <title>Help with spark partition syntax (scala)</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Help-with-spark-partition-syntax-scala/m-p/190231#M152320</link>
      <description>&lt;P&gt;
	I have a hive table (in the glue metastore in AWS) like this:
&lt;/P&gt;
&lt;PRE&gt;
  CREATE EXTERNAL TABLE `events_keyed`(
  `source_file_name` string, 
  `ingest_timestamp` timestamp, 
   ...
  `time_of_event` int
  ...)
PARTITIONED BY ( 
  `time_of_event_day` date)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'my_location'
TBLPROPERTIES (
  'PARQUET.COMPRESSION'='SNAPPY', 
  'transient_lastDdlTime'='1531187782')
&lt;/PRE&gt;
&lt;P&gt;
	I want to append data to it from spark:
&lt;/P&gt;
&lt;PRE&gt;
val deviceData = hiveContext.table(deviceDataDBName + "." + deviceDataTableName)
val incrementalKeyed = sqlDFProdDedup.join(broadcast(deviceData),
    $"prod_clean.endpoint_id" === $"$deviceDataTableName.endpoint_id"
    &amp;amp;&amp;amp; $"prod_clean.time_of_event" &amp;gt;= $"$deviceDataTableName.start_dt_unix"
      &amp;amp;&amp;amp; $"prod_clean.time_of_event" &amp;lt;= coalesce($"$deviceDataTableName.end_dt_unix"),
"inner")
.select(
    $"prod_clean.source_file_name",
    $"prod_clean.ingest_timestamp",
    ...
    $"prod_clean.time_of_event",
    ...
    $"prod_clean.time_of_event_day"
)
// this show good data:
incrementalKeyed.show(20, false)
incrementalKeyed.repartition($"time_of_event_day")
  .write
  .partitionBy("time_of_event_day")
  .format("hive")
  .mode("append")
  .saveAsTable(outputDBName + "." + outputTableName + "_keyed")
&lt;/PRE&gt;

But this gives me a failure:

Exception encountered reading prod data:
org.apache.spark.SparkException: Requested partitioning does not match the events_keyed table:
Requested partitions:
Table partitions: time_of_event_day

What am I doing wrong? How can I accomplish the append operation I'm trying to get?</description>
      <pubDate>Thu, 12 Jul 2018 01:38:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Help-with-spark-partition-syntax-scala/m-p/190231#M152320</guid>
      <dc:creator>zack_riesland</dc:creator>
      <dc:date>2018-07-12T01:38:30Z</dc:date>
    </item>
    <item>
      <title>Re: Help with spark partition syntax (scala)</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Help-with-spark-partition-syntax-scala/m-p/190232#M152321</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2593/zackriesland.html" nodeid="2593"&gt;@Zack Riesland&lt;/A&gt;  Can you provide schema of sqlDFProdDedup and deviceData dataframes ?&lt;/P&gt;</description>
      <pubDate>Thu, 12 Jul 2018 16:23:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Help-with-spark-partition-syntax-scala/m-p/190232#M152321</guid>
      <dc:creator>hmatta</dc:creator>
      <dc:date>2018-07-12T16:23:39Z</dc:date>
    </item>
    <item>
      <title>Re: Help with spark partition syntax (scala)</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Help-with-spark-partition-syntax-scala/m-p/190233#M152322</link>
      <description>&lt;P&gt;
	Thanks &lt;A rel="user" href="https://community.cloudera.com/users/24173/hmatta.html" nodeid="24173"&gt;@hmatta&lt;/A&gt;&lt;/P&gt;&lt;PRE&gt;Printing schema for sqlDFProdDedup:
root
 |-- time_of_event_day: date (nullable = true)
 |-- endpoint_id: integer (nullable = true)
 ...
 |-- time_of_event: integer (nullable = true)
 ...
 |-- source_file_name: string (nullable = true)
Printing schema for deviceData:
root
...
 |-- endpoint_id: integer (nullable = true)
 |-- source_file_name: string (nullable = true)
 ...
 |-- start_dt_unix: long (nullable = true)
 |-- end_dt_unix: long (nullable = true)
Printing schema for incrementalKeyed (result of joining 2 sets above):
root
 |-- source_file_name: string (nullable = true)
 |-- ingest_timestamp: timestamp (nullable = false)
 ...
 |-- endpoint_id: integer (nullable = true)
 ...
 |-- time_of_event: integer (nullable = true)
...
 |-- time_of_event_day: date (nullable = true)
&lt;/PRE&gt;</description>
      <pubDate>Thu, 12 Jul 2018 17:29:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Help-with-spark-partition-syntax-scala/m-p/190233#M152322</guid>
      <dc:creator>zack_riesland</dc:creator>
      <dc:date>2018-07-12T17:29:36Z</dc:date>
    </item>
    <item>
      <title>Re: Help with spark partition syntax (scala)</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Help-with-spark-partition-syntax-scala/m-p/190234#M152323</link>
      <description>&lt;P&gt;I was able to get this to work by using the insertInto() function, rather than the saveAsTable() function.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Jul 2018 20:58:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Help-with-spark-partition-syntax-scala/m-p/190234#M152323</guid>
      <dc:creator>zack_riesland</dc:creator>
      <dc:date>2018-07-12T20:58:39Z</dc:date>
    </item>
  </channel>
</rss>

