<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark Conflicting partition schema parquet files in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Conflicting-partition-schema-parquet-files/m-p/27698#M6063</link>
    <description>&lt;P&gt;Thanks for sharing your solution!&amp;nbsp;&lt;img id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://community.cloudera.com/i/smilies/16x16_smiley-happy.png" alt="Smiley Happy" title="Smiley Happy" /&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 20 May 2015 12:35:46 GMT</pubDate>
    <dc:creator>cjervis</dc:creator>
    <dc:date>2015-05-20T12:35:46Z</dc:date>
    <item>
      <title>Spark Conflicting partition schema parquet files</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Conflicting-partition-schema-parquet-files/m-p/27690#M6061</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am using Spark 1.3.1 and my data is stored in parquet format, the parquet files have been created by Impala.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;after i added a parition "server" to my partition schema &amp;nbsp;(it was year,month,day and now is&lt;SPAN&gt; year,month,day,server&lt;/SPAN&gt;&amp;nbsp;) and now Spark is having trouwble reading the data.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I get the following error:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;java.lang.AssertionError: assertion failed: Conflicting partition column names detected:&lt;BR /&gt;ArrayBuffer(year, month, day)&lt;BR /&gt;ArrayBuffer(year, month, day, server)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Does spark keep some data in cache/temp dirs with the old schema? which is causing a mismatch?&lt;/P&gt;&lt;P&gt;Any ideas on howto fix&amp;nbsp;his issue?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;directory layout sample:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;drwxr-xr-x - impala hive 0 2015-05-19 14:02 /user/hive/queries/year=2015/month=05/day=17&lt;BR /&gt;drwxr-xr-x - impala hive 0 2015-05-19 14:02 /user/hive/queries/year=2015/month=05/day=17/server=ns1&lt;BR /&gt;drwxr-xr-x - impala hive 0 2015-05-19 14:02 /user/hive/queries/year=2015/month=05/day=18&lt;BR /&gt;drwxr-xr-x - impala hive 0 2015-05-19 14:02 /user/hive/queries/year=2015/month=05/day=18/server=ns1&lt;BR /&gt;drwxr-xr-x - impala hive 0 2015-05-20 09:01 /user/hive/queries/year=2015/month=05/day=19&lt;BR /&gt;drwxr-xr-x - impala hive 0 2015-05-20 09:01 /user/hive/queries/year=2015/month=05/day=19/server=ns1&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;complete stacktrace:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;java.lang.AssertionError: assertion failed: Conflicting partition column names detected:&lt;BR /&gt;ArrayBuffer(year, month, day)&lt;BR /&gt;ArrayBuffer(year, month, day, server)&lt;BR /&gt;at scala.Predef$.assert(Predef.scala:179)&lt;BR /&gt;at org.apache.spark.sql.parquet.ParquetRelation2$.resolvePartitions(newParquet.scala:933)&lt;BR /&gt;at org.apache.spark.sql.parquet.ParquetRelation2$.parsePartitions(newParquet.scala:851)&lt;BR /&gt;at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$7.apply(newParquet.scala:311)&lt;BR /&gt;at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$7.apply(newParquet.scala:303)&lt;BR /&gt;at scala.Option.getOrElse(Option.scala:120)&lt;BR /&gt;at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:303)&lt;BR /&gt;at org.apache.spark.sql.parquet.ParquetRelation2.&amp;lt;init&amp;gt;(newParquet.scala:391)&lt;BR /&gt;at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:540)&lt;BR /&gt;at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:19)&lt;BR /&gt;at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:24)&lt;BR /&gt;at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:26)&lt;BR /&gt;at $iwC$$iwC$$iwC$$iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:28)&lt;BR /&gt;at $iwC$$iwC$$iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:30)&lt;BR /&gt;at $iwC$$iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:32)&lt;BR /&gt;at $iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:34)&lt;BR /&gt;at $iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:36)&lt;BR /&gt;at &amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:38)&lt;BR /&gt;at .&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:42)&lt;BR /&gt;at .&amp;lt;clinit&amp;gt;(&amp;lt;console&amp;gt;)&lt;BR /&gt;at .&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:7)&lt;BR /&gt;at .&amp;lt;clinit&amp;gt;(&amp;lt;console&amp;gt;)&lt;BR /&gt;at $print(&amp;lt;console&amp;gt;)&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)&lt;BR /&gt;at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)&lt;BR /&gt;at java.lang.reflect.Method.invoke(Method.java:606)&lt;BR /&gt;at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)&lt;BR /&gt;at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)&lt;BR /&gt;at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)&lt;BR /&gt;at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)&lt;BR /&gt;at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:656)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:664)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:669)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:996)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)&lt;BR /&gt;at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:944)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1058)&lt;BR /&gt;at org.apache.spark.repl.Main$.main(Main.scala:31)&lt;BR /&gt;at org.apache.spark.repl.Main.main(Main.scala)&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)&lt;BR /&gt;at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)&lt;BR /&gt;at java.lang.reflect.Method.invoke(Method.java:606)&lt;BR /&gt;at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)&lt;BR /&gt;at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)&lt;BR /&gt;at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)&lt;BR /&gt;at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)&lt;BR /&gt;at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:29:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Conflicting-partition-schema-parquet-files/m-p/27690#M6061</guid>
      <dc:creator>crcerror</dc:creator>
      <dc:date>2022-09-16T09:29:47Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Conflicting partition schema parquet files</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Conflicting-partition-schema-parquet-files/m-p/27691#M6062</link>
      <description>&lt;P&gt;Found the problem.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There were some "old style" parquet files in a hidden directory named .impala_insert_staging&lt;/P&gt;&lt;P&gt;After removing these directories Spark could load the data.&lt;/P&gt;&lt;P&gt;Impala will recreate the table when i do a new insert into the table. why there were some parquet files left in that dir is not clear to me. it was some pretty old data, so maybe something went wrong during an insert a while ago.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 20 May 2015 09:37:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Conflicting-partition-schema-parquet-files/m-p/27691#M6062</guid>
      <dc:creator>crcerror</dc:creator>
      <dc:date>2015-05-20T09:37:37Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Conflicting partition schema parquet files</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Conflicting-partition-schema-parquet-files/m-p/27698#M6063</link>
      <description>&lt;P&gt;Thanks for sharing your solution!&amp;nbsp;&lt;img id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://community.cloudera.com/i/smilies/16x16_smiley-happy.png" alt="Smiley Happy" title="Smiley Happy" /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 20 May 2015 12:35:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Conflicting-partition-schema-parquet-files/m-p/27698#M6063</guid>
      <dc:creator>cjervis</dc:creator>
      <dc:date>2015-05-20T12:35:46Z</dc:date>
    </item>
  </channel>
</rss>

