<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark throws &amp;quot;Invalid Sync&amp;quot; Error when trying to Read an Avro File in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-throws-quot-Invalid-Sync-quot-Error-when-trying-to/m-p/236835#M85193</link>
    <description>&lt;P&gt; &lt;A rel="user" href="https://community.cloudera.com/users/16912/manikandanjeyabal029.html" nodeid="16912"&gt;@Manikandan Jeyabal&lt;/A&gt; Please review this one and let me know if that helps. Perhaps the way the avro is being written is actually causing the problem. &lt;/P&gt;&lt;P&gt;&lt;A href="http://mail-archives.apache.org/mod_mbox/avro-user/201105.mbox/%3CCA03B5F3.5891%25Matt.Pouttu-Clarke@icrossing.com%3E" target="_blank"&gt;http://mail-archives.apache.org/mod_mbox/avro-user/201105.mbox/%3CCA03B5F3.5891%25Matt.Pouttu-Clarke@icrossing.com%3E&lt;/A&gt;&lt;/P&gt;&lt;P&gt;HTH&lt;/P&gt;</description>
    <pubDate>Tue, 27 Nov 2018 03:04:58 GMT</pubDate>
    <dc:creator>falbani</dc:creator>
    <dc:date>2018-11-27T03:04:58Z</dc:date>
    <item>
      <title>Spark throws "Invalid Sync" Error when trying to Read an Avro File</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-throws-quot-Invalid-Sync-quot-Error-when-trying-to/m-p/236834#M85192</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;I'm trying to Write 430000 records into Avro file, im using the following Avro Writer in my dependency&lt;/P&gt;&lt;P&gt;&amp;lt;&lt;STRONG&gt;dependency&lt;/STRONG&gt;&amp;gt;&lt;BR /&gt;
 &amp;lt;&lt;STRONG&gt;groupId&lt;/STRONG&gt;&amp;gt;org.apache.parquet&amp;lt;/&lt;STRONG&gt;groupId&lt;/STRONG&gt;&amp;gt;&lt;BR /&gt;
 &amp;lt;&lt;STRONG&gt;artifactId&lt;/STRONG&gt;&amp;gt;parquet-avro&amp;lt;/&lt;STRONG&gt;artifactId&lt;/STRONG&gt;&amp;gt;&lt;BR /&gt;
 &amp;lt;&lt;STRONG&gt;version&lt;/STRONG&gt;&amp;gt;1.9.0&amp;lt;/&lt;STRONG&gt;version&lt;/STRONG&gt;&amp;gt;&lt;BR /&gt;
&amp;lt;/&lt;STRONG&gt;dependency&lt;/STRONG&gt;&amp;gt;&lt;/P&gt;&lt;P&gt;the file writing successfully completed, but when i try to read the data from Spark using any avro supporting library like&lt;/P&gt;&lt;P&gt;Databrics avro, SparkAvro and Apache Avro im getting below Error: (But one important thing is still 200000 i can read my data without the error)&lt;/P&gt;&lt;P&gt;18/11/26 15:48:23 ERROR TaskSetManager: Task 3 in stage 0.0 failed 1 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 1 times, most recent failure: Lost task 3.0 in stage 0.0 (TID 3, localhost, executor driver): org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!
        at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
        at org.apache.spark.sql.avro.AvroFileFormat$anonfun$buildReader$1$anon$1.hasNext(AvroFileFormat.scala:202)
        at scala.collection.Iterator$anon$11.hasNext(Iterator.scala:409)
        at org.apache.spark.sql.execution.datasources.FileScanRDD$anon$1.hasNext(FileScanRDD.scala:101)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(Unknown Source)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$anonfun$11$anon$1.hasNext(WholeStageCodegenExec.scala:619)
        at scala.collection.Iterator$anon$11.hasNext(Iterator.scala:409)
        at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
        at org.apache.spark.scheduler.Task.run(Task.scala:121)
        at org.apache.spark.executor.Executor$TaskRunner$anonfun$10.apply(Executor.scala:402)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Invalid sync!
        at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:297)
        at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:198)
        ... 18 more
Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$failJobAndIndependentStages(DAGScheduler.scala:1887)
  at org.apache.spark.scheduler.DAGScheduler$anonfun$abortStage$1.apply(DAGScheduler.scala:1875)
  at org.apache.spark.scheduler.DAGScheduler$anonfun$abortStage$1.apply(DAGScheduler.scala:1874)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1874)
  at org.apache.spark.scheduler.DAGScheduler$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
  at org.apache.spark.scheduler.DAGScheduler$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
  at scala.Option.foreach(Option.scala:257)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2108)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2057)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2046)
  at org.apache.spark.util.EventLoop$anon$1.run(EventLoop.scala:49)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
  at org.apache.spark.rdd.RDD$anonfun$collect$1.apply(RDD.scala:945)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
  at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
  at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:299)
  at org.apache.spark.sql.Dataset$anonfun$count$1.apply(Dataset.scala:2831)
  at org.apache.spark.sql.Dataset$anonfun$count$1.apply(Dataset.scala:2830)
  at org.apache.spark.sql.Dataset$anonfun$53.apply(Dataset.scala:3365)
  at org.apache.spark.sql.execution.SQLExecution$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3364)
  at org.apache.spark.sql.Dataset.count(Dataset.scala:2830)
  ... 49 elided
Caused by: org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!
  at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
  at org.apache.spark.sql.avro.AvroFileFormat$anonfun$buildReader$1$anon$1.hasNext(AvroFileFormat.scala:202)
  at scala.collection.Iterator$anon$11.hasNext(Iterator.scala:409)
  at org.apache.spark.sql.execution.datasources.FileScanRDD$anon$1.hasNext(FileScanRDD.scala:101)
  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(Unknown Source)
  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
  at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
  at org.apache.spark.sql.execution.WholeStageCodegenExec$anonfun$11$anon$1.hasNext(WholeStageCodegenExec.scala:619)
  at scala.collection.Iterator$anon$11.hasNext(Iterator.scala:409)
  at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
  at org.apache.spark.scheduler.Task.run(Task.scala:121)
  at org.apache.spark.executor.Executor$TaskRunner$anonfun$10.apply(Executor.scala:402)
  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Invalid sync!
  at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:297)
  at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:198)
  ... 18 more&lt;/P&gt;&lt;P&gt;Even from Hive i can run select * from statement but i cannot perform count(*) operation&lt;/P&gt;&lt;P&gt;Help me out on this issue&lt;/P&gt;&lt;P&gt;Cheers&lt;/P&gt;&lt;P&gt;MJ&lt;/P&gt;&lt;P&gt;(+91) - 9688 514 443&lt;/P&gt;</description>
      <pubDate>Mon, 26 Nov 2018 18:20:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-throws-quot-Invalid-Sync-quot-Error-when-trying-to/m-p/236834#M85192</guid>
      <dc:creator>manikandanjeyab</dc:creator>
      <dc:date>2018-11-26T18:20:10Z</dc:date>
    </item>
    <item>
      <title>Re: Spark throws "Invalid Sync" Error when trying to Read an Avro File</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-throws-quot-Invalid-Sync-quot-Error-when-trying-to/m-p/236835#M85193</link>
      <description>&lt;P&gt; &lt;A rel="user" href="https://community.cloudera.com/users/16912/manikandanjeyabal029.html" nodeid="16912"&gt;@Manikandan Jeyabal&lt;/A&gt; Please review this one and let me know if that helps. Perhaps the way the avro is being written is actually causing the problem. &lt;/P&gt;&lt;P&gt;&lt;A href="http://mail-archives.apache.org/mod_mbox/avro-user/201105.mbox/%3CCA03B5F3.5891%25Matt.Pouttu-Clarke@icrossing.com%3E" target="_blank"&gt;http://mail-archives.apache.org/mod_mbox/avro-user/201105.mbox/%3CCA03B5F3.5891%25Matt.Pouttu-Clarke@icrossing.com%3E&lt;/A&gt;&lt;/P&gt;&lt;P&gt;HTH&lt;/P&gt;</description>
      <pubDate>Tue, 27 Nov 2018 03:04:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-throws-quot-Invalid-Sync-quot-Error-when-trying-to/m-p/236835#M85193</guid>
      <dc:creator>falbani</dc:creator>
      <dc:date>2018-11-27T03:04:58Z</dc:date>
    </item>
    <item>
      <title>Re: Spark throws "Invalid Sync" Error when trying to Read an Avro File</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-throws-quot-Invalid-Sync-quot-Error-when-trying-to/m-p/236836#M85194</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/11048/falbani.html" nodeid="11048"&gt;@Felix Albani&lt;/A&gt;, Tanx for your response.&lt;/P&gt;&lt;P&gt;I reffered this site i'm thinking my issues is related to the story.&lt;/P&gt;&lt;P&gt;Cheers,&lt;/P&gt;&lt;P&gt;MJ&lt;/P&gt;</description>
      <pubDate>Tue, 27 Nov 2018 16:10:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-throws-quot-Invalid-Sync-quot-Error-when-trying-to/m-p/236836#M85194</guid>
      <dc:creator>manikandanjeyab</dc:creator>
      <dc:date>2018-11-27T16:10:11Z</dc:date>
    </item>
  </channel>
</rss>

