<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: org.apache.spark.SparkException: Task failed while writing rows. in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/org-apache-spark-SparkException-Task-failed-while-writing/m-p/98786#M12102</link>
    <description>&lt;P&gt;ORC is only supported in HiveContext, but here SQLContext is used.&lt;/P&gt;</description>
    <pubDate>Wed, 16 Dec 2015 04:52:33 GMT</pubDate>
    <dc:creator>zzhang</dc:creator>
    <dc:date>2015-12-16T04:52:33Z</dc:date>
    <item>
      <title>org.apache.spark.SparkException: Task failed while writing rows.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/org-apache-spark-SparkException-Task-failed-while-writing/m-p/98782#M12098</link>
      <description>&lt;P&gt;
	Hi,&lt;/P&gt;&lt;P&gt;
	I am using HDP2.3.2 with Spark 1.4.1 and trying to insert data in hive table using hive context.&lt;/P&gt;&lt;P&gt;Below is the sample code &lt;/P&gt;
&lt;PRE&gt;spark-shell   --master yarn-client --driver-memory 512m --executor-memory 512m
//Sample code 
import org.apache.spark.sql.SQLContext
import sqlContext.implicits._
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val people = sc.textFile("/user/spark/people.txt")
val schemaString = "name age"
import org.apache.spark.sql.Row;
import org.apache.spark.sql.types.{StructType,StructField,StringType};
val schema =
  StructType(
    schemaString.split(" ").map(fieldName =&amp;gt; StructField(fieldName, StringType, true)))
val rowRDD = people.map(_.split(",")).map(p =&amp;gt; Row(p(0), p(1).trim))
//Create hive context 
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
//Apply the schema to the 
val df = hiveContext.createDataFrame(rowRDD, schema);
val options = Map("path" -&amp;gt;  "hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/personhivetable")
df.write.format("org.apache.spark.sql.hive.orc.DefaultSource").options(options).saveAsTable("personhivetable")
&lt;/PRE&gt;&lt;P&gt;
	Getting below error :&lt;/P&gt;
&lt;PRE&gt;org.apache.spark.SparkException: Task failed while writing rows.
	at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$writeRows$1(commands.scala:191)
	at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$anonfun$insert$1.apply(commands.scala:160)
	at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$anonfun$insert$1.apply(commands.scala:160)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
	at org.apache.spark.scheduler.Task.run(Task.scala:70)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
	at $line30.$read$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$anonfun$2.apply(&amp;lt;console&amp;gt;:29)
	at $line30.$read$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$anonfun$2.apply(&amp;lt;console&amp;gt;:29)
	at scala.collection.Iterator$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$anon$11.next(Iterator.scala:328)
	at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$writeRows$1(commands.scala:182)
	... 8 more
&lt;/PRE&gt;&lt;P&gt;Is it configuration issue?&lt;/P&gt;&lt;P&gt;When I googled it I found out that Environment variable named HIVE_CONF_DIR should be there in spark-env.sh&lt;/P&gt;&lt;P&gt;Then I checked spark-env.sh in HDP2.3.2,I couldnt find the Environment variable named HIVE_CONF_DIR .&lt;/P&gt;&lt;P&gt;Do I need to add above mentioned variables to insert spark output data to hive tables.&lt;/P&gt;&lt;P&gt;Would really appreciate pointers. &lt;/P&gt;&lt;P&gt;
	Thanks, &lt;/P&gt;&lt;P&gt;
	Divya&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 13:46:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/org-apache-spark-SparkException-Task-failed-while-writing/m-p/98782#M12098</guid>
      <dc:creator>DivyaGehlot13</dc:creator>
      <dc:date>2015-12-11T13:46:56Z</dc:date>
    </item>
    <item>
      <title>Re: org.apache.spark.SparkException: Task failed while writing rows.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/org-apache-spark-SparkException-Task-failed-while-writing/m-p/98783#M12099</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/831/divyag.html" nodeid="831"&gt;@Divya Gehlot&lt;/A&gt;&lt;/P&gt;&lt;P&gt;If you want the table to be accessible from hive as well, you cannot use saveAsTable. If you use saveAsTable only spark sql will be able to use it.&lt;/P&gt;&lt;P&gt;You have two ways to create orc tables from spark (compatible with hive). I tested codes below with hdp 2.3.2 sandbox and spark 1.4.1&lt;/P&gt;&lt;P&gt;1- Saving orc file from spark and create table directly on hive, see this code:&lt;/P&gt;&lt;PRE&gt;spark-shell   --master yarn-client --driver-memory 512m --executor-memory 512m&lt;/PRE&gt;&lt;PRE&gt;import org.apache.spark.sql._
import org.apache.spark.sql.types._

val people = sc.textFile("/tmp/people.txt")
val schemaString = "name age"
val schema =
  StructType(
    schemaString.split(" ").map(fieldName =&amp;gt; StructField(fieldName, StringType, true)))
val rowRDD = people.map(_.split(",")).map(p =&amp;gt; Row(p(0), p(1).trim))

val df = sqlContext.createDataFrame(rowRDD, schema);

sqlContext.sql("drop table if exists personhivetable")

sqlContext.sql("create external table personhivetable (name string, age string) stored as orc location '/tmp/personhivetable/'")

df.write.format("orc").mode("overwrite").save("/tmp/personhivetable")

sqlContext.sql("show tables").collect().foreach(println);

sqlContext.sql("select * from personhivetable").collect().foreach(println);
&lt;/PRE&gt;&lt;P&gt;2- Registering your data frame as temporary table and performing a create table as select&lt;/P&gt;&lt;PRE&gt;spark-shell   --master yarn-client --driver-memory 512m --executor-memory 512m&lt;/PRE&gt;&lt;PRE&gt;import org.apache.spark.sql._
import org.apache.spark.sql.types._

val people = sc.textFile("/tmp/people.txt")
val schemaString = "name age"
val schema =
  StructType(
    schemaString.split(" ").map(fieldName =&amp;gt; StructField(fieldName, StringType, true)))
val rowRDD = people.map(_.split(",")).map(p =&amp;gt; Row(p(0), p(1).trim))


val df = sqlContext.createDataFrame(rowRDD, schema);

df.registerTempTable("personhivetable_tmp")

sqlContext.sql("drop table if exists personhivetable2")

sqlContext.sql("CREATE TABLE personhivetable2 STORED AS ORC AS SELECT * from personhivetable_tmp")

sqlContext.sql("show tables").collect().foreach(println);

sqlContext.sql("select * from personhivetable2").collect().foreach(println);

&lt;/PRE&gt;&lt;P&gt;Also, check this question with more discussion about orc + spark.&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="https://community.hortonworks.com/questions/4292/how-do-i-create-an-orc-hive-table-from-spark.html"&gt;https://community.hortonworks.com/questions/4292/how-do-i-create-an-orc-hive-table-from-spark.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 21:25:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/org-apache-spark-SparkException-Task-failed-while-writing/m-p/98783#M12099</guid>
      <dc:creator>gbraccialli3</dc:creator>
      <dc:date>2015-12-11T21:25:56Z</dc:date>
    </item>
    <item>
      <title>Re: org.apache.spark.SparkException: Task failed while writing rows.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/org-apache-spark-SparkException-Task-failed-while-writing/m-p/98784#M12100</link>
      <description>&lt;P&gt;Hi Guil,&lt;/P&gt;&lt;P&gt;What is the benefit of running Hive QLs in spark-shell.&lt;/P&gt;&lt;P&gt;Can't I change the hive.execution.engine to spark and do the querying?&lt;/P&gt;&lt;P&gt;thanks&lt;/P&gt;</description>
      <pubDate>Mon, 14 Dec 2015 00:49:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/org-apache-spark-SparkException-Task-failed-while-writing/m-p/98784#M12100</guid>
      <dc:creator>sachin_mca25</dc:creator>
      <dc:date>2015-12-14T00:49:13Z</dc:date>
    </item>
    <item>
      <title>Re: org.apache.spark.SparkException: Task failed while writing rows.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/org-apache-spark-SparkException-Task-failed-while-writing/m-p/98785#M12101</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/238/gbraccialli.html" nodeid="238"&gt;@Guilherme Braccialli&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Mon, 14 Dec 2015 00:51:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/org-apache-spark-SparkException-Task-failed-while-writing/m-p/98785#M12101</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2015-12-14T00:51:44Z</dc:date>
    </item>
    <item>
      <title>Re: org.apache.spark.SparkException: Task failed while writing rows.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/org-apache-spark-SparkException-Task-failed-while-writing/m-p/98786#M12102</link>
      <description>&lt;P&gt;ORC is only supported in HiveContext, but here SQLContext is used.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Dec 2015 04:52:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/org-apache-spark-SparkException-Task-failed-while-writing/m-p/98786#M12102</guid>
      <dc:creator>zzhang</dc:creator>
      <dc:date>2015-12-16T04:52:33Z</dc:date>
    </item>
    <item>
      <title>Re: org.apache.spark.SparkException: Task failed while writing rows.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/org-apache-spark-SparkException-Task-failed-while-writing/m-p/98787#M12103</link>
      <description>&lt;P&gt;Hive QL is more functionality rich than SparkSQL. When possible, HiveContext is recommended.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Dec 2015 04:53:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/org-apache-spark-SparkException-Task-failed-while-writing/m-p/98787#M12103</guid>
      <dc:creator>zzhang</dc:creator>
      <dc:date>2015-12-16T04:53:22Z</dc:date>
    </item>
    <item>
      <title>Re: org.apache.spark.SparkException: Task failed while writing rows.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/org-apache-spark-SparkException-Task-failed-while-writing/m-p/98788#M12104</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/824/sachin-mca25.html" nodeid="824"&gt;@sandeep agarwal&lt;/A&gt; &lt;/P&gt;&lt;P&gt;See this post about Spark vs Tez:&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="https://community.hortonworks.com/questions/5408/spark-vs-tez.html#comment-6248"&gt;https://community.hortonworks.com/questions/5408/spark-vs-tez.html#comment-6248&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Dec 2015 10:48:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/org-apache-spark-SparkException-Task-failed-while-writing/m-p/98788#M12104</guid>
      <dc:creator>gbraccialli3</dc:creator>
      <dc:date>2015-12-16T10:48:31Z</dc:date>
    </item>
  </channel>
</rss>

