<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: save spark-csv output to hive in HDP 2.3.2 in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/save-spark-csv-output-to-hive-in-HDP-2-3-2/m-p/99067#M12362</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/831/divyag.html" nodeid="831"&gt;@Divya Gehlot&lt;/A&gt; Could you share the code here ? I would love to test it in my env.&lt;/P&gt;</description>
    <pubDate>Mon, 14 Dec 2015 18:57:28 GMT</pubDate>
    <dc:creator>nsabharwal</dc:creator>
    <dc:date>2015-12-14T18:57:28Z</dc:date>
    <item>
      <title>save spark-csv output to hive in HDP 2.3.2</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/save-spark-csv-output-to-hive-in-HDP-2-3-2/m-p/99066#M12361</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am new bee to spark and using spark 1.4.1 &lt;/P&gt;&lt;P&gt;How can I save the output to hive as external table .&lt;/P&gt;&lt;P&gt;For instance ,I have a csv file which I am parsing through spark -csv packages which results me a DataFrame.&lt;/P&gt;&lt;P&gt;Now how do I save this dataframe as hive external table using hivecontext.&lt;/P&gt;&lt;P&gt;Would really appreciate your pointers/guidance.&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Divya &lt;/P&gt;</description>
      <pubDate>Mon, 14 Dec 2015 17:31:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/save-spark-csv-output-to-hive-in-HDP-2-3-2/m-p/99066#M12361</guid>
      <dc:creator>DivyaGehlot13</dc:creator>
      <dc:date>2015-12-14T17:31:30Z</dc:date>
    </item>
    <item>
      <title>Re: save spark-csv output to hive in HDP 2.3.2</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/save-spark-csv-output-to-hive-in-HDP-2-3-2/m-p/99067#M12362</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/831/divyag.html" nodeid="831"&gt;@Divya Gehlot&lt;/A&gt; Could you share the code here ? I would love to test it in my env.&lt;/P&gt;</description>
      <pubDate>Mon, 14 Dec 2015 18:57:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/save-spark-csv-output-to-hive-in-HDP-2-3-2/m-p/99067#M12362</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2015-12-14T18:57:28Z</dc:date>
    </item>
    <item>
      <title>Re: save spark-csv output to hive in HDP 2.3.2</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/save-spark-csv-output-to-hive-in-HDP-2-3-2/m-p/99068#M12363</link>
      <description>&lt;P&gt;Hey Divya,&lt;/P&gt;&lt;P&gt;There are a couple of ways to do this. The main flow is this however.&lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;Load/parse the data into dataframes. It seems like you have already done this but since you didn't pass along that snippet I'm just going to make something up. You did mention you were using the spark-csv package so the example is doing the same.&lt;/LI&gt;&lt;/UL&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;PRE&gt;val sqlContext = new SQLContext(sc)
val df = sqlContext.read
    .format("com.databricks.spark.csv")
    .option("header", "true") // Use first line of all files as header
    .option("inferSchema", "true") // Automatically infer data types
    .load("cars.csv");&lt;/PRE&gt;
&lt;/LI&gt;&lt;/OL&gt;
&lt;UL&gt;
&lt;LI&gt;Write the dataframe data to the HDFS location where you plan to create the Hive external table or the directory for an existing Hive table.&lt;PRE&gt;df.select("year", "model").write()
    .format("com.databricks.spark.csv")
    .option("header", "true")
    .save("hdfs://hdfs_location/newcars.csv");&lt;/PRE&gt;
&lt;/LI&gt;
&lt;LI&gt;Create the external Hive table by creating a HiveSQLContext&lt;PRE&gt;val hiveSQLContext = new org.apache.spark.sql.hive.HiveContext(sc)

//Several other options can be passed in here for other formats, partitions, etc
hiveSQLContext.sql("CREATE EXTERNAL TABLE cars(year INT, model STRING) STORED AS TEXTFILE LOCATION 'hdfs_location'");&lt;/PRE&gt;
&lt;/LI&gt;
&lt;LI&gt;Query the Hive table with whatever query you wish&lt;PRE&gt;// Queries are expressed in HiveQL
hiveSQLContext.sql("SELECT * FROM cars").collect().foreach(println)&lt;/PRE&gt;
&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Mon, 14 Dec 2015 21:02:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/save-spark-csv-output-to-hive-in-HDP-2-3-2/m-p/99068#M12363</guid>
      <dc:creator>jdyer</dc:creator>
      <dc:date>2015-12-14T21:02:29Z</dc:date>
    </item>
    <item>
      <title>Re: save spark-csv output to hive in HDP 2.3.2</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/save-spark-csv-output-to-hive-in-HDP-2-3-2/m-p/99069#M12364</link>
      <description>&lt;P&gt;
	Hi
&lt;/P&gt;
&lt;P&gt;
	@
	&lt;A href="https://community.hortonworks.com/users/166/jdyer.html"&gt;Neeraj Sabharwal &lt;/A&gt;
&lt;/P&gt;
&lt;P&gt;
	&lt;A href="https://community.hortonworks.com/users/166/jdyer.html"&gt;	&lt;/A&gt;
&lt;/P&gt;
&lt;P&gt;
	&lt;A href="https://community.hortonworks.com/users/166/jdyer.html"&gt;@Jeremy Dyer&lt;/A&gt;
&lt;/P&gt;
&lt;UL&gt;
	&lt;LI&gt;
	&lt;STRONG&gt;Processing and inserting data in hive without schema&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;
	&lt;A href="https://community.hortonworks.com/users/166/jdyer.html"&gt;&lt;/A&gt;
&lt;/P&gt;
&lt;PRE&gt;
//Processing and inserting data in hive without schema 
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.orc._
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val df = hiveContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/tmp/cars.csv")
val selectedData = df.select("year", "model")
selectedData.write.format("orc").option("header", "true").save("/tmp/newcars_orc_cust17")
//permission issues as user hive 
// org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.security.AccessControlException: Permission denied: user=hive, access=WRITE, inode="/tmp/newcars_orc_cust17":hdfs:hdfs:drwxr-xr-x
//Updated /tmp/newcars_orc_cust17 directory permissions
hiveContext.sql("create external table newcars_orc_ext_cust17(year string,model string) stored as orc location '/tmp/newcars_orc_cust17'")
hiveContext.sql("show tables").collect().foreach(println)
[cars_orc_ext,false]
[cars_orc_ext1,false]
[cars_orc_exte,false]
[newcars_orc_ext_cust17,false]
[sample_07,false]
[sample_08,false]
hiveContext.sql("select * from newcars_orc_ext_cust17").collect().foreach(println)
ook 1.459321 s
[2012,S]
[1997,E350]
[2015,Volt]
&lt;/PRE&gt;
&lt;P&gt;
Hive console
&lt;/P&gt;
&lt;PRE&gt;
hive&amp;gt; show tables ;
OK
cars_orc_ext
cars_orc_ext1
cars_orc_exte
newcars_orc_ext_cust17
sample_07
sample_08
Time taken: 12.185 seconds, Fetched: 6 row(s)
hive&amp;gt; select * from newcars_orc_ext_cust17 ;
OK
2012    S
1997    E350
2015    Volt
Time taken: 48.922 seconds, Fetched: 3 row(s)
&lt;/PRE&gt;
&lt;P&gt;
		&lt;A href="https://community.hortonworks.com/users/166/jdyer.html"&gt;&lt;/A&gt;
&lt;/P&gt;
&lt;P&gt;
		Now When I try the same code by defining the custom schema and executing it Getting below errors :
&lt;/P&gt;
&lt;UL&gt;
	&lt;LI&gt;
	&lt;STRONG&gt;Processing and inserting data in hive with custom schema&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;PRE&gt;
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.orc._
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType};
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val customSchema = StructType( StructField("year", IntegerType, true),StructField("make", StringType, true),StructField("model", StringType, true),StructField("comment", StringType, true),StructField("blank", StringType, true))
&amp;lt;br&amp;gt;scala&amp;gt; val customSchema = StructType( StructField("year", IntegerType, true),StructField("make", StringType, true),StructField("model", StringType, true),StructField("comment", StringType, true),StructField("blank", StringType, true))
&amp;lt;console&amp;gt;:24: error: overloaded method value apply with alternatives:
  (fields: Array[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType &amp;lt;and&amp;gt;
  (fields: java.util.List[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType &amp;lt;and&amp;gt;
  (fields: Seq[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType
 cannot be applied to (org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField)
  val customSchema = StructType( StructField("year", IntegerType, true),StructField("make", StringType, true),StructField("model", StringType, true),StructField("comment", StringType, true),StructField("blank", StringType, true))
&lt;/PRE&gt;
&lt;P&gt;
	&lt;A href="https://community.hortonworks.com/users/166/jdyer.html"&gt;&lt;/A&gt;
&lt;/P&gt;
&lt;P&gt;
	&lt;A href="https://community.hortonworks.com/users/140/nsabharwal.html"&gt;&lt;/A&gt;
&lt;/P&gt;
&lt;P&gt;
Any help/pointers appreciated
&lt;/P&gt;
&lt;P&gt;
	Thanks
&lt;/P&gt;</description>
      <pubDate>Thu, 17 Dec 2015 13:01:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/save-spark-csv-output-to-hive-in-HDP-2-3-2/m-p/99069#M12364</guid>
      <dc:creator>DivyaGehlot13</dc:creator>
      <dc:date>2015-12-17T13:01:13Z</dc:date>
    </item>
    <item>
      <title>Re: save spark-csv output to hive in HDP 2.3.2</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/save-spark-csv-output-to-hive-in-HDP-2-3-2/m-p/99070#M12365</link>
      <description>&lt;P&gt;Thanks for the explanation.&lt;/P&gt;</description>
      <pubDate>Sun, 31 Jul 2016 08:44:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/save-spark-csv-output-to-hive-in-HDP-2-3-2/m-p/99070#M12365</guid>
      <dc:creator>subhrajit</dc:creator>
      <dc:date>2016-07-31T08:44:30Z</dc:date>
    </item>
    <item>
      <title>Re: save spark-csv output to hive in HDP 2.3.2</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/save-spark-csv-output-to-hive-in-HDP-2-3-2/m-p/99071#M12366</link>
      <description>&lt;P&gt;You have to do a sequence of StructTypes, a java list of structTypes, or a Sequence of struct Types.&lt;/P&gt;&lt;P&gt;You could have passed in a Sequence of Structfields.  Thats one the constructors (fields:Seq[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType&lt;/P&gt;&lt;P&gt;So...&lt;/P&gt;&lt;P&gt;val customSchema =StructType(Seq(StructField("year",IntegerType,true),StructField("make",StringType,true),StructField("model",StringType,true),StructField("comment",StringType,true),StructField("blank",StringType,true)))&lt;/P&gt;</description>
      <pubDate>Mon, 12 Feb 2018 01:07:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/save-spark-csv-output-to-hive-in-HDP-2-3-2/m-p/99071#M12366</guid>
      <dc:creator>MidwestMike</dc:creator>
      <dc:date>2018-02-12T01:07:37Z</dc:date>
    </item>
  </channel>
</rss>

