<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question create a parquet table in Hive from a dataframe in Scala, in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/create-a-parquet-table-in-Hive-from-a-dataframe-in-Scala/m-p/118106#M80889</link>
    <description>&lt;P&gt;1) Read Data from a file in Hadoop to a DataFrame in Spark in Scala&lt;/P&gt;&lt;P&gt;//sc -- SparkContext&lt;/P&gt;&lt;P&gt;      val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)&lt;/P&gt;&lt;P&gt;     var hadoopFileDataFrame =hiveContext.read.format("com.databricks.spark.csv").load(filePath)&lt;/P&gt;&lt;P&gt;2) Using Dataframe schema , create a table in Hive in Parquet format and load the data from dataframe to Hive Table.
 &lt;/P&gt;&lt;P style="margin-left: 40px;"&gt;&lt;STRONG&gt;Issue 1 : &lt;/STRONG&gt;Dependency added in pom.xml for parquet-hive-bundle-1.6.0.jar .&lt;/P&gt;&lt;P style="margin-left: 40px;"&gt;Using following code:&lt;/P&gt;&lt;P style="margin-left: 40px;"&gt;var query = "CREATE TABLE Test(EMP_ID string,Organisation string,Org_Skill string,EMP_Name string)ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat' TBLPROPERTIES ('PARQUET.COMPRESS'='SNAPPY')"&lt;/P&gt;&lt;P style="margin-left: 40px;"&gt;  val dataFrame = hiveContext.sql(query)&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;      The code hangs although the table is created but unable to fire select query on the table.
&lt;STRONG&gt;     Issue 2: 
        &lt;/STRONG&gt;hadoopFileDataFrame.registerTempTable("temp")
        var query="CREATE TABLE TEST AS SELECT * FROM TEMP"&lt;/P&gt;&lt;P style="margin-left: 60px;"&gt;hiveContext.sql(query)&lt;/P&gt;&lt;P style="margin-left: 40px;"&gt;   val dataFrame = hiveContext.sql("select * from test")&lt;/P&gt;&lt;P style="margin-left: 40px;"&gt;   dataFrame.show()&lt;/P&gt;&lt;P style="margin-left: 40px;"&gt;&lt;STRONG&gt;   Note: &lt;/STRONG&gt; It successfully loads the data from dataframe to Hive Table as printed in the console logs. But when I check the Hive table using same Select Statement , there is no data in the table . What is the cause behind this?&lt;/P&gt;&lt;P style="margin-left: 40px;"&gt;&lt;STRONG&gt;How can I copy the data from dataframe to a Hive table and store it as Parquet file and perform dynamic partitioning of the data ?(ensuring that the data is correctly copied in the Hive table )&lt;/STRONG&gt;&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;&lt;/P&gt;&lt;P style="margin-left: 40px;"&gt;&lt;/P&gt;&lt;P&gt;,&lt;/P&gt;</description>
    <pubDate>Thu, 07 Jul 2016 23:02:38 GMT</pubDate>
    <dc:creator>guptaneha_er</dc:creator>
    <dc:date>2016-07-07T23:02:38Z</dc:date>
  </channel>
</rss>

