<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: SPARK, Hive : ORC does not support type conversion from STRING to VARCHAR in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/SPARK-Hive-ORC-does-not-support-type-conversion-from-STRING/m-p/137056#M99705</link>
    <description>&lt;P&gt;I suggest to read you this &lt;A target="_blank" href="https://community.hortonworks.com/questions/48260/hive-string-vs-varchar-performance.html"&gt;topic&lt;/A&gt;. Might be helpful.&lt;/P&gt;</description>
    <pubDate>Mon, 24 Jul 2017 19:06:23 GMT</pubDate>
    <dc:creator>alisa_houskova</dc:creator>
    <dc:date>2017-07-24T19:06:23Z</dc:date>
    <item>
      <title>SPARK, Hive : ORC does not support type conversion from STRING to VARCHAR</title>
      <link>https://community.cloudera.com/t5/Support-Questions/SPARK-Hive-ORC-does-not-support-type-conversion-from-STRING/m-p/137049#M99698</link>
      <description>&lt;P&gt;I Sqooped the data from SQL server and stored the data in Hive in ORC file in a data warehouse as table testtable.   I read the data using spark into a dataframe.  Added a column using withColumn to dataframe and issued an alter to add the column &lt;/P&gt;&lt;P&gt;alter table testtable add columns (PatMD5 VARCHAR(50)  using hiveContext.sql and it is changing the table and I saved the dataframe using the following&lt;/P&gt;&lt;P&gt;dataframe.write.format("orc").mode(SaveMode.Overwrite).save("testtable")&lt;/P&gt;&lt;P&gt;I am able to save the file into ORC.  But when I tried to query using Hue or Beeline,  I am getting the following error&lt;/P&gt;&lt;P&gt;ORC does not support type conversion from STRING to VARCHAR&lt;/P&gt;&lt;P&gt;I tried with &lt;/P&gt;&lt;P&gt;alter table testtable add columns (PatMD5 STRING)&lt;/P&gt;&lt;P&gt;I am able to save the file in ORC but not able to query from hive.  Can any one help.&lt;/P&gt;&lt;P&gt;thanks in advance&lt;/P&gt;&lt;P&gt;Ram&lt;/P&gt;</description>
      <pubDate>Wed, 20 Jul 2016 10:24:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/SPARK-Hive-ORC-does-not-support-type-conversion-from-STRING/m-p/137049#M99698</guid>
      <dc:creator>ram_pratapa</dc:creator>
      <dc:date>2016-07-20T10:24:46Z</dc:date>
    </item>
    <item>
      <title>Re: SPARK, Hive : ORC does not support type conversion from STRING to VARCHAR</title>
      <link>https://community.cloudera.com/t5/Support-Questions/SPARK-Hive-ORC-does-not-support-type-conversion-from-STRING/m-p/137050#M99699</link>
      <description>&lt;P&gt;it seems ORC dont support schema evolution&lt;/P&gt;</description>
      <pubDate>Wed, 20 Jul 2016 12:01:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/SPARK-Hive-ORC-does-not-support-type-conversion-from-STRING/m-p/137050#M99699</guid>
      <dc:creator>rajkumar_singh</dc:creator>
      <dc:date>2016-07-20T12:01:52Z</dc:date>
    </item>
    <item>
      <title>Re: SPARK, Hive : ORC does not support type conversion from STRING to VARCHAR</title>
      <link>https://community.cloudera.com/t5/Support-Questions/SPARK-Hive-ORC-does-not-support-type-conversion-from-STRING/m-p/137051#M99700</link>
      <description>&lt;P&gt;ORC schema evolution is added in V. 1.1 which is not available with HDP 2.4 or older.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Jul 2016 19:36:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/SPARK-Hive-ORC-does-not-support-type-conversion-from-STRING/m-p/137051#M99700</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-07-20T19:36:52Z</dc:date>
    </item>
    <item>
      <title>Re: SPARK, Hive : ORC does not support type conversion from STRING to VARCHAR</title>
      <link>https://community.cloudera.com/t5/Support-Questions/SPARK-Hive-ORC-does-not-support-type-conversion-from-STRING/m-p/137052#M99701</link>
      <description>&lt;P&gt;Can you please do a SHOW CREATE TABLE testtable: ?  That will show us exactly how your columns are defined and the ORC format.&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jul 2016 02:49:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/SPARK-Hive-ORC-does-not-support-type-conversion-from-STRING/m-p/137052#M99701</guid>
      <dc:creator>don_jernigan</dc:creator>
      <dc:date>2016-07-21T02:49:43Z</dc:date>
    </item>
    <item>
      <title>Re: SPARK, Hive : ORC does not support type conversion from STRING to VARCHAR</title>
      <link>https://community.cloudera.com/t5/Support-Questions/SPARK-Hive-ORC-does-not-support-type-conversion-from-STRING/m-p/137053#M99702</link>
      <description>&lt;P&gt;Hi,  thank you for your reply.   I will post the results.  However I followed these steps.&lt;/P&gt;&lt;P&gt;a) Loaded the data from existing table testtable into dataframe using HiveContext&lt;/P&gt;&lt;P&gt;b) Added a column using withColumn to dataframe &lt;/P&gt;&lt;P&gt;c) Created the new table (testtabletmp) using Spark SQL with new column that saves as ORC&lt;/P&gt;&lt;P&gt;d) Save the data frame as ORC dataframe.write.format("orc").save("testtabletmp")&lt;/P&gt;&lt;P&gt;With the above steps,  I am able to access the table from Hive.   I will post the results related to SHOW CREATE TABLE testtable tomorrow.&lt;/P&gt;&lt;P&gt;thanks&lt;/P&gt;&lt;P&gt;Ram&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jul 2016 04:06:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/SPARK-Hive-ORC-does-not-support-type-conversion-from-STRING/m-p/137053#M99702</guid>
      <dc:creator>ram_pratapa</dc:creator>
      <dc:date>2016-07-22T04:06:29Z</dc:date>
    </item>
    <item>
      <title>Re: SPARK, Hive : ORC does not support type conversion from STRING to VARCHAR</title>
      <link>https://community.cloudera.com/t5/Support-Questions/SPARK-Hive-ORC-does-not-support-type-conversion-from-STRING/m-p/137054#M99703</link>
      <description>&lt;P&gt;I executed the above statement and I indentified that we created a table with  &lt;/P&gt;&lt;P&gt;TBLPROPERTIES (                                                                                         |
|   'COLUMN_STATS_ACCURATE'='false',                                                                      |
|   'last_modified_by'='hdfs',                                                                            |
|   'last_modified_time'='1469026541',                                                                    |
|   'numFiles'='1',                                                                                       |
|   'numRows'='-1',                                                                                       |
|   '&lt;STRONG&gt;orc.compress'='SNAPPY&lt;/STRONG&gt;',                                                                              |
|   'rawDataSize'='-1',                                                                                   |
|   'totalSize'='11144909',                                                                               |
|   'transient_lastDdlTime'='1469026541'&lt;/P&gt;&lt;P&gt;I noticed that while storing ORC file I did not provide compress option and I used option("compression", "snappy") while saving the file and it appears the compression is not working.   can you please help.&lt;/P&gt;&lt;P&gt;thanks&lt;/P&gt;&lt;P&gt;Ram&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jul 2016 23:29:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/SPARK-Hive-ORC-does-not-support-type-conversion-from-STRING/m-p/137054#M99703</guid>
      <dc:creator>ram_pratapa</dc:creator>
      <dc:date>2016-07-22T23:29:43Z</dc:date>
    </item>
    <item>
      <title>Re: SPARK, Hive : ORC does not support type conversion from STRING to VARCHAR</title>
      <link>https://community.cloudera.com/t5/Support-Questions/SPARK-Hive-ORC-does-not-support-type-conversion-from-STRING/m-p/137055#M99704</link>
      <description>&lt;P&gt;Here are the details :&lt;/P&gt;&lt;P&gt;   a)  The following is the show create table testtable results ( this table is created with Spark SQL&lt;/P&gt;&lt;P&gt;CREATE TABLE `testtabletmp1`( `person_key` bigint, `pat_last` string,                                                         `pat_first` string, `pat_dob` timestamp, `pat_zip` string, `pat_gender` string, `pat_chksum1` bigint, `pat_chksum2` bigint, `dimcreatedgmt` timestamp, `pat_mi` string, `h_keychksum` string, `patmd5` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://hdp-cent7-01:8020/apps/hive/warehouse/datawarehouse.db/testtabledimtmp1' | TBLPROPERTIES ( 'orc.compress'='SNAPPY', 'transient_lastDdlTime'='1469207216') &lt;/P&gt;&lt;P&gt;2.   The original table create when we scooped the data from SQL server using SQOOP import&lt;/P&gt;&lt;P&gt; CREATE TABLE `testtabledim`( `person_key` bigint, `pat_last` varchar(35), `pat_first` varchar(35), `pat_dob` timestamp, `pat_zip` char(5), `pat_gender` char(1), `pat_chksum1` bigint, `pat_chksum2` bigint, `dimcreatedgmt` timestamp, `pat_mi` char(1), `h_keychksum` string, `patmd5` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://hdp-cent7-01:8020/apps/hive/warehouse/datawarehouse.db/testtabledim' TBLPROPERTIES ( 'COLUMN_STATS_ACCURATE'='false',                                                  'last_modified_by'='hdfs', 'last_modified_time'='1469026541', 'numFiles'='1', 'numRows'='-1', 'orc.compress'='SNAPPY', 'rawDataSize'='-1', 'totalSize'='11144909', 'transient_lastDdlTime'='1469026541')&lt;/P&gt;&lt;P&gt;  If use the first script using spark sql and store the file as ORC with snappy compression it is working.  if I store ORC file with snappy compression and use hive to create table using script 1 then it is working fine.  But I use an existing table alter table with a new coulmn using the Spark Hive context and save as ORC with snappy compression,  I am getting the following error ORC does not support type conversion from STRING to VARCHAR.   if use the same ORC but use hive to create a table using second query even then I am getting the same error.  &lt;/P&gt;&lt;P&gt;I noticed some columns are defined as VARCHAR(35)  and I think those columns may be the issue.&lt;/P&gt;&lt;P&gt;After I made the change from VARCHAR to String and CHAR to String,  it worked fine.  I am still investigating what is the best way to handle VARCHAR/CHAR types through Spark dataframe.&lt;/P&gt;&lt;P&gt;Please let me know if you need more information.&lt;/P&gt;&lt;P&gt;Thank you for your help.&lt;/P&gt;</description>
      <pubDate>Sat, 23 Jul 2016 01:03:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/SPARK-Hive-ORC-does-not-support-type-conversion-from-STRING/m-p/137055#M99704</guid>
      <dc:creator>ram_pratapa</dc:creator>
      <dc:date>2016-07-23T01:03:44Z</dc:date>
    </item>
    <item>
      <title>Re: SPARK, Hive : ORC does not support type conversion from STRING to VARCHAR</title>
      <link>https://community.cloudera.com/t5/Support-Questions/SPARK-Hive-ORC-does-not-support-type-conversion-from-STRING/m-p/137056#M99705</link>
      <description>&lt;P&gt;I suggest to read you this &lt;A target="_blank" href="https://community.hortonworks.com/questions/48260/hive-string-vs-varchar-performance.html"&gt;topic&lt;/A&gt;. Might be helpful.&lt;/P&gt;</description>
      <pubDate>Mon, 24 Jul 2017 19:06:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/SPARK-Hive-ORC-does-not-support-type-conversion-from-STRING/m-p/137056#M99705</guid>
      <dc:creator>alisa_houskova</dc:creator>
      <dc:date>2017-07-24T19:06:23Z</dc:date>
    </item>
  </channel>
</rss>

