<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: HiveContext is not reading schema of an Orcfile in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/HiveContext-is-not-reading-schema-of-an-Orcfile/m-p/161644#M124023</link>
    <description>&lt;P&gt;I figured out what the problem was. It was the way I was creating the test data. I was under the impression that if I run the following commands:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;create table mydb.mytable1 (empno int, name VARCHAR(20), deptno int) stored as orc;

INSERT INTO mydb.mytable1(empno, name, deptno) VALUES (1, 'EMP1',100);
INSERT INTO mydb.mytable1(empno, name, deptno) VALUES (2, 'EMP2',50);
INSERT INTO mydb.mytable1(empno, name, deptno) VALUES (3, 'EMP3',200);&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Data would be created in the ORC format at: &lt;STRONG&gt;/apps/hive/warehouse/mydb.db/mytable1&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Turns out that's not the case. Even though I indicated 'stored as orc' the INSERT statements didn't save the column information. Not sure if that's expected behavior. In any case, it all works now. Apologies for the confusion but hopefully this will help someone in future -:)&lt;/P&gt;</description>
    <pubDate>Thu, 04 Aug 2016 04:35:15 GMT</pubDate>
    <dc:creator>jay_ch</dc:creator>
    <dc:date>2016-08-04T04:35:15Z</dc:date>
    <item>
      <title>HiveContext is not reading schema of an Orcfile</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HiveContext-is-not-reading-schema-of-an-Orcfile/m-p/161640#M124019</link>
      <description>&lt;P&gt;When I run the following:&lt;/P&gt;&lt;PRE&gt;&amp;lt;code&amp;gt;val df1 = sqlContext.read.format("orc").load(myPath)
df1.columns.map(m =&amp;gt; println(m))&lt;/PRE&gt;
&lt;P&gt;The columns are printed as '_col0', '_col1', '_col2' etc. As opposed to their real names such as 'empno', 'name', 'deptno'.&lt;/P&gt;&lt;P&gt;When I 'describe mytable' in Hive it prints the column name correctly, but when I run 'orcfiledump' it shows _col0, _col1, _col2 as well. Do I have to specify 'schema on read' or something? If yes, how do I do that in Spark/Scala?&lt;/P&gt;&lt;PRE&gt;&amp;lt;code&amp;gt;hive --orcfiledump /apps/hive/warehouse/mydb.db/mytable1
.....
fieldNames:"_col0"
fieldNames:"_col1"
fieldNames:"_col2"&lt;/PRE&gt;
&lt;P&gt;As suggested elsewhere I've added '--files' BEFORE '--jars' as follows:
&lt;/P&gt;&lt;PRE&gt;spark-submit \
  --master yarn \
  --deploy-mode cluster \
  --class xxx.xxxx.MyDriver \
  --files hive-site.xml \
  --jars datanucleus-api-jdo-3.2.6.jar,datanucleus-core-3.2.10.jar,datanucleus-rdbms-3.2.9.jar \
  --name MyDriver \
  --num-executors 1 \
  --driver-memory 1g \
  --executor-memory 1g \
  --executor-cores 1 \
   ./my-utils-1.0-SNAPSHOT.jar 
&lt;/PRE&gt;&lt;P&gt;Note: I created the table as follows:&lt;/P&gt;&lt;PRE&gt;&amp;lt;code&amp;gt;create table mydb.mytable1 (empno int, name VARCHAR(20), deptno int) stored as orc;&lt;/PRE&gt;
&lt;P&gt;Note: This is not a duplicate of this issue (&lt;A href="http://stackoverflow.com/questions/30094604/hadoop-orc-file-how-it-works-how-to-fetch-metadata"&gt;Hadoop ORC file - How it works - How to fetch metadata&lt;/A&gt;) because the answer tells me to use 'Hive' &amp;amp; I am already using HiveContext as follows:&lt;/P&gt;&lt;PRE&gt;&amp;lt;code&amp;gt;val sqlContext =new org.apache.spark.sql.hive.HiveContext(sc)&lt;/PRE&gt;
&lt;P&gt;By the way, I am using my own hive-site.xml, which contains following:&lt;/P&gt;&lt;PRE&gt;&amp;lt;code&amp;gt;&amp;lt;configuration&amp;gt;
  &amp;lt;property&amp;gt;
    &amp;lt;name&amp;gt;hive.metastore.uris&amp;lt;/name&amp;gt;
    &amp;lt;value&amp;gt;thrift://sandbox.hortonworks.com:9083&amp;lt;/value&amp;gt;
  &amp;lt;/property&amp;gt;
&amp;lt;/configuration&amp;gt;
&lt;/PRE&gt;</description>
      <pubDate>Wed, 03 Aug 2016 02:26:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HiveContext-is-not-reading-schema-of-an-Orcfile/m-p/161640#M124019</guid>
      <dc:creator>jay_ch</dc:creator>
      <dc:date>2016-08-03T02:26:34Z</dc:date>
    </item>
    <item>
      <title>Re: HiveContext is not reading schema of an Orcfile</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HiveContext-is-not-reading-schema-of-an-Orcfile/m-p/161641#M124020</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/12251/chitreajay.html" nodeid="12251"&gt;@Jay Ch&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Great question.  You must register your df1 as a temporary table like:&lt;/P&gt;&lt;PRE&gt;val table = sqlContext.read.format("orc").load("/apps/hive/warehouse/yourtable")

table.registerTempTable("yourtable")&lt;/PRE&gt;&lt;P&gt;and then run:&lt;/P&gt;&lt;PRE&gt;val tester = sqlContext.sql("select * from yourtable");

tester.columns
&lt;/PRE&gt;&lt;P&gt;You'll get the actual column names&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2016 04:50:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HiveContext-is-not-reading-schema-of-an-Orcfile/m-p/161641#M124020</guid>
      <dc:creator>RyanCicak</dc:creator>
      <dc:date>2016-08-03T04:50:10Z</dc:date>
    </item>
    <item>
      <title>Re: HiveContext is not reading schema of an Orcfile</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HiveContext-is-not-reading-schema-of-an-Orcfile/m-p/161642#M124021</link>
      <description>&lt;P&gt;Not sure I understand the answer. I need to run "select * from yourtable" to get column names populated? Perhaps "select * from yourtable limit 1". I can try this, but shouldn't the column names be populated from Metastore as soon as I do a 'load'?&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2016 06:48:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HiveContext-is-not-reading-schema-of-an-Orcfile/m-p/161642#M124021</guid>
      <dc:creator>jay_ch</dc:creator>
      <dc:date>2016-08-03T06:48:19Z</dc:date>
    </item>
    <item>
      <title>Re: HiveContext is not reading schema of an Orcfile</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HiveContext-is-not-reading-schema-of-an-Orcfile/m-p/161643#M124022</link>
      <description>&lt;P&gt;Tried it but it doesn't work. Columns are still '_col0', '_col1', '_col2'&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2016 09:21:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HiveContext-is-not-reading-schema-of-an-Orcfile/m-p/161643#M124022</guid>
      <dc:creator>jay_ch</dc:creator>
      <dc:date>2016-08-03T09:21:41Z</dc:date>
    </item>
    <item>
      <title>Re: HiveContext is not reading schema of an Orcfile</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HiveContext-is-not-reading-schema-of-an-Orcfile/m-p/161644#M124023</link>
      <description>&lt;P&gt;I figured out what the problem was. It was the way I was creating the test data. I was under the impression that if I run the following commands:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;create table mydb.mytable1 (empno int, name VARCHAR(20), deptno int) stored as orc;

INSERT INTO mydb.mytable1(empno, name, deptno) VALUES (1, 'EMP1',100);
INSERT INTO mydb.mytable1(empno, name, deptno) VALUES (2, 'EMP2',50);
INSERT INTO mydb.mytable1(empno, name, deptno) VALUES (3, 'EMP3',200);&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Data would be created in the ORC format at: &lt;STRONG&gt;/apps/hive/warehouse/mydb.db/mytable1&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Turns out that's not the case. Even though I indicated 'stored as orc' the INSERT statements didn't save the column information. Not sure if that's expected behavior. In any case, it all works now. Apologies for the confusion but hopefully this will help someone in future -:)&lt;/P&gt;</description>
      <pubDate>Thu, 04 Aug 2016 04:35:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HiveContext-is-not-reading-schema-of-an-Orcfile/m-p/161644#M124023</guid>
      <dc:creator>jay_ch</dc:creator>
      <dc:date>2016-08-04T04:35:15Z</dc:date>
    </item>
    <item>
      <title>Re: HiveContext is not reading schema of an Orcfile</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HiveContext-is-not-reading-schema-of-an-Orcfile/m-p/161645#M124024</link>
      <description>&lt;P&gt;I am with the same issue&lt;/P&gt;,,&lt;P&gt;I have the same isssue.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2017 02:42:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HiveContext-is-not-reading-schema-of-an-Orcfile/m-p/161645#M124024</guid>
      <dc:creator>talktojulio</dc:creator>
      <dc:date>2017-01-24T02:42:01Z</dc:date>
    </item>
    <item>
      <title>Re: HiveContext is not reading schema of an Orcfile</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HiveContext-is-not-reading-schema-of-an-Orcfile/m-p/161646#M124025</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;You should use val df = hiveContext.read.table("table_name") instead. In this case, columns are displayed properly.&lt;/P&gt;</description>
      <pubDate>Wed, 01 Mar 2017 22:41:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HiveContext-is-not-reading-schema-of-an-Orcfile/m-p/161646#M124025</guid>
      <dc:creator>tenke_iu8</dc:creator>
      <dc:date>2017-03-01T22:41:34Z</dc:date>
    </item>
  </channel>
</rss>

