<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question [Hive 3.1.0] Missing lines when requesting ORC external table in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Hive-3-1-0-Missing-lines-when-requesting-ORC-external-table/m-p/293533#M216744</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;we are facing a strange issue with an ORC external Hive table, some lines can't be retrieved.&lt;/P&gt;
&lt;P&gt;The context is HDP 3.1.0 with ORC files generated with a Spark 2.3.2 job into an HDFS partition with an external table mapped on this HDFS directory. Table partitions were added.&lt;/P&gt;
&lt;P&gt;When trying to filtering this table with a specific column value, there's no result;but when accessing these files from the Spark interpreter of a Zeppelin notebook, the filter shows the expected lines.&lt;/P&gt;
&lt;P&gt;Hive LLAP is not used for this issue and have another behaviour (data is retrieved).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The SQL query (JDBC tool or beeline command) that don't return any result :&lt;/P&gt;
&lt;PRE&gt;select * from tracabilite.tg1pivot where sscc="330232926251080606" and `application`="LOGUSI" and month="04" and day="01";&lt;/PRE&gt;
&lt;P&gt;Another SQL query that returns 2 results (within the same partition):&lt;/P&gt;
&lt;PRE&gt;select * from tracabilite.tg1pivot where sscc="330232926636794272" and `application`="LOGUSI" and month="04" and day="01";&lt;/PRE&gt;
&lt;P&gt;The Zeppelin notebook sample :&lt;/P&gt;
&lt;PRE&gt;&lt;EM&gt;val logusi = spark.read.format("orc").option("header",true).load("/DEV/smart_data/TG/application=LOGUSI/year=2020/month=04/day=01")&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;println(logusi.count)&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;logusi.where(col("sscc")==="330232926251080606").show()&lt;/EM&gt;&lt;/PRE&gt;
&lt;P&gt;Please help to find and fix why some lines are not readable from Hive.&lt;/P&gt;
&lt;P&gt;Do you have any tracks to follow, some hive/tez parameters to check or any bugs known ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks&lt;/P&gt;
&lt;P&gt;Olivier&lt;/P&gt;</description>
    <pubDate>Wed, 08 Apr 2020 18:39:55 GMT</pubDate>
    <dc:creator>olivier_drouin_</dc:creator>
    <dc:date>2020-04-08T18:39:55Z</dc:date>
    <item>
      <title>[Hive 3.1.0] Missing lines when requesting ORC external table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-3-1-0-Missing-lines-when-requesting-ORC-external-table/m-p/293533#M216744</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;we are facing a strange issue with an ORC external Hive table, some lines can't be retrieved.&lt;/P&gt;
&lt;P&gt;The context is HDP 3.1.0 with ORC files generated with a Spark 2.3.2 job into an HDFS partition with an external table mapped on this HDFS directory. Table partitions were added.&lt;/P&gt;
&lt;P&gt;When trying to filtering this table with a specific column value, there's no result;but when accessing these files from the Spark interpreter of a Zeppelin notebook, the filter shows the expected lines.&lt;/P&gt;
&lt;P&gt;Hive LLAP is not used for this issue and have another behaviour (data is retrieved).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The SQL query (JDBC tool or beeline command) that don't return any result :&lt;/P&gt;
&lt;PRE&gt;select * from tracabilite.tg1pivot where sscc="330232926251080606" and `application`="LOGUSI" and month="04" and day="01";&lt;/PRE&gt;
&lt;P&gt;Another SQL query that returns 2 results (within the same partition):&lt;/P&gt;
&lt;PRE&gt;select * from tracabilite.tg1pivot where sscc="330232926636794272" and `application`="LOGUSI" and month="04" and day="01";&lt;/PRE&gt;
&lt;P&gt;The Zeppelin notebook sample :&lt;/P&gt;
&lt;PRE&gt;&lt;EM&gt;val logusi = spark.read.format("orc").option("header",true).load("/DEV/smart_data/TG/application=LOGUSI/year=2020/month=04/day=01")&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;println(logusi.count)&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;logusi.where(col("sscc")==="330232926251080606").show()&lt;/EM&gt;&lt;/PRE&gt;
&lt;P&gt;Please help to find and fix why some lines are not readable from Hive.&lt;/P&gt;
&lt;P&gt;Do you have any tracks to follow, some hive/tez parameters to check or any bugs known ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks&lt;/P&gt;
&lt;P&gt;Olivier&lt;/P&gt;</description>
      <pubDate>Wed, 08 Apr 2020 18:39:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-3-1-0-Missing-lines-when-requesting-ORC-external-table/m-p/293533#M216744</guid>
      <dc:creator>olivier_drouin_</dc:creator>
      <dc:date>2020-04-08T18:39:55Z</dc:date>
    </item>
    <item>
      <title>Re: [Hive 3.1.0] Missing lines when requesting ORC external table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-3-1-0-Missing-lines-when-requesting-ORC-external-table/m-p/293569#M216760</link>
      <description>&lt;P&gt;The cause has been identified; a wrong table property&amp;nbsp;'skip.header.line.count'='1' inherited from other CSV format tables that makes this strong behaviour on the ORC external table : missing lines, empty counts (due to null values ?), ...&lt;/P&gt;</description>
      <pubDate>Thu, 09 Apr 2020 06:47:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-3-1-0-Missing-lines-when-requesting-ORC-external-table/m-p/293569#M216760</guid>
      <dc:creator>olivier_drouin_</dc:creator>
      <dc:date>2020-04-09T06:47:37Z</dc:date>
    </item>
  </channel>
</rss>

