<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark 1.3.1 not pulling all hive metadata when executing query - header row of CSV datafile not ignored in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Spark-1-3-1-not-pulling-all-hive-metadata-when-executing/m-p/120582#M83345</link>
    <description>&lt;P&gt;It looks like this may be a bug: &lt;A href="https://issues.apache.org/jira/browse/SPARK-11374" target="_blank"&gt;https://issues.apache.org/jira/browse/SPARK-11374&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Sat, 06 Feb 2016 08:29:26 GMT</pubDate>
    <dc:creator>jmeyer</dc:creator>
    <dc:date>2016-02-06T08:29:26Z</dc:date>
    <item>
      <title>Spark 1.3.1 not pulling all hive metadata when executing query - header row of CSV datafile not ignored</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-1-3-1-not-pulling-all-hive-metadata-when-executing/m-p/120575#M83338</link>
      <description>&lt;P&gt;I am using Spark 1.3.1 to create a hive table from a CSV file (in which the first row is the header row).  I have set the hive table property to skip the header row:&lt;/P&gt;&lt;P&gt; 
               TBLPROPERTIES ("skip.header.line.count"="1") &lt;/P&gt;&lt;P&gt;I validated with a "show create table BOP" that the table property is set to ignore the header row. But when i execute "select count(*) from mytable" i get the correct count from HUE/beeswax/beeline, but if i execute the same query via Spark i get a result that is count+1 (i.e. it counts the header row as a data row). Why is Spark reading the hive metadata and still not ignoring the header row?&lt;/P&gt;</description>
      <pubDate>Tue, 02 Feb 2016 07:14:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-1-3-1-not-pulling-all-hive-metadata-when-executing/m-p/120575#M83338</guid>
      <dc:creator>mqazi</dc:creator>
      <dc:date>2016-02-02T07:14:35Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 1.3.1 not pulling all hive metadata when executing query - header row of CSV datafile not ignored</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-1-3-1-not-pulling-all-hive-metadata-when-executing/m-p/120576#M83339</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2517/mqazi.html" nodeid="2517"&gt;@Maleeha Qazi&lt;/A&gt; have you tried with Spark 1.4.1 in the latest sandbox environment?&lt;/P&gt;</description>
      <pubDate>Tue, 02 Feb 2016 07:18:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-1-3-1-not-pulling-all-hive-metadata-when-executing/m-p/120576#M83339</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-02T07:18:05Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 1.3.1 not pulling all hive metadata when executing query - header row of CSV datafile not ignored</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-1-3-1-not-pulling-all-hive-metadata-when-executing/m-p/120577#M83340</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/393/aervits.html" nodeid="393"&gt;@Artem Ervits&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Here are steps to reproduce in 1.4.1 sandbox.  Still getting the issue, too many characters for a comment.  Thoughts?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;STEPS:&lt;/P&gt;&lt;P&gt;$pyspark&lt;/P&gt;&lt;PRE&gt;sqlContext.sql("create table names(name string, age int) row format delimited fields terminated by ',' stored as textfile TBLPROPERTIES('skip.header.line.count'='1')")&lt;/PRE&gt;
&lt;PRE&gt;sqlContext.sql("LOAD DATA INPATH '/user/root/test.txt' overwrite into table names")&lt;/PRE&gt;
&lt;PRE&gt;sqlContext.sql("Select count(*) from names").show()
3&lt;/PRE&gt;
&lt;PRE&gt;sqlContext.sql("Select * from names").show()
+-----+-----+
| name|	age |
| name|	null|
|  aaa|    1|
|  bbb|    2|&lt;/PRE&gt;&lt;PRE&gt;hive&amp;gt; select count(*) from names;
2
&lt;/PRE&gt;&lt;P&gt;INPUT FILE: "test.txt", with contents:&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;name,age
aaa,1
bbb,2
&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 06:12:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-1-3-1-not-pulling-all-hive-metadata-when-executing/m-p/120577#M83340</guid>
      <dc:creator>mqazi</dc:creator>
      <dc:date>2016-02-03T06:12:58Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 1.3.1 not pulling all hive metadata when executing query - header row of CSV datafile not ignored</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-1-3-1-not-pulling-all-hive-metadata-when-executing/m-p/120578#M83341</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/325/azeltov.html" nodeid="325"&gt;@azeltov&lt;/A&gt; help&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 06:22:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-1-3-1-not-pulling-all-hive-metadata-when-executing/m-p/120578#M83341</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-03T06:22:31Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 1.3.1 not pulling all hive metadata when executing query - header row of CSV datafile not ignored</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-1-3-1-not-pulling-all-hive-metadata-when-executing/m-p/120579#M83342</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/332/vshukla.html" nodeid="332"&gt;@vshukla&lt;/A&gt; ??&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 10:17:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-1-3-1-not-pulling-all-hive-metadata-when-executing/m-p/120579#M83342</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-03T10:17:31Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 1.3.1 not pulling all hive metadata when executing query - header row of CSV datafile not ignored</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-1-3-1-not-pulling-all-hive-metadata-when-executing/m-p/120580#M83343</link>
      <description>&lt;P&gt;Can you please try using HiveContext and report back? &lt;A rel="user" href="https://community.cloudera.com/users/2517/mqazi.html" nodeid="2517"&gt;@Maleeha Qazi&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 10:27:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-1-3-1-not-pulling-all-hive-metadata-when-executing/m-p/120580#M83343</guid>
      <dc:creator>vshukla</dc:creator>
      <dc:date>2016-02-03T10:27:11Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 1.3.1 not pulling all hive metadata when executing query - header row of CSV datafile not ignored</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-1-3-1-not-pulling-all-hive-metadata-when-executing/m-p/120581#M83344</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/332/vshukla.html" nodeid="332"&gt;@vshukla
&lt;/A&gt;I just recreated the scenario &lt;A rel="user" href="https://community.cloudera.com/users/2517/mqazi.html" nodeid="2517"&gt;@Maleeha Qazi&lt;/A&gt; mentioned using a HiveContext in both pyspark, and spark-shell with the sandbox and spark 1.4.1.  Still getting the same erroneous output that was mentioned.  I created the table using the HiveContext.  The show create table looks good in hive.  Looks like when spark-sql queries the table, its not handling the header correctly.  Not respecting the table property when querying.&lt;/P&gt;&lt;P&gt;Hive is handling the header just fine.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 13:48:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-1-3-1-not-pulling-all-hive-metadata-when-executing/m-p/120581#M83344</guid>
      <dc:creator>jwiden</dc:creator>
      <dc:date>2016-02-03T13:48:40Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 1.3.1 not pulling all hive metadata when executing query - header row of CSV datafile not ignored</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-1-3-1-not-pulling-all-hive-metadata-when-executing/m-p/120582#M83345</link>
      <description>&lt;P&gt;It looks like this may be a bug: &lt;A href="https://issues.apache.org/jira/browse/SPARK-11374" target="_blank"&gt;https://issues.apache.org/jira/browse/SPARK-11374&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 06 Feb 2016 08:29:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-1-3-1-not-pulling-all-hive-metadata-when-executing/m-p/120582#M83345</guid>
      <dc:creator>jmeyer</dc:creator>
      <dc:date>2016-02-06T08:29:26Z</dc:date>
    </item>
  </channel>
</rss>

