<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Read csv files from HDFS into R using fread() and grep — lost column names in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-csv-files-from-HDFS-into-R-using-fread-and-grep-lost/m-p/170599#M37335</link>
    <description>&lt;P&gt;I've been trying to read large csv files from HDFS into R using the data.table package since it's a lot faster than the rhdfs package in my experience.&lt;/P&gt;&lt;P&gt;I have been successful with reading entire files with the following commands:&lt;/P&gt;&lt;PRE&gt;data &amp;lt;- fread("/usr/bin/hadoop fs -text /path/to/the/file.csv"), fill=TRUE
&lt;/PRE&gt;&lt;P&gt;Then, I would like to only read in rows that contain the value "2MS-US". I tried to do it with &lt;CODE&gt;grep&lt;/CODE&gt;:&lt;/P&gt;&lt;PRE&gt;data &amp;lt;- fread("/usr/bin/hadoop fs -text /path/to/the/file.csv | grep '2MS-US'"), fill=TRUE)
&lt;/PRE&gt;&lt;P&gt;This returns the correct number of rows, but it removes all the headers. They now become "V1", "V2", etc.&lt;/P&gt;&lt;P&gt;According to this &lt;A href="http://stackoverflow.com/questions/28602337/r-data-table-fread-using-named-colclasses-without-header-e-g-no-col-names"&gt;thread&lt;/A&gt;, the issue with losing column names when using &lt;CODE&gt;grep&lt;/CODE&gt; has been resolved in data.package 1.9.6, but I am still experiencing it even though I am using 1.9.7. Any thoughts on this? Thanks!&lt;/P&gt;</description>
    <pubDate>Tue, 09 Aug 2016 22:40:33 GMT</pubDate>
    <dc:creator>jingjing_yang</dc:creator>
    <dc:date>2016-08-09T22:40:33Z</dc:date>
    <item>
      <title>Read csv files from HDFS into R using fread() and grep — lost column names</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-csv-files-from-HDFS-into-R-using-fread-and-grep-lost/m-p/170599#M37335</link>
      <description>&lt;P&gt;I've been trying to read large csv files from HDFS into R using the data.table package since it's a lot faster than the rhdfs package in my experience.&lt;/P&gt;&lt;P&gt;I have been successful with reading entire files with the following commands:&lt;/P&gt;&lt;PRE&gt;data &amp;lt;- fread("/usr/bin/hadoop fs -text /path/to/the/file.csv"), fill=TRUE
&lt;/PRE&gt;&lt;P&gt;Then, I would like to only read in rows that contain the value "2MS-US". I tried to do it with &lt;CODE&gt;grep&lt;/CODE&gt;:&lt;/P&gt;&lt;PRE&gt;data &amp;lt;- fread("/usr/bin/hadoop fs -text /path/to/the/file.csv | grep '2MS-US'"), fill=TRUE)
&lt;/PRE&gt;&lt;P&gt;This returns the correct number of rows, but it removes all the headers. They now become "V1", "V2", etc.&lt;/P&gt;&lt;P&gt;According to this &lt;A href="http://stackoverflow.com/questions/28602337/r-data-table-fread-using-named-colclasses-without-header-e-g-no-col-names"&gt;thread&lt;/A&gt;, the issue with losing column names when using &lt;CODE&gt;grep&lt;/CODE&gt; has been resolved in data.package 1.9.6, but I am still experiencing it even though I am using 1.9.7. Any thoughts on this? Thanks!&lt;/P&gt;</description>
      <pubDate>Tue, 09 Aug 2016 22:40:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-csv-files-from-HDFS-into-R-using-fread-and-grep-lost/m-p/170599#M37335</guid>
      <dc:creator>jingjing_yang</dc:creator>
      <dc:date>2016-08-09T22:40:33Z</dc:date>
    </item>
    <item>
      <title>Re: Read csv files from HDFS into R using fread() and grep — lost column names</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-csv-files-from-HDFS-into-R-using-fread-and-grep-lost/m-p/170600#M37336</link>
      <description>&lt;P&gt;Fixed the issue by using &lt;CODE&gt;sed&lt;/CODE&gt; instead:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;fread("hadoop fs -text /path/to/the/file.csv |sed -n '1p;/2MS-US/p'", fill=TRUE)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The &lt;CODE&gt;1p&lt;/CODE&gt; part prints the first line, which are the headers, so this way I was able to keep the headers as well as the rows that match the string.&lt;/P&gt;</description>
      <pubDate>Wed, 10 Aug 2016 01:39:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-csv-files-from-HDFS-into-R-using-fread-and-grep-lost/m-p/170600#M37336</guid>
      <dc:creator>jingjing_yang</dc:creator>
      <dc:date>2016-08-10T01:39:14Z</dc:date>
    </item>
  </channel>
</rss>

