<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Insert a new column with value based on file title - HDFS in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Insert-a-new-column-with-value-based-on-file-title-HDFS/m-p/135496#M98155</link>
    <description>&lt;P&gt;Get date from Filename&lt;/P&gt;&lt;P&gt;There are some ways to get at the filename in mapreduce but its difficult. MapReduce by definition abstracts filenames away. You have two options there:&lt;/P&gt;&lt;P&gt;1) Use a little python/java/shell whatever preprocessing script OUTSIDE hadoop that adds a field with the date to each row of each file taken from the filename. Easy but not that scalable&lt;/P&gt;&lt;P&gt;2) Write your own recordreader&lt;/P&gt;&lt;P&gt;3) Pig seems to provide some value called tagsource that can do the same&lt;/P&gt;&lt;P&gt;&lt;A href="http://stackoverflow.com/questions/9751480/how-can-i-incorporate-the-current-input-filename-into-my-pig-latin-script"&gt;http://stackoverflow.com/questions/9751480/how-can-i-incorporate-the-current-input-filename-into-my-pig-latin-script&lt;/A&gt;&lt;/P&gt;&lt;P&gt;4) Hive has a hidden column for the filename so you could use that to compute a date column&lt;/P&gt;&lt;P&gt;&lt;A href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns" target="_blank"&gt;https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 13 Jun 2016 20:29:45 GMT</pubDate>
    <dc:creator>bleonhardi</dc:creator>
    <dc:date>2016-06-13T20:29:45Z</dc:date>
    <item>
      <title>Insert a new column with value based on file title - HDFS</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Insert-a-new-column-with-value-based-on-file-title-HDFS/m-p/135495#M98154</link>
      <description>&lt;P&gt;Hi,

I've multiple files (in hDFS) with the same schema and I will aggregate all of them into Hive at only one table. Each files represents a date but I only have this info on file title. 

Which is the best way to insert the file title (the date) as a new column on this files. Java? NiFi? 

Thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 13 Jun 2016 20:13:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Insert-a-new-column-with-value-based-on-file-title-HDFS/m-p/135495#M98154</guid>
      <dc:creator>prodgers125</dc:creator>
      <dc:date>2016-06-13T20:13:04Z</dc:date>
    </item>
    <item>
      <title>Re: Insert a new column with value based on file title - HDFS</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Insert-a-new-column-with-value-based-on-file-title-HDFS/m-p/135496#M98155</link>
      <description>&lt;P&gt;Get date from Filename&lt;/P&gt;&lt;P&gt;There are some ways to get at the filename in mapreduce but its difficult. MapReduce by definition abstracts filenames away. You have two options there:&lt;/P&gt;&lt;P&gt;1) Use a little python/java/shell whatever preprocessing script OUTSIDE hadoop that adds a field with the date to each row of each file taken from the filename. Easy but not that scalable&lt;/P&gt;&lt;P&gt;2) Write your own recordreader&lt;/P&gt;&lt;P&gt;3) Pig seems to provide some value called tagsource that can do the same&lt;/P&gt;&lt;P&gt;&lt;A href="http://stackoverflow.com/questions/9751480/how-can-i-incorporate-the-current-input-filename-into-my-pig-latin-script"&gt;http://stackoverflow.com/questions/9751480/how-can-i-incorporate-the-current-input-filename-into-my-pig-latin-script&lt;/A&gt;&lt;/P&gt;&lt;P&gt;4) Hive has a hidden column for the filename so you could use that to compute a date column&lt;/P&gt;&lt;P&gt;&lt;A href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns" target="_blank"&gt;https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Jun 2016 20:29:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Insert-a-new-column-with-value-based-on-file-title-HDFS/m-p/135496#M98155</guid>
      <dc:creator>bleonhardi</dc:creator>
      <dc:date>2016-06-13T20:29:45Z</dc:date>
    </item>
  </channel>
</rss>

