<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question how to identify certain keywords from a flat file, row by row in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-identify-certain-keywords-from-a-flat-file-row-by-row/m-p/165181#M53741</link>
    <description>&lt;P&gt;In a flat file, i have certain keywords, which are sensitive, i would like to identify these sensitive keywords row by row. These keywords could appear in any column of the flat file.&lt;/P&gt;&lt;P&gt;Appreciate any help. &lt;/P&gt;&lt;P&gt;Either in Hive or Pig anything is fine.&lt;/P&gt;</description>
    <pubDate>Wed, 08 Feb 2017 13:47:14 GMT</pubDate>
    <dc:creator>PentaReddy</dc:creator>
    <dc:date>2017-02-08T13:47:14Z</dc:date>
    <item>
      <title>how to identify certain keywords from a flat file, row by row</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-identify-certain-keywords-from-a-flat-file-row-by-row/m-p/165181#M53741</link>
      <description>&lt;P&gt;In a flat file, i have certain keywords, which are sensitive, i would like to identify these sensitive keywords row by row. These keywords could appear in any column of the flat file.&lt;/P&gt;&lt;P&gt;Appreciate any help. &lt;/P&gt;&lt;P&gt;Either in Hive or Pig anything is fine.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Feb 2017 13:47:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-identify-certain-keywords-from-a-flat-file-row-by-row/m-p/165181#M53741</guid>
      <dc:creator>PentaReddy</dc:creator>
      <dc:date>2017-02-08T13:47:14Z</dc:date>
    </item>
    <item>
      <title>Re: how to identify certain keywords from a flat file, row by row</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-identify-certain-keywords-from-a-flat-file-row-by-row/m-p/165182#M53742</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/14949/reddyppr.html" nodeid="14949"&gt;@Reddy&lt;/A&gt; The easiest way in my opinion to do this is via NiFi. Ingest your file via nifi, do a split text essentailly creating flow file for each line in file.  Load your senstive keywords in the nifi distributed map cache.  Do a lookup for each value in the row against DMC (which stores your sensitive key words).  If any of the fields match the sensitive key words, you can route on text and do what ever you wish..ie store that record in a hdfs location.  You can also instead of storing indivial records (the ones which have sensitive key words) on hdfs, use mergecontent to merge x number of records into a file and then store on hdfs.&lt;/P&gt;</description>
      <pubDate>Sun, 26 Feb 2017 11:52:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-identify-certain-keywords-from-a-flat-file-row-by-row/m-p/165182#M53742</guid>
      <dc:creator>sunile_manjee</dc:creator>
      <dc:date>2017-02-26T11:52:58Z</dc:date>
    </item>
    <item>
      <title>Re: how to identify certain keywords from a flat file, row by row</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-identify-certain-keywords-from-a-flat-file-row-by-row/m-p/165183#M53743</link>
      <description>&lt;P&gt;For Pig and Hive implementations, I'd suggest you create a UDF.  If new territory for you, here are some quick blog posts on creating (simple) UDFs for Pig and Hive; &lt;A href="https://martin.atlassian.net/wiki/x/C4BRAQ" target="_blank"&gt;https://martin.atlassian.net/wiki/x/C4BRAQ&lt;/A&gt; and &lt;A href="https://martin.atlassian.net/wiki/x/GoBRAQ" target="_blank"&gt;https://martin.atlassian.net/wiki/x/GoBRAQ&lt;/A&gt;.  Good luck&lt;/P&gt;</description>
      <pubDate>Mon, 27 Feb 2017 02:18:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/how-to-identify-certain-keywords-from-a-flat-file-row-by-row/m-p/165183#M53743</guid>
      <dc:creator>LesterMartin</dc:creator>
      <dc:date>2017-02-27T02:18:25Z</dc:date>
    </item>
  </channel>
</rss>

