<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Ranger audit to HDFS creates corrupt JSON in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145076#M48385</link>
    <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/11295/slachterman.html" nodeid="11295"&gt;@slachterman&lt;/A&gt; ,&lt;/P&gt;&lt;P&gt;many thanks for this hint. Could you please send me the details of the processor config to drop the line if they are invalid?&lt;/P&gt;&lt;P&gt;Thanks and regards...&lt;/P&gt;</description>
    <pubDate>Mon, 12 Dec 2016 19:06:44 GMT</pubDate>
    <dc:creator>geko</dc:creator>
    <dc:date>2016-12-12T19:06:44Z</dc:date>
    <item>
      <title>Ranger audit to HDFS creates corrupt JSON</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145070#M48379</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I configured Ranger to write audit-log to HDFS only. Now I have e.g. directories like &lt;/P&gt;&lt;PRE&gt;/ranger/audit/hiveServer2/20161206
/ranger/audit/hiveServer2/20161207

...same for hdfs, hbase...&lt;/PRE&gt;&lt;P&gt;At the end I am collecting all the single files per day (from any service) to one general folder, and put a Hive table on top.&lt;/P&gt;&lt;P&gt;Similar to what is described &lt;A target="_blank" href="https://community.hortonworks.com/articles/60802/ranger-audit-in-hive-table-a-sample-approach-1.html"&gt;here in HCC&lt;/A&gt; , just extended by collecting all dedicated files from the same day to a common directory to which the partition points to. &lt;/P&gt;&lt;P&gt;Unfortunately the Hive-QL select statement fails with a JSON parse error, because some of the created log files are corrupt, invalid JSON, due to the last line is just cutted off, like e.g.:&lt;/P&gt;&lt;PRE&gt;hdfs dfs -cat /ranger/audit/hiveServer2/20161207/hiveServer2_ranger_audit_&amp;lt;hostname&amp;gt;.log

...
{"repoType":3,"repo":"hdp_hive","reqUser":"xxxxxx","evtTime":"2016-12-07 08:13:20.276","access":"SELECT","resource":"xxxxxxx","resType":"@column","action":"QUERY&lt;/PRE&gt;&lt;P&gt;but the first file from the same day looks fine:&lt;/P&gt;&lt;PRE&gt;hdfs dfs -cat /ranger/audit/hiveServer2/20161207/hiveServer2_ranger_audit_&amp;lt;hostname&amp;gt;.1.log

...

{"repoType":3,"repo":"hdp_hive","reqUser":"xxxxx","evtTime":"2016-12-07 12:16:24.474","access":"USE","resource":"xxxx","resType":"@database","action":"SWITCHDATABASE","result":1,"policy":17,"enforcer":"ranger-acl","sess":"bf9a9f2e-ee90-4784-9d82-87008ad2e7fa","cliType":"HIVESERVER2","cliIP":"xxxxxx","reqData":"USE dbname","agentHost":"xxxxxxx","logType":"RangerAudit","id":"5b0b00ed-ed60-4817-85e0-e1c629952414","seq_num":213,"event_count":1,"event_dur_ms":0}&lt;/PRE&gt;&lt;P&gt;What can cause those corrupt files? ...or what to do to be able to select the final Hive table without issue ?!?!&lt;/P&gt;&lt;P&gt;env.: HDP2.3.4, Ranger policies for HDFS, Hive, HBase enabled, all configured to store audit to HDFS folder "/ranger/audit"&lt;/P&gt;&lt;P&gt;Thanks for any hints...&lt;/P&gt;</description>
      <pubDate>Fri, 09 Dec 2016 00:53:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145070#M48379</guid>
      <dc:creator>geko</dc:creator>
      <dc:date>2016-12-09T00:53:28Z</dc:date>
    </item>
    <item>
      <title>Re: Ranger audit to HDFS creates corrupt JSON</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145071#M48380</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1198/koenigbodensee.html" nodeid="1198"&gt;@Gerd Koenig&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Does this happen often or just one off ? Generally this would mean the writing application did not sync the data completely to HDFS. So looks like you have an incomplete JSON and Hive is not able to parse it.&lt;/P&gt;</description>
      <pubDate>Sat, 10 Dec 2016 02:27:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145071#M48380</guid>
      <dc:creator>aengineer</dc:creator>
      <dc:date>2016-12-10T02:27:37Z</dc:date>
    </item>
    <item>
      <title>Re: Ranger audit to HDFS creates corrupt JSON</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145072#M48381</link>
      <description>&lt;P&gt;Hi @aengineer , &lt;/P&gt;&lt;P&gt;It happens frequently. I created an oozie Job to collect the logs each night from the day before. The logs from yesterday have the same issue.&lt;/P&gt;&lt;P&gt;The oozie Job runs at 3am, at that time the logs from the day before should have been closed correctly....I guess.&lt;/P&gt;</description>
      <pubDate>Sat, 10 Dec 2016 03:28:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145072#M48381</guid>
      <dc:creator>geko</dc:creator>
      <dc:date>2016-12-10T03:28:43Z</dc:date>
    </item>
    <item>
      <title>Re: Ranger audit to HDFS creates corrupt JSON</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145073#M48382</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/518/aengineer.html" nodeid="518"&gt;@aengineer&lt;/A&gt; I saw this consistently as well when creating &lt;A href="https://community.hortonworks.com/articles/67060/ranger-audit-analytics-with-nifi-and-zeppelin.html"&gt;this HCC article&lt;/A&gt;. It seems like the Ranger plugin isn't always writing complete records for the last record in the file. In the NiFi flow described in that article, I just dropped these invalid records as this was appropriate for the purposes of the analysis in question.&lt;/P&gt;</description>
      <pubDate>Mon, 12 Dec 2016 09:01:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145073#M48382</guid>
      <dc:creator>slachterman</dc:creator>
      <dc:date>2016-12-12T09:01:10Z</dc:date>
    </item>
    <item>
      <title>Re: Ranger audit to HDFS creates corrupt JSON</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145074#M48383</link>
      <description>&lt;A href="https://community.hortonworks.com/questions/70583/ranger-audit-to-hdfs-creates-corrupt-json.html#"&gt;@slachterman&lt;/A&gt; Thanks for sharing your experience. &lt;A rel="user" href="https://community.cloudera.com/users/1198/koenigbodensee.html" nodeid="1198"&gt;@Gerd Koenig&lt;/A&gt;&lt;P&gt;Sorry to hear that this happening quite often. This might be an issue in Ranger as mentioned by &lt;A href="https://community.hortonworks.com/questions/70583/ranger-audit-to-hdfs-creates-corrupt-json.html#"&gt;@slachterman&lt;/A&gt; If you have enough details, please feel free to open an Apache Ranger JIRA so that Ranger team gets a chance to look at this.&lt;/P&gt;</description>
      <pubDate>Mon, 12 Dec 2016 09:07:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145074#M48383</guid>
      <dc:creator>aengineer</dc:creator>
      <dc:date>2016-12-12T09:07:47Z</dc:date>
    </item>
    <item>
      <title>Re: Ranger audit to HDFS creates corrupt JSON</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145075#M48384</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/518/aengineer.html" nodeid="518"&gt;@aengineer&lt;/A&gt; ,&lt;/P&gt;&lt;P&gt;many thanks, I'll try to gather the needful and open a ticket there.&lt;/P&gt;</description>
      <pubDate>Mon, 12 Dec 2016 19:04:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145075#M48384</guid>
      <dc:creator>geko</dc:creator>
      <dc:date>2016-12-12T19:04:34Z</dc:date>
    </item>
    <item>
      <title>Re: Ranger audit to HDFS creates corrupt JSON</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145076#M48385</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/11295/slachterman.html" nodeid="11295"&gt;@slachterman&lt;/A&gt; ,&lt;/P&gt;&lt;P&gt;many thanks for this hint. Could you please send me the details of the processor config to drop the line if they are invalid?&lt;/P&gt;&lt;P&gt;Thanks and regards...&lt;/P&gt;</description>
      <pubDate>Mon, 12 Dec 2016 19:06:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145076#M48385</guid>
      <dc:creator>geko</dc:creator>
      <dc:date>2016-12-12T19:06:44Z</dc:date>
    </item>
    <item>
      <title>Re: Ranger audit to HDFS creates corrupt JSON</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145077#M48386</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/1198/koenigbodensee.html" nodeid="1198"&gt;@Gerd Koenig&lt;/A&gt;, please see my linked HCC article in the parent comment. The template XML is attached to that post.&lt;/P&gt;&lt;P&gt;Essentially, the ReplaceText processor will fail, so FlowFiles that contain an incomplete JSON record will get routed to the PutFile processor within the exception flow.&lt;/P&gt;</description>
      <pubDate>Mon, 12 Dec 2016 23:54:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145077#M48386</guid>
      <dc:creator>slachterman</dc:creator>
      <dc:date>2016-12-12T23:54:18Z</dc:date>
    </item>
    <item>
      <title>Re: Ranger audit to HDFS creates corrupt JSON</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145078#M48387</link>
      <description>&lt;P&gt;thanks &lt;A rel="user" href="https://community.cloudera.com/users/11295/slachterman.html" nodeid="11295"&gt;@slachterman&lt;/A&gt; , that's perfect. I missed the attached xml on my first view of your article &lt;span class="lia-unicode-emoji" title=":winking_face:"&gt;😉&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Dec 2016 15:37:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145078#M48387</guid>
      <dc:creator>geko</dc:creator>
      <dc:date>2016-12-13T15:37:55Z</dc:date>
    </item>
    <item>
      <title>Re: Ranger audit to HDFS creates corrupt JSON</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145079#M48388</link>
      <description>&lt;P&gt;There is solution put around for this please refer &lt;A href="https://issues.apache.org/jira/browse/RANGER-1310" target="_blank"&gt;https://issues.apache.org/jira/browse/RANGER-1310&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Feb 2017 09:55:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Ranger-audit-to-HDFS-creates-corrupt-JSON/m-p/145079#M48388</guid>
      <dc:creator>rmani</dc:creator>
      <dc:date>2017-02-15T09:55:32Z</dc:date>
    </item>
  </channel>
</rss>

