Created 12-08-2016 04:53 PM
Hi,
I configured Ranger to write audit-log to HDFS only. Now I have e.g. directories like
/ranger/audit/hiveServer2/20161206 /ranger/audit/hiveServer2/20161207 ...same for hdfs, hbase...
At the end I am collecting all the single files per day (from any service) to one general folder, and put a Hive table on top.
Similar to what is described here in HCC , just extended by collecting all dedicated files from the same day to a common directory to which the partition points to.
Unfortunately the Hive-QL select statement fails with a JSON parse error, because some of the created log files are corrupt, invalid JSON, due to the last line is just cutted off, like e.g.:
hdfs dfs -cat /ranger/audit/hiveServer2/20161207/hiveServer2_ranger_audit_<hostname>.log ... {"repoType":3,"repo":"hdp_hive","reqUser":"xxxxxx","evtTime":"2016-12-07 08:13:20.276","access":"SELECT","resource":"xxxxxxx","resType":"@column","action":"QUERY
but the first file from the same day looks fine:
hdfs dfs -cat /ranger/audit/hiveServer2/20161207/hiveServer2_ranger_audit_<hostname>.1.log ... {"repoType":3,"repo":"hdp_hive","reqUser":"xxxxx","evtTime":"2016-12-07 12:16:24.474","access":"USE","resource":"xxxx","resType":"@database","action":"SWITCHDATABASE","result":1,"policy":17,"enforcer":"ranger-acl","sess":"bf9a9f2e-ee90-4784-9d82-87008ad2e7fa","cliType":"HIVESERVER2","cliIP":"xxxxxx","reqData":"USE dbname","agentHost":"xxxxxxx","logType":"RangerAudit","id":"5b0b00ed-ed60-4817-85e0-e1c629952414","seq_num":213,"event_count":1,"event_dur_ms":0}
What can cause those corrupt files? ...or what to do to be able to select the final Hive table without issue ?!?!
env.: HDP2.3.4, Ranger policies for HDFS, Hive, HBase enabled, all configured to store audit to HDFS folder "/ranger/audit"
Thanks for any hints...
Created 12-12-2016 01:07 AM
Sorry to hear that this happening quite often. This might be an issue in Ranger as mentioned by @slachterman If you have enough details, please feel free to open an Apache Ranger JIRA so that Ranger team gets a chance to look at this.
Created 12-09-2016 06:27 PM
Does this happen often or just one off ? Generally this would mean the writing application did not sync the data completely to HDFS. So looks like you have an incomplete JSON and Hive is not able to parse it.
Created 12-09-2016 07:28 PM
Hi @aengineer ,
It happens frequently. I created an oozie Job to collect the logs each night from the day before. The logs from yesterday have the same issue.
The oozie Job runs at 3am, at that time the logs from the day before should have been closed correctly....I guess.
Created 12-12-2016 01:01 AM
@aengineer I saw this consistently as well when creating this HCC article. It seems like the Ranger plugin isn't always writing complete records for the last record in the file. In the NiFi flow described in that article, I just dropped these invalid records as this was appropriate for the purposes of the analysis in question.
Created 12-12-2016 11:06 AM
Hi @slachterman ,
many thanks for this hint. Could you please send me the details of the processor config to drop the line if they are invalid?
Thanks and regards...
Created 12-12-2016 03:54 PM
Hi @Gerd Koenig, please see my linked HCC article in the parent comment. The template XML is attached to that post.
Essentially, the ReplaceText processor will fail, so FlowFiles that contain an incomplete JSON record will get routed to the PutFile processor within the exception flow.
Created 12-13-2016 07:37 AM
thanks @slachterman , that's perfect. I missed the attached xml on my first view of your article 😉
Created 12-12-2016 01:07 AM
Sorry to hear that this happening quite often. This might be an issue in Ranger as mentioned by @slachterman If you have enough details, please feel free to open an Apache Ranger JIRA so that Ranger team gets a chance to look at this.
Created 12-12-2016 11:04 AM
Hi @aengineer ,
many thanks, I'll try to gather the needful and open a ticket there.
Created 02-15-2017 01:55 AM
There is solution put around for this please refer https://issues.apache.org/jira/browse/RANGER-1310