Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Ranger audit to HDFS creates corrupt JSON

avatar
Guru

Hi,

I configured Ranger to write audit-log to HDFS only. Now I have e.g. directories like

/ranger/audit/hiveServer2/20161206
/ranger/audit/hiveServer2/20161207

...same for hdfs, hbase...

At the end I am collecting all the single files per day (from any service) to one general folder, and put a Hive table on top.

Similar to what is described here in HCC , just extended by collecting all dedicated files from the same day to a common directory to which the partition points to.

Unfortunately the Hive-QL select statement fails with a JSON parse error, because some of the created log files are corrupt, invalid JSON, due to the last line is just cutted off, like e.g.:

hdfs dfs -cat /ranger/audit/hiveServer2/20161207/hiveServer2_ranger_audit_<hostname>.log

...
{"repoType":3,"repo":"hdp_hive","reqUser":"xxxxxx","evtTime":"2016-12-07 08:13:20.276","access":"SELECT","resource":"xxxxxxx","resType":"@column","action":"QUERY

but the first file from the same day looks fine:

hdfs dfs -cat /ranger/audit/hiveServer2/20161207/hiveServer2_ranger_audit_<hostname>.1.log

...

{"repoType":3,"repo":"hdp_hive","reqUser":"xxxxx","evtTime":"2016-12-07 12:16:24.474","access":"USE","resource":"xxxx","resType":"@database","action":"SWITCHDATABASE","result":1,"policy":17,"enforcer":"ranger-acl","sess":"bf9a9f2e-ee90-4784-9d82-87008ad2e7fa","cliType":"HIVESERVER2","cliIP":"xxxxxx","reqData":"USE dbname","agentHost":"xxxxxxx","logType":"RangerAudit","id":"5b0b00ed-ed60-4817-85e0-e1c629952414","seq_num":213,"event_count":1,"event_dur_ms":0}

What can cause those corrupt files? ...or what to do to be able to select the final Hive table without issue ?!?!

env.: HDP2.3.4, Ranger policies for HDFS, Hive, HBase enabled, all configured to store audit to HDFS folder "/ranger/audit"

Thanks for any hints...

1 ACCEPTED SOLUTION

avatar
Expert Contributor
@slachterman Thanks for sharing your experience. @Gerd Koenig

Sorry to hear that this happening quite often. This might be an issue in Ranger as mentioned by @slachterman If you have enough details, please feel free to open an Apache Ranger JIRA so that Ranger team gets a chance to look at this.

View solution in original post

9 REPLIES 9

avatar
Expert Contributor

@Gerd Koenig

Does this happen often or just one off ? Generally this would mean the writing application did not sync the data completely to HDFS. So looks like you have an incomplete JSON and Hive is not able to parse it.

avatar
Guru

Hi @aengineer ,

It happens frequently. I created an oozie Job to collect the logs each night from the day before. The logs from yesterday have the same issue.

The oozie Job runs at 3am, at that time the logs from the day before should have been closed correctly....I guess.

avatar

@aengineer I saw this consistently as well when creating this HCC article. It seems like the Ranger plugin isn't always writing complete records for the last record in the file. In the NiFi flow described in that article, I just dropped these invalid records as this was appropriate for the purposes of the analysis in question.

avatar
Guru

Hi @slachterman ,

many thanks for this hint. Could you please send me the details of the processor config to drop the line if they are invalid?

Thanks and regards...

avatar

Hi @Gerd Koenig, please see my linked HCC article in the parent comment. The template XML is attached to that post.

Essentially, the ReplaceText processor will fail, so FlowFiles that contain an incomplete JSON record will get routed to the PutFile processor within the exception flow.

avatar
Guru

thanks @slachterman , that's perfect. I missed the attached xml on my first view of your article 😉

avatar
Expert Contributor
@slachterman Thanks for sharing your experience. @Gerd Koenig

Sorry to hear that this happening quite often. This might be an issue in Ranger as mentioned by @slachterman If you have enough details, please feel free to open an Apache Ranger JIRA so that Ranger team gets a chance to look at this.

avatar
Guru

Hi @aengineer ,

many thanks, I'll try to gather the needful and open a ticket there.

avatar
Super Collaborator

There is solution put around for this please refer https://issues.apache.org/jira/browse/RANGER-1310