Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hue Directory Monitoring

avatar
Explorer

Hello,

is it possible to monitor a Hue directory using, say, a User Defined Function?
If so, can you provide an example?

Many thanks.

4 REPLIES 4

avatar
Expert Contributor

@phir1 , 

Can you give us more context on the use-case, please? What type of monitoring you're planning to implement? And, by Hue directory, which path are you referring to? 

avatar
Explorer


I'd like to write Scala/Java or Python code which would monitor Home Directories (e.g., /Users/phil) for:

- files larger than, say, 10GB

- files containing private data, such as bank accounts, email addresses, etc.

avatar
Expert Contributor

I'm not sure if we have example scripts or custom functions, but the general idea to achieve what you're trying to do would be - 

1. Referring the HDFS reporting in CM or parse the fsimage (for large files).
2. Write a MapReduce/Spark job to scan files, or run a query in Hive/Impala/SparkSQL to see if data files are mapped to Hive tables.

You can also implement access controls through Ranger, enable transparent disk encryption (TDE) using Ranger KMS, tag sensitive hive columns (containing PII, PCI, PHI) using Atlas classification and assign tag-based masking policies from Ranger, implement navencrypt to encrypt the spill files, et. - if you're planning to protect the sensitive data from unauthorised access. 

avatar
Community Manager

@phir1, Did @Sean464's responses assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future. 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: