Created 01-30-2024 01:02 AM
Hello,
is it possible to monitor a Hue directory using, say, a User Defined Function?
If so, can you provide an example?
Many thanks.
Created 01-30-2024 02:33 AM
@phir1 ,
Can you give us more context on the use-case, please? What type of monitoring you're planning to implement? And, by Hue directory, which path are you referring to?
Created 01-30-2024 04:07 AM
I'd like to write Scala/Java or Python code which would monitor Home Directories (e.g., /Users/phil) for:
- files larger than, say, 10GB
- files containing private data, such as bank accounts, email addresses, etc.
Created on 01-31-2024 09:39 AM - edited 01-31-2024 09:40 AM
I'm not sure if we have example scripts or custom functions, but the general idea to achieve what you're trying to do would be -
1. Referring the HDFS reporting in CM or parse the fsimage (for large files).
2. Write a MapReduce/Spark job to scan files, or run a query in Hive/Impala/SparkSQL to see if data files are mapped to Hive tables.
You can also implement access controls through Ranger, enable transparent disk encryption (TDE) using Ranger KMS, tag sensitive hive columns (containing PII, PCI, PHI) using Atlas classification and assign tag-based masking policies from Ranger, implement navencrypt to encrypt the spill files, et. - if you're planning to protect the sensitive data from unauthorised access.
Created 02-04-2024 11:59 PM
@phir1, Did @Sean464's responses assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future.
Regards,
Vidya Sargur,