About gkeys

murphy_sean · ‎12-09-2016

For those who may be at a back level HDF version as we are a good workaround is to use the SplitContent instead as it utilizes many of the attributes Matt has documented above for the SplitJson processor.

gkeys · ‎12-06-2016

There is both a push (reporting api) and pull (restful api) way to automate metrics collection in NiFI. See this post for an overview: https://community.hortonworks.com/questions/69004/nifi-monitoring-processor-and-nifi-service.html

dsun · ‎12-05-2016

Greg, Yes, there is always a possibility depending on services deployed in the environment. For instance, if you have an audit server deployed on a so-called masternode, and a predefined audit policy, such as a Hive query, may trigger sensitive data being written to a local folder that is not in HDFS. For those type of cases, you need to set up redaction, encryption, and/or service level authentication/authorization strategies to protect the sensitive data, such as PII, PCI, SSN & etc. Speaking of Yarn, Oozie and Zookeeper in particular, you should be fine. Yarn has all the AM and containers logs stored on the datanodes, only high level RM logs are stored on the RM nodes, furthermore, you should configureJob history logs to be written to directory in HDFS, and apply HDFS native encryption on that folder if needed, /user/history/ for instance. Oozie service logs shouldn’t contain any sensitive information either, as they should only contain high level information, such as which part of a workflow fails, and you will need to drill down to the individual service logs to get more insights. Zookeeper is the same thing, only high level information stored on znodes depending on the services deployed in your environment, such as Solr schemas, Kafka topics offsets & etc. Hope that helps. Derek

vigneshviky_rg · ‎12-01-2016

Thank You @Matt Burgess and @greyKeys . Previously I used timestamp which results in error, then I changed it to ${uuid} its working fine.

gkeys · ‎11-30-2016

See comment to answer above on how to get configs to local.

flats70 · ‎04-02-2018

Zhen Zang, Thanks for the updated document and information! I’m still struggling with the best way to determine how much of the JVM CPU, memory, I/O, etc. each NiFi processor is using. Thanks,

jojoman200 · ‎12-01-2016

I sloved the problem. when new file added to ftp. it modification date is before listing.timestamp Thanks you for helping me

nitesh_varma · ‎11-27-2016

Thank you @Greg Keys - that helps a lot!

gkeys · ‎11-23-2016

Definitely not advisable nor worth considering. It also would not be supported by Hortonworks support license. The minimum cluster size for a production environment is typically seen as 3 management nodes that hold master services like namenode, zookeepers, etc + 4 data nodes that hold data in hdfs and also slave services. The sandbox is a single node with everything, great for installing quickly, learning skills and perhaps doing simple demos ... but not production high availability, throughput, processing etc See this post for a discussion of minimal deployment: https://community.hortonworks.com/questions/48572/physical-layout-of-architecture.html

alopresto · ‎11-20-2016

@Karthik is correct that the provenance, content, and flowfile repositories are stored on disk unencrypted. Current recommendations are to restrict access to said repositories using OS-level access control (e.g. POSIX) and to use encrypted storage volumes. There is an existing security feature roadmap entry for transparent data encryption of the various repositories so that the values are never written to the file system in an unencrypted form. Obviously there are performance implications to take into consideration when developing this feature and an admin choosing to enable it. Just because the repository format on disk is "human unreadable" binary does not preclude the security concerns here -- an arbitrary process with OS permission can read those files, and the serialization logic is open source.

Online	Offline
Last Visited	‎06-11-2019 01:24 AM

Member Since	‎06-20-2016 01:29 PM
Last Visited	‎06-11-2019 01:24 AM
Posts	488
Kudos received	430

Cloudera Community

Re: DR for hadoop

Re: API + how to know by API command all machines ...

Re: Does data get copied in edge node from externa...

Re: is it possible to set the hadoop.tmp.dir value...

Re: How to handle nulls when exporting from Hive?

Re: Where are Nifi attributes written?

Re: NiFi: Provenance list capture

Re: Is there ever a possibility where data in hdfs...

Re: NIFI splittext to split the single file into m...

Re: Nifi data streaming into HDFS

Re: NIFI : Monitoring processor and nifi Service

Re: how to use listftp processor?

Re: Does TailFile Processor support Expression Lan...

Re: Can HDP Sandbox be used in production?

Re: NiFi: Encrypt all flowfiles on disk during ful...