Member since
06-20-2016
488
Posts
433
Kudos Received
118
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3231 | 08-25-2017 03:09 PM | |
2075 | 08-22-2017 06:52 PM | |
3590 | 08-09-2017 01:10 PM | |
8298 | 08-04-2017 02:34 PM | |
8341 | 08-01-2017 11:35 AM |
12-09-2016
02:25 PM
For those who may be at a back level HDF version as we are a good workaround is to use the SplitContent instead as it utilizes many of the attributes Matt has documented above for the SplitJson processor.
... View more
12-06-2016
11:52 AM
There is both a push (reporting api) and pull (restful api) way to automate metrics collection in NiFI. See this post for an overview: https://community.hortonworks.com/questions/69004/nifi-monitoring-processor-and-nifi-service.html
... View more
12-05-2016
11:24 PM
1 Kudo
Greg, Yes, there is always a possibility depending on services deployed in the environment. For instance, if you have an audit server deployed on a so-called masternode, and a predefined audit policy, such as a Hive query, may trigger sensitive data being written to a local folder that is not in HDFS. For those type of cases, you need to set up redaction, encryption, and/or service level authentication/authorization strategies to protect the sensitive data, such as PII, PCI, SSN & etc. Speaking of Yarn, Oozie and Zookeeper in particular, you should be fine. Yarn has all the AM and containers logs stored on the datanodes, only high level RM logs are stored on the RM nodes, furthermore, you should configureJob history logs to be written to directory in HDFS, and apply HDFS native encryption on that folder if needed, /user/history/ for instance. Oozie service logs shouldn’t contain any sensitive information either, as they should only contain high level information, such as which part of a workflow fails, and you will need to drill down to the individual service logs to get more insights. Zookeeper is the same thing, only high level information stored on znodes depending on the services deployed in your environment, such as Solr schemas, Kafka topics offsets & etc. Hope that helps. Derek
... View more
12-01-2016
04:27 AM
Thank You @Matt Burgess and @greyKeys . Previously I used timestamp which results in error, then I changed it to ${uuid} its working fine.
... View more
11-30-2016
03:26 PM
1 Kudo
See comment to answer above on how to get configs to local.
... View more
04-02-2018
04:21 PM
Zhen Zang, Thanks for
the updated document and information! I’m
still struggling with the best way to determine how much of the JVM CPU, memory,
I/O, etc. each NiFi processor is using. Thanks,
... View more
12-01-2016
09:59 AM
I sloved the problem. when new file added to ftp. it modification date is before listing.timestamp Thanks you for helping me
... View more
11-27-2016
01:16 AM
Thank you @Greg Keys - that helps a lot!
... View more
11-23-2016
07:17 PM
2 Kudos
Definitely not advisable nor worth considering. It also would not be supported by Hortonworks support license. The minimum cluster size for a production environment is typically seen as 3 management nodes that hold master services like namenode, zookeepers, etc + 4 data nodes that hold data in hdfs and also slave services. The sandbox is a single node with everything, great for installing quickly, learning skills and perhaps doing simple demos ... but not production high availability, throughput, processing etc See this post for a discussion of minimal deployment: https://community.hortonworks.com/questions/48572/physical-layout-of-architecture.html
... View more
11-20-2016
04:03 AM
1 Kudo
@Karthik is correct that the provenance, content, and flowfile repositories are stored on disk unencrypted. Current recommendations are to restrict access to said repositories using OS-level access control (e.g. POSIX) and to use encrypted storage volumes. There is an existing security feature roadmap entry for transparent data encryption of the various repositories so that the values are never written to the file system in an unencrypted form. Obviously there are performance implications to take into consideration when developing this feature and an admin choosing to enable it. Just because the repository format on disk is "human unreadable" binary does not preclude the security concerns here -- an arbitrary process with OS permission can read those files, and the serialization logic is open source.
... View more