Member since
07-30-2019
333
Posts
357
Kudos Received
76
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
10104 | 02-17-2017 10:58 PM | |
2398 | 02-16-2017 07:55 PM | |
8121 | 12-21-2016 06:24 PM | |
1801 | 12-20-2016 01:29 PM | |
1271 | 12-16-2016 01:21 PM |
10-05-2015
01:31 PM
1 Kudo
It's a known issue with Ambari 2.1.1. When performing this rolling upgrade for HDP, ensure you're using Ambari 2.1.2.
... View more
10-03-2015
03:19 PM
Thanks Joe. As I understand, in this scenario we could leave provenance and flowfile repos on the local disks (regular application server sizing), but for content could mount a big fat SAN/NAS/you-name-it and configure HDF to point to that. Are expiration policies configurable per-repository in that case?
... View more
10-03-2015
01:10 PM
Hi, what are the recommended approaches for handling the following scenario? NiFi is ingesting lots of files (say, pull from a remote system into the flow), and we care about file as a whole only, so flowfile content is the file, no further splits or row-by-row processing. The size of files can vary from few MBs to GBs, which is not the problem, but what happens when there are millions of files ingested this way? Say, they end up in HDFS in the dataflow. Given that file content will be recorded in the content repository to enable data provenance, disk space may become an issue. Any way to control this purge/expiration on a more fine-grained level other than instance-wide journal settings?
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
10-01-2015
11:31 PM
1 Kudo
Are you referring to the hadoop-policy section in core-site and hdfs-site? These do not control security the way you'd expect. For proper ACLs on HDFS do either of these: Secure (Kerberize) your cluster. Ambari automates this. Add Ranger and enable HDFS policies. If accessing via REST API (WebHDFS) - restrict direct datanode access via a firewall and only allow access via Knox. Knox, in turn, will be able to map an incoming user into an actual role (still, full control with audit will require adding Ranger). Andrew
... View more
10-01-2015
01:46 PM
1 Kudo
Hi All, The use case is a Banana dashboard working with a SolrCloud instance/cluster. If we follow default steps we end up using the 'data_driven_schema' in Solr, which makes it easy for it to accept any random data and try index it. However, the problem is down the line. Banana table widget can't sort on many columns as Solr complains about those fields being multi-valued. In fact, they are not, and unique (checked via admin section), but rather declared multi-valued. What is the approach to address this? Ideally, without having to specify a complete new schema for a Solr index. Can one have a benefit of flexible fields, but default to non-multi-valued maybe?
... View more
Labels:
- Labels:
-
Apache Solr
09-29-2015
05:16 PM
Thanks Mark, that was a great and exhaustive answer (I'm thinking of how to express it in a slide for advanced level deck). I guess the control plane HA for NCM itself is the next bastion, will probably require some changes on the client side (e.g. specify a list of failover NCM nodes to cycle through) as well as some UI updates to support it. Is my understanding correct?
... View more
09-29-2015
04:39 PM
2 Kudos
I highly recommend Knox's shell which uses a DSL for those operations http://knox.apache.org/books/knox-0-6-0/user-guide.html#WebHDFS Great way to programmatically interact with a cluster in a controlled and audited manner (e.g. simpler DSL and secured gateway endpoint, no need to open every node's port). BTW, it's a groovy DSL, which makes it trivial to run in any Java program.
... View more
09-29-2015
04:08 PM
Hi, I'd like to understand how the receiving end of the site-to-site protocol works. The sending side drops a remote process group on the canvas and is mostly done. The receiving side - simple in case of a single NiFi node. In a cluster, though, we still need to specify a FQDN to connect to. What is the best practice there? If we put a load balancer in front, would it break batch affinity (when site2site batches sends for you)?
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
09-29-2015
01:49 PM
3 Kudos
HDFS Balancer can run in the background and there is a controllable bandwidth that it consumes. In general, on a large cluster it can run continuously, but it is a must after adding new nodes to have a healthy system. Note for large clusters a single convergence run can be a full day or more (that shouldn't scare you away though), let it run. Also, some customers reported that had more stable experience when adding nodes in small batches of a few instead of adding a full rack at once, for example.
... View more
09-28-2015
09:45 PM
3 Kudos
I would highly recommend against re-using another ZK quorum for this purpose. The risk of the network partitioning is too high and the benefits aren't clear. As David mentions above, NN doesn't put high load on ZK for leader election. Have each NN HA pair (cluster for that matter) talk to their own ZK quorum within the same network segment.
... View more
- « Previous
- Next »