About pvillard

pvillard · ‎02-10-2017

@Raj B - https://pierrevillard.com/2017/02/10/haproxy-load-balancing-in-front-of-apache-nifi

pvillard · ‎02-10-2017

1) Correct 2) It is really tied to your use case. Let's say you want to get the content of a database table which contains a column "jobId" and that you have in input a file listing all the job IDs for which you want to retrieve the associated data. You could have a GetFile -> SplitText -> ExtractText to have one flow file per job ID with the job ID in an attribute. All of this would run on primary node. Then you distribute the flow file on the cluster with a RPG. And finally, you can use ExecuteSQL with a query containing "WHERE jobId = ${jobId}". This way you have concurrent queries against the table but each query is accessing its own data. Again it really depends of the use case and how you can split the queries. 3) Not that I am aware of. The main reason being it really depends of the use case, there is no general rule. Hope this helps.

pvillard · ‎02-08-2017

You're right. If you ensure that each ExecuteSQL takes care of its own part of data on each node of the cluster then you have load balancing. But if the same request is executed by all ExecuteSQL this won't be load balanced (unless your DB takes care of that for you). GenerateTableFetch is meant to be executed on primary node to generate multiple flow files to retrieve all the data of a table, and each flow file will take care of one "page" of data. Then the flow files are distributed over the cluster and QueryDatabaseTable will actually fetch the data. But in this case you ensure that each flow file contains a query for different pieces of data of the same table. This way you won't duplicate data and you will have load balanced the queries. ExecuteSQL can achieve the same kind of things, it really depends of your use case and how you define your queries.

pvillard · ‎02-08-2017

ExecuteSQL is not a passive processor (it does not wait for clients to send data), it is an active processor in charge of getting the data. As long as your remote DB accepts concurrent access/requests, then it's fine. When you retrieve data from a table and you want to load balance the queries, then have a look at GenerateTableFetch and QueryDatabaseTable processors.

pvillard · ‎02-08-2017

Hi @Michal R, I'd recommend to have a look at the variable registry: https://community.hortonworks.com/articles/57304/supporting-custom-properties-for-expression-langua.html Please note that this is only for processor properties supporting expression language, and that, at the moment, properties are loaded when NiFi starts. You cannot change the values while NiFi is running but variable registry will evolve in the near future. Hope this helps.

pvillard · ‎02-08-2017

Hi @Andy Liang, If you want to apply "Need Authentication" (check that the user is correctly authenticated), then you need to fill the SSL context service property. For that you have to create a standard SSL context service where you provide keystore and truststore available on NiFi host that you want to use. The keystore will contain the key used by the server, the truststore will contain the certificates of the clients the server should trust. In your case the truststore should contain the certificate of the client you want to authenticate (or if you have one, you could use a Certificate Authority so that you don't need to add a new certificate for each new client). You may want to read: - https://community.hortonworks.com/articles/27033/https-endpoint-in-nifi-flow.html - https://community.hortonworks.com/questions/19476/connecting-to-facebook-graph-api-using-nifi-postht.html The latter is not about HandleHttpRequest but will maybe bring some clarifications around SSL context service. As a final side note: if you need to perform a login/password authentication like Basic Authentication (with HTTP headers), then you don't need all of this, and you just need to check the attributes sent by the user in the request with a RouteOnAttribute processor and using the expression language. Hope this helps a bit.

pvillard · ‎02-08-2017

Well, hard to tell... but to have two name nodes should prevent you to get into this situation. I believe that we will support more than 2 name nodes in next versions of HDP. Also, just to clarify, if one of your name node fails, the other one will automatically become the active name node, you don't have anything to handle regarding the handover. But as soon as a name node fails, it is recommended to take care of it by looking at the logs and fixing the root cause.

pvillard · ‎02-07-2017

Just did the following in my env - Installed Ranger, enabled Hive plugin, created a table student with a single column name, created a user "test" and created a /user/test directory in HDFS, created a rule in Ranger to only allow read access (select) to the table student for user test. And here are my commands: https://gist.github.com/pvillard31/528d0d186d05422b0b9d1f3b94a85a02 It seems to be working as expected. In Audit / Plugins, can you check that the policies have been correctly synced with Hive?

pvillard · ‎02-07-2017

AFAIK HDF does not provide a solution for load balancing (unless you implement something yourself through ZooKeeper). You need to have specific hardware equipment or use software solutions like HA proxy (in a docker container for example).

pvillard · ‎02-07-2017

You can ask your sending system to send data to Kafka and then have NiFi to pull data from Kafka but the problem will be probably the same between your sender and Kafka (you will probably need to give the list of your kafka brokers to the sender and this is not ideal when you scale up/down). Regarding NiFi and Kafka, I recommend the following article: http://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka

Online	Offline
Last Visited	‎07-30-2024 08:59 AM

Member Since	‎04-11-2016 09:20 AM
Last Visited	‎07-30-2024 08:59 AM
Posts	471
Kudos received	325

Cloudera Community

Re: ValidateRecord doesn't maintain column order?

Re: For NiFi S2S, is it better to us load balancer...

Re: How to Limit Number of Threads for Each Proces...

Re: Once YARN queue is at capacity, running jobs s...

Re: putHiveQL error

Re: How to configure NiFi processors (that interfa...

Re: How to configure NiFi processors (that interfa...

Re: How to configure NiFi processors (that interfa...

Re: How to configure NiFi processors (that interfa...

Re: NiFi - how to declare and read global properti...

Re: HandleHttpRequest -- Client Authentication con...

Re: High Availability

Re: Hive policy in ranger is not working

Re: How to configure NiFi processors (that interfa...

Re: How to configure NiFi processors (that interfa...