About SagarKanani

SagarKanani · ‎08-21-2020

If you are using Kerberos for authentication, when a job is submitted, the user permissions are evaluated first by Ranger and once the authorization is successful, only then the Kerberos ticket is delegated to hive user and the hive user starts the execution. So, as long as the user who is submitting the job has a policy in Ranger, it should work as expected. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎08-21-2020

Hi, this should work for you : https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#alldelineatedvalues Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎08-21-2020

Partitioning and bucketing are forms to improve hive performance. Neither is mandate but is good to have. The partitioning and bucketing depend a lot on how the table looks like. If the table has millions / billions of row or the table is too wide with hundreds of columns, the query performance is impacted greatly. To answer your question, the only effect is see is performance degradation. But, it again if the table is small (my assumption ~10-15 mil) then one bucket or more than one bucket will not bring significant improvement. But, with million of rows, it always good to bucket, so the query is evaluated only on the rows within a 1/2 buckets and this results in increased performance. When the table has billions and are wide as well, ideally it is always bucketed and partitioned both interchangeably. There is no perfect solution, it always defers depending on the scenario. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎08-21-2020

If the goal is to only update the flowfiles with group id i would suggest to use the script mentioned here by @mburgess here : https://community.cloudera.com/t5/Support-Questions/Get-the-processor-group-name-in-NIFI-flow/td-p/213662 . As mentioned in the post, there is a caveat so please double check. I would still recommend to use this script anyway if there the custom processor has more to do, you can always pull in attributes in the custom processor very easily. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎08-20-2020

You need to provide more information here...Is Updatedate processor an Updateattribute processor? What does invokescriptor processor do? Are you storing an attribute called state or are you using the processor's state? Assuming its just a variable in flowfile, your attempt 2 should work. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎08-20-2020

Assuming that you will create table only once, it is very difficult since you need to parse the columns and data types. The amount of effort is not worth for a one time thing. So, i would suggest you to create the table manually. If you are going to create a table dynamically everytime, i would be interested in how you did this or if you have made some progress i would be happy to help. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎08-10-2020

Your repository server isnt accepting your request. The HTTP 403 means forbidden and that is the main problem here. If you have access to the repo server , please check the URL is correct. Also double check that the user you are running has sufficient privileges. If you do not have access, you need to contact your administrator.

SagarKanani · ‎08-10-2020

Can you provide errors from hdfs log file ? It seems to me that it is likely to be a handshake problem but logs can give more insight. You should double check the keystore and truststore (i am not sure of Cloudera Manager) to understand they are setup correctly.

SagarKanani · ‎08-07-2020

How are you trying to enable SSL? Are you using self signed certificates or signed certificates from Certificate Authority? Are you doing 1 way SSL or 2 way SSL? A general rule is that you should import server certificates in clients truststore and in case of HDFS daemons they act as both servers and clients so it requires additional setup to import certificates on both the hosts. Hope this helps.

SagarKanani · ‎08-07-2020

Please provide more info from ambari logs. One thing is to check your backend db connection.

Online	Offline
Last Visited	‎10-21-2024 07:58 AM

Member Since	‎07-13-2020 05:50 AM
Last Visited	‎10-21-2024 07:58 AM
Posts	58
Kudos received	2

Cloudera Community

Re: Different files with different columns to be l...

Re: Convert Json to Avro in Nifi

Re: bucketing table with just one bucket vs partio...

Re: External table not loading data after Alter

Re: How to access the updateattributes inside groo...

Re: How to restrict yarn queue access when Hive Im...

Re: How to compare two Arrays in Nifi ?

Re: bucketing table with just one bucket vs partio...

Re: NIFI --How to get the Current Processor group ...

Re: How to access the updateattributes inside groo...

Re: How to create table from executeQuery in NiFi

Re: AMBARI 2.7.3 UI not progressing to next step d...

Re: name nodes not working after Ssl encryption

Re: name nodes not working after Ssl encryption

Re: AMBARI 2.7.3 UI not progressing to next step d...