About araujo

araujo · ‎03-25-2022

Woah! CDH 5.4.3 is *really* old. Unfortunatelly I don't have a cluster running that version here to test. Hive has come a long way since then. We're already using Hive 3 on CDP 7.x. I'd recommend you upgrade your system, if possible. Cheers, André

araujo · ‎03-25-2022

@Boss , These are upper bound values to ensure that the services running on the machine won't run into limitations on the number of processes or open file descriptors. IMO, these are really pertinent parameters when you have gateway servers where tens or hundreds users connect to to run their own processes and you want to make sure no single user will run rogue processes that will starve everyone else of resources. The hosts in a CDP cluster enviroment are typically not hosts where users should be connecting directly to. The services and processes that run on those hosts are well known and managed by the administrator. In this scenario, these parameter are not as critical and we usually set them to a value that get them "out of the way", so that that we never reach them. Specifically to answer your question, though: "nofile" is the limit of open file descriptors. Note that file descriptors are not only associated to files; for example, they are also used to refer to open network sockets/ports and pipes. You can check the file descriptors currently open using the command "lsof" "nproc" is the limit of running processes. You can check that with the command "ps". Cheers, André

araujo · ‎03-25-2022

How did you create the impala_test principal?

araujo · ‎03-25-2022

@pandu2022 , Please check the servicePrincipalName (SPN) property of the AD user. It should be impala_test/<host>@realm. André

araujo · ‎03-24-2022

@Aditya-Moghe , Would you be able to better explain what you're trying to achieve? You post is not very clear. Cheers, André

araujo · ‎03-24-2022

@AndreDre1 , did the above answer your question? André -- Was your question answered? Please take some time to click on "Accept as Solution" below this post. If you find a reply useful, say thanks by clicking on the thumbs up button.

araujo · ‎03-24-2022

@pandu2022 , The KDC does not need to connect to Impala servers. Do you happen to have multiple realms in your environment with cross-realm trust configured between them? Could you please run the below commands and share the output? kinit <your_user> kvno impala/<host_fqdn>@<REALM> kvno impala_test/<host_fqdn>@<REALM> Cheers, André -- Was your question answered? Please take some time to click on "Accept as Solution" below this post. If you find a reply useful, say thanks by clicking on the thumbs up button.

araujo · ‎03-24-2022

@wazzu62 , "Connection refused" errors usually means that the server is reachable but not accepting connections on that port. If it was a firewall issue you should've seen a "connection timeout" type of error. This might mean that either (a) the service running on the target server is not actually running or (b) it has been configured with a different port. Have you tried telneting into that port to check if it's open at all? Try this locally from the server and remotely from where the ATS is running. Cheers, André -- Was your question answered? Please take some time to click on "Accept as Solution" below this post. If you find a reply useful, say thanks by clicking on the thumbs up button.

araujo · ‎03-24-2022

@CJ-Llanes , The expression ${id} refers to a flowfile attribute called "id", not to the "id" attribute of your flowfile content. You need to extract the "id" from the flowfile first with a EvaluateJsonPath between your FlattenJson and PutDynamoDB processors. Like this: Cheers, André -- Was your question answered? Please take some time to click on "Accept as Solution" below this post. If you find a reply useful, say thanks by clicking on the thumbs up button.

araujo · ‎03-24-2022

@mystefied_ , Which version of Hive are you using? That query works well on my cluster. Nevertheless, you should be able to run the below, which is pretty much the same: select yr, mth, month_total, month_total / lag(month_total, 1) over (order by yr, mth) as percentage_over_previous_month, sum(month_total) over (order by yr, mth) as running_sum, sum(month_total) over (order by yr, mth) / sum(month_total) over (partition by 1) as running_percentage from ( select year(stock_date) as yr, month(stock_date) as mth, sum(stock_price) as month_total from table4 group by year(stock_date), month(stock_date) ) x Cheers, André -- Was your question answered? Please take some time to click on "Accept as Solution" below this post. If you find a reply useful, say thanks by clicking on the thumbs up button.

Online	Offline
Last Visited	‎07-21-2025 10:25 PM

Member Since	‎06-26-2015 11:59 AM
Last Visited	‎07-21-2025 10:25 PM
Posts	515
Kudos received	139

Cloudera Community

Re: Is it possible to use Single User authenticati...

Re: Dynamically Assign an XSD File

Re: "error": "There is no mapped role for the grou...

Re: Read xml file content into an Attribute: How t...

Re: Nifi Lookup CSV values with SQL NULL values

Re: Hive query for functions

Re: How to set ulimit values

Re: Kerberos Authentication Failure : Catalog Serv...

Re: Kerberos Authentication Failure : Catalog Serv...

Re: Nifi http

Re: No show Users and Policies in Global Menu

Re: Kerberos Authentication Failure : Catalog Serv...

Re: ATSv2 HBase Application The HBase application ...

Re: Hash key value missing putdynamodb nifi

Re: Hive query for functions