About snm1523

snm1523 · ‎10-17-2022

Hi, We are in process of setting up a CDP 7.1.7 SP1 cluster. We have got Knox enabled and configured across all services which works fine for all except Livy. When we attempt to submit a spark session via Livy service (Knox URL), it does not recognise the user session is getting submitted from and so returns "unauthorised error). There is a log entry in Knox gateway which clearly shows that the user name is not detected. However, when we submit a spark session directly via Livy URL with the same user, it passes through. This confirms definitely something is not correct in the Knox / Livy configuration. In this attempt, at the same place in Knox gateway logs, it has a POST entry which clearly displays the user name who has submitted this job. Digging it further, found that there are is no entry in simplified topology configuration for Livy. Not sure if this is an issue? Attached is the configuration. Secondly, we also found that in KNOX_DATA_DIR/services, we have a folder called Livy and it has 3 version folders. What sense this makes, not sure? Honestly, I do not understand what is the actual significance of these folders. It has rewrite and services.xml in it. Further referring to below article, did mentioned about creating these files in services folder, however, we are not sure how exactly that would help. Add custom service to existing descriptor in Apache Knox Proxy | CDP Private Cloud (cloudera.com) Is there a documentation that actually helps to understand all the steps that are involved in configuring Livy to work with Knox and explains the communications / interactions within these services? Also, any help in getting this setup correct would be really great. Additionally, how exactly topologies in Knox have an impact in this. Thanks snm1523

snm1523 · ‎06-30-2022

Hello, Is there a straight documentation available that would help to side car migrate Oozie jobs (50+) from HDP to CDP PB? I know of the properties file that might need some modification to point to CDP RM and relevant servers, however, facing it hard to understand / map properties file as we have 50+ workflows to be migrated. Thanks snm1523

snm1523 · ‎06-30-2022

Thank you @araujo, Do we also get a field to enter these details while installing Cloudera Manager at the page were we add custom repositories for Hadoop parcels? I don't remember it hence, asking. Thanks snm1523

snm1523 · ‎06-24-2022

Thank you for the response. I was able to find a way out to fix this.

snm1523 · ‎06-16-2022

Hi, I just completed setting up CDP Private Base in a POC environment. I was in process of attempting AD (LDAP based) integration so that users get authenticated via Active Directory. I am unsure if there was a mis configuration, however, after restarting Cloudera Manager, I can't login to it via "Admin" account (local). Thought it has got AD integrated, tried multiple accounts from AD, however, none of them are working. Please help to re-enable admin (local) account. Thanks snm1523

snm1523 · ‎06-16-2022

Hello, I am in process of setting up a CDP cluster (private base) and in step of configuring local repositories. I have got the Cloudera repos configured via out standard repository solution, Artifactory since servers will not have access to public internet to access Cloudera archives. Now the URL given to me to access those repos has to be authenticated via a User ID and API key. So URL ultimately turns out something like this: https://<user>:<API Key>@Repo URL/ Questions are: 1. Is there a way I can configure a authentication based YUM repository for Cloudera Manager packages, without having the need to enter credentials in a plain text in base URL field of YUM repo? 2. Once we have Cloudera Manager installed, we can provide a custom repo link in Cloudera Manager to fetch Cloudera Runtime Parcels. At that stage, will Cloudera Manager accept Environment variables in the URL or that again has to be a plain text? Else, is there any other way to setup authentication based Cloudera runtime repo which can be used in Cloudera Manager so we don't have to provide these creds in a plain text. Thanks snm1523

snm1523 · ‎04-08-2022

Hi All, I was able to get my script tested on my 10 nodes DEV cluster. Below are the results: 1. All HDP core services started / stopped okay 2. None of Hive Service Interactive service started and hence, Hive service was not marked as STARTED though HMS and HS2 were started okay 3. None of the Spark2_THRIFTSERVER was started Any one can share some thoughts on points 2 and 3? Thanks snm1523

snm1523 · ‎03-09-2022

Additional info @steven-matison, I checked further an API call for SPARK2 service and found a difference. Spark Thrift Server is reported as STARTED in the API output, however, on Ambari UI it is stopped. See the screenshots. Ambari UI: API Output: Any thoughts / suggestions. Thanks snm1523

snm1523 · ‎03-09-2022

@steven-matison Something like this. This time Spark, Kafka and Hive got stopped as expected, but since YARN took a longer to Stop (internal components were still getting stopped), moment script got service status of YARN as INSTALLED, it triggered HDFS and HDFS was waiting. Please refer to screenshot: What I want is to ensure previous service is completely started / stopped before the next one is triggered. Not sure if that is even possible. Thanks snm1523

snm1523 · ‎03-09-2022

Thank you for the response @steven-matison. I have 3 different scenarios so far: 1. Kafka, at times one or other Kafka Broker doesn't come up quickly. 2. Spark, Spark Thrift server is always not starting at the first place when bundled in an All services script. However, if called individually (only SPARK) works as expected. 3. Hive, we have around 6 Hive Interactive servers. HSI by nature takes a little long to start (they do start eventually though). In all the scenarios mentioned above, the moment there is an API call to start or stop service, the status in ServiceInfo field of API output changes to what is needed (i.e. Installed in case of Stop and Started in case of start), however, the underlying components are still doing their work to start / stop. Since, I am checking the status at the service level (reference below), the condition is passed and moves ahead. Ultimately I am in a situation of one service still starting / stopping and other is already triggered. str=$(curl -s -u $USER:$PASS http://{$HOST}/api/v1/clusters/$CLUSTER/services/$1) if [[ $str == *"INSTALLED"* ]] then finished=1 echo "\n$1 Stopped...\n" fi So far I have noticed this only for those 3 services. Hence, I am seeking suggestions on how do I overcome this OR if there is a way I could check status of each component of each service. Hope I was able to explain this better. Thanks snm1523

Online	Offline
Last Visited	‎10-09-2025 07:54 AM

Member Since	‎10-29-2015 07:36 PM
Last Visited	‎10-09-2025 07:54 AM
Posts	128
Kudos received	31

Cloudera Community

Re: YARN and HDFS monitoring via Grafana

Re: Enable Admin account for Cloudera Manager

Re: Datanode not starting: SIGTERM error

Re: MKDirs failed to create file

Knox / Livy communications / configuration

Migrating Oozie jobs from HDP to CDP PB

Re: Auth based local repository configuration for ...

Re: Enable Admin account for Cloudera Manager

Enable Admin account for Cloudera Manager

Auth based local repository configuration for CDP ...

Re: Scripted start / stop of HDP services

Re: Scripted start / stop of HDP services

Re: Scripted start / stop of HDP services

Re: Scripted start / stop of HDP services