Created on 01-19-2018 01:46 AM - edited 09-16-2022 05:45 AM
Hi everyone!
I have a cluster installed by Cloudera Manager Path B with Parcel CDH 5.13.1 activated. I have some doubts about how configuration files work in this environment (I was used to edit manually them in the Quickstart virtual machine provided by Cloudera).
For example: I need to add/modify a property configuration in Oozie, thus i searched in the node where Oozie server is installed for the file "oozie-site.xml". I got the following results:
The content is very different between the one in /opt and the others in /run (that seem to have the same content):
/opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/etc/oozie/conf.dist/oozie-site.xml
<?xml version="1.0"?> <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <configuration> <!-- Refer to the oozie-default.xml file for the complete list of Oozie configuration properties and their default values. --> <!-- Proxyuser Configuration --> <!-- <property> <name>oozie.service.ProxyUserService.proxyuser.#USER#.hosts</name> <value>*</value> <description> List of hosts the '#USER#' user is allowed to perform 'doAs' operations. The '#USER#' must be replaced with the username o the user who is allowed to perform 'doAs' operations. The value can be the '*' wildcard or a list of hostnames. For multiple users copy this property and replace the user name in the property name. </description> </property> <property> <name>oozie.service.ProxyUserService.proxyuser.#USER#.groups</name> <value>*</value> <description> List of groups the '#USER#' user is allowed to impersonate users from to perform 'doAs' operations. The '#USER#' must be replaced with the username o the user who is allowed to perform 'doAs' operations. The value can be the '*' wildcard or a list of groups. For multiple users copy this property and replace the user name in the property name. </description> </property> --> <!-- Default proxyuser configuration for Hue --> <property> <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name> <value>*</value> </property> </configuration>
/run/cloudera-scm-agent/process/49-oozie-OOZIE_SERVER/oozie-site.xml
<?xml version="1.0" encoding="UTF-8"?> <!--Autogenerated by Cloudera Manager--> <configuration> <property> <name>oozie.service.JPAService.jdbc.driver</name> <value>org.postgresql.Driver</value> </property> <property> <name>oozie.service.JPAService.jdbc.username</name> <value>oozie_oozie_server</value> </property> <property> <name>oozie.service.JPAService.jdbc.password</name> <value>********</value> </property> <property> <name>oozie.service.JPAService.jdbc.url</name> <value>jdbc:postgresql://master:7432/oozie_oozie_server</value> </property> <property> <name>oozie.services.ext</name> <value>org.apache.oozie.service.PartitionDependencyManagerService,org.apache.oozie.service.HCatAccessorService,org.apache.oozie.service.MetricsInstrumentationService</value> </property> <property> <name>oozie.service.EventHandlerService.event.listeners</name> <value></value> </property> <property> <name>oozie.service.URIHandlerService.uri.handlers</name> <value>org.apache.oozie.dependency.FSURIHandler,org.apache.oozie.dependency.HCatURIHandler</value> </property> <property> <name>oozie.service.HCatAccessorService.hcat.configuration</name> <value>hive-conf/hive-site.xml</value> </property> <property> <name>oozie.service.ActionService.executor.ext.classes</name> <value></value> </property> <property> <name>oozie.service.SchemaService.wf.ext.schemas</name> <value></value> </property> <property> <name>oozie.email.smtp.host</name> <value>localhost</value> </property> <property> <name>oozie.email.smtp.port</name> <value>25</value> </property> <property> <name>oozie.email.from.address</name> <value>oozie@localhost</value> </property> <property> <name>oozie.email.smtp.auth</name> <value>false</value> </property> <property> <name>oozie.service.CoordMaterializeTriggerService.lookup.interval</name> <value>300</value> </property> <property> <name>oozie.service.coord.normal.default.timeout</name> <value>120</value> </property> <property> <name>oozie.service.WorkflowAppService.system.libpath</name> <value>/user/oozie/share/lib</value> </property> <property> <name>oozie.service.CallableQueueService.callable.concurrency</name> <value>10</value> </property> <property> <name>oozie.service.CallableQueueService.queue.size</name> <value>10000</value> </property> <property> <name>oozie.service.CallableQueueService.threads</name> <value>50</value> </property> <property> <name>oozie.service.PurgeService.older.than</name> <value>30</value> </property> <property> <name>oozie.service.PurgeService.coord.older.than</name> <value>7</value> </property> <property> <name>oozie.service.PurgeService.bundle.older.than</name> <value>7</value> </property> <property> <name>oozie.service.PurgeService.purge.old.coord.action</name> <value>true</value> </property> <property> <name>oozie.service.DBLiteWorkflowStoreService.status.metrics.collection.interval</name> <value>1</value> </property> <property> <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name> <value>*</value> </property> <property> <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>oozie.service.HadoopAccessorService.nameNode.whitelist</name> <value>master:8020</value> </property> <property> <name>oozie.actions.default.name-node</name> <value>hdfs://master:8020</value> </property> <property> <name>oozie.service.HadoopAccessorService.jobTracker.whitelist</name> <value>master:8032</value> </property> <property> <name>oozie.actions.default.job-tracker</name> <value>master:8032</value> </property> <property> <name>oozie.service.HadoopAccessorService.hadoop.configurations</name> <value>*=yarn-conf</value> </property> <property> <name>oozie.service.HadoopAccessorService.action.configurations</name> <value>*=yarn-conf</value> </property> <property> <name>oozie.base.url</name> <value>http://master:11000/oozie</value> </property> <property> <name>oozie.service.GroupsService.hadoop.security.group.mapping</name> <value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value> </property> <property> <name>oozie.service.CallbackService.base.url</name> <value>http://master:11000/oozie/callback</value> </property> <property> <name>oozie.service.SparkConfigurationService.spark.configurations</name> <value>*=/etc/spark/conf</value> </property> <property> <name>hadoop.security.credential.provider.path</name> <value>localjceks://file//run/cloudera-scm-agent/process/49-oozie-OOZIE_SERVER/creds.localjceks</value> </property> </configuration>
Are the files in /run the ones used by the cluster? If I need to make an edit to configuration, can I do it manually (in this case which file in /run should I modify?) or I have to do it through the Cloudera Manager web console?
Thanks for any information!
Created 01-19-2018 07:53 AM
If you have cloudera managed cluster then it is recomended to manage your configuration via cloudera manager.
please do not edit the configuration file manually unless you are very familiar as the same copy of configuration file will be maintained in different nodes and in different locations (in the same node) for various reasons
Created 01-19-2018 08:13 AM
Thank you for your question.
When using Cloudera Manager to manage your cluster, configuration for all your services is stored centrally in Cloudera Manager. When a service role is started, Cloudera Manager assembles the necessary configuration which the agents will download and distribute to a unique "process" directory. Those are the /run/cloudera-scm-agent directories you found to contain oozie-site.xml.
If you wish to change configuration for a service or role, you do so in Cloudera Manager itself so it can deploy the necessary configuration files and set the environment variables to run the process.
The oozie-site.xml file you found in the /opt/cloudera/parcels... directory is a "stock" file that ships with the parcels. It is not intended for use or editing and should not be modified unless instructed by Cloudera.
In order for you to run client commands on a host, you will need to have Cloudera Manager distribute Client Configuration files.
These topics are explained in more detail here:
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_service_config_overview.html
Let us know if you have any other questions.
Created 01-19-2018 12:01 PM
Thank you very much for the for informations.
Thus if I understand correctly I should use Cloudera Manager for everything...
During the installation procedure with CM I've included the Teradata Connector parcel (which I need to conenct to Teradata database through Oozie/Sqoop). In CM web console I can see that the Teradata Connector parcel has been deployed and activated by the CM, but a Sqoop job using Teradata connector will fail (it does not find the connector). According to https://www.cloudera.com/documentation/other/connectors/teradata/1-x/topics/cctd_topic_3.html#concep... the connector should be copied in the Sqoop library inside the parcel, but the file is not present. Also, the documentations says that in order to use a Sqoop action inside Oozie that performs a Teradata import the following should be present in the oozie-site.xml file:
<configuration> <property> <name>sqoop.connection.factories</name> <value>com.cloudera.connector.teradata.TeradataManagerFactory</value> </property> <configuration>
but in none of the oozie-site.xml files inside /run/cloudera-scm-agent/process/* folders this property is set, neither it seems to be present in the Cloudera Manager configuration for Oozie in the web console. In this situation what should I do?
Created 01-19-2018 12:42 PM
Hi @ludof,
If a third party vendor instructs you to add or remove files in their parcel, that's just fine; it is best to follow their instructions.
Since you are using Cloudera Manager, you should follow the instructions here:
https://www.cloudera.com/documentation/other/connectors/teradata/1-x/topics/cctd_topic_3.html#concep...
The instructions you reference regarding the XML configuration apply to installation when Cloudera Manager is not managing the cluster.
If the documentation is correct, you should only need to create a sqoop1 gateway on any hosts that will be using the teradata connector and then deploy client config. After that, distribute and activate the connector parcel.
Note the following that may be relevant to the problem you describe:
Important: The Sqoop 1 Client Gateway is required for the Teradata Connector to work correctly. Cloudera recommends installing the Sqoop 1 Client Gateway role on any host used to execute the Sqoop CLI. If you do not already have the Sqoop Client service running on your cluster, see Managing the Sqoop 1 Client for instructions on how to add the service using the Cloudera Manager Admin Console.
Created on 01-22-2018 07:58 AM - edited 01-22-2018 08:01 AM
Hi bgooley, yhank you for the answer!
I've set the Sqoop 1 Client Gateway and deployed the configuration. The Teradata Connector parcel was already activated during the cluster configuration phase.
What puzzles me is that in the Cloudera documentation, for the manual installation, says that the following property must be added to the sqoop-site.xml file in order to use the Teradata connector with Oozie:
<configuration> <property> <name>sqoop.connection.factories</name> <value>com.cloudera.connector.teradata.TeradataManagerFactory</value> </property> <configuration>
I've followed the teradata connector installation path trough Cloaudera Manager, thus I was expecting that the property had been injected automatically by the CM, but it seems missing in the sqoop-site.xml files:
/run/cloudera-scm-agent/process/ccdeploy_sqoop-conf_etcsqoopconf.cloudera.sqoop_client_1328941158299821742/sqoop-conf/sqoop-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera Manager-->
<configuration>
<property>
<name>sqoop.connection.factories</name>
<value></value>
</property>
<property>
<name>sqoop.tool.plugins</name>
<value></value>
</property>
</configuration>
/etc/sqoop/conf.cloudera.sqoop_client/sqoop-site.xml
<?xml version="1.0" encoding="UTF-8"?> <!--Autogenerated by Cloudera Manager--> <configuration> <property> <name>sqoop.connection.factories</name> <value></value> </property> <property> <name>sqoop.tool.plugins</name> <value></value> </property> </configuration>
In this case should I inject it manually from the configuration in Cloudera Manager web console as explained from @saranvisa?
Thanks for the help
Created 01-19-2018 01:05 PM
If my understanding is correct, you want to add few new parameters which are not already present in configuration. If so, in general cloudera provides this option
Ex: Cloudera manager -> Hive -> Configuration -> search for 'Advanced Configuration' -> identify the correct file & click on + symbol and add the corresponding name and value
if you need specific to Oozie, search for 'Oozie Server Advanced Configuration Snippet (Safety Valve) for oozie-site.xml' under Oozie -> Configuration and add name values as needed
also pls make sure to restart the corresponding service, so that your changes will be activated