Member since
09-24-2018
10
Posts
1
Kudos Received
0
Solutions
10-10-2018
02:07 AM
Hi, @Predrag Minovic; The hadoop version is 2.7.3. It seems property 'mapreduce.job.hdfs-servers.token-renewal.exclude' is not available for hadoop version under 2.8.0. I've tried the method you provided, but got same error.
... View more
10-09-2018
07:37 AM
Hi, @Saurabh Gupta; you can try reduce the number of replication factor, Below information may help you: 1, Modify ClusterB's hdfs replication factor. vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
###
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
### 2, Run distcp to replicate ClusterA's data into ClusterB hadoop distctp -Ddfs.replication=2 -update -p hdfs://ClusterA:8020/ hdfs://ClusterB:8020/ 3, If ClusterB already has existing data. Can run below command to release the space. hdfs dfs -setrep -w 2 hdfs://ClusterB:8020/ That may reduce the total size of HDFS data.
... View more
10-09-2018
07:16 AM
Hi, @saichand
akella
I've tested using same cluster name for base replicate and it worked. Thank you.
... View more
10-09-2018
03:52 AM
Hi, @Predrag Minovic Thank you for your reply. I tried the method you suggest but got the same error. Below is the command what I ran: hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk,d002.server.edu.tk -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/
hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk:8020,d002.server.edu.tk:8020 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/
hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk,d002.server.edu.tk:8020 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/
hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/
hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2:8020 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/
... View more
10-09-2018
03:52 AM
Hi, @Jonathan Sneep Thank you for your reply. I tried the method you suggest but got the same error. Below is the command what I ran: hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk,d002.server.edu.tk -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/
... View more
10-09-2018
03:52 AM
4, Hadoop version is 2.7.3
... View more
10-09-2018
03:52 AM
Hi, ALL;
I have two hadoop cluster named as cluster_1 and cluster_2 with different zookeepers. Now I want to Distcp hdfs files from cluster_1 to cluster_2.
following as cluster_1's and cluster_2's information.
### cluster_1
1, active namenode: g001.server.edu.tk standby namenode: g002.server.edu.tk
2, zookeeper hosts: g003.server.edu.tk g004.server.edu.tk g005.server.edu.tk
### cluster_2
1, active namenode: d001.server.edu.tk standby namenode: d002.server.edu.tk
2, zookeeper hosts: d003.server.edu.tk d004.server.edu.tk d005.server.edu.tk<br>
1, In order to distcp data from cluster_1 to cluster_2, I copied whole hadoop configuration files from $HADOOP_HOME/etc/hadoop to /configurations/hadoop in cluster_1 and added following properties in hdfs-site.xml in g001.server.org.tk: <property>
<name>dfs.nameservices</name>
<value>cluster_1,cluster_2</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.cluster_2</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.namenodes.cluster_2</name>
<value>d001.server.edu.tk,d002.server.edu.tk</value> </property>
<property>
<name>dfs.namenode.rpc-address.cluster_2.d001.server.edu.tk</name>
<value>d001.server.edu.tk:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.cluster_2.d001.server.edu.tk</name>
<value>d001.server.edu.tk:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.cluster_2.d001.server.edu.tk</name>
<value>d001.server.edu.tk:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.cluster_2.d001.server.edu.tk</name>
<value>d001.server.edu.tk:50470</value>
</property>
<property>
<name>dfs.namenode.rpc-address.cluster_2.d002.server.edu.tk</name>
<value>d002.server.edu.tk:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.cluster_2.d002.server.edu.tk</name>
<value>d002.server.edu.tk:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.cluster_2.d002.server.edu.tk</name>
<value>d002.server.edu.tk:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.cluster_2.d002.server.edu.tk</name>
<value>d002.server.edu.tk:50470</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.cluster_1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.namenodes.cluster_1</name>
<value>g001.server.edu.tk,g002.server.edu.tk</value>
</property>
<property>
<name>dfs.namenode.rpc-address.cluster_1.g001.server.edu.tk</name>
<value>g001.server.edu.tk:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.cluster_1.g001.server.edu.tk</name>
<value>g001.server.edu.tk:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.cluster_1.g001.server.edu.tk</name>
<value>g001.server.edu.tk:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.cluster_1.g001.server.edu.tk</name>
<value>g001.server.edu.tk:50470</value>
</property>
<property>
<name>dfs.namenode.rpc-address.cluster_1.g002.server.edu.tk</name>
<value>g002.server.edu.tk:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.cluster_1.g002.server.edu.tk</name>
<value>g002.server.edu.tk:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.cluster_1.g002.server.edu.tk</name>
<value>g002.server.edu.tk:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.cluster_1.g002.server.edu.tk</name>
<value>g002.server.edu.tk:50470</value>
</property>
2, Then I ran the Distcp command in g001.server.org.tk: hdfs --config /configurations/hadoop distcp -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/ 3, But got errors like: 18/10/08 07:55:00 ERROR tools.Distcp: Exception encountered
java.io.IOException: org.apache.hadoop.yarn.exception.YarnException: Failed to submit application_xxx to YARN: Failed to renew token Kind: HDFS_DELEGATION_YOKEN, Service: hdfs:cluster_2, Ident: (HDFS_DELEGATION_TOKEN token 50168 for hdfs)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:306)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(Jobsubmitter.java:240)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject;.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:183)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:126)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_xxxx to YARN: Failed to RENEW token: Kind: HDFS_DELEGATION_TOKEN, Service: hfs:cluster_2, Ident: (HDFS_DELEGATION_TOKEN token 50168 for hdfs)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:271)
at org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:290)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:291)
... 12 more 4, Hadoop version is 2.7.3
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN
09-26-2018
09:38 PM
1 Kudo
Hi;
At the first day we started spark thrift with keytab file and principal, we could use beeline to connect database and could get data from tables. [spark@xxx] $SPARK_HOME/sbin/start-thriftserver.sh --master yarn-client --keytab /keytab/spark_thrift.keytab --principal thriftuser/thrift.server.org@THRIFT.REALMS.ORG --hiveconf hive.server2.thrift.port=10102 --conf spark.hadoop.fs.hdfs.impl.disable.cache=true --hiveconf hive.server2.authetication.kerberos.pricipal=thriftuser/thrift.server.org@THRIFT.REALMS.ORG --hiveconf hive.server2.authetication.kerberos.keytab /keytab/spark_thrift.keytab --hiveconf hive.server2.logging.operation.enabled=true We renewed the principal every 18hours. while (true) do
kinit -kt /keytab/spark_thrift.keytab thriftuser/thrift.server.org@THRIFT.REALMS.ORG
sleep 18h
done & The first day we started spark thrift, we could use beeline normally. [spark@xxx]beeline
beeline> !connect jdbc:hive2://hive.server.org:10102/database;principal=thriftuser/thrift.server.org@THRIFT.REALMS.ORG
beeline> select count(1) from table;
###This would show the table details.
But one day left we tried get data again. it would throw errors like: java.lang.ClassCastException:
org.apache.hadoop.security.authentication.client.AuthenticationException
cannot be cast to java.security.GeneralSecurityException
at
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:189)
at
org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
at
org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1381)
at
org.apache.hadoop.hdfs.DFSClient.createWrappedInputStream(DFSClient.java:1451)
at
org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:305)
at
org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:312)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
at
org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109)
at
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at
org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:252)
at
org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:251)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
What we've checked: 1, We renewed the principal we used every 18 hours. 2, Checked spark thrift log. we found that credentials stored in HDFS were renewed every 15 hours. 18/09/19 16:45:58 INFO Client: Credentials file set to : credentials-xxxxx
18/09/19 16:45:59 INFO Client: To enable the AM to login from keytab, credentials are being copied over to the AM via the YARN secure Distributed Cache.
18/09/19 16:46:10 INFO CredentialUpdater:Scheduling credentials refresh from HDFS in 57588753ms.
18/09/20 08:45:58 INFO CredentialUpdater:Reading new credentials from hdfs://cluster/user/thriftuser/.sparkStaging/application_xxx/credentials-xxyyx
18/09/20 08:45:58 INFO CredentialUpdater:Credentials updated from credentials files.
18/09/20 08:45:58 INFO CredentialUpdater:Scheduling credentials refresh from HDFS in 57588700ms. 3, Checked ranger-kms access log, we found that error code was 403 when decrypt. xxx.xxx.xxx.xxx - - [20/Sep/2018:10:57:50 +0800] "POST /kms/v1/keyversion/thriftuser_key%400/_eek?eek_op=decrypt HTTP/1.1 403 410" 4, But when we read encrypted data (which hive metadata stored.) directory with the principal active, the data can be read successfully.. [spark@xxx]hdfs dfs -cat /user/thriftuser/test.txt
test!
... View more
Labels:
- Labels:
-
Apache Ranger
-
Apache Spark
09-26-2018
07:02 AM
@saichand akella Thank you for you quick respond! Actually, I need to have the same cluster name for HBase replicate for some reasons. So, I want to know details about what is the disadvantage/drawback to use the same cluster name for HBase replicate. Thank you!
... View more
09-26-2018
03:07 AM
Hi, For HBase Replicate process. I have two separate zookeepers. But do I need different cluster name for HBase replicate? or I can modify the same cluster name to two clusters? Thank you.
... View more
- Tags:
- Data Processing
- HBase
Labels:
- Labels:
-
Apache HBase