About xyao

xyao · ‎10-05-2018

These seems to be bogus replay exception when running solr service. Changes hadoop-env.sh or solr JVM option with -Dsun.security.krb5.rcache=none should fix the problem. # # Extra Java runtime options. Empty by default. export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.rcache=none ${HADOOP_OPTS}"

xyao · ‎08-27-2018

@Daniel Muller, can you grep "Safe mode is" from hdfs namenode log? That will tell the reason why namenode does not exit safemode directly.

xyao · ‎04-13-2018

Have you clean up files under dfs.datanode.data.dirs that is not being written by HDFS for blocks? If not, the non-dfs used won't change. Similar question has been answered here: https://community.hortonworks.com/questions/42122/hdfs-non-dfs-used.html.

xyao · ‎04-13-2018

@Vinit Pandey, for the encrypted files under HDFS encryption zone. They are only allowed to rename within the same encryption zone. You may copy to/from encryption zone which had an additional decrpt (or encrypt) overhead compared with rename. Please refer to the document for more details: "HDFS restricts file and directory renames across encryption zone boundaries. This includes renaming an encrypted file / directory into an unencrypted directory (e.g., hdfs dfs mv /zone/encryptedFile /home/bob), renaming an unencrypted file or directory into an encryption zone (e.g., hdfs dfs mv /home/bob/unEncryptedFile /zone), and renaming between two different encryption zones (e.g., hdfs dfs mv /home/alice/zone1/foo /home/alice/zone2). In these examples, /zone, /home/alice/zone1, and /home/alice/zone2 are encryption zones, while /home/bob is not. A rename is only allowed if the source and destination paths are in the same encryption zone, or both paths are unencrypted (not in any encryption zone)."

xyao · ‎04-13-2018

Like other MR jobs, you can specify the number of map tasks for the job via JobConf.setNumMapTasks(). However, this is only a hint and the actual number of spawned map tasks depends on the number of input splits. If you have set NumMapTasks=100 with 50 nodes, then the number of tasks in parallel per node is about 100/50 = 2 assuming the splits are evenly distributed across the nodes.

xyao · ‎04-10-2018

@Saurabh Saurabh, have you check the following HCC article to see if it applies to your case? https://community.hortonworks.com/articles/16144/write-or-append-failures-in-very-small-clusters-un.html

xyao · ‎02-13-2018

@Yog Prabhhu, you can get the file block information from WebHDFS REST API like curl -i "http://<HOST>:<PORT>/webhdfs/v1/<FilePath>?op=GETFILEBLOCKLOCATIONS The corresponding JAVA API is FileSystem.getFileBlockLocations: public BlockLocation[] getFileBlockLocations(FileStatus file, long start, long len) You will get an array of block locations like below: [BlockLocation(offset: 0, length: BLOCK_SIZE,* hosts: {"host1:9866", "host2:9866, host3:9866"},...,]

xyao · ‎02-01-2018

Can you check you hadoop.kms.authentication.kerberos.name.rules settings from kms-site.xml? Try "DEFAULT" if you have a customized setting that is invalid. You mentioned that the kms principle is changed. Can you also post your hadoop.kms.authentication.kerberos.principal and hadoop.security.auth_to_local settings from core-site.xml?

xyao · ‎01-08-2018

This is a known limitation of wholeTextFiles as reported in https://issues.apache.org/jira/browse/SPARK-18965. Try using binaryFiles as suggested in https://issues.apache.org/jira/browse/SPARK-22225.

xyao · ‎01-02-2018

@Michael Bronson It depends on whether the compoennts are going to use the new disks or not. If not, they don't need to restart. For those services that need to use the new disk. Some of them, such as HDFS datanode supported Hot-Swap, which means you can add disks by the following steps without a restart of datanode service. 1> changing the dfs.datanode.data.dir from hdfs-site.xml to include new disk locations (e.g., /data/disk2). <property> <name>dfs.datanode.data.dir</name> <value>/data/disk1,/data/disk2</value> </property> 2> Run hdfs CLI to reconfig datanode service without a restart. hdfs dfsadmin-reconfig datanode dn1.hdp.com:9820 start Other services might need a restart to use the new disks if Hot-Swap is not supported.

Online	Offline
Last Visited	‎05-16-2021 03:05 PM

Member Since	‎09-28-2015 04:06 PM
Last Visited	‎05-16-2021 03:05 PM
Posts	51
Kudos received	32

Cloudera Community

Re: Non DFS Utilization shows same post cleanup of...

Re: can't be moved from encryption zone

Re: how to set and change the number of parallel t...

Re: Getting Error: ReplicaNotFoundException: Canno...

Re: how to Find Block Locations of a file in HDFS ...

Re: Solr "Request is a replay" (Ambari Infra Solr ...

Re: HDFS NameNode won't leave safemode

Re: Non DFS Utilization shows same post cleanup of...

Re: can't be moved from encryption zone

Re: how to set and change the number of parallel t...

Re: Getting Error: ReplicaNotFoundException: Canno...

Re: how to Find Block Locations of a file in HDFS ...

Re: kerberos: Authentication failed, status: 404, ...

Re: processing 1GB file pyspark in my HDP cluster ...

Re: is it necessary to stop the components on each...