Member since
08-14-2018
47
Posts
4
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1495 | 01-26-2021 03:16 PM | |
2062 | 08-28-2019 07:11 AM |
01-26-2021
03:16 PM
1 Kudo
@mike_bronson7 It seems to me like this is a symptom of having the default replication set to 3. This is for redundancy and processing capability within HDFS. It is recommended to have minimum 3 data nodes in the cluster to accommodate 3 healthy replicas of a block (as we have a default replication of 3). HDFS will not write replicas of the same blocks to the same data node. In your scenario there will be under replicated blocks and 1 healthy replica will be placed on the available data node. You may run setrep [1] to change the replication factor. If you provide a path to a directory then the command recursively changes the replication factor of all files under the directory tree rooted at path. hdfs dfs -setrep -w 1 /user/hadoop/dir1 [1] https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#setrep
... View more
01-25-2021
08:10 AM
CDH6.3.x supports Spark2.4.0 - https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_63_packaging.html#c... You may find the CSD + Parcel here: https://docs.cloudera.com/documentation/spark2/latest/topics/spark2_packaging.html#packaging Spark 3 is offically supported in CDP 7.1.5 - https://docs.cloudera.com/cdp-private-cloud-base/7.1.5/cds-3/topics/spark-spark-3-overview.html
... View more
01-25-2021
07:29 AM
@Paop You will need a HDFS and spark gateway role on the node where you are triggering the job. The error -- "Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream" -- is a hdfs class. Which would lead me to believe that you do not have gateway role on the node from where you running the command.
... View more
08-28-2019
07:11 AM
1 Kudo
Slow get rate is The number of Gets that took over 1000ms to complete.
... View more