Member since
05-07-2020
32
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2047 | 06-23-2020 01:13 AM |
07-31-2020
07:14 AM
Have seen other topics with the same or similar subject name, in particular this one. Followed the hints, however they do not solve my problem, or it is unclear how to implement a solution. Hence let me create this alternate topic.
In a CDH 6.3.2 cluster have an Anaconda parcel distributed and activated, which of course has the numpy module installed. However the Spark nodes seem to ignore the CDH configuration and keep using the system wide Python from /usr/bin/python.
Nevertheless I have installed numpy in system wide Python across all cluster nodes. However I still experience the "ImportError: No module named numpy".
Would appreciate any further advice how to solve the problem.
Also not sure how to implement the solution referred in https://stackoverflow.com/questions/46857090/adding-pyspark-python-path-in-oozie. Any clarification much appreciated.
Here is the error extracted from a Jupyter notebook output:
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure:
Aborting TaskSet 1.0 because task 0 (partition 0)
cannot run anywhere due to node and executor blacklist.
Most recent failure:
Lost task 0.0 in stage 1.0 (TID 1, blc-worker-03.novalocal, executor 2): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/python/pyspark/worker.py", line 359, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/python/pyspark/worker.py", line 64, in read_command
command = serializer._read_with_length(file)
File "/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/python/pyspark/serializers.py", line 172, in _read_with_length
return self.loads(obj)
File "/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/python/pyspark/serializers.py", line 580, in loads
return pickle.loads(obj)
File "/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/python/pyspark/mllib/__init__.py", line 28, in <module>
import numpy
ImportError: No module named numpy
... View more
Labels:
07-31-2020
05:54 AM
In a CDH 6.3.2 cluster have an Anaconda parcel distributed and activated, which of course has the numpy module installed. However the Spark nodes seem to ignore the CDH configuration and keep using the system wide Python from /usr/bin/python. Nevertheless I have installed numpy in system wide Python across all cluster nodes. However I still experience the "ImportError: No module named numpy". Would appreciate any further advice how to solve the problem. Not sure how to implement the solution referred in https://stackoverflow.com/questions/46857090/adding-pyspark-python-path-in-oozie.
... View more
07-31-2020
05:48 AM
@kernel8liang Could you please explain how to implement the solution?
... View more
07-28-2020
04:32 AM
@GangWar Please see the CDSW session command log and the hdfs-site.xml file contents enclosed. !echo $PATH
/usr/lib/jvm/jre-openjdk/bin:/home/cdsw/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/conda/bin:/opt/cloudera/parcels/CDH/bin:/home/cdsw/.conda/envs/python3.6/bin
!which hdfs
/opt/cloudera/parcels/CDH/bin/hdfs
!/opt/cloudera/parcels/CDH/bin/hdfs dfs -ls /
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"WARN","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/07/28 11:28:47","logger":"hdfs.DFSUtilClient","timezone":"UTC","log":{"message":"Namenode for namenodeHA remains unresolved for ID namenode43. Check your hdfs-site.xml file to ensure namenodes are configured properly."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"WARN","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/07/28 11:28:47","logger":"hdfs.DFSUtilClient","timezone":"UTC","log":{"message":"Namenode for namenodeHA remains unresolved for ID namenode57. Check your hdfs-site.xml file to ensure namenodes are configured properly."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/07/28 11:28:47","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "blc-control-03.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over blc-control-03.novalocal:8020 after 1 failover attempts. Trying to failover after sleeping for 1424ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/07/28 11:28:49","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "blc-control-02.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over blc-control-02.novalocal:8020 after 2 failover attempts. Trying to failover after sleeping for 2662ms."}} <?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera Manager-->
<configuration>
<property>
<name>dfs.nameservices</name>
<value>namenodeHA</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.namenodeHA</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled.namenodeHA</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>blc-control-01.novalocal:2181,blc-control-02.novalocal:2181,blc-control-03.novalocal:2181</value>
</property>
<property>
<name>dfs.ha.namenodes.namenodeHA</name>
<value>namenode43,namenode57</value>
</property>
<property>
<name>dfs.namenode.rpc-address.namenodeHA.namenode43</name>
<value>blc-control-02.novalocal:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.namenodeHA.namenode43</name>
<value>blc-control-02.novalocal:8022</value>
</property>
<property>
<name>dfs.namenode.http-address.namenodeHA.namenode43</name>
<value>blc-control-02.novalocal:9870</value>
</property>
<property>
<name>dfs.namenode.https-address.namenodeHA.namenode43</name>
<value>blc-control-02.novalocal:9871</value>
</property>
<property>
<name>dfs.namenode.rpc-address.namenodeHA.namenode57</name>
<value>blc-control-03.novalocal:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.namenodeHA.namenode57</name>
<value>blc-control-03.novalocal:8022</value>
</property>
<property>
<name>dfs.namenode.http-address.namenodeHA.namenode57</name>
<value>blc-control-03.novalocal:9870</value>
</property>
<property>
<name>dfs.namenode.https-address.namenodeHA.namenode57</name>
<value>blc-control-03.novalocal:9871</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>fs.permissions.umask-mode</name>
<value>022</value>
</property>
<property>
<name>dfs.client.block.write.locateFollowingBlock.retries</name>
<value>7</value>
</property>
<property>
<name>dfs.namenode.acls.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>false</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/run/hdfs-sockets/dn</value>
</property>
<property>
<name>dfs.client.read.shortcircuit.skip.checksum</name>
<value>false</value>
</property>
<property>
<name>dfs.client.domain.socket.data.traffic</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
<value>ALWAYS</value>
</property>
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.best-effort</name>
<value>true</value>
</property>
</configuration>
... View more
07-20-2020
01:22 AM
Let me refresh and kindly remind about this open support question.
... View more
07-02-2020
02:10 AM
@GangWar I do confirm that I am able to list the HDFS files from the CDSW master node: [root@cdsw-master-01 ~]# hdfs dfs -ls /
Found 3 items
drwxr-xr-x - hbase hbase 0 2020-06-29 19:23 /hbase
drwxrwxrwt - hdfs supergroup 0 2020-06-29 21:05 /tmp
drwxr-xr-x - hdfs supergroup 0 2020-06-29 21:44 /user Have re-deployed client configurations and refreshed the cluster. Have restarted NN roles. Do confirm that the HDFS gateway roles are available on the CDSW hosts: Please clarify what you mean by "Form CDSW host doc a list on HDFS". From a CDSW session input prompt I try to access HDFS, however still get the error: !hdfs dfs -ls /
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"WARN","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/07/02 09:08:35","logger":"hdfs.DFSUtilClient","timezone":"UTC","log":{"message":"Namenode for namenodeHA remains unresolved for ID namenode43. Check your hdfs-site.xml file to ensure namenodes are configured properly."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"WARN","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/07/02 09:08:35","logger":"hdfs.DFSUtilClient","timezone":"UTC","log":{"message":"Namenode for namenodeHA remains unresolved for ID namenode57. Check your hdfs-site.xml file to ensure namenodes are configured properly."}} Hence would appreciate your further assistance in the troubleshooting.
... View more
07-01-2020
03:14 AM
@GangWar I do confirm that localhost resolves to 127.0.0.1, not to 127.0.0.0, which I believe is a typo, isn't it? [root@cdsw-master-01 ~]# nslookup localhost
Server: 172.16.1.3
Address: 172.16.1.3#53
Non-authoritative answer:
Name: localhost
Address: 127.0.0.1 This is related to a CDSW proof-of-concept/trial on top of a CDH Enterprise R&D cluster, hence I am unable to submit a support case, though would be glad to do that. Please check your private messages inbox regarding the logs bundle.
... View more
06-29-2020
05:49 AM
Hi, Would appreciate any advice, how to solve a problem with terminal access from a CDSW session. Let me highlight that I can launch a session, however within a session I am unable to access the terminal – please see the attached screenshot with HTTP ERROR 401. The networking requirements are met, in particular: IPv6 is enabled CDSW hosts are within the same subnet as the CDH cluster DNS is configured with the relevant A record for domain name, CNAME record for wildcard domain, and a reverse PTR domain record (please see the enclosed response to ping command, where the DNS resolves the terminal's FQDN to CDSW master node's IP) No iptables rules were enabled SElinux is disabled [cloud-user@cdh-control-01 ~]$ ping -c1 tty-jidv65sd8630btx4.cdsw.<intranetdomain>
PING cdsw.<intranetdomain> (10.133.210.200) 56(84) bytes of data.
64 bytes from cdsw.<intranetdomain> (10.133.210.200): icmp_seq=1 ttl=60 time=0.884 ms
--- cdsw.<intranetdomain> ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.884/0.884/0.884/0.000 ms
... View more
Labels:
- « Previous
-
- 1
- 2
- Next »