Member since
03-09-2017
6
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2339 | 03-05-2018 03:38 AM |
03-05-2018
03:38 AM
Hi Aaron! Thanks for answering. At the end it wasn't a problem with Hadoop or the configuration (credentials were correct and config files deploy in all nodes). It was just that IT was blocking all the traffic to the private bucket. Even after asking them to allow those IPs it didn't work so I install CNLM in all nodes and specified the proxy using: -Dfs.s3a.proxy.host="localhost" -Dfs.s3a.proxy.port="3128" After that I was able to move 3 TB in less than a day.
... View more
02-02-2018
12:14 AM
Hi! I am trying to move data from HDFS to a S3 bucket. I am using last version of CM/CDH (5.14.0). I have been able to copy data using the tool aws: aws s3api put-object And also with the python SDK but I cannot copy data with hadoop distcp. I have added the following extra properties to core-site.xml in HDFS service. <property>
<name>fs.s3a.access.key</name>
<value>X</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>X</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>s3.us-east-2.amazonaws.com</value>
</property> Nothing happens when I execute a command like hadoop distcp /blablabla s3a://bucket-name/ but it hangs for a while (I guess is trying several times). Same thing when I try to just list files in the bucket with hadoop fs -ls s3a://bucket-name I am sure it is not a credentials problem since I can connect using the same access and secret key with the python SKD and aws tool. Anyone facing a similar issue? Thanks!
... View more
Labels:
- Labels:
-
HDFS
12-12-2017
06:43 AM
You can follow the solution in previous page, it works! By the way, I have just upgraded to CM and CDH 5.13.1 and this issue is still present. Will it be fixed?
... View more
05-04-2017
05:24 AM
1 Kudo
According to this documentation when running a query from a DataNode via impala-shell, the Impala daemon running on that node acts as the coordinator node for that query, but in theory all nodes with Impala daemons will work in parallel to transmit partial results back. It seems though that in our cluster this is not working properly because it only uses 2% CPU and it takes a lot of time to complete queries. Also, since CDH 5.10 the use of Llama role is deprecated, so what is the right way to manage Impala resources? Chaning CPU shares in the configuration seems to have no effect.
... View more
Labels:
- Labels:
-
Apache Impala
-
Cloudera Manager
04-05-2017
08:06 AM
Same problem after updating to CDH 5.10.1.
... View more
03-09-2017
06:30 AM
1 Kudo
Same problem here. Hope a solution is available soon.
... View more