I'm trying to load a file to a Live with Cloudera Cluster (5.1) using Pentaho Kettle aka PDI (5.2) and I'm getting this error:
File /user/pdi/weblogs/in/weblogs_parse.txt could only be replicated to 0 nodes instead of minReplication (=1).
Does anybody knows how to fix this? I've formated the data node but it's stil not working.
3 Datanodes have plenty space:
DFS Used%: 0.01%
DFS Remaining%: 95.18%
Could it be some sceurity issue? The ETL tool uses my SO user to load data. First I was getting an access error that I solved by changing the destination folder permission with hadoop chmod in the master node.
Can you please tell me commands and steps. Following are my commands and steps
movies.txt is 30 KB file.
My pentaho job create the file movies.txt unders /user/test1/ folder but Empty 0 bytes.
Following exception raised
Caused by: File /user/test1/movies.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
It's been a long time since I tried this, I really don't remember what was causing the exception but these are the steps I ended up following:
Besides creating the folder and changing it to 777, I had to give ownership to root:
sudo -u hdfs hadoop fs -chown -R root:root /user/test1
Then the only thing that worked for me was uploading PDI to the master node and using carte to run the jobs:
Hope this helps, regards!
sudo -u hdfs hadoop fs -chown -R root:root /user/test1 not work in my case too, I will check Carte option to execute Transformations and Jobs remotely.
Thank for your help.