Member since
10-06-2015
45
Posts
54
Kudos Received
0
Solutions
02-18-2016
06:52 PM
1 Kudo
Thanks for gettng back. Yes--I'm aware of the -m option, but it appears from the documentation that the mappers get a list of HDFS level files and work on these. I'm trying to find out if my understanding is accurate: that unlike a typical map reduce job that deals in single blocks or splits, the distcp maps each get the URI of an entire file or files to copy. Therefore, you might have hundreds of blocks, but if it's all one file, the same mapper will handle all. Is this the case?
... View more
02-18-2016
05:35 PM
2 Kudos
We see very few mappers created for discp copies. Are these mappers being allocated at the block level or at the file level? I.e., does a mapper copy a physical block or does it copy an entire logical file?
... View more
Labels:
- Labels:
-
Apache Hadoop
02-12-2016
03:43 PM
1 Kudo
All---thanks for the very helpful answers. The real issue here is that values get changed after the original correct installation. Then you get nailed by surprise later because arbitrarily much time can go by before processes are restarted (That's what happens repeatedly here.) It would be wonderful if Ambari could have an option to execute the same script it executes to do install-time checks periodically to catch this kind of thing.
... View more
02-05-2016
09:34 PM
2 Kudos
Aha. The problem turns out to be with the multiple directories named in the file naming the sources. You can have many sources, but only one target. The behavior I was looking for would be for distcp to make a separate tree for each input directory under the target. This seems not to be the way distcp works, but it's easy to script around it.
... View more
02-03-2016
07:41 PM
I must have been unclear. We definitely want to use discp and cannot use Falcon for admin reasons. The problem is that I can't get the fully recursive behavior with discp. There's probably a way to do it, but I'm having trouble getting it to build the full depth of the directories on the target if it goes more than one level deep.
... View more
02-03-2016
04:50 PM
1 Kudo
Falcon is not available in my environment, unfortunately. Is there no way to do this without it? This must come up fairly often with partitioned HDFS files and ORC.
... View more
02-02-2016
08:58 PM
2 Kudos
I have a cluster with THP inadvertently left enabled. If I disable it, will the processes that are already running stop using it, or do they need to be restarted. Restarting is very inconvenient in this environment.
... View more
02-02-2016
08:47 PM
4 Kudos
I need to take a list of HDFS directories and copy the contents of those directories to another HDFS using discp. The problem is recursively creating the directories automatically. These are large partitioned files, and the available means seem to preserved structure only one level deep. Can anyone provide an example?
... View more
Labels:
- Labels:
-
Apache Hadoop
01-12-2016
06:56 PM
1 Kudo
One last detail---if the time runs out, and the blocks go on the queue for replication, what happens when the node comes back online and reports. Are they stricken from the queue? What if they've already been replicated?
... View more
01-11-2016
08:10 PM
1 Kudo
The three staleness properties control how long it will take for nodes that have not been heard from are regarded as stale, and whether to read or write to such nodes. I don't think that's what we're looking for. What I'm asking is whether it is necessary to avoid replicating blocks on nodes that are temporarily offline. I found the property dfs.namenode.replication.interval which is described as "controlling the periodicity with which the NN computed replication work for data nodes." It sounds like bumping it up temporarily might work. Opinion?
... View more