Member since
08-05-2022
2
Posts
0
Kudos Received
0
Solutions
08-07-2022
06:02 PM
Hey, do you have an idea how I can actually locate and read a specific block using BlockLocation? I want to read the block byte by byte.(I understand it might be a remote read)
... View more
08-05-2022
03:14 PM
Basically, I'm attempting to create a standalone Yarn application that runs code on hdfs files on a block level for validation purposes. First I will need to get where the different blocks are distributed (hdfs file metadata), and will run validation code on every block and will combine the results to determine if the file is valid or not. I want to minimize/eliminate network overhead by making the application validation code run on the same node on which the block resides (data locality). So, for instance, a 1 GB file could be divided into 10 blocks. In order to run the same validation code on all blocks in parallel, I would want to run 10 different instances. My thinking is, I would launch a single ApplicationMaster with 10 containers. Does my approach make sense, and if not, what do you suggest I change in my approach? And if it does make sense, how do I launch 10 different instances while (possibly) determining 10 different hosts for every container.
... View more
Labels:
- Labels:
-
Apache Hadoop