Member since
03-11-2016
73
Posts
16
Kudos Received
16
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1019 | 08-21-2019 06:03 AM | |
34231 | 05-24-2018 07:55 AM | |
4464 | 04-25-2018 08:38 AM | |
6174 | 01-23-2018 09:41 AM | |
1945 | 10-11-2017 09:44 AM |
08-21-2019
06:03 AM
1 Kudo
This might be caused by NIFI-5525. Check for double quotes in your CSV. Either remove them or update NiFi to >=1.8.0.
... View more
06-01-2018
08:27 AM
@RAUI wholeTextFile() is not part of the HDFS API, I'm assuming you're using Spark, with which I'm not too familiar. I suggest you to post another question for this to HCC.
... View more
05-31-2018
02:48 PM
1 Kudo
@RAUI No, it won't create it, the target directory must exist. However, if the target directory doesn't exist, it won't throw an exception, it will only indicate the error via the return value (as described in the documentation). So 1) you should create the target directory before you call rename() and 2) you should check the return value, like this: fs.mkdirs(new Path("/your/target/path"));
boolean result = fs.rename(
new Path("/your/source/path/your.file"),
new Path("/your/target/path/your.file"));
if (!result) {
...
}
... View more
05-24-2018
07:55 AM
@RAUI The answer is no. Renaming is the way to move files on HDFS: FileSystem.rename(). Actually, this is exactly what the HDFS shell command "-mv" does as well, you can check it in the source code. If you think about it, it's pretty logical, since when you move a file on the distributed file system, you don't really move any blocks of the file, you just update the "path" metadata of the file in the NameNode.
... View more
05-15-2018
07:49 AM
@Dinesh Chitlangia Unfortunately the native build on OS X is broken by HDFS-13403 at this moment on trunk. You have two options: If you don't need native build, you can build hadoop without the -Pnative option successfully. The build issue is fixed by HDFS-13534, but it's not merged yet (at the time of writing this answer). You can either wait until it gets merged, or apply it manually: wget https://issues.apache.org/jira/secure/attachment/12922534/HDFS-13534.001.patch
git apply HDFS-13534.001.patch
... View more
04-25-2018
08:38 AM
1 Kudo
@Manikandan Jeyabal Your question is not quite clear to me. If you really want to fetch data from the YARN Resource Manager REST API in Java, all you need to do is open an HttpURLConnection and get the data from any endpoint. E.g.: URL url = new URL("http://" + rmHost + ":8088/ws/v1/cluster/apps");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream()));
... // read and process your data
conn.disconnect(); But there is a much easier solution to get data from the RM in Java: YarnClient, which is basically a Java API for YARN. YarnClient yarnClient = YarnClient.createYarnClient();
Configuration conf = new YarnConfiguration();
conf.set("yarn.resourcemanager.hostname", "your RM hostname");
yarnClient.init(conf);
yarnClient.start();
for (ApplicationReport applicationReport : yarnClient.getApplications()) {
System.out.println(applicationReport.getApplicationId());
}
... View more
01-23-2018
02:01 PM
@Anton P I'm glad it works. I'm not sure how exactly the "fair" ordering policy works inside one queue, but preemption is only for between queues. I assume, that it will try to give resources to the applications/users in the same queue equally, but once a container is running it will not preempt it. If you would like to achieve that, you should consider creating sub-queues.
... View more
01-23-2018
09:41 AM
@Anton P You are doing everything just fine, this is by design. The "Ordering Policy" can indeed only be set for leaf queues, because it defines the ordering policy between applications in the same queue. So it has nothing to do with your use case. "I try to run two Yarn queues where if only one queue is active it will consume all the resources and once a job will arrive to the second queue Yarn will preempt some of the resources of the first queue to start the second job." To achieve this, you need to configure your queues like this (I think, you already did this): yarn.scheduler.capacity.root.queues=test1,test2
yarn.scheduler.capacity.root.test1.capacity=50
yarn.scheduler.capacity.root.test1.maximum-capacity=100
yarn.scheduler.capacity.root.test2.capacity=50
yarn.scheduler.capacity.root.test2.maximum-capacity=100
... and enable preemption (as described in the article you attached). This will let the first application in the first queue to use all the resources, until the second job arrives to the second queue, then the resources will be devided equally between the two queues. Hope this makes everything clear, give it a try 🙂
... View more
12-12-2017
04:48 PM
@Amithesh Merugu Try to use the IP address of the NameNode. And also add the port (default is 8020).
... View more
12-12-2017
10:35 AM
@Amithesh Merugu Use this method: copyFromLocalFile(Path src,
Path dst). The first parameter is a path on your local disk (in your example /tmp/files) and the second is the HDFS path (hdfs://user/username). The documentation doesn't make it clear, but the source can be a dictionary and then the whole content is copied to the HDFS. FileSystem fs = FileSystem.get(hdfsUri, conf);
fs.copyFromLocalFile(new Path("/tmp/files"), new Path("/user/username"));
... View more