About gnovak

gnovak · ‎08-21-2019

This might be caused by NIFI-5525. Check for double quotes in your CSV. Either remove them or update NiFi to >=1.8.0.

gnovak · ‎06-01-2018

@RAUI wholeTextFile() is not part of the HDFS API, I'm assuming you're using Spark, with which I'm not too familiar. I suggest you to post another question for this to HCC.

gnovak · ‎05-31-2018

@RAUI No, it won't create it, the target directory must exist. However, if the target directory doesn't exist, it won't throw an exception, it will only indicate the error via the return value (as described in the documentation). So 1) you should create the target directory before you call rename() and 2) you should check the return value, like this: fs.mkdirs(new Path("/your/target/path")); boolean result = fs.rename( new Path("/your/source/path/your.file"), new Path("/your/target/path/your.file")); if (!result) { ... }

gnovak · ‎05-24-2018

@RAUI The answer is no. Renaming is the way to move files on HDFS: FileSystem.rename(). Actually, this is exactly what the HDFS shell command "-mv" does as well, you can check it in the source code. If you think about it, it's pretty logical, since when you move a file on the distributed file system, you don't really move any blocks of the file, you just update the "path" metadata of the file in the NameNode.

gnovak · ‎05-15-2018

@Dinesh Chitlangia Unfortunately the native build on OS X is broken by HDFS-13403 at this moment on trunk. You have two options: If you don't need native build, you can build hadoop without the -Pnative option successfully. The build issue is fixed by HDFS-13534, but it's not merged yet (at the time of writing this answer). You can either wait until it gets merged, or apply it manually: wget https://issues.apache.org/jira/secure/attachment/12922534/HDFS-13534.001.patch git apply HDFS-13534.001.patch

gnovak · ‎04-25-2018

@Manikandan Jeyabal Your question is not quite clear to me. If you really want to fetch data from the YARN Resource Manager REST API in Java, all you need to do is open an HttpURLConnection and get the data from any endpoint. E.g.: URL url = new URL("http://" + rmHost + ":8088/ws/v1/cluster/apps"); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream())); ... // read and process your data conn.disconnect(); But there is a much easier solution to get data from the RM in Java: YarnClient, which is basically a Java API for YARN. YarnClient yarnClient = YarnClient.createYarnClient(); Configuration conf = new YarnConfiguration(); conf.set("yarn.resourcemanager.hostname", "your RM hostname"); yarnClient.init(conf); yarnClient.start(); for (ApplicationReport applicationReport : yarnClient.getApplications()) { System.out.println(applicationReport.getApplicationId()); }

gnovak · ‎01-23-2018

@Anton P I'm glad it works. I'm not sure how exactly the "fair" ordering policy works inside one queue, but preemption is only for between queues. I assume, that it will try to give resources to the applications/users in the same queue equally, but once a container is running it will not preempt it. If you would like to achieve that, you should consider creating sub-queues.

gnovak · ‎01-23-2018

@Anton P You are doing everything just fine, this is by design. The "Ordering Policy" can indeed only be set for leaf queues, because it defines the ordering policy between applications in the same queue. So it has nothing to do with your use case. "I try to run two Yarn queues where if only one queue is active it will consume all the resources and once a job will arrive to the second queue Yarn will preempt some of the resources of the first queue to start the second job." To achieve this, you need to configure your queues like this (I think, you already did this): yarn.scheduler.capacity.root.queues=test1,test2 yarn.scheduler.capacity.root.test1.capacity=50 yarn.scheduler.capacity.root.test1.maximum-capacity=100 yarn.scheduler.capacity.root.test2.capacity=50 yarn.scheduler.capacity.root.test2.maximum-capacity=100 ... and enable preemption (as described in the article you attached). This will let the first application in the first queue to use all the resources, until the second job arrives to the second queue, then the resources will be devided equally between the two queues. Hope this makes everything clear, give it a try 🙂

gnovak · ‎12-12-2017

@Amithesh Merugu Try to use the IP address of the NameNode. And also add the port (default is 8020).

gnovak · ‎12-12-2017

@Amithesh Merugu Use this method: copyFromLocalFile(Path src, Path dst). The first parameter is a path on your local disk (in your example /tmp/files) and the second is the HDFS path (hdfs://user/username). The documentation doesn't make it clear, but the source can be a dictionary and then the whole content is copied to the HDFS. FileSystem fs = FileSystem.get(hdfsUri, conf); fs.copyFromLocalFile(new Path("/tmp/files"), new Path("/user/username"));

Online	Offline
Last Visited	‎08-26-2019 03:26 AM

Member Since	‎03-11-2016 12:20 PM
Last Visited	‎08-26-2019 03:26 AM
Posts	73
Kudos received	16

Cloudera Community

Re: NiFi ConvertRecord StringIndexOutOfBoundsExcep...

Re: Move file from one HDFS directoy to another us...

Re: how to get data from Yarn Resource Manager RES...

Re: Yarn queue has fair ordering policy only on le...

Re: YARN - Is there a metric in RM for Number of C...

Re: NiFi ConvertRecord StringIndexOutOfBoundsExcep...

Re: Move file from one HDFS directoy to another us...

Re: Move file from one HDFS directoy to another us...

Re: Move file from one HDFS directoy to another us...

Re: Building Hadoop on MacOS : An Ant BuildExcepti...

Re: how to get data from Yarn Resource Manager RES...

Re: Yarn queue has fair ordering policy only on le...

Re: Yarn queue has fair ordering policy only on le...

Re: How to copy files from one location(/tmp/files...

Re: How to copy files from one location(/tmp/files...