About Prav

Prav · ‎08-02-2018

This is an open issue and is being tracked in CDH-22890

Prav · ‎07-31-2018

Hi All, Is there a way / a workaround probably to be able to see user who submitted the job in Hive while the job/query is in progress. Currently it does display the user only once the job has finished or has been killed. Environment (Required for a CDH5.10.x setup or higher) Sentry enabled and impersonation disabled in Hive. If not directly, can this be tracked with application ID in RM,NM or HS2 logs , please confirm? Also if there is an open JIRA to get this feature, please share. Regards

Prav · ‎07-31-2018

@Harsh J No, we rarely run balancer in this environment. I'll set it to 3 for now and observe for a while for any reoccurence of those WARNs if any . (CM recommends to set it between a value equal or greater than replication factor and lesser than number of DNs) Regards

Prav · ‎07-30-2018

@Harsh J Thanks for your response, you pointed it out correct, DN logs do indicate reason of these notification, "Replica not found" and that relates to "mapreduce.client.submit.file.replication" because it is currently set to 1 [CM recommends it to be 8]. I can bump it up and check if that alleviates or decreased the occurence further. What are the repurcussions if this value is set too high? Regards

Prav · ‎07-26-2018

@Wilfred Thank you for your response . Regards

Prav · ‎07-23-2018

With this property - dfs.permissions.supergroup, dfs.permissions.superusergroup CM lets you control/assign superuser for Hadoop environment. But if left as is, I notice directories owned by group for ex: hdfs:supergroup. Now CM creates it at the time of Hadoop cluster installation, but who are all the users under this supergroup?

Prav · ‎07-20-2018

Hi I notice below occassional error message in YARN jobs, if someone has seen/noticed below would you know an exact cause, apart from any network latency etc? Note: This doesn't contribute to job failures etc, because it is tried I think by default 4 attempts and usually go through: 2018-07-19 09:36:46,433 WARN [ContainerLocalizer Downloader] org.apache.hadoop.hdfs.BlockReaderFactory: I/O error constructing remote block reader. java.io.IOException: Got error for OP_READ_BLOCK, status=ERROR, self=/xx.xx.xxx.xxx:32010, remote=/xx.xx.xxx.xx:1004, for file /user/hive/.staging/job_xxxxxxxxxxx_xxxxx/libjars/sqoop-1.4.5-cdh5.4.4.jar, for pool BP-xxxxxxx-xx.xx.xxx.xx-xxxxxxxxxx block xxxxx_xxxx at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:467) at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:432) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:890) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:768) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:377) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:660) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:897) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:956) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:121) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:369) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:265) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:364) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:362) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:361) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)

Prav · ‎07-10-2018

@bgooley here is the observation: https://hostname:7183/api/v17/clusters/cluster/services/impala/impalaQueries?from=2018-07-09T12:59:32.776Z&to=2018-07-10&limit=1000 , Returns 1000 queries and ends with warning timestamp --> 2018-07-10T01:16:17.434Z Use this timestamp in next query in to clause and it goes back in time https://hostname:7183/api/v17/clusters/cluster/services/impala/impalaQueries?from=2018-07-09T12:59:32.776Z&to=2018-07-10T01:16:17.434Z&limit=1000 , Returns 1000 queries and ends with warning timestamp --> 2018-07-09T21:26:17.434Z use timestamp from output warning from first attempt and put it in from clause alone and it moves ahead: https://hostname:7183/api/v17/clusters/cluster/services/impala/impalaQueries?from=2018-07-10T01:16:17.434Z&limit=1000 , Returns 1000 queries and ends with warning timestamp --> 2018-07-10T13:56:17.434Z Use timestamp from output warning and again put it in from clause and this time it just stays at the same timestamp in warning message: https://hostname:7183/api/v17/clusters/cluster/services/impala/impalaQueries?from=2018-07-10T13:56:17.434Z&limit=1000 , Returns 1000 queries and ends with warning timestamp --> 2018-07-10T13:56:17.434Z Use timestamp from output warning and again put it in to clause this time and it now hits warning with empty timestamp - Assuming this to be the most recent output? https://hostname:7183/api/v17/clusters/cluster/services/impala/impalaQueries?to=2018-07-10T13:56:17.434Z&limit=1000 , Returns 1000 queries and ends with warning timestamp --> [] Does above check out with your assumption?

Prav · ‎07-09-2018

Thanks so . but unfortunately from and to clause doesn't help either 😞 https://hostname:7183/api/v18/clusters/cluster_name/services/impala/impalaQueries?from=2018-07-09T12:59:32.776Z&to=2018-07-09T17:04:32.776Z Returns queries and ends with warning : warnings" : [ "Impala query scan limit reached. Last end time considered is 2018-07-09T12:59:32.776Z" ] Now as mentioned I use this timestamp for next query https://hostname:7183/api/v18/clusters/cluster_name/services/impala/impalaQueries?from=2018-07-09T12:59:32.776Z&to=2018-07-10 Returns queries and ends with warning: "warnings" : [ "Impala query scan limit reached. Last end time considered is 2018-07-09T17:04:32.776Z" ] back to square one where it keeps showing that as the final timestamp, so I can't really use this warning message time stamp and as a measure to get all queries from sometime until current date. After just one loop it hits this timestamp and enters a never ending loop. I'm not using any other filter for ex: User because I need to fetch all the queries and feed it to a dashboard.

Prav · ‎07-09-2018

@bgooley Your query has "august" mentioned, could be a typo. But what I meant is when I specify "2018-07-09T17:04:32.776Z" in the from clause I still get query results (100 - default) but they still have this warning message at the end of the page. warnings" : [ "Impala query scan limit reached. Last end time considered is 2018-07-09T17:04:32.776Z I was assuming this , If I capture the date in warning message at the end of each page and use it as a date field in the "from clause" the query should return the next result set of query and will have a date warning of some other time.date(future) which I can use again until I reach the current date/time at which point i'll stop the loop. (Was fetching it via curl) But that doesn't seem to be happening. Other question I had what query string can I use to have to and from both in the same query string. I tried below but it didn't fetch any result: https://hostname:7183/api/v17/clusters/cluster_name/services/impala/impalaQueries?filter=(from=2018-07-09 and to=`date +"%Y-%m-%dT%T"`)

Online	Offline
Last Visited	‎09-09-2020 05:41 PM

Member Since	‎07-06-2018 12:26 PM
Last Visited	‎09-09-2020 05:41 PM
Posts	59
Kudos received	1

Cloudera Community

Re: Update YARN resource pools using cm api

Re: Update YARN resource pools using cm api

Re: CM API call doesn't capture/show all results

Re: Actual user not shown for a Query/Job in progr...

Re: Actual user not shown for a Query/Job in progr...

Actual user not shown for a Query/Job in progress

Re: YARN - occasional Error message

Re: YARN - occasional Error message

Re: YARN - occasional Error message

Users under default "supergroup"

YARN - occasional Error message

Re: CM API: Maximum number of requests

Re: CM API: Maximum number of requests

Re: CM API: Maximum number of requests