Member since
07-06-2018
59
Posts
1
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2709 | 03-05-2019 07:20 AM | |
2879 | 01-16-2019 09:15 AM | |
1531 | 10-25-2018 01:46 PM | |
1714 | 08-02-2018 12:34 PM |
08-02-2018
12:34 PM
This is an open issue and is being tracked in CDH-22890
... View more
07-31-2018
10:48 AM
Hi All, Is there a way / a workaround probably to be able to see user who submitted the job in Hive while the job/query is in progress. Currently it does display the user only once the job has finished or has been killed. Environment (Required for a CDH5.10.x setup or higher) Sentry enabled and impersonation disabled in Hive. If not directly, can this be tracked with application ID in RM,NM or HS2 logs , please confirm? Also if there is an open JIRA to get this feature, please share. Regards
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Sentry
-
Apache YARN
07-31-2018
07:09 AM
@Harsh J No, we rarely run balancer in this environment. I'll set it to 3 for now and observe for a while for any reoccurence of those WARNs if any . (CM recommends to set it between a value equal or greater than replication factor and lesser than number of DNs) Regards
... View more
07-30-2018
10:52 AM
@Harsh J Thanks for your response, you pointed it out correct, DN logs do indicate reason of these notification, "Replica not found" and that relates to "mapreduce.client.submit.file.replication" because it is currently set to 1 [CM recommends it to be 8]. I can bump it up and check if that alleviates or decreased the occurence further. What are the repurcussions if this value is set too high? Regards
... View more
07-23-2018
12:45 PM
With this property - dfs.permissions.supergroup, dfs.permissions.superusergroup
CM lets you control/assign superuser for Hadoop environment. But if left as is, I notice directories owned by group for ex: hdfs:supergroup.
Now CM creates it at the time of Hadoop cluster installation, but who are all the users under this supergroup?
... View more
Labels:
- Labels:
-
Cloudera Manager
-
HDFS
07-20-2018
09:19 AM
Hi I notice below occassional error message in YARN jobs, if someone has seen/noticed below would you know an exact cause, apart from any network latency etc? Note: This doesn't contribute to job failures etc, because it is tried I think by default 4 attempts and usually go through: 2018-07-19 09:36:46,433 WARN [ContainerLocalizer Downloader] org.apache.hadoop.hdfs.BlockReaderFactory: I/O error constructing remote block reader. java.io.IOException: Got error for OP_READ_BLOCK, status=ERROR, self=/xx.xx.xxx.xxx:32010, remote=/xx.xx.xxx.xx:1004, for file /user/hive/.staging/job_xxxxxxxxxxx_xxxxx/libjars/sqoop-1.4.5-cdh5.4.4.jar, for pool BP-xxxxxxx-xx.xx.xxx.xx-xxxxxxxxxx block xxxxx_xxxx at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:467) at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:432) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:890) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:768) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:377) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:660) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:897) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:956) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:121) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:369) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:265) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:364) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:362) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:361) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)
... View more
Labels:
- Labels:
-
Apache YARN
-
MapReduce
07-10-2018
09:28 AM
@bgooley here is the observation: https://hostname:7183/api/v17/clusters/cluster/services/impala/impalaQueries?from=2018-07-09T12:59:32.776Z&to=2018-07-10&limit=1000 , Returns 1000 queries and ends with warning timestamp --> 2018-07-10T01:16:17.434Z Use this timestamp in next query in to clause and it goes back in time https://hostname:7183/api/v17/clusters/cluster/services/impala/impalaQueries?from=2018-07-09T12:59:32.776Z&to=2018-07-10T01:16:17.434Z&limit=1000 , Returns 1000 queries and ends with warning timestamp --> 2018-07-09T21:26:17.434Z use timestamp from output warning from first attempt and put it in from clause alone and it moves ahead: https://hostname:7183/api/v17/clusters/cluster/services/impala/impalaQueries?from=2018-07-10T01:16:17.434Z&limit=1000 , Returns 1000 queries and ends with warning timestamp --> 2018-07-10T13:56:17.434Z Use timestamp from output warning and again put it in from clause and this time it just stays at the same timestamp in warning message: https://hostname:7183/api/v17/clusters/cluster/services/impala/impalaQueries?from=2018-07-10T13:56:17.434Z&limit=1000 , Returns 1000 queries and ends with warning timestamp --> 2018-07-10T13:56:17.434Z Use timestamp from output warning and again put it in to clause this time and it now hits warning with empty timestamp - Assuming this to be the most recent output? https://hostname:7183/api/v17/clusters/cluster/services/impala/impalaQueries?to=2018-07-10T13:56:17.434Z&limit=1000 , Returns 1000 queries and ends with warning timestamp --> [] Does above check out with your assumption?
... View more
07-09-2018
01:50 PM
Thanks so . but unfortunately from and to clause doesn't help either 😞 https://hostname:7183/api/v18/clusters/cluster_name/services/impala/impalaQueries?from=2018-07-09T12:59:32.776Z&to=2018-07-09T17:04:32.776Z Returns queries and ends with warning : warnings" : [ "Impala query scan limit reached. Last end time considered is 2018-07-09T12:59:32.776Z" ] Now as mentioned I use this timestamp for next query https://hostname:7183/api/v18/clusters/cluster_name/services/impala/impalaQueries?from=2018-07-09T12:59:32.776Z&to=2018-07-10 Returns queries and ends with warning: "warnings" : [ "Impala query scan limit reached. Last end time considered is 2018-07-09T17:04:32.776Z" ] back to square one where it keeps showing that as the final timestamp, so I can't really use this warning message time stamp and as a measure to get all queries from sometime until current date. After just one loop it hits this timestamp and enters a never ending loop. I'm not using any other filter for ex: User because I need to fetch all the queries and feed it to a dashboard.
... View more
07-09-2018
12:59 PM
@bgooley Your query has "august" mentioned, could be a typo. But what I meant is when I specify "2018-07-09T17:04:32.776Z" in the from clause I still get query results (100 - default) but they still have this warning message at the end of the page. warnings" : [ "Impala query scan limit reached. Last end time considered is 2018-07-09T17:04:32.776Z I was assuming this , If I capture the date in warning message at the end of each page and use it as a date field in the "from clause" the query should return the next result set of query and will have a date warning of some other time.date(future) which I can use again until I reach the current date/time at which point i'll stop the loop. (Was fetching it via curl) But that doesn't seem to be happening. Other question I had what query string can I use to have to and from both in the same query string. I tried below but it didn't fetch any result: https://hostname:7183/api/v17/clusters/cluster_name/services/impala/impalaQueries?filter=(from=2018-07-09 and to=`date +"%Y-%m-%dT%T"`)
... View more
- « Previous
- Next »