Member since
10-22-2020
10
Posts
0
Kudos Received
0
Solutions
11-03-2022
10:15 PM
@AcharkiMed Is it not possible to compute incremental statistics on KUDU tables? Do I have to do 'compute stats' every day to compute statistics for all the data in the table?
... View more
10-03-2022
06:48 PM
@clev Is 'RowsReturnedRate' data read from disk? Compared to the fast 'HDFS Scanner Average Read Throughput' value of 988.1 MiB/s, the 'Threads: Network Receive Wait Time' value was slow at 59.5m, so I did not think it was disk I/O. Don't you think it's the process of quickly reading from disk and sending it over the network?
... View more
09-29-2022
07:45 AM
@ChethanYM I have dfs.namenode.delegation.token.max-lifetime in YARN(yarn-site.xml, mapred-site.xml, core-site.xml), HDFS(hdfs-site.xml) I increased the setting to 8 days. The workflow of test 1 ran for 7 days without any settings and ended successfully. The workflow of test #2 was set to 'TRUE' with the 'mapreduce.job.complete.cancel.delegation.tokens' value set to 'TRUE' and ran for 8 days, but ended in failure.
... View more
09-29-2022
07:36 AM
Most of the same Impala queries respond within 20 seconds. But sometimes it takes more than 60 seconds. I see a hotspot in 'SCAN HDFS' of 'ExecSummary' in slow query profile. (avgTime 6s462ms, maxTime 54s766ms ) low response time Query Info: - Duration : 13.1s - HDFS Scanner Average Read Throughput: 382.2MiB/s - Threads: Network Receive Wait Time: 10.0m high response time Query Info: - Duration : 1m, 4s - HDFS Scanner Average Read Throughput : 988.1 MiB/s - Threads: Network Receive Wait Time: 59.5m If HDFS SCAN is the cause of the slow response time, is the 'HDFS Scanner Average Read Throughput' value higher in the slow query? And one more question. I compared the fast and slow operators of the HDFS_SCAN_NODE stage in the slow query profile. RowBatchQueueGetWaitTime(54.76s) and ScannerThreadsTotalWallClockTime(54.77s) I wonder why the two metric values are high. HDFS_SCAN_NODE (id=0) Hdfs split stats (<volume id>:<# splits>/<split lengths>): 1:1/4.72 MB ExecOption: PARQUET Codegen Enabled, Codegen enabled: 1 out of 1 Hdfs Read Thread Concurrency Bucket: 0:100% 1:0% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% 8:0% 9:0% 10:0% File Formats: PARQUET/SNAPPY:1 - AverageHdfsReadThreadConcurrency: 0.00 (0.0) - AverageScannerThreadConcurrency: 1.00 (1.0) - BytesRead: 959.6 KiB (982621) - BytesReadDataNodeCache: 0 B (0) - BytesReadLocal: 959.6 KiB (982621) - BytesReadRemoteUnexpected: 0 B (0) - BytesReadShortCircuit: 959.6 KiB (982621) - CachedFileHandlesHitCount: 0 (0) - CachedFileHandlesMissCount: 3 (3) - CollectionItemsRead: 0 (0) - DecompressionTime: 2ms (2131729) - InactiveTotalTime: 0ns (0) - MaxCompressedTextFileLength: 0 B (0) - NumColumns: 2 (2) - NumDictFilteredRowGroups: 0 (0) - NumDisksAccessed: 1 (1) - NumRowGroups: 1 (1) - NumScannerThreadsStarted: 1 (1) - NumScannersWithNoReads: 0 (0) - NumStatsFilteredRowGroups: 0 (0) - PeakMemoryUsage: 3.3 MiB (3430040) - PerReadThreadRawHdfsThroughput: 1,011.8 MiB/s (1060904048) - RemoteScanRanges: 0 (0) - RowBatchQueueGetWaitTime: 54.76s (54763817619) - RowBatchQueuePutWaitTime: 0ns (0) - RowsRead: 52,319 (52319) - RowsReturned: 52,319 (52319) - RowsReturnedRate: 955 per second (955) - ScanRangesComplete: 1 (1) - ScannerThreadsInvoluntaryContextSwitches: 1 (1) - ScannerThreadsTotalWallClockTime: 54.77s (54768078433) - MaterializeTupleTime(*): 1ms (1634590) - ScannerThreadsSysTime: 1ms (1897000) - ScannerThreadsUserTime: 9ms (9870000) - ScannerThreadsVoluntaryContextSwitches: 14 (14) - TotalRawHdfsOpenFileTime(*): 3ms (3129744) - TotalRawHdfsReadTime(*): 926.21us (926211) - TotalReadThroughput: 10.8 KiB/s (11075) - TotalTime: 54.77s (54766111629) HDFS_SCAN_NODE (id=0) Hdfs split stats (<volume id>:<# splits>/<split lengths>): 3:1/4.70 MB ExecOption: PARQUET Codegen Enabled, Codegen enabled: 1 out of 1 Hdfs Read Thread Concurrency Bucket: 0:0% 1:0% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% 8:0% 9:0% 10:0% File Formats: PARQUET/SNAPPY:1 - AverageHdfsReadThreadConcurrency: 0.00 (0.0) - AverageScannerThreadConcurrency: 0.00 (0.0) - BytesRead: 956.9 KiB (979887) - BytesReadDataNodeCache: 0 B (0) - BytesReadLocal: 956.9 KiB (979887) - BytesReadRemoteUnexpected: 0 B (0) - BytesReadShortCircuit: 956.9 KiB (979887) - CachedFileHandlesHitCount: 0 (0) - CachedFileHandlesMissCount: 3 (3) - CollectionItemsRead: 0 (0) - DecompressionTime: 1ms (1175394) - InactiveTotalTime: 0ns (0) - MaxCompressedTextFileLength: 0 B (0) - NumColumns: 2 (2) - NumDictFilteredRowGroups: 0 (0) - NumDisksAccessed: 1 (1) - NumRowGroups: 1 (1) - NumScannerThreadsStarted: 1 (1) - NumScannersWithNoReads: 0 (0) - NumStatsFilteredRowGroups: 0 (0) - PeakMemoryUsage: 3.2 MiB (3392266) - PerReadThreadRawHdfsThroughput: 1.1 GiB/s (1200419705) - RemoteScanRanges: 0 (0) - RowBatchQueueGetWaitTime: 8ms (8679355) - RowBatchQueuePutWaitTime: 0ns (0) - RowsRead: 52,115 (52115) - RowsReturned: 52,115 (52115) - RowsReturnedRate: 4868064 per second (4868064) - ScanRangesComplete: 1 (1) - ScannerThreadsInvoluntaryContextSwitches: 0 (0) - ScannerThreadsTotalWallClockTime: 11ms (11603409) - MaterializeTupleTime(*): 1ms (1465282) - ScannerThreadsSysTime: 2ms (2006000) - ScannerThreadsUserTime: 6ms (6044000) - ScannerThreadsVoluntaryContextSwitches: 14 (14) - TotalRawHdfsOpenFileTime(*): 2ms (2488801) - TotalRawHdfsReadTime(*): 816.29us (816287) - TotalReadThroughput: 0 B/s (0) - TotalTime: 10ms (10705486)
... View more
Labels:
- Labels:
-
Apache Impala
09-19-2022
05:39 PM
@ChethanYM @VidyaSargur I'm checking the YARN-2694 and YARN-3055 patches. It's like a patch for a bug where the token is revoked when an app in the workflow completes, and other apps don't renew the token. I still don't understand how this bug relates to setting the mapreduce.job.complete.cancel.delegation.tokens setting to true. Could you please explain the behavior you expect when you set mapreduce.job.complete.cancel.delegation.tokens to true? Shouldn't the token be prevented from being canceled by setting the mapreduce.job.complete.cancel.delegation.tokens setting to false ?
... View more
09-15-2022
08:11 PM
@ChethanYM Please explain why we need to set 'mapreduce.job.complete.cancel.delegation.tokens' to 'TRUE' in our case. We confirmed the workflow we tested previously. We didn't set this, but it will be retrieved as 'TRUE' in the job's metadata in HUE. It seems to be the default value, do I have to set it as a property in the workflow as you said?
... View more
09-14-2022
07:30 PM
@VidyaSargur @ChethanYM I tried how to increase the hdfs delegation token max lifetime of the token. But I found a new way in the CDP documentation. https://docs.cloudera.com/runtime/7.2.9/yarn-security/topics/yarn-long-running-applications.html This feature was patched in YARN-2704 and released as hadoop 2.6.0. https://issues.apache.org/jira/browse/YARN-2704 Our cluster is CDH version 5.14.2 and we use hadoop version 2.6.0. Our cluster has these settings set. YARN: yarn.resourcemanager.proxy-user-privileges.enabled : checked(true) HDFS NameNode: hadoop.proxyuser.yarn.hosts:* hadoop.proxyuser.yarn.groups: * According to this feature, the workflow should succeed because when the delegation token expires it creates a new token. Still, why is the workflow failing after the dfs.name.delegation.token.max-lifetim(7days) setting?
... View more
09-13-2022
11:47 PM
@VidyaSargur @ChethanYM The script only execute sleep command. # 24 * 7 + 16
for i in {1..184}
do
sleep 3600 # 1h
done dfs.name.delegation.token.max-lifetime change 7d to 8d. (issueDate=1661416856453, maxDate=1662108056453) restart namenode, oozie, yarn resource manager, yarn jobhistory server. workflow still fail. Oozie log WARN org.apache.oozie.action.hadoop.ShellActionExecutor: SERVER[*****] USER[develop] GROUP[-] TOKEN[] APP[sleep7d] JOB[0000003-220825173452344-oozie-oozi-W] ACTION[0000003-220825173452344-oozie-oozi-W@shell-eb25] Exception in check(). Message[JA017: Could not lookup launched hadoop Job ID [job_1661416633783_0003] which was associated with action [0000003-220825173452344-oozie-oozi-W@shell-eb25]. Failing this action!]
org.apache.oozie.action.ActionExecutorException: JA017: Could not lookup launched hadoop Job ID [job_1661416633783_0003] which was associated with action [0000003-220825173452344-oozie-oozi-W@shell-eb25]. Failing this action!
at org.apache.oozie.action.hadoop.JavaActionExecutor.check(JavaActionExecutor.java:1497)
at org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:182)
at org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:56)
at org.apache.oozie.command.XCommand.call(XCommand.java:286)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) Yarn log ERROR LogAggregationService
Failed to setup application log directory for application_......
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (token for develop: HDFS_DELEGATION_TOKEN owner=develop, renewer=yarn, realUser=oozie/*****, issueDate=1661416856453, maxDate=1662108056453, sequenceNumber=166609, masterKeyId=1547) can't be found in cache
at org.apache.hadoop.ipc.Client.call(Client.java:1504)
at org.apache.hadoop.ipc.Client.call(Client.java:1441)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy27.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:786)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy28.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2167)
at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1265)
at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1261)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1277)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:265)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:68)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:278)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:384)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:337)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:463)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:68)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
... View more
09-04-2022
08:28 PM
Running shell actions(Only sleep command) in Oozie workflows for more than 7 days throws JA017 error on exit. We saw this problem in a shell script that submits a spark job. Spark job succeeded but submit shell action failed. We tested by creating a shell action that does a simple sleep and we see the same failure. https://blog.cloudera.com/hadoop-delegation-tokens-explained/ I saw this blog and tried to increase max-lifetime dfs.namenode.delegation.token.max-lifetime (hdfs-site.xml) restart NameNode, Oozie, YARN ResourceManager but the same fails. If I run the Oozie workflow and look at the logs, I see that I get 3 delegation tokens. RM_DELEGATION_TOKEN, MR_DELEGATION_TOKEN, HDFS_DELEGATION_TOKEN Because the dfs.namenode.delegation.token.max-lifetime setting value is increased, the maxDate of HDFS_DELEGATION_TOKEN is increased. The maxDate of RM_DELEGATION_TOKEN and MR_DELEGATION_TOKEN tokens did not increase. Why does authentication problem occur due to expiration of delegation token when Oozie shell action is terminated? Can't just increasing dfs.namenode.delegation.token.max-lifetime solve this problem? How to run Shell Actions for long periods of time in Oozie?
... View more
Labels:
- Labels:
-
Apache Oozie