Member since
08-22-2018
10
Posts
0
Kudos Received
0
Solutions
06-04-2020
01:45 AM
Hello All, I am new to Spark environment. I have converted Hive query to Spark Scala. Now I am loading data and doing performance testing. Below are details on loading 3 weeks data. Cluster level small file avg size is set to 128 MB. 1. New temp table where I am loading data is ORC formatted as current Hive table is ORC stored. 2. Hive table each partition folder size is 200 MB. 3. I am using repartition(1) in spark code so that it will create one 200MB part file in each partition folder(to avoid small file issue). With this job is completing in 23 to 26 mins. 4. If I don't use repartition(), job is completing in 12 to 13 mins. But problem with this approach is, it is creating 800 part files (size <128MB) in each partition folder. I am quite not sure on how to reduce processing time and not create small files at the same time. Could anyone please help me in this situation.
... View more
Labels:
11-26-2019
11:24 PM
Hello Community,
I am seeing different value at Teradata and Hive when I trigger same query.
Teradata:
SELECT CASE WHEN 6240.00 <> 0 AND (30/(6240.00)) > 0 THEN Y ELSE N END AS "Col1"
Output: N
Hive:
SELECT CASE WHEN 6240 <> 0 AND (30/6240) > 0 THEN Y ELSE N END AS "Col1"
Output: Y
Can anyone please help me solving this? I need Hive result to be written as Teradata one.
Thank you.
... View more
Labels:
09-07-2018
05:17 AM
Both commands returned same result. We established hostname switch using DNS forwarding. I can see aliased old hostname when I do nslookup on new hostname. https:/<New_FQDN>:8090/cluster is the link where I am able to access new cluster Resource Manager Web UI. But it is giving above error when I access with old hostname https:/<Old_FQDN>:8090/cluster And principals with old hostnames were already there in KDC but these are not showing up in Cloudera Manager. I am only seeing principals names generated with new hostnames.
... View more
09-06-2018
06:25 PM
We are migrating to new cluster. We established hostname switch using DNS forwarding.We wanted to have hostname testing for every service that we have so that users will use the same old hostnames to connect to new cluster. Hostname testing for impala worked as expected when checked with impala-shell command. Currently, we are testing for hive. HiveServer2 is running on <New_FQDN> in New Cluster. Mapped old server for this is different <Old_FQDN> Beeline String is only working if we give new server principal name. !connect jdbc:hive2://<Old_FQDN>:10000/default;principal=hive/<New_FQDN>@REALM We are using Active Directory. How can we make this work with old server principal.
... View more
09-05-2018
01:57 PM
I triggered HDFS BDR job as hdfs user. It is failed with below error message. INFO distcp.DistCp: map 96% reduce 0% files 99% bytes 80% throughput 24.7 (MB/s) remaining time 25 mins 27 secs running mappers 1 INFO distcp.DistCp: map 96% reduce 32% files 99% bytes 80% throughput 24.7 (MB/s) remaining time 25 mins 27 secs running mappers 1 INFO ipc.Client: Retrying connect to server: FQDN/IP:PORT. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) INFO ipc.Client: Retrying connect to server: FQDN/IP:PORT. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) INFO ipc.Client: Retrying connect to server: FQDN/IP:PORT. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server WARN security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:PROXY) via <hdfs_principal_name> (auth:KERBEROS) cause:java.io.IOException: Job status not available ERROR util.DistCpUtils: Exception encountered java.io.IOException: Job status not available at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:334) at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:621) at com.cloudera.enterprise.distcp.DistCp.checkProgress(DistCp.java:471) at com.cloudera.enterprise.distcp.DistCp.execute(DistCp.java:461) at com.cloudera.enterprise.distcp.DistCp$1.run(DistCp.java:151) at com.cloudera.enterprise.distcp.DistCp$1.run(DistCp.java:148) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at com.cloudera.enterprise.distcp.DistCp.run(DistCp.java:148) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at com.cloudera.enterprise.distcp.DistCp.main(DistCp.java:843) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 18/09/05 16:32:09 INFO distcp.DistCp: Used diff: false 18/09/05 16:32:09 WARN distcp.DistCp: Killing submitted job job_1536157028747_0004
... View more
09-04-2018
10:22 PM
We are migrating to new cluster. We performed HostName Redirection by alaising every old server in old cluster alias to every new server in new cluster. This way users will use the same old hostnames to connect to new cluster. I am currently testing HostName testing for every service that we have. Hostname testing for impala worked as expected when checked with impala-shell command. I am able to check all the jobs ran using Job History Server URL which has new server hostname. But when I give equivalent hostname that we mapped during hostname reidrection, it is saying this. HTTP ERROR 403 Problem accessing /jobhistory. Reason: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails) Powered by Jetty:// Same is the case when I try to access Resource Manager and Namenode UI.
... View more
08-23-2018
03:43 PM
Since we are running BDR jobs using hdfs user, I think this '/user/history' has been set to 'hdfs:supergroup'. I changed to mapred:hadoop and ran the job again but still it is failing with the same error. I see it creates another directoy inside '/user/history/done_intermediate/hdfs' with the name whoever runs a mapreduce job. It is writing .jhist, .summary, .xml extension files with below mentioned permissions and ownerships. -rwxrwx--- 1 hdfs supergroup Can we change anything in such a way that files its going to create in ' /user/history/done_intermediate/hdfs' will have enough permissions to read/write.
... View more
08-23-2018
03:20 PM
Are you talking on destination cluster? Do you know how to add mapred user to supergroup? I don't see supergroup name at all in /etc/group. But I see mapred user in hadoop group. I need to add mapred user to supergroup as I am running BDR jobs as hdfs user.
... View more
08-22-2018
06:57 PM
Hi,
HDFS and Hive Replication schedule is failing with beloew error.
18/08/22 19:40:02 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:PROXY) via hdfs/<FQDN>@realm (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnRuntimeException): org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load history file hdfs://nameservice9:8020/user/history/done_intermediate/hdfs/job_1534964122122_0006-1534978377784-hdfs-HdfsReplication-1534981192465-20-1-SUCCEEDED-root.users.hdfs-1534978383044.jhist at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:199) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getJob(JobHistory.java:218) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler$1.run(HistoryClientService.java:208) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler$1.run(HistoryClientService.java:204) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.verifyAndGetJob(HistoryClientService.java:204) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getJobReport(HistoryClientService.java:236) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2220) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2214) Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load history file hdfs://nameservice9:8020/user/history/done_intermediate/hdfs/job_1534964122122_0006-1534978377784-hdfs-HdfsReplication-1534981192465-20-1-SUCCEEDED-root.users.hdfs-1534978383044.jhist at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.loadFullHistoryData(CompletedJob.java:341) at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.<init>(CompletedJob.java:101) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo.loadJob(HistoryFileManager.java:479) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.loadJob(CachedHistoryStorage.java:180) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.access$000(CachedHistoryStorage.java:52) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage$1.load(CachedHistoryStorage.java:103) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage$1.load(CachedHistoryStorage.java:100) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) at com.google.common.cache.LocalCache.get(LocalCache.java:3965) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969) at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829) at com.google.common.cache.LocalCache$LocalManualCache.getUnchecked(LocalCache.java:4834) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:193) ... 18 more Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=READ, inode="/user/history/done_intermediate/hdfs/job_1534964122122_0006-1534978377784-hdfs-HdfsReplication-1534981192465-20-1-SUCCEEDED-root.users.hdfs-1534978383044.jhist":hdfs:supergroup:-rwxrwx---
... View more