Member since
03-23-2015
1288
Posts
114
Kudos Received
98
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4416 | 06-11-2020 02:45 PM | |
| 6054 | 05-01-2020 12:23 AM | |
| 3884 | 04-21-2020 03:38 PM | |
| 4075 | 04-14-2020 12:26 AM | |
| 3047 | 02-27-2020 05:51 PM |
08-13-2019
04:27 PM
Hi Nekkanti, The Sqoop action in Oozie, you do not need "sqoop" parent command anymore, you just need sub-command. So change "<arg>sqoop job</arg>" to "<arg>job</arg>". Also, you should not create job through Oozie, because if you run it again, it might fail, you should create job first and then just execute the job through Oozie so that Sqoop will pick up last value automatically. If still fails, please share the oozie launcher log for reviewing the error messages. Cheers Eric
... View more
08-12-2019
04:50 PM
@Sona, Sorry I missed your question in May. For (1), please refer to my previous update. For (2), yes all paths that store Hive databases/tables should be managed by Hive/Sentry, so those paths should be configured under Sentry Synchronization Path Prefixes setting and need to be owned by "hive:hive". The idea of Sentry is to have everything managed by "hive" so that no one can do direct modifications without going through Hive/Sentry. Cheers Eric
... View more
08-12-2019
04:39 PM
1 Kudo
Hi @vinodnerella, As @Sona mentioned, after job finishes, you can find out the user who ran the job through Cloudera Manager's YARN applications list page. When the job is running, you can find out the actual user who ran the job by checking YARN application's configuration setting called "hive.sentry.subject.name". If you access through RM, click on the "Configuration" link on the left side of the job details page. If you access through Hue, click through the job details page and go to "Metadata" tab and search "hive.sentry.subject.name". This setting stores the original user who submitted the job, as after Sentry is enabled, impersonation is turned off. Of course, this only works in sentry enabled environment. Cheers Eric
... View more
08-12-2019
04:04 PM
1 Kudo
Go to RM web UI to see the amount of resources you have in your cluster and check if your job requires more than that. This can confirm you are out of resources. Cheers Eric
... View more
08-12-2019
12:11 AM
Hi Nekkanti, You can use Sqoop saved jobs feature, where Sqoop will remember the last incremental import and continue from where it left off, please refer to below doc upstream: http://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html#_saved_jobs >> If a saved job is configured to perform an incremental import, state regarding the most recently imported rows is updated in the saved job to allow the job to continually import only the newest rows. Cheers Eric
... View more
08-11-2019
04:42 PM
Hi Vinod, You should NOT need to restart services on weekly or monthly basis, unless you have scheduled maintenance work like making configurations changes. CDH services are expected to continue their services without the need to restart, unless you face any issues like memory pressure, crashing etc, which will require further investigation. Regarding YARN aggregation log not cleared properly, please refer to below post and KB: https://community.cloudera.com/t5/Support-Questions/Yarn-Aggregate-Log-Retention-Setting/m-p/81382 https://my.cloudera.com/knowledge/YARN-logs-under-tmp-logs-user-name-logs-not-cleared-properly?id=75330 Cheers Eric
... View more
08-09-2019
07:09 PM
1 Kudo
Hi, "HiveServer2 Enable Impersonation is setting to TRUE" is probably the reason. When Impersonation is true, it means Hive will impersonate as the end user who runs the query to submit jobs. Your ACL output showed that the directory is owned by "hive:hive" and as @Tomas79 found out, you have sticky bit set, so if hive needs to impersonate as the end user, the end user who runs the query will not be able to delete the path as he/she is not the owner. If impersonation is OFF, then HS2 will run query as "hive" user (the user that runs HS2 process), then you should not see such issue. I assume you have no sentry? As sentry will require Impersonation to be OFF on HS2 side, so that all queries will be running under "hive" user. To test the theory, try to remove the sticky bit on this path and drop again in Hive. Cheers Eric
... View more
08-09-2019
01:58 AM
Hi, Please also check what was the error reported on the Impala side in Impala Daemon logs, which is normally under /var/log/impalad. You need to know which host you connect to, and if you use LB, you need to trace it as it might be hard to determine. Check in your ODBC driver if you have SSL connection ticked something in the configuration. Cheers Eric
... View more
08-09-2019
01:53 AM
Hi, I don't see any errors posted in the post, can you please share again? In general, Hive have locking mechanism enabled, so when INSERT is happening on a table, an exclusive lock will be place on the table and any query tries to run against the same table will have to wait, and there is a timeout to control how long the wait will be, and the query will fail with error if timeout exceeded, otherwise it will just keep waiting until lock is released. So to understand more on the issue, I need to see what error was reported. Cheers Eric
... View more