Member since
12-11-2015
206
Posts
30
Kudos Received
30
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
524 | 08-14-2024 06:24 AM | |
1598 | 10-02-2023 06:26 AM | |
1401 | 07-28-2023 06:28 AM | |
8991 | 06-02-2023 06:06 AM | |
675 | 01-09-2023 12:20 PM |
02-26-2020
04:09 AM
Hi Ansar, For CDH5.x version the spark service that is bundled with the CDH parcel was spark 1.6 version hence there was a need to add Spark2 as a separate parcel. You need not add spark2 separately for CDH6.2. When you just add spark service in CDH6.2, it will install spark2.4.0 -- For your reference, This link contains the individual service packaged inside cdh6.2.x https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_62_packaging.html#cdh_621_packaging If you would like to validate the version of spark installed alternatively you can run spark-shell --> This will show the spark version like this Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.0-cdh6.3.3
/_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_141)
Type in expressions to have them evaluated.
... View more
02-25-2020
10:52 PM
Can you follow the steps mentioned in this link https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cm_ig_extrnl_pstgrs.html#cmig_topic_5_6_1 Specifically these lines if you want to access PostgreSQL from a different host, replace 127.0.0.1 with your IP address and update postgresql.conf, which is typically found in the same place as pg_hba.conf, to include: listen_addresses = '*'
... View more
02-21-2020
09:11 AM
Thanks for the awesome explanation!! This comment from spark explains on the reason for allowing insecure connection https://issues.apache.org/jira/browse/SPARK-26019?focusedCommentId=16719231&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16719231
... View more
02-20-2020
10:07 PM
May I know the exact steps you followed to replicate the issue? Are you noticing this error when running any code snippet. Can we have a shorter version of the script to replicate on my side and evaluate further?
... View more
02-20-2020
08:54 AM
time series graph of which metric are you looking for. HTTPFS has the following metrics https://docs.cloudera.com/documentation/enterprise/5-12-x/topics/cm_metrics_httpfs.html#cm_metrics_httpfs
... View more
02-19-2020
09:56 PM
you can set hive.mapred.mode = strict; Quoting from doc: https://blog.cloudera.com/improving-query-performance-using-partitioning-in-apache-hive/ If your partitioned table is very large, you could block any full table scan queries by putting Hive into strict mode using the set hive.mapred.mode=strict command. In this mode, when users submit a query that would result in a full table scan (i.e. queries without any partitioned columns) an error is issued.
... View more
02-19-2020
08:53 PM
1 Kudo
@JeffEvans You are right. In CDH we cherry pick jiras to be included in our spark. Not all features available in upstream are expected to be present on CDH spark. The line number you quoted was added in this jira https://issues.apache.org/jira/browse/SPARK-1087 and is not back-ported to our spark code base. This is one of the reason we quote the following in our documentation Although this document makes some references to the external Spark site, not all the features, components, recommendations, and so on are applicable to Spark when used on CDH. Always cross-check the Cloudera documentation before building a reliance on some aspect of Spark that might not be supported or recommended by Cloudera. Hope this clarifies.
... View more
02-19-2020
01:01 AM
ERROR 2020Feb19 02:01:21,086 main com.client.engineering.group.JOB.main.JOBMain: org.apache.hadoop.hbase.client.RetriesExhaustedException thrown: Can't get the location On this application which particular table are you trying to access? Did you validate if the user mcaf has permission to access the concerned table (https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_sg_hbase_authorization.html#topic_8_3_2 has the commands) If there is no permission for the concerned user, grant them required privileges. If you notice privileges required for mcaf are already provided. Then checking hbase master logs during the issue timeframe would give further clues. Qn: And do we need to execute 'kinit mcaf' every time before submitting the job ? And how can we configure scheduled jobs ? Ans: Yes and how are you scheduling the jobs? If its a shell script then you can include kinit command with mcaf's keytab which would avoid prompting for password
... View more
02-18-2020
10:12 PM
yes, you are in right direction. You can set min.user.id to a value lower value like 500 and then re-submit the job
... View more
02-18-2020
07:42 PM
Actually we use mcaf as a user to execute the jobs but why http user coming to the picture ? --> By this do you mean, you switch to mcaf unix user[su - mcaf] and then run job? If yes, then its wrong. Post enabling kerberos hdfs and yarn recognises the user by the tgt and not by unix user id. So even if you su to mcaf and then have tgt as different user[say HTTP]. then yarn/hdfs recognises you by that tgt user. Can you kinit mcaf, then run klist[to ensure you have mcaf tgt] and submit the job?
... View more