About venkatsambath

venkatsambath · ‎02-26-2020

Hi Ansar, For CDH5.x version the spark service that is bundled with the CDH parcel was spark 1.6 version hence there was a need to add Spark2 as a separate parcel. You need not add spark2 separately for CDH6.2. When you just add spark service in CDH6.2, it will install spark2.4.0 -- For your reference, This link contains the individual service packaged inside cdh6.2.x https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_62_packaging.html#cdh_621_packaging If you would like to validate the version of spark installed alternatively you can run spark-shell --> This will show the spark version like this Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.0-cdh6.3.3 /_/ Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_141) Type in expressions to have them evaluated.

venkatsambath · ‎02-25-2020

Can you follow the steps mentioned in this link https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cm_ig_extrnl_pstgrs.html#cmig_topic_5_6_1 Specifically these lines if you want to access PostgreSQL from a different host, replace 127.0.0.1 with your IP address and update postgresql.conf, which is typically found in the same place as pg_hba.conf, to include: listen_addresses = '*'

venkatsambath · ‎02-21-2020

Thanks for the awesome explanation!! This comment from spark explains on the reason for allowing insecure connection https://issues.apache.org/jira/browse/SPARK-26019?focusedCommentId=16719231&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16719231

venkatsambath · ‎02-20-2020

May I know the exact steps you followed to replicate the issue? Are you noticing this error when running any code snippet. Can we have a shorter version of the script to replicate on my side and evaluate further?

venkatsambath · ‎02-20-2020

time series graph of which metric are you looking for. HTTPFS has the following metrics https://docs.cloudera.com/documentation/enterprise/5-12-x/topics/cm_metrics_httpfs.html#cm_metrics_httpfs

venkatsambath · ‎02-19-2020

you can set hive.mapred.mode = strict; Quoting from doc: https://blog.cloudera.com/improving-query-performance-using-partitioning-in-apache-hive/ If your partitioned table is very large, you could block any full table scan queries by putting Hive into strict mode using the set hive.mapred.mode=strict command. In this mode, when users submit a query that would result in a full table scan (i.e. queries without any partitioned columns) an error is issued.

venkatsambath · ‎02-19-2020

@JeffEvans You are right. In CDH we cherry pick jiras to be included in our spark. Not all features available in upstream are expected to be present on CDH spark. The line number you quoted was added in this jira https://issues.apache.org/jira/browse/SPARK-1087 and is not back-ported to our spark code base. This is one of the reason we quote the following in our documentation Although this document makes some references to the external Spark site, not all the features, components, recommendations, and so on are applicable to Spark when used on CDH. Always cross-check the Cloudera documentation before building a reliance on some aspect of Spark that might not be supported or recommended by Cloudera. Hope this clarifies.

venkatsambath · ‎02-19-2020

ERROR 2020Feb19 02:01:21,086 main com.client.engineering.group.JOB.main.JOBMain: org.apache.hadoop.hbase.client.RetriesExhaustedException thrown: Can't get the location On this application which particular table are you trying to access? Did you validate if the user mcaf has permission to access the concerned table (https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_sg_hbase_authorization.html#topic_8_3_2 has the commands) If there is no permission for the concerned user, grant them required privileges. If you notice privileges required for mcaf are already provided. Then checking hbase master logs during the issue timeframe would give further clues. Qn: And do we need to execute 'kinit mcaf' every time before submitting the job ? And how can we configure scheduled jobs ? Ans: Yes and how are you scheduling the jobs? If its a shell script then you can include kinit command with mcaf's keytab which would avoid prompting for password

venkatsambath · ‎02-18-2020

yes, you are in right direction. You can set min.user.id to a value lower value like 500 and then re-submit the job

venkatsambath · ‎02-18-2020

Actually we use mcaf as a user to execute the jobs but why http user coming to the picture ? --> By this do you mean, you switch to mcaf unix user[su - mcaf] and then run job? If yes, then its wrong. Post enabling kerberos hdfs and yarn recognises the user by the tgt and not by unix user id. So even if you su to mcaf and then have tgt as different user[say HTTP]. then yarn/hdfs recognises you by that tgt user. Can you kinit mcaf, then run klist[to ensure you have mcaf tgt] and submit the job?

Online	Offline
Last Visited	‎12-20-2024 03:10 PM

Member Since	‎12-11-2015 07:09 AM
Last Visited	‎12-20-2024 03:10 PM
Posts	206
Kudos received	30

Cloudera Community

Re: Utilization Report - Cloudera Platform

Re: Run 2 kerberos ticket in a server for transfer...

Re: in-place upgrade CM problem(CM 7.4.4 to CM 7.7...

Re: Hive query failed with java.io.IOException: Ca...

Re: limit the size of files that an application ca...

Re: Created local repository for CDH 6.2.0 but una...

Re: Test connection getting failed while added Hiv...

Re: PySpark source code bundled with CDH 6.3 Spark...

Re: PySpark source code bundled with CDH 6.3 Spark...

Re: times series graph for httpfs in cloudera mana...

Re: How to enforce a user to query a hive table us...

Re: PySpark source code bundled with CDH 6.3 Spark...

Re: Yarn jobs are failing after enabling MIT-Kerbe...

Re: Yarn jobs are failing after enabling MIT-Kerbe...

Re: Yarn jobs are failing after enabling MIT-Kerbe...