About venkatsambath

venkatsambath · ‎02-17-2020

The klist result shows you are submitting job as HTTP user hostname.org:~:HADOOP QA]$ klist Ticket cache: FILE:/tmp/krb5cc_251473 Default principal: HTTP/hostname.org@FQDN.COM WARN security.UserGroupInformation: PriviledgedActionException as:HTTP/hostname.org@FQDN.COM (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=HTTP, access=WRITE, inode="/user":mcaf:supergroup:drwxr-xr-x The above error just implies you don't have write permission for HTTP user on /user directory. So you can either provide write permission for "others" for /user in hdfs so that HTTP user can write or run the job after you kinit as user mcaf which has write permission

venkatsambath · ‎02-11-2020

You will need to further isolate the issue to understand the root cause. There are 4 tables involved rmt_demo.resume_convert, rmt_demo.job_description_convert, rmt_demo.skill_count and rmt_demo.education. Are you noticing results when you do select each of these tables individually? If yes, after doing join you are not noticing result then it implies there are no rows matching the join criteria. If you are not able to retrieve results from none of the tables. Then you would need to inspect the table location. To get table location you can run describe formatted <table_name> and then run hdfs dfs -ls <table_location> to understand if there are any data underneath it

venkatsambath · ‎02-10-2020

What version of CDH/HDP are you trying this on? Per the query you shared you are running a insert query and for insert you wont see any results on the console? Are you trying to run another select query after this insert query which is not showing results? What results do you get for this query SELECT n.Id, t.job_id, t.job_title, n.Name, n.Email, n.Mobile_Number, n.Education, n.Total_Experiance, n.project_id, ((count(n.new_skills)*100)/s.skill_count) Average FROM rmt_demo.resume_convert n JOIN rmt_demo.job_description_convert t ON n.new_skills = t.skills and n.job_position = t.job_title JOIN rmt_demo.skill_count s ON n.job_position = s.job_title JOIN rmt_demo.education e ON n.education = e.education GROUP BY n.Id,t.job_id,t.job_title, n.Name, n.Email, n.Mobile_Number, n.Education, n.Total_Experiance,n.project_id,s.skill_count;

venkatsambath · ‎02-10-2020

You can pass that as command line argument Example: hbase org.apache.hadoop.hbase.mapreduce.RowCounter -Dmapreduce.job.cache.files=/test 'hbase_table_t10' hbase org.apache.hadoop.hbase.mapreduce.RowCounter -Dmapreduce.job.cache.files=/test '<table_name>'

venkatsambath · ‎02-09-2020

Sorry, I've not come across any scripts yet. For observability the cluster utilisation report is something that you can review to understand how weightage influenced the load. More details are in this link https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/admin_cluster_util_report.html#concept_edr_ntt_2v

venkatsambath · ‎02-09-2020

The tuning of this property totally depends on your use case. yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent: Queue level AMshare For instance lets say your cluster is primarily used for oozie. For each oozie action [except sshaction] you will have a oozie launcher application(map-only job which will start the jobs) and an external application which actually does the job. In this case you will have a requirement to run lots of application and inturn lots of application master. In such cases if you want to achieve more parallelism you will create a dedicated queue for launcher application[oozie.launcher.mapred.job.queue.name can be used to direct all launcher application to this dedicated queue] and another another queue for the external application. You can then set 0.5 to launcher queue which has a single AM and single Mapper so equal distribution is rational setting. At cluster level yarn.scheduler.capacity.maximum-am-resource-percent - lets say you have the capacity to run 1000 containers and each of your application on an average runs 10 mapper. Then setting this value to 10% would allow you to run, 100 application in parallel (100 application master and 900 mappers). If you set this to 20% then you get a chance to run 200 application in parallel (200 application master and 800 mapper container) - each application will run short of 2 containers and will wait for other application to finish and the average throughput of your application will be little longer.

venkatsambath · ‎04-08-2019

yes you have to upgrade to cdh6.1.0 or higher to use impala-3.1.0 - Its not possible to selectively upgrade impala alone https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_61_packaging.html

venkatsambath · ‎02-12-2019

What exact command you used for impala-shell? Can you try impala-shell -i <haproxy-host> and let us know if it works. Also do you have any overrides in hue.ini, hue_safety_valve.ini hue_safety_valve_server.ini If yes what values are under [impala] tag? Also Is there any reason behind having ELB and also Haproxy?

venkatsambath · ‎02-12-2019

<td> <samp>principal (string)</samp></td> <td><samp>impala/master2-impala-20.yodlee.com@YODLEEINSIGHTS.COM</samp></td> Impalad will expect the client to use this SPN while client tries to connect to it. This is why it failed when you put host fqdn as haproxy When you enable haproxy in CM > Impala > Configuration > Impalad Load Balancer name then CM will prepare a merged keytab consisting of SPN of the loadbalancer and also changes this principal field in impalad configuration to haproxy spn, After which you will be able to connect to impalad.

venkatsambath · ‎02-11-2019

What value are you noticing for principal in the impalad-webui varz page? https://<impalad-hostname>:25000/varz Did you add the ELB or Haproxy details in the CM > Impala > Configuration > Impala Daemons Load Balancer ?

Online	Offline
Last Visited	‎12-20-2024 03:10 PM

Member Since	‎12-11-2015 07:09 AM
Last Visited	‎12-20-2024 03:10 PM
Posts	206
Kudos received	30

Cloudera Community

Re: Utilization Report - Cloudera Platform

Re: Run 2 kerberos ticket in a server for transfer...

Re: in-place upgrade CM problem(CM 7.4.4 to CM 7.7...

Re: Hive query failed with java.io.IOException: Ca...

Re: limit the size of files that an application ca...

Re: Yarn jobs are failing after enabling MIT-Kerbe...

Re: When i run the query it's executed but the out...

Re: When i run the query it's executed but the out...

Re: HBase Row Counter

Re: how to tune yarn.scheduler.capacity.maximum-am...

Re: how to tune yarn.scheduler.capacity.maximum-am...

Re: Upgrading to Impala 3.10

Re: Unable to connect to Impala using Cloudera ODB...

Re: Unable to connect to Impala using Cloudera ODB...

Re: Unable to connect to Impala using Cloudera ODB...