Member since
12-11-2015
206
Posts
30
Kudos Received
30
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
523 | 08-14-2024 06:24 AM | |
1598 | 10-02-2023 06:26 AM | |
1400 | 07-28-2023 06:28 AM | |
8991 | 06-02-2023 06:06 AM | |
674 | 01-09-2023 12:20 PM |
02-17-2020
11:55 PM
The klist result shows you are submitting job as HTTP user hostname.org:~:HADOOP QA]$ klist Ticket cache: FILE:/tmp/krb5cc_251473 Default principal: HTTP/hostname.org@FQDN.COM WARN security.UserGroupInformation: PriviledgedActionException as:HTTP/hostname.org@FQDN.COM (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=HTTP, access=WRITE, inode="/user":mcaf:supergroup:drwxr-xr-x The above error just implies you don't have write permission for HTTP user on /user directory. So you can either provide write permission for "others" for /user in hdfs so that HTTP user can write or run the job after you kinit as user mcaf which has write permission
... View more
02-11-2020
02:30 AM
You will need to further isolate the issue to understand the root cause. There are 4 tables involved rmt_demo.resume_convert, rmt_demo.job_description_convert, rmt_demo.skill_count and rmt_demo.education. Are you noticing results when you do select each of these tables individually? If yes, after doing join you are not noticing result then it implies there are no rows matching the join criteria. If you are not able to retrieve results from none of the tables. Then you would need to inspect the table location. To get table location you can run describe formatted <table_name> and then run hdfs dfs -ls <table_location> to understand if there are any data underneath it
... View more
02-10-2020
10:49 PM
What version of CDH/HDP are you trying this on? Per the query you shared you are running a insert query and for insert you wont see any results on the console? Are you trying to run another select query after this insert query which is not showing results? What results do you get for this query SELECT n.Id,
t.job_id,
t.job_title,
n.Name,
n.Email,
n.Mobile_Number,
n.Education,
n.Total_Experiance,
n.project_id,
((count(n.new_skills)*100)/s.skill_count) Average
FROM
rmt_demo.resume_convert n
JOIN
rmt_demo.job_description_convert t ON n.new_skills = t.skills and n.job_position = t.job_title
JOIN
rmt_demo.skill_count s ON n.job_position = s.job_title
JOIN
rmt_demo.education e ON n.education = e.education
GROUP BY
n.Id,t.job_id,t.job_title, n.Name, n.Email, n.Mobile_Number, n.Education, n.Total_Experiance,n.project_id,s.skill_count;
... View more
02-10-2020
06:45 AM
1 Kudo
You can pass that as command line argument Example: hbase org.apache.hadoop.hbase.mapreduce.RowCounter -Dmapreduce.job.cache.files=/test 'hbase_table_t10'
hbase org.apache.hadoop.hbase.mapreduce.RowCounter -Dmapreduce.job.cache.files=/test '<table_name>'
... View more
02-09-2020
07:19 PM
Sorry, I've not come across any scripts yet. For observability the cluster utilisation report is something that you can review to understand how weightage influenced the load. More details are in this link https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/admin_cluster_util_report.html#concept_edr_ntt_2v
... View more
02-09-2020
06:10 PM
1 Kudo
The tuning of this property totally depends on your use case. yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent: Queue level AMshare For instance lets say your cluster is primarily used for oozie. For each oozie action [except sshaction] you will have a oozie launcher application(map-only job which will start the jobs) and an external application which actually does the job. In this case you will have a requirement to run lots of application and inturn lots of application master. In such cases if you want to achieve more parallelism you will create a dedicated queue for launcher application[oozie.launcher.mapred.job.queue.name can be used to direct all launcher application to this dedicated queue] and another another queue for the external application. You can then set 0.5 to launcher queue which has a single AM and single Mapper so equal distribution is rational setting. At cluster level yarn.scheduler.capacity.maximum-am-resource-percent - lets say you have the capacity to run 1000 containers and each of your application on an average runs 10 mapper. Then setting this value to 10% would allow you to run, 100 application in parallel (100 application master and 900 mappers). If you set this to 20% then you get a chance to run 200 application in parallel (200 application master and 800 mapper container) - each application will run short of 2 containers and will wait for other application to finish and the average throughput of your application will be little longer.
... View more
04-08-2019
11:45 PM
1 Kudo
yes you have to upgrade to cdh6.1.0 or higher to use impala-3.1.0 - Its not possible to selectively upgrade impala alone https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_61_packaging.html
... View more
02-12-2019
10:02 PM
What exact command you used for impala-shell? Can you try impala-shell -i <haproxy-host> and let us know if it works. Also do you have any overrides in hue.ini, hue_safety_valve.ini hue_safety_valve_server.ini If yes what values are under [impala] tag? Also Is there any reason behind having ELB and also Haproxy?
... View more
02-12-2019
01:32 AM
<td> <samp>principal (string)</samp></td> <td><samp>impala/master2-impala-20.yodlee.com@YODLEEINSIGHTS.COM</samp></td> Impalad will expect the client to use this SPN while client tries to connect to it. This is why it failed when you put host fqdn as haproxy When you enable haproxy in CM > Impala > Configuration > Impalad Load Balancer name then CM will prepare a merged keytab consisting of SPN of the loadbalancer and also changes this principal field in impalad configuration to haproxy spn, After which you will be able to connect to impalad.
... View more
02-11-2019
09:36 PM
What value are you noticing for principal in the impalad-webui varz page? https://<impalad-hostname>:25000/varz Did you add the ELB or Haproxy details in the CM > Impala > Configuration > Impala Daemons Load Balancer ?
... View more