About ABaaya

ABaaya · ‎11-07-2017

Below is the error message i received . 1137 2017-11-07 03:55:11,536 [INFO ] There are no more tasks to run at this time 1138 Starting Impala Shell without Kerberos authentication 1140 Server version: impalad version 2.6.0-cdh5.8.4 RELEASE (build 207450616f75adbe082a4c2e1145a2384da83fa6) 1141 Invalidating Metadata 1142 Query: invalidate metadata 1143 1144 Fetched 0 row(s) in 4.11s 1145 Query: use `DBNAME` 1146 Query: insert overwrite table Table partition(recordtype) select adid,seg,profile,livecount, 1147 count(distinct mc) as nofs,stbnt,1 from table1 where livecount<>0 group by adid,seg,profile,livecount,stbnt 1148 WARNINGS: 1149 CatalogException: Table 'dbname.table' was modified while operation was in progress, aborting execution. 1150 1151

ABaaya · ‎11-07-2017

Did anyone look into this issue? I am also facing the same issue.I am using CDH5.10.2

ABaaya · ‎08-24-2017

The below link connects and filter results from API to get last 30 days of jobs and limit is increased to 10000 jobs at a time. Note: All times "endTime=1503438687495" are EPOC time, so filter your times based on requirement. Also set limit to what ever number of jobs you want to be displayed. http://cloudera-manager-host-ip/cmf/yarn/completedApplications?startTime=1500758462000&endTime=1503438687495&offset=0&limit=10000&serviceName=yarn&histogramAttributes=allocated_memory_seconds%2Callocated_vcore_seconds%2Ccpu_milliseconds%2Capplication_duration%2Chdfs_bytes_read%2Chdfs_bytes_written

ABaaya · ‎08-22-2017

Hi All I am trying to get all hive queries for last 30 days ran on my cluster. I selected all the jobs on cloudera YARN application UI and filtered for last 30 days and also selected attribute of hive_query_string, which allows me to see the actual query. The only issue is that cloudera restricts UI to show only last 100 jobs at a time. Because of this i can't get details on all the other queries. I tried to hit this api http://cluster:8088/ws/v1/cluster/apps to get all the details, two issue: there is no filter and other is there is no hive_query_string variable to filter. It just shows me all the job details. Is there an API exposed by cloudera where i can filter and get all the hive_query_string or if i can configure the cloudera UI to show me more than 100 jobs(or just export it). Let me know. Thanks AB

ABaaya · ‎08-14-2017

Hi All Thanks @Yuexin Zhang for the response. I figured out the solution for this. Below is the actual submit which worked for me. The catch here is that when we submit in cluster mode, it uploads the file to a staging dir on hdfs. Now the path and name of the file is different on hdfs then what it expects in the program. To make that file available in the program, u have to make an alias for that file with '#' like mentioned below. (thats the only trick). Now everywhere, u need to refer to that file, just mention that alias on spark submit command. I mentioned the complete walkthrough and how to reach the solution in below links i referred to. Issue also discussed here - https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-File-not-found-error-works-fine-in-local-mode-but-failed/m-p/32306#M1149 - (Didn't actually helped me resolved, so i posted it separately) Section "Important notes" in http://spark.apache.org/docs/latest/running-on-yarn.html ( Kinda have to read between the lines) Blog explaining the reason - http://progexc.blogspot.com/2014/12/spark-configuration-mess-solved.html (Nice blog 🙂 ) spark-submit \ --master yarn \ --deploy-mode cluster \ --class myCLASS \ --properties-file /home/abhig/spark.conf \ --files /home/abhig/application.conf#application.conf,/home/abhig/log4.properties#log4j \ --conf "spark.executor.extraJavaOptions=-Dconfig.resource=application.conf -Dlog4j.configuration=log4j" \ --conf spark.driver.extraJavaOptions="-Dconfig.file=application.conf -Dlog4j.configuration=log4j" \ /local/project/gateway/mypgm.jar Hope this helps the next person facing similar issue!

ABaaya · ‎08-11-2017

Hi All I have been trying to submit below spark job in cluster mode through a bash shell. Client mode submit works perfectly fine. But when i switch to cluster mode, this fails with error, no app file present. App file refers to missing application.conf. spark-submit \ --master yarn \ --deploy-mode cluster \ --class myCLASS \ --properties-file /home/abhig/spark.conf \ --files /home/abhig/application.conf \ --conf "spark.executor.extraJavaOptions=-Dconfig.resource=application.conf -Dlog4j.configuration=/home/abhig/log4.properties" \ --driver-java-options "-Dconfig.file=/home/abhig/application.conf -Dlog4j.configuration=/home/abhig/log4.properties" \ /loca/project/gateway/mypgm.jar I followed the link below on similar post https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-File-not-found-error-works-fine-in-local-mode-but-failed/m-p/32306#M1149 This solution mentioned is still not clear. I even tried --files $CONFIG_FILE#application.conf Still it doesn't work. Any help will be appreciated. Thanks AB

ABaaya · ‎04-18-2017

Changing default umask through cloudera manager properties of HDFS from 022 to 002 helped out to get child dir inherit the permissions from parent dir.

ABaaya · ‎04-05-2017

Hi All How do i ensure that the child dir and files created by a member of a group having rwx permissions on hdfs have the same rwx permission as parent? I tried chmod and acls both as suggested by apache and cloudera. All the new dirs created by a user in a group having permission to write are still having the r-x permissions instead of rwx which i want. I have also enabled dfs.namenode.posix.acl.inheritance.enabled to true and dfs.permissions also to true as mentioned in https://issues.apache.org/jira/browse/HDFS-6962. fs.permissions.umask-mode=000 dfs.umaskmode, fs.permissions.umask-mode=022 [root@dev ~]# id abhig uid=515(abhig) gid=519(abhig) groups=519(abhig),525(low_priority),528(devgrp) ******************************************** [abhig@dev ~]$ hdfs dfs -setfacl -m default:group:devgrp:rwx /test [abhig@dev ~]$ hdfs dfs -getfacl /test # file: /test # owner: abhig # group: devgrp user::rwx group::r-x other::r-x default:user::rwx default:group::r-x default:group:devgrp:rwx default:mask::rwx default:other::r-x ******************************************** [abhig@dev ~]$ hdfs dfs -mkdir /test/tst1 [abhig@dev ~]$ hdfs dfs -getfacl /test/tst1 # file: /test/tst1 # owner: abhig # group: devgrp user::rwx group::r-x group:devgrp:rwx #effective:r-x mask::r-x other::r-x default:user::rwx default:group::r-x default:group:devgrp:rwx default:mask::rwx default:other::r-x ********************************************* This doesn't help much https://community.cloudera.com/t5/Storage-Random-Access-HDFS/HDFS-ACL-Inheritance/m-p/25494#M1092 Please give a workaround if any.

ABaaya · ‎04-04-2017

Hi Has anyone faced issue with impala catalog server creating jar files and storing it in /tmp dir on the node where impala catalog server is running? Like below. Everytime i run invalidate metadata on any database, it creates 2 jar files like below and doesnt delete those after it finishes. These files keep on increarsing and overtime start causing below issue Query: invalidate metadata ERROR: FSError: java.io.IOException: No space left on device CAUSED BY: IOException: No space left on device Also resulting in failure of catalog server and impala daemon eventually. This happens with only one of my cluster. All my clusters are running on same version. No change done recently. Using CDH 5.8.3, Impala Shell v2.6.0-cdh5.8.3. Jar files created on running invalidate metadata command. -rw-r--r-- 1 impala impala 68246195 Apr 4 12:54 0079d271-f044-46be-9580-7d98cd4fced2.jar -rw-r--r-- 1 impala impala 68246195 Apr 4 12:53 1dd950d5-2db9-4a88-9043-02dff869697c.jar -rw-r--r-- 1 impala impala 68246195 Apr 4 12:53 2c910b59-83fc-4cde-96d3-3ac086f9fcb2.jar -rw-r--r-- 1 impala impala 68246195 Apr 4 13:02 318ef680-2ea2-47fb-b688-8df439af676c.jar -rw-r--r-- 1 impala impala 68246195 Apr 4 12:50 3b15f36a-5353-4553-bee6-96c5ac807703.jar -rw-r--r-- 1 impala impala 68246195 Apr 4 12:53 43e13701-dad8-4892-a19c-43125dbaf1e1.jar -rw-r--r-- 1 impala impala 68246195 Apr 4 12:50 51a5b806-fb0c-444a-a989-87e832501711.jar -rw-r--r-- 1 impala impala 68246195 Apr 4 12:55 54ef3cd7-ea8d-4932-a032-9b3f63f5a60b.jar -rw-r--r-- 1 impala impala 68246195 Apr 4 12:58 81dd054d-d720-4d11-826a-eba2518e4381.jar -rw-r--r-- 1 impala impala 68246195 Apr 4 12:53 93cc7752-c80c-4f70-a47c-14681fd5c487.jar -rw-r--r-- 1 impala impala 68246195 Apr 4 12:58 9a25bded-1cd7-4773-8962-0fc41705f728.jar -rw-r--r-- 1 impala impala 68246195 Apr 4 12:52 a70b6d71-191e-494a-a78e-63a03642e0e9.jar -rw-r--r-- 1 impala impala 68246195 Apr 4 12:54 a7a102f8-9abe-4388-90a4-8abd85bb9c09.jar

ABaaya · ‎03-15-2017

Was unable to re direct completed spark jobs from yarn to spark history server even though all permissions and spark conf was set correctly. Might be useful. The issue was we were passing a spark.conf file while submitting the spark job hoping the config changes would be aggregated with default parameters from default spark.conf. Turns out it overrides the default spark config file. Even if you pass blank spark conf it will not consider the default spark.conf for the job. We had to below 3 lines on the custom spark conf file to enable log aggregation at spark history server and URL at resource manager to point to spark history server. This has to be done with every spark job. If a job is submitted with below 3 parms it will not be available in spark history server even if u restart anything. ```spark.eventLog.enabled=true spark.eventLog.dir=hdfs://nameservice1/user/spark/applicationHistory spark.yarn.historyServer.address=http://sparkhist-dev.visibleworld.com:18088``` https://community.cloudera.com/t5/CDH-Manual-Installation/Permission-denied-user-mapred-access-WRITE-inode-quot-quot-hdfs/td-p/16318/page/2

Online	Offline
Last Visited	‎11-22-2017 09:08 AM

Member Since	‎10-25-2016 01:36 PM
Last Visited	‎11-22-2017 09:08 AM
Posts	18
Kudos received	2

Cloudera Community

Re: Get actual query details of all HIVE jobs in l...

Re: Spark job fails in cluster mode.

Re: HDFS ACL Inheritance doesn't work

Re: Errors while running alter table and compute s...

Re: Errors while running alter table and compute s...

Re: Get actual query details of all HIVE jobs in l...

Get actual query details of all HIVE jobs in last ...

Re: Spark job fails in cluster mode.

Spark job fails in cluster mode.

Re: HDFS ACL Inheritance doesn't work

HDFS ACL Inheritance doesn't work

Jar files created when invalidate metadata is ran ...

Re: CDH5.0 VM: Error getting logs for job_14046579...