Member since
03-23-2016
14
Posts
0
Kudos Received
0
Solutions
01-29-2019
05:17 PM
found this https://community.hortonworks.com/content/supportkb/228145/yarn-aggregation-log-deletion-service-is-unable-to.html I've applied the suggested change but still have log files going back to March, does it take a while to trigger clean up?
... View more
01-29-2019
04:03 PM
on HDFS I have 4TB of logs in /app-logs/hive/logs-ifile There then look to be folders for individual applications that have run going back to March 14th 2018. There are 202k folders, most are under 1MB some are a few MB and some run to GB with one being 970GB. Picking one of the smaller ones at random the files nested in the application directory it looks like it relates to Hive2 Interactive (LLAP) and I think March was about when queries started to be run on LLAP for the cluster. I've looked at the 970GB folder and it looks to be made up of 88 files of between 10-12GB each. The file names are of the format [FQDN]_45454_1540505117980 and are one of two hosts, at the time of the files of creation there would only have been two nodes in our LLAP config. My questions are: - is there somewhere I can set a retention policy for this as 10 months seems excessive logging. - can I just delete it out or could that bite me in the arse?
... View more
Labels:
10-05-2018
03:58 PM
Hi, So about a year ago we decided Hbase wasn't really working for us so we decided to remove it. It's not listed in the services list in Ambari any more and I think we ran a HDP upgrade since too. HDFS is a little low on disk so I've been doing some digging about and found that /app/hbase has a footprint that works out to about 4% of the cluster with it split between /apps/hbase/data/archive/ /apps/hbase/data/data/ If I dig into it I can see in the folder structure the names of tables that we used to run, in retrospect maybe it would have been an idea to delete all the data before uninstalling the service but hind sight is 20/20. What I wantto kow now is are there any risks to just deleting out the content? I'm leaning towards not deleting the root directory so I don't screw anything up should we add Hbase back some day, but in the folders /apps/hbase/data/archive/data/default/ /apps/hbase/data/data/default/ delete the folders with the table names I no longer care about. Does anyone have any advice on this?
... View more
Labels:
06-18-2018
09:20 AM
Hi I've got 8 nodes in the cluster running yarn and 4 of them have been allocated to hive. In the yarn UI scheduler view I can see that 50% of the cluster has been allocated to and 49% is being used by hive (I found I needed to scale it back a smidge to ensure it launhed. Load averages on the boxes that LLAP is running in do no go over 0.3 which seems odd to me. When ever a hive job runs it spins up large Tez containers in the default queue. When I look at the LLAP over view `namenode:10502/llap.html` and dig into each node it shows 9 executors present but used never goes over 0. All this makes me think while it's there, nothing is using it, but I don't know why. Has anyone got any ideas or is there a check sheet I can go through to get this running right. Many thanks Ant
... View more
Labels:
05-16-2018
09:39 AM
I have a pig job which runs as a step in a workflow, this has been running in a map reduce container but I recently moved it to a tez container as it's a more effecient use of system resource and the job runs faster. If I look in tez I can see the job completes sucessfully also the workflow accepts that the stage completed and moves on to the next stage in the workflow. However if I watch the containers from the Yarn ResourceManager UI I see that the pig job gets reported as killed rather than finished. On digging into the error logs it's because the Yarn user seems not to be able to modify the Tez application master , I guess it's just a permissions thing but I'm not sure how to correct this. full error attached if anyone knows how to resolve this it would be much appreceated. 2018-05-16 10:02:41,175 [WARN] [IPC Server handler 0 on 41321] |ipc.Server|: IPC Server handler 0 on 41321, call org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.shutdownSession from 192.168.0.25:60510 Call#2053 Retry#0
java.security.AccessControlException: User yarn (auth:SIMPLE) cannot perform AM modify operation
at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.shutdownSession(DAGClientAMProtocolBlockingPBServerImpl.java:195)
at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7638)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
2018-05-16 10:02:41,231 [INFO] [pool-1-thread-1] |app.DAGAppMaster|: DAGAppMasterShutdownHook invoked
... View more
Labels:
05-01-2018
03:26 PM
Iv'e recently enabled compaction on Hive and I'm seeing a job appear regularly in the default queue. I'd like to portion off a slither of the cluster to this function by creating a new queue for it but how do I tell it what queue to use to do this rather than it going to default?
... View more
Labels:
05-01-2018
08:53 AM
Running HPD 2.6.4 Ambari 2.6.1 I've inherited an issue. So the workflow co-ordinator has been deployed using Hue, first I wanted to move away from Hue and keep things hortonworks by creating the workflow and co-ordinator in the workflow manager in ambari. I've installed the view into ambari and created the workflow and set up all the actions in the job as they were in Hue. When I try and run the pig portion it fails to run though. I'm not sure what logs I'm supposed to be looking at to find out why this fails but all I'm seeing is info messages and no reason as to why it failed. It failes in under 20 seconds so it looks like it is having issues with creating the environment for pig rather than something in the job itself. It does however run happly from the co-ordinator that has been created in Hue though. Thinking I am struggling with getting pig to run in the workflow manager view I tried to tweek the job in Hue, I've set the properties on the workflow of "mapred.job.queue.name"="dataprocessing" which worked for all the Mapreduce type jobs but not those that ran in TEZ (for hive) which when I stoped and thought about it made sence so I found the variable "tez.queue.name" and set that to "dataprocessing" to but the TEZ jobs are still going to default. I want this running in data processing so that everyone else can happily muck about in default till we get some better practices set up but this job will have it's own queue with reserved resource and will run happily in the background regardless of what large jobs anyone else runs in the rest of the cluster. I'm a infrastructure guy so not dumb but workflows and pig are all a bit new to me so this may be something glaringly obvious I'm just not seeing so any advice would be greatly appreceated both on getting the TEZ stuff to run in the right queue and getting the whole thing migrated to work flow manager so I can remove all this Hue stuff out of the system.
... View more
Labels:
03-06-2018
10:17 AM
I've been looking through oozie and the map tasks that fail all run on the same nodes, the ones that dont' run on a different set of nodes but there is no cross over. Clearly something on one set of nodes differs from the other but I can't see what. I've tried to cycle YARN on all nodes with no gain.
... View more
03-05-2018
03:55 PM
I've just upgraded to HDP 2.6.4 and I have a recurring workflow launched from oozie. Sometimes the job runs sometimes it dies after about 1 min on a pig job with the error java.lang.NoSuchFieldError: HIVE_ORC_CACHE_STRIPE_DETAILS_SIZE
Has anyone got any ideas about where I need to look to resolve this? It feels like the jobs being handed to one box that can do it and other that can't and when it goes to the one that can't it gets handed out again and again till it hits the one that can. I'm not sure where to look to identify where this point of failure is though so I can either remove that node or correct it.
... View more
Labels:
03-03-2018
02:27 PM
Ambari Version: 2.6.1 HDP Version: 2.6.4 I stopped Hbase from Ambari I then told it to delete the service confirmed the delete action I then got "Error 500 Status Code recieved on DELETE Method API: /api/V1/clusters/development1/services/HBASE Error Message: Service Error" At this point Ambari became unresponsive and I needed to stop and start ambari-server. I was then able to view Ambari and Hbase was no longer listed as a service, If I look on the name node I still see hbase folders in /etc/ though so I'm not convinced it's all been removed, how can I be sure and where can I look to get more info on what went wrong? As the cluster name suggests this is a testing cluster and I can restore it back to before the service was removed and re-run the process if there are alternative ways to process this to make it clean. I want to get a process designed on the development cluster before trying this on the main one.
... View more
Labels:
02-14-2018
08:52 AM
Hi, I've got a cluster with a dozzen datanodes all running Yarn Node Managers also, I've noticed that two of the nodes (along with the name node and history server) have yarn clients on them, should the yarn clinet be on all or any of the data nodes?
... View more
Labels:
09-25-2017
04:12 PM
Has anyone had any joy with getting Hue 4 working with HDP 2.6.x I've found mention of an ambari service that I've tried to apply on a test cluster but that only looks to go as high as hue 3.11 If anyone has any information on how I might go about getting the latest version of hue to work with the latest version of HDP thtat would be most useful as I'm told that there are a great many features in the new release that would be of benifit to my user base.
... View more
03-23-2016
02:13 PM
Not to worry, I just turned the whole envionment off and on again and it looks to have resolved my issue. I'm guessing it was a cached value in NSCD that took longer to clear than the others.
... View more
03-23-2016
01:48 PM
I have just done this myself with out UAT environment which we have brought in house. I have changed the IP on all the boxes and made the changes on the DNS (both forward and reverse) to reflect their new IPs. On starting Ambari all the data nodes resolved fine but the name node is still showing the old IP in the ambari hosts data. If I ping the FQDN of the name node from the ambari box it resolves the correct IP though. Has anyone got any ideas what I need to do?
... View more