About guillaume_roger

guillaume_roger · ‎11-13-2018

Eventually, after a restart of everything (not only the services seen as requiring a restart) it went OK.

guillaume_roger · ‎11-01-2018

Hello, I installed a new (not an update) HDP 3.0.1 and seem to have many issues with the timeline server. 1) The first weird thing is that the Yarn tab in ambari keeps showing this error: ATSv2 HBase Application The HBase application reported a 'STARTED' state. Check took 2.125s 2) The second issue seems to be with oozie. Running a job, it starts but stalls with the following log repeated hundreds of times 2018-11-01 11:15:37,842 INFO [Thread-82] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING Then with: 2018-11-01 11:15:37,888 ERROR [Job ATS Event Dispatcher] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Exception while publishing configs on JOB_SUBMITTED Event for the job : job_1541066376053_0066 org.apache.hadoop.yarn.exceptions.YarnException: Failed while publishing entity at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:548) at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:149) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.publishConfigsOnJobSubmittedEvent(JobHistoryEventHandler.java:1254) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForNewTimelineService(JobHistoryEventHandler.java:1414) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleTimelineEvent(JobHistoryEventHandler.java:742) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.access$1200(JobHistoryEventHandler.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1795) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1791) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) at java.lang.Thread.run(Thread.java:745) Caused by: com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out 3) In hadoop-yarn-timelineserver-${hostname}.log I see, repeated many times: 2018-11-01 11:32:47,715 WARN timeline.EntityGroupFSTimelineStore (LogInfo.java:doParse(208)) - Error putting entity: dag_1541066376053_0144_2 (TEZ_DAG_ID): 6 4) In hadoop-yarn-timelinereader-${hostname}.log I see, repeated many times: Thu Nov 01 11:34:10 CET 2018, RpcRetryingCaller{globalStartTime=1541068444076, pause=1000, maxAttempts=4}, java.net.ConnectException: Call to /192.168.x.x:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /192.168.x.x:17020 at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145) at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) ... 3 more Caused by: java.net.ConnectException: Call to /192.168.x.x:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /192.168.x.x:17020 at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:165) and indeed, there is nothing listening to port 17020 on 192.168.x.x. 5) I cannot find on any server a process named ats-hbase, this might be the reason for everything else. The queue set is just yarn_hbase_system_service_queue_name=default, which has no limit which would prevent Hbase to start. I am sure that something is very wrong here, and any help would be appreciated.

guillaume_roger · ‎01-16-2018

@Jordan Moore Not really relevant to the question but no this is not the point. The use case here is data export, where some clients have their own BI tools, processes and so on. They just need the data, csv in a zip file. Other clients do not have this in place and have a different access to this data.

guillaume_roger · ‎01-15-2018

The zip file is the output of the process, not to be read in hdfs anymore - it will just end up being downloaded and sent to a user. In this context using zip makes sense, as I am only looking at *compressing* multiple csv together, not reading them afterwards. Using beeline with formatted output is what I do currently, but I end up downloading locally multiple gigs, compress and re-upload. This is a waste and could actually fill my local disks up. Using coalesce in Spark is the best option I found, but the compression step is still not easy. Thanks!

guillaume_roger · ‎01-09-2018

My end goal is to run a few hive queries, get 1 csv file (with headers) per query, compress all those files together in one zip (not gzip or bzip, unfortunately, needs to open natively under windows) and hopefully get the zip back in hdfs. My current solution (CTAS) ends up creating one directory per table, with possibly multiple files under it (depending on number of reducers and presence/absence of UNION). I can easily generate as well a header file per table with only one line in it. Now how to put all that together? The only option I could find implies to do all the processing locally (hdfs dfs -getmerge followed by a an actual zip command). This adds a lot of overhead and could technically fill up the local disk. So my questions are: is there a way to concatenate files inside hdfs without getting them locally? is there a way to compress a bunch of files together (not individually) in zip, inside hdfs? Thanks

guillaume_roger · ‎10-12-2017

The answer pointed at https://community.hortonworks.com/questions/57795/how-to-fix-under-replicated-blocks-fasly-its-take.html is the good one. Those are undocumented features in hadoop 2.7 but they can be set up and used and now I do see that replication is speed up.

guillaume_roger · ‎10-12-2017

Hi, I had an issue with datanodes, resulting in having about 300k under replicated blocks. DNs are back, blocks are being replicated, but this is very slow, about 1 per second, and I am trying to find a way to speed replication up. I checked dfs.datanode.balance.bandwidthPerSec, which is set at about 6MB/seconds. My monitoring shows me that on average the rx/tx from each node is about 200k/seconds, so I am way below this limit. I followed this link https://community.hortonworks.com/articles/4427/fix-under-replicated-blocks-in-hdfs-manually.html which did not help (use setrep -w 3 on all underreplicated files ) This link https://community.hortonworks.com/questions/57795/how-to-fix-under-replicated-blocks-fasly-its-take.html is not fully applicable (hadoop 2.7) but I set fs.namenode.replication.work.multiplier.per.iteration to 100 (default is 2) without visible speed up. So my question is: what can I do to fasten replication up? Context: hdp2.6, AWS, 3 nodes and replication factor = 3.

guillaume_roger · ‎10-04-2017

1) was ok, 2) and 3) were not good and no it's not a kerberized cluster. After the fixes you suggested it all seems to work as expected. Thanks a million!

guillaume_roger · ‎10-03-2017

I am using hdp2.6 and would like to properly use Tez UI. The tez view is available, if I go there I see queries, can click on a query id and follow to the dag ID, but I do not have all I expect. DAG Details and DAG Counters look good. Graphical View tells me: Data not available to display graphical view! No vertex data found in YARN Timeline Server. All Vertices, All Tasks and All Task Attempts tell me: No records available! Vertex Swimlane tells me: Data not available to display swimlane! No vertex data found in YARN Timeline Server. I have seen the documentation relative to the manual install of HDP, saying to download a war file, but I do not believe this is what I should be doing here as I am using the ambari install from the cluster. tez.tez-ui.history-url.base is http://$ambari_ip:8080/#/main/view/TEZ/tez_cluster_instance which is indeed the URL where I can reach the tez view. Is there anything obvious I could have forgotten?

guillaume_roger · ‎09-28-2017

Fair enough about the *.period. As I did get metrics there is probably a smart default, but nice to have. I indeed found some messages in the service logs, and all looks good. To be honest, it all worked today. I then happily applied the settings to prod, and lo and behold, I only have 2 metrics there. Carrying on thinking, I understood is that in metrics2.properties I say that I want for instance node manager metrics, but I then actually need to restart the node manages to see those metrics. Indeed, the cluster I worked on yesterday has been rebooted (dev cluster, switched off at night). Now all works as expected. Thanks!

Online	Offline
Last Visited	‎11-22-2019 08:48 AM

Member Since	‎10-13-2016 09:10 AM
Last Visited	‎11-22-2019 08:48 AM
Posts	68
Kudos received	10

Cloudera Community

Re: Hive odbc with prepared statements: ParseExcep...

Re: Fix under replicated blocks very slow

Re: Adding a host to ambari from another DC behind...

Re: ATS hbase does not seem to start

ATS hbase does not seem to start

Re: Concatenate and zip files in hdfs

Re: Concatenate and zip files in hdfs

Concatenate and zip files in hdfs

Re: Fix under replicated blocks very slow

Fix under replicated blocks very slow

Re: Tez UI in HDP2.6

Tez UI in HDP2.6

Re: Hadoop metrics2 to Graphite: only 2 are receiv...