Member since
09-06-2016
6
Posts
0
Kudos Received
0
Solutions
08-22-2018
10:16 AM
Really sorry for the late reply. This somehow fell off my radar. To answer your question, for the exchange node 6, you can sum the TotalBytesSent for each DatastreamSender for the total bytes shuffled between all instances of F00 to all instances of exchange node 6. Impala currently doesn't consider network bandwidth to decide whether it will compress row batches or not. It's true that if the network bandwidth is plentiful, we may save some CPU time by skipping the compression/decompression step. For multi-rack deployment, the network bandwidth is likely bound by the throughput of top-of-rack switch so compression usually helps in this case. Please let me know if I may have missed other questions.
... View more
08-15-2018
01:12 AM
The remote read statistics should be recorded in BytesReadRemoteUnexpected. Please note that the 27.57MB read from HDFS is the raw HDFS block but the actual table data may be compressed. The scan node will unpack the block, parse it based on the file format and convert them into row batches. So, the 27.57MB read can be unpacked into something larger. The 73.06MB recorded in the DataStreamSender is the total number of bytes sent across the network. If we are broadcasting the data to all destination exchange nodes, the total bytes sent will be (row batch size * num destination nodes). If we are using hash partitioning or random, the number of bytes sent should equal to row batch size. I cannot tell from the quote above whether we are using hash partitioning or broadcasting. It may help if you attach the entire query profile. Please also note that DataStreamSender will compress the row batches before sending them and TotalBytesSent correspond to the actual number of bytes which hit the network after compression. The total size before compression is recorded in UncompressedRowBatchSize. The TotalNetworkThroughput seems a bit wonky and I filed IMPALA-7449 to fix it.
... View more
08-10-2018
02:01 PM
@ImpalaStorm wrote: Q: I guess the exchange times do only include the hashing?! A: The hashing (if any) is actually done in the DataStreamSender. I cannot tell from the profile if the data shuffling strategy is broadcast or hash-partitioned. Q: Is TotalBytesSent the amount of bytes this node sents to node5? A: Yes, this is the total byte sent to all instances of node5. Q: What are BytesSent and the comma separated list of MBs? What is the time in brackets? Q: This is a time series counter of the value TotalBytesSent. The value in the bracket is the period in which the samples were taken. By defult, samples are taken every 500ms. However, as there is a bound for the maximum number of samples kept in the time series, we start merging samples once the maximum length of the time series is reached and that's when you'll see a value different from the default sampling period. Q: If I add all six TotalBytesSent values up, I get ~400MB. Do I have to calculate the time myself (based on the given throughput)? A: Yes, dividing the total bytes sent / total throughput should give an approximated network time for a particular DataStreamSender. Please note that a DataStreamSender may send to multiple receivers in parallel so the network time may not necessarily map to wall clock time. Q: All six exchange nodes look roughly the same. What is the Total time of 1s488ms in this context? A: This is the total time spent in the exchange node, including the wait time in the receiver waiting for data to arrive. The non-child time is essentially measuring the active time spent executing any code in exchange node. The rest will mostly be due to wait time. "DataWaitTime" in profile records the amount of time the DataStreamReceiver spent waiting for data to arrive.
... View more
02-17-2017
11:29 AM
The key part of the error is indicated here: # SIGSEGV (0xb) at pc=0x0000000000b7b623, pid=6574, tid=140376947717888
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [impalad+0x77b623] boost::re_detail::perl_matcher<__gnu_cxx::__normal_iterator<char const*, std::string>, std::allocator<boost::sub_match<__gnu_cxx::__normal_iterator<char const*, std::string> > >, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::~perl_matcher()+0x53 It's unclear from the output whether the problem is related to Impala or the C++ function of the UDF itself. It appears to be the latter. To understand the problem, please enable coredumping by doing "ulimit -c unlimited" and once the core file is located, do gdb -c <core-file> <impalad>.
... View more
02-17-2017
01:22 AM
We are still investigating the linking error. In the meantime, would you mind giving an older version of the UDF SDK a try ? It should be mostly compatible with 5.9.0. Sorry for the trouble.
... View more
09-06-2016
10:27 AM
I would suggest looking in the log directory to see if you see any crash information there in impalad.INFO or impalad.FATAL. If so, can you please share them ?
... View more