Member since
04-26-2018
6
Posts
4
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2580 | 12-04-2018 06:27 AM | |
8516 | 05-01-2018 01:28 AM |
12-04-2018
06:27 AM
2 Kudos
Make sure that the nosuid flag isn't set on the /var (or /var/lib) mount point in /etc/fstab. Since this release the container-executor has now moved to /var/lib/yarn-ce, which for many users will be on a different mount that it was previously (perhaps /opt or /usr). This should probably be in the release notes for v5.16, as it isn't clear that the default location of container-executor has moved, and potential implications this will have. Matt
... View more
05-17-2018
02:03 PM
2 Kudos
Even easier, disable the listeners which generate the lineage files, by setting the following spark properties: --conf spark.sql.queryExecutionListeners="" --conf spark.extraListeners=""
... View more
05-01-2018
01:28 AM
The reason you were seeing HdfsParquetTableWriter::ColumnWriter is that I was testing the bug using the syntax: CREATE TABLE db.newTable STORED AS PARQUET AS SELECT a.topLevelField, b.priceFromNestedField FROM db.table a LEFT JOIN a.nestedField This was purely to force the bug to occur - if you just did the SELECT in Hue it would often succeed because it only brings back the first 100 rows - to consistently trigger the crash I had to make Impala read from both Parquet files. No other query was running at the time. Anyway, as Chris says, the bug appears to be fixed in 5.14.2. The job which originally consistently triggered the crash has now been running unchanged over the same source data for 20 hours without hitch. Thanks for your help Matt
... View more
04-26-2018
04:12 PM
Yes I think we can arrange to supply the minidump if we can generate it with mocked up data, which I can do next week as am out of office tomorrow. I can consistently reproduce the crash using a table defined over just two parquet files. One of the files has some nested columns in an ARRAY<STRUCT> field which are not defined in the other. Previously these would have been treated as NULL, but in the new release they trigger the crash. I will ask the DBA to get in touch tomorrow. Thanks
... View more
04-26-2018
02:19 PM
With SET NUM_NODES = 1 it now fails a single node only so is easier to debug. Full contents of ERROR file: Log file created at: 2018/04/26 21:47:36 Running on machine: dn.**.**.com Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg E0426 21:47:36.375561 60280 logging.cc:121] stderr will be logged to this file. tcmalloc: large alloc 1073741824 bytes == 0x7f4bb0d80000 @ 0x21c6f07 0x20483f2 Wrote minidump to /var/log/impala-minidumps/impalad/44e4d3fe-ee85-4b0c-80e45ba6-4f79bbbc.dmp tail of INFO file: I0426 21:55:59.223675 62106 impala-internal-service.cc:44] ExecQueryFInstances(): query_id=7747869aa15df350:3eb7678900000000 I0426 21:55:59.223722 62106 query-exec-mgr.cc:46] StartQueryFInstances() query_id=7747869aa15df350:3eb7678900000000 coord=dn.**.**.com:22000 I0426 21:55:59.223831 62106 query-state.cc:173] Buffer pool limit for 7747869aa15df350:3eb7678900000000: 109951162777 I0426 21:55:59.223939 62106 initial-reservations.cc:60] Successfully claimed initial reservations (143.75 MB) for query 7747869aa15df350:3eb7678900000000 I0426 21:55:59.224138 11838 query-state.cc:286] StartFInstances(): query_id=7747869aa15df350:3eb7678900000000 #instances=8 I0426 21:55:59.224687 11838 query-state.cc:299] descriptor table for query=7747869aa15df350:3eb7678900000000 tuples: Tuple(id=3 size=0 slots=[] tuple_path=[]) Tuple(id=2 size=471 slots=[Slot(id=63 type=STRING col_path=[] offset=0 null=(offset=468 mask=1) slot_idx=0 field_idx=-1), Slot(id=64 type=STRING col_path=[] offset=16 null=(offset=468 mask=2) slot_idx=1 field_idx=-1), Slot(id=65 type=STRING col_path=[] offset=32 null=(offset=468 mask=4) slot_idx=2 field_idx=-1), Slot(id=66 type=STRING col_path=[] offset=48 null=(offset=0 mask=0) slot_idx=3 field_idx=-1), Slot(id=67 type=INT col_path=[] offset=432 null=(offset=469 mask=10) slot_idx=38 field_idx=-1), Slot(id=68 type=DOUBLE col_path=[] offset=256 null=(offset=468 mask=8) slot_idx=16 field_idx=-1), Slot(id=69 type=BIGINT col_path=[] offset=264 null=(offset=0 mask=0) slot_idx=17 field_idx=-1), Slot(id=70 type=STRING col_path=[] offset=64 null=(offset=0 mask=0) slot_idx=4 field_idx=-1), Slot(id=71 type=BIGINT col_path=[] offset=272 null=(offset=0 mask=0) slot_idx=18 field_idx=-1), Slot(id=72 type=STRING col_path=[] offset=80 null=(offset=0 mask=0) slot_idx=5 field_idx=-1), Slot(id=73 type=BIGINT col_path=[] offset=280 null=(offset=0 mask=0) slot_idx=19 field_idx=-1), Slot(id=74 type=STRING col_path=[] offset=96 null=(offset=0 mask=0) slot_idx=6 field_idx=-1), Slot(id=75 type=INT col_path=[] offset=436 null=(offset=469 mask=20) slot_idx=39 field_idx=-1), Slot(id=76 type=DOUBLE col_path=[] offset=288 null=(offset=468 mask=10) slot_idx=20 field_idx=-1), Slot(id=77 type=BIGINT col_path=[] offset=296 null=(offset=0 mask=0) slot_idx=21 field_idx=-1), Slot(id=78 type=STRING col_path=[] offset=112 null=(offset=0 mask=0) slot_idx=7 field_idx=-1), Slot(id=79 type=INT col_path=[] offset=440 null=(offset=469 mask=40) slot_idx=40 field_idx=-1), Slot(id=80 type=DOUBLE col_path=[] offset=304 null=(offset=468 mask=20) slot_idx=22 field_idx=-1), Slot(id=81 type=BIGINT col_path=[] offset=312 null=(offset=0 mask=0) slot_idx=23 field_idx=-1), Slot(id=82 type=STRING col_path=[] offset=128 null=(offset=0 mask=0) slot_idx=8 field_idx=-1), Slot(id=83 type=BIGINT col_path=[] offset=320 null=(offset=0 mask=0) slot_idx=24 field_idx=-1), Slot(id=84 type=STRING col_path=[] offset=144 null=(offset=0 mask=0) slot_idx=9 field_idx=-1), Slot(id=85 type=BIGINT col_path=[] offset=328 null=(offset=0 mask=0) slot_idx=25 field_idx=-1), Slot(id=86 type=STRING col_path=[] offset=160 null=(offset=0 mask=0) slot_idx=10 field_idx=-1), Slot(id=87 type=INT col_path=[] offset=444 null=(offset=469 mask=80) slot_idx=41 field_idx=-1), Slot(id=88 type=DOUBLE col_path=[] offset=336 null=(offset=468 mask=40) slot_idx=26 field_idx=-1), Slot(id=89 type=BIGINT col_path=[] offset=344 null=(offset=0 mask=0) slot_idx=27 field_idx=-1), Slot(id=90 type=STRING col_path=[] offset=176 null=(offset=0 mask=0) slot_idx=11 field_idx=-1), Slot(id=91 type=INT col_path=[] offset=448 null=(offset=470 mask=1) slot_idx=42 field_idx=-1), Slot(id=92 type=DOUBLE col_path=[] offset=352 null=(offset=468 mask=80) slot_idx=28 field_idx=-1), Slot(id=93 type=BIGINT col_path=[] offset=360 null=(offset=0 mask=0) slot_idx=29 field_idx=-1), Slot(id=94 type=STRING col_path=[] offset=192 null=(offset=0 mask=0) slot_idx=12 field_idx=-1), Slot(id=95 type=INT col_path=[] offset=452 null=(offset=470 mask=2) slot_idx=43 field_idx=-1), Slot(id=96 type=DOUBLE col_path=[] offset=368 null=(offset=469 mask=1) slot_idx=30 field_idx=-1), Slot(id=97 type=BIGINT col_path=[] offset=376 null=(offset=0 mask=0) slot_idx=31 field_idx=-1), Slot(id=98 type=STRING col_path=[] offset=208 null=(offset=0 mask=0) slot_idx=13 field_idx=-1), Slot(id=99 type=INT col_path=[] offset=456 null=(offset=470 mask=4) slot_idx=44 field_idx=-1), Slot(id=100 type=DOUBLE col_path=[] offset=384 null=(offset=469 mask=2) slot_idx=32 field_idx=-1), Slot(id=101 type=BIGINT col_path=[] offset=392 null=(offset=0 mask=0) slot_idx=33 field_idx=-1), Slot(id=102 type=STRING col_path=[] offset=224 null=(offset=0 mask=0) slot_idx=14 field_idx=-1), Slot(id=103 type=INT col_path=[] offset=460 null=(offset=470 mask=8) slot_idx=45 field_idx=-1), Slot(id=104 type=DOUBLE col_path=[] offset=400 null=(offset=469 mask=4) slot_idx=34 field_idx=-1), Slot(id=105 type=BIGINT col_path=[] offset=408 null=(offset=0 mask=0) slot_idx=35 field_idx=-1), Slot(id=106 type=STRING col_path=[] offset=240 null=(offset=0 mask=0) slot_idx=15 field_idx=-1), Slot(id=107 type=INT col_path=[] offset=464 null=(offset=470 mask=10) slot_idx=46 field_idx=-1), Slot(id=108 type=DOUBLE col_path=[] offset=416 null=(offset=469 mask=8) slot_idx=36 field_idx=-1), Slot(id=109 type=BIGINT col_path=[] offset=424 null=(offset=0 mask=0) slot_idx=37 field_idx=-1)] tuple_path=[]) Tuple(id=1 size=13647 slots=[Slot(id=16 type=STRING col_path=[] offset=13312 null=(offset=13644 mask=1) slot_idx=13 field_idx=-1), Slot(id=17 type=STRING col_path=[] offset=13328 null=(offset=13644 mask=2) slot_idx=14 field_idx=-1), Slot(id=18 type=STRING col_path=[] offset=13344 null=(offset=13644 mask=4) slot_idx=15 field_idx=-1), Slot(id=19 type=FIXED_UDA_INTERMEDIATE(1024) col_path=[] offset=0 null=(offset=0 mask=0) slot_idx=0 field_idx=-1), Slot(id=20 type=INT col_path=[] offset=13608 null=(offset=13645 mask=10) slot_idx=38 field_idx=-1), Slot(id=21 type=FIXED_UDA_INTERMEDIATE(16) col_path=[] offset=13360 null=(offset=13644 mask=8) slot_idx=16 field_idx=-1), Slot(id=22 type=BIGINT col_path=[] offset=13504 null=(offset=0 mask=0) slot_idx=25 field_idx=-1), Slot(id=23 type=FIXED_UDA_INTERMEDIATE(1024) col_path=[] offset=1024 null=(offset=0 mask=0) slot_idx=1 field_idx=-1), Slot(id=24 type=BIGINT col_path=[] offset=13512 null=(offset=0 mask=0) slot_idx=26 field_idx=-1), Slot(id=25 type=FIXED_UDA_INTERMEDIATE(1024) col_path=[] offset=2048 null=(offset=0 mask=0) slot_idx=2 field_idx=-1), Slot(id=26 type=BIGINT col_path=[] offset=13520 null=(offset=0 mask=0) slot_idx=27 field_idx=-1), Slot(id=27 type=FIXED_UDA_INTERMEDIATE(1024) col_path=[] offset=3072 null=(offset=0 mask=0) slot_idx=3 field_idx=-1), Slot(id=28 type=INT col_path=[] offset=13612 null=(offset=13645 mask=20) slot_idx=39 field_idx=-1), Slot(id=29 type=FIXED_UDA_INTERMEDIATE(16) col_path=[] offset=13376 null=(offset=13644 mask=10) slot_idx=17 field_idx=-1), Slot(id=30 type=BIGINT col_path=[] offset=13528 null=(offset=0 mask=0) slot_idx=28 field_idx=-1), Slot(id=31 type=FIXED_UDA_INTERMEDIATE(1024) col_path=[] offset=4096 null=(offset=0 mask=0) slot_idx=4 field_idx=-1), Slot(id=32 type=INT col_path=[] offset=13616 null=(offset=13645 mask=40) slot_idx=40 field_idx=-1), Slot(id=33 type=FIXED_UDA_INTERMEDIATE(16) col_path=[] offset=13392 null=(offset=13644 mask=20) slot_idx=18 field_idx=-1), Slot(id=34 type=BIGINT col_path=[] offset=13536 null=(offset=0 mask=0) slot_idx=29 field_idx=-1), Slot(id=35 type=FIXED_UDA_INTERMEDIATE(1024) col_path=[] offset=5120 null=(offset=0 mask=0) slot_idx=5 field_idx=-1), Slot(id=36 type=BIGINT col_path=[] offset=13544 null=(offset=0 mask=0) slot_idx=30 field_idx=-1), Slot(id=37 type=FIXED_UDA_INTERMEDIATE(1024) col_path=[] offset=6144 null=(offset=0 mask=0) slot_idx=6 field_idx=-1), Slot(id=38 type=BIGINT col_path=[] offset=13552 null=(offset=0 mask=0) slot_idx=31 field_idx=-1), Slot(id=39 type=FIXED_UDA_INTERMEDIATE(1024) col_path=[] offset=7168 null=(offset=0 mask=0) slot_idx=7 field_idx=-1), Slot(id=40 type=INT col_path=[] offset=13620 null=(offset=13645 mask=80) slot_idx=41 field_idx=-1), Slot(id=41 type=FIXED_UDA_INTERMEDIATE(16) col_path=[] offset=13408 null=(offset=13644 mask=40) slot_idx=19 field_idx=-1), Slot(id=42 type=BIGINT col_path=[] offset=13560 null=(offset=0 mask=0) slot_idx=32 field_idx=-1), Slot(id=43 type=FIXED_UDA_INTERMEDIATE(1024) col_path=[] offset=8192 null=(offset=0 mask=0) slot_idx=8 field_idx=-1), Slot(id=44 type=INT col_path=[] offset=13624 null=(offset=13646 mask=1) slot_idx=42 field_idx=-1), Slot(id=45 type=FIXED_UDA_INTERMEDIATE(16) col_path=[] offset=13424 null=(offset=13644 mask=80) slot_idx=20 field_idx=-1), Slot(id=46 type=BIGINT col_path=[] offset=13568 null=(offset=0 mask=0) slot_idx=33 field_idx=-1), Slot(id=47 type=FIXED_UDA_INTERMEDIATE(1024) col_path=[] offset=9216 null=(offset=0 mask=0) slot_idx=9 field_idx=-1), Slot(id=48 type=INT col_path=[] offset=13628 null=(offset=13646 mask=2) slot_idx=43 field_idx=-1), Slot(id=49 type=FIXED_UDA_INTERMEDIATE(16) col_path=[] offset=13440 null=(offset=13645 mask=1) slot_idx=21 field_idx=-1), Slot(id=50 type=BIGINT col_path=[] offset=13576 null=(offset=0 mask=0) slot_idx=34 field_idx=-1), Slot(id=51 type=FIXED_UDA_INTERMEDIATE(1024) col_path=[] offset=10240 null=(offset=0 mask=0) slot_idx=10 field_idx=-1), Slot(id=52 type=INT col_path=[] offset=13632 null=(offset=13646 mask=4) slot_idx=44 field_idx=-1), Slot(id=53 type=FIXED_UDA_INTERMEDIATE(16) col_path=[] offset=13456 null=(offset=13645 mask=2) slot_idx=22 field_idx=-1), Slot(id=54 type=BIGINT col_path=[] offset=13584 null=(offset=0 mask=0) slot_idx=35 field_idx=-1), Slot(id=55 type=FIXED_UDA_INTERMEDIATE(1024) col_path=[] offset=11264 null=(offset=0 mask=0) slot_idx=11 field_idx=-1), Slot(id=56 type=INT col_path=[] offset=13636 null=(offset=13646 mask=8) slot_idx=45 field_idx=-1), Slot(id=57 type=FIXED_UDA_INTERMEDIATE(16) col_path=[] offset=13472 null=(offset=13645 mask=4) slot_idx=23 field_idx=-1), Slot(id=58 type=BIGINT col_path=[] offset=13592 null=(offset=0 mask=0) slot_idx=36 field_idx=-1), Slot(id=59 type=FIXED_UDA_INTERMEDIATE(1024) col_path=[] offset=12288 null=(offset=0 mask=0) slot_idx=12 field_idx=-1), Slot(id=60 type=INT col_path=[] offset=13640 null=(offset=13646 mask=10) slot_idx=46 field_idx=-1), Slot(id=61 type=FIXED_UDA_INTERMEDIATE(16) col_path=[] offset=13488 null=(offset=13645 mask=8) slot_idx=24 field_idx=-1), Slot(id=62 type=BIGINT col_path=[] offset=13600 null=(offset=0 mask=0) slot_idx=37 field_idx=-1)] tuple_path=[]) Tuple(id=0 size=226 slots=[Slot(id=0 type=STRING col_path=[3] offset=0 null=(offset=224 mask=1) slot_idx=0 field_idx=-1), Slot(id=1 type=DOUBLE col_path=[4] offset=192 null=(offset=225 mask=10) slot_idx=12 field_idx=-1), Slot(id=2 type=DOUBLE col_path=[5] offset=200 null=(offset=225 mask=20) slot_idx=13 field_idx=-1), Slot(id=3 type=STRING col_path=[6] offset=16 null=(offset=224 mask=2) slot_idx=1 field_idx=-1), Slot(id=4 type=STRING col_path=[8] offset=32 null=(offset=224 mask=4) slot_idx=2 field_idx=-1), Slot(id=5 type=DOUBLE col_path=[9] offset=208 null=(offset=225 mask=40) slot_idx=14 field_idx=-1), Slot(id=6 type=DOUBLE col_path=[10] offset=216 null=(offset=225 mask=80) slot_idx=15 field_idx=-1), Slot(id=7 type=STRING col_path=[11] offset=48 null=(offset=224 mask=8) slot_idx=3 field_idx=-1), Slot(id=8 type=STRING col_path=[13] offset=64 null=(offset=224 mask=10) slot_idx=4 field_idx=-1), Slot(id=9 type=STRING col_path=[15] offset=80 null=(offset=224 mask=20) slot_idx=5 field_idx=-1), Slot(id=10 type=STRING col_path=[16] offset=96 null=(offset=224 mask=40) slot_idx=6 field_idx=-1), Slot(id=11 type=STRING col_path=[17] offset=112 null=(offset=224 mask=80) slot_idx=7 field_idx=-1), Slot(id=12 type=STRING col_path=[18] offset=128 null=(offset=225 mask=1) slot_idx=8 field_idx=-1), Slot(id=13 type=STRING col_path=[0] offset=144 null=(offset=225 mask=2) slot_idx=9 field_idx=-1), Slot(id=14 type=STRING col_path=[1] offset=160 null=(offset=225 mask=4) slot_idx=10 field_idx=-1), Slot(id=15 type=STRING col_path=[2] offset=176 null=(offset=225 mask=8) slot_idx=11 field_idx=-1)] tuple_path=[]) I0426 21:55:59.224853 11839 query-state.cc:377] Executing instance. instance_id=7747869aa15df350:3eb767890000004d fragment_idx=1 per_fragment_instance_idx=36 coord_state_idx=0 #in-flight=2 I0426 21:55:59.224995 11840 query-state.cc:377] Executing instance. instance_id=7747869aa15df350:3eb767890000004e fragment_idx=1 per_fragment_instance_idx=37 coord_state_idx=0 #in-flight=3 I0426 21:55:59.225100 11841 query-state.cc:377] Executing instance. instance_id=7747869aa15df350:3eb767890000004f fragment_idx=1 per_fragment_instance_idx=38 coord_state_idx=0 #in-flight=4 I0426 21:55:59.225247 11842 query-state.cc:377] Executing instance. instance_id=7747869aa15df350:3eb7678900000050 fragment_idx=1 per_fragment_instance_idx=39 coord_state_idx=0 #in-flight=5 I0426 21:55:59.225323 11843 query-state.cc:377] Executing instance. instance_id=7747869aa15df350:3eb7678900000025 fragment_idx=2 per_fragment_instance_idx=36 coord_state_idx=0 #in-fligWrote minidump to *** # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f4d9f1c595b, pid=60280, tid=0x00007f49bca98700 # # JRE version: Java(TM) SE Runtime Environment (8.0_121-b13) (build 1.8.0_121-b13) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.121-b13 mixed mode linux-amd64 compressed oops) # Problematic frame: # C [libc.so.6+0x8995b] memcpy+0x15b # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # *** # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # ht=6 I0426 21:55:59.225462 11844 query-state.cc:377] Executing instance. instance_id=7747869aa15df350:3eb7678900000026 fragment_idx=2 per_fragment_instance_idx=37 coord_state_idx=0 #in-flight=7 I0426 21:55:59.226022 11845 query-state.cc:377] Executing instance. instance_id=7747869aa15df350:3eb7678900000027 fragment_idx=2 per_fragment_instance_idx=38 coord_state_idx=0 #in-flight=8 I0426 21:55:59.226142 11846 query-state.cc:377] Executing instance. instance_id=7747869aa15df350:3eb7678900000028 fragment_idx=2 per_fragment_instance_idx=39 coord_state_idx=0 #in-flight=9 I0426 21:55:59.276877 11838 query-exec-mgr.cc:149] ReleaseQueryState(): query_id=7747869aa15df350:3eb7678900000000 refcnt=9 I0426 21:56:01.554430 11801 Frontend.java:948] Compiled query. I0426 21:56:01.555058 11445 Frontend.java:948] Compiled query. I0426 21:56:01.556988 11801 impala-server.cc:1506] Waiting for catalog version: 0 current version: 340853 I0426 21:56:01.561921 11801 impala-server.cc:981] Query f84b8846a78d9172:eb1a212100000000 has timeout of 30m I0426 21:56:01.562665 11801 impala-beeswax-server.cc:193] get_results_metadata(): query_id=f84b8846a78d9172:eb1a212100000000 I0426 21:56:01.563427 11801 impala-beeswax-server.cc:235] close(): query_id=f84b8846a78d9172:eb1a212100000000 I0426 21:56:01.563433 11801 impala-server.cc:992] UnregisterQuery(): query_id=f84b8846a78d9172:eb1a212100000000 I0426 21:56:01.563438 11801 impala-server.cc:1075] Cancel(): query_id=f84b8846a78d9172:eb1a212100000000 I0426 21:56:01.572669 11801 impala-server.cc:1796] Connection from client **.**.**.**:47836 closed, closing 1 associated session(s) I0426 21:56:01.604763 11445 impala-server.cc:1506] Waiting for catalog version: 0 current version: 340853 I0426 21:56:01.608330 11445 impala-server.cc:981] Query bc495dd9bfbdf2bb:9701f21000000000 has timeout of 30m I0426 21:56:01.609072 11445 impala-beeswax-server.cc:193] get_results_metadata(): query_id=bc495dd9bfbdf2bb:9701f21000000000 I0426 21:56:01.609866 11445 impala-beeswax-server.cc:235] close(): query_id=bc495dd9bfbdf2bb:9701f21000000000 I0426 21:56:01.609872 11445 impala-server.cc:992] UnregisterQuery(): query_id=bc495dd9bfbdf2bb:9701f21000000000 I0426 21:56:01.609877 11445 impala-server.cc:1075] Cancel(): query_id=bc495dd9bfbdf2bb:9701f21000000000 I0426 21:56:01.620052 11445 impala-server.cc:1796] Connection from client **.**.**.**:41424 closed, closing 1 associated session(s) I0426 21:56:01.741907 61515 data-stream-mgr.cc:228] Reduced stream ID cache from 100 items, to 96, eviction took: 0 I0426 21:56:01.746819 668 impala-server.cc:1171] ReportExecStatus(): Received report for unknown query ID (probably closed or cancelled): 5f4e007f54480601:93bc8a1900000000 I0426 21:56:01.747061 61124 impala-server.cc:1171] ReportExecStatus(): Received report for unknown query ID (probably closed or cancelled): ae479e297463e5ed:2babda700000000 I0426 21:56:02.088552 11846 query-state.cc:384] Instance completed. instance_id=7747869aa15df350:3eb7678900000028 #in-flight=8 status=OK I0426 21:56:02.088580 11846 query-exec-mgr.cc:149] ReleaseQueryState(): query_id=7747869aa15df350:3eb7678900000000 refcnt=8 I0426 21:56:02.171540 11843 query-state.cc:384] Instance completed. instance_id=7747869aa15df350:3eb7678900000025 #in-flight=7 status=OK I0426 21:56:02.171561 11843 query-exec-mgr.cc:149] ReleaseQueryState(): query_id=7747869aa15df350:3eb7678900000000 refcnt=7 I0426 21:56:02.190233 11844 query-state.cc:384] Instance completed. instance_id=7747869aa15df350:3eb7678900000026 #in-flight=6 status=OK I0426 21:56:02.190253 11844 query-exec-mgr.cc:149] ReleaseQueryState(): query_id=7747869aa15df350:3eb7678900000000 refcnt=6 I0426 21:56:02.211194 11839 data-stream-mgr.cc:238] DeregisterRecvr(): fragment_instance_id=7747869aa15df350:3eb767890000004d, node=2 I0426 21:56:02.211228 11839 data-stream-recvr.cc:235] cancelled stream: fragment_instance_id_=7747869aa15df350:3eb767890000004d node_id=2 I0426 21:56:02.211513 11841 data-stream-mgr.cc:238] DeregisterRecvr(): fragment_instance_id=7747869aa15df350:3eb767890000004f, node=2 I0426 21:56:02.211539 11841 data-stream-recvr.cc:235] cancelled stream: fragment_instance_id_=7747869aa15df350:3eb767890000004f node_id=2 I0426 21:56:02.212172 11840 data-stream-mgr.cc:238] DeregisterRecvr(): fragment_instance_id=7747869aa15df350:3eb767890000004e, node=2 I0426 21:56:02.212190 11840 data-stream-recvr.cc:235] cancelled stream: fragment_instance_id_=7747869aa15df350:3eb767890000004e node_id=2 I0426 21:56:02.212222 11842 data-stream-mgr.cc:238] DeregisterRecvr(): fragment_instance_id=7747869aa15df350:3eb7678900000050, node=2 I0426 21:56:02.212245 11842 data-stream-recvr.cc:235] cancelled stream: fragment_instance_id_=7747869aa15df350:3eb7678900000050 node_id=2 I0426 21:56:02.214601 11845 query-st That isn't a cut/paste error, the INFO file literally stops half way through the query-st string output. I've had a look at the JIRA listed and while related the error message is very different. The queries listed in that clip isn't the query that caused the crash. The only lines in INFO relating to the query that actually caused the crash are: I0426 21:55:55.247063 10918 admission-controller.cc:510] Schedule for id=674ebfd8523e4b20:1918551d00000000 in pool_name=root.** cluster_mem_needed=24.00 MB PoolConfig: max_requests=50 max_queued=200 max_mem=-1.00 B I0426 21:55:55.247102 10918 admission-controller.cc:531] Admitted query id=674ebfd8523e4b20:1918551d00000000 I0426 21:55:55.247117 10918 coordinator.cc:99] Exec() query_id=674ebfd8523e4b20:1918551d00000000 stmt=select c.* from db.table a LEFT JOIN a.nested c where a.partitionField = '***' limit 1 I0426 21:55:55.247274 10918 coordinator.cc:357] starting execution on 1 backends for query 674ebfd8523e4b20:1918551d00000000 I0426 21:55:55.248740 61547 impala-internal-service.cc:44] ExecQueryFInstances(): query_id=674ebfd8523e4b20:1918551d00000000 I0426 21:55:55.248770 61547 query-exec-mgr.cc:46] StartQueryFInstances() query_id=674ebfd8523e4b20:1918551d00000000 coord=dn.**.**.com:22000 I0426 21:55:55.248777 61547 query-state.cc:173] Buffer pool limit for 674ebfd8523e4b20:1918551d00000000: 109951162777 I0426 21:55:55.248823 61547 initial-reservations.cc:60] Successfully claimed initial reservations (0) for query 674ebfd8523e4b20:1918551d00000000 I0426 21:55:55.248982 11682 query-state.cc:286] StartFInstances(): query_id=674ebfd8523e4b20:1918551d00000000 #instances=1 I0426 21:55:55.249147 10918 coordinator.cc:370] started execution on 1 backends for query 674ebfd8523e4b20:1918551d00000000 I0426 21:55:55.249209 11682 query-state.cc:299] descriptor table for query=674ebfd8523e4b20:1918551d00000000 I0426 21:55:55.249514 11683 query-state.cc:377] Executing instance. instance_id=674ebfd8523e4b20:1918551d00000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=9 I0426 21:55:55.249629 11683 hdfs-scan-node.cc:160] Max row batch queue size for scan node '0' in fragment instance '674ebfd8523e4b20:1918551d00000000': 110 I0426 21:55:55.270134 11682 query-exec-mgr.cc:149] ReleaseQueryState(): query_id=674ebfd8523e4b20:1918551d00000000 refcnt=3 I0426 21:55:55.270484 10918 impala-server.cc:981] Query 674ebfd8523e4b20:1918551d00000000 has timeout of 10m I can get the DBA to supply the other files tomorrow. Thanks
... View more
04-26-2018
07:37 AM
Since upgrading from 5.13 to 5.14 we have found that certain types of queries will consistently crash all Impala daemons in the cluster (at least all those which are running fragments of the queries in question).
These queries ran for 6 months+ as scheduled jobs on versions prior to 5.14, so it is not an issue with the query or table.
The queries that trigger the crash are accessing nested fields, of the format:
SELECT a.topLevelField, b.priceFromNestedField FROM db.table a LEFT JOIN a.nestedField b
The underlying tables are all parquet, written by Spark. It happens on multiple tables, and multiple partitions, so is not a corrupt parquet file. It doesn't affect nested fields in most other tables.
When the query fails, the logs show the following error:
tcmalloc: large alloc 1073741824 bytes == 0x7f62666e8000 @ 0x21c6f07 0x20483f2 Wrote minidump to /var/log/impala-minidumps/impalad/9fcf5df1-13c0-47bd-35ceaa92-72438843.dmp
Does anyone else have the same issue?
Thanks
... View more
Labels:
- Labels:
-
Apache Impala
-
Apache Spark