About athtsang

athtsang · ‎05-30-2016

It is definitely not present. Actually, I forgot to say, the spark streaming job killed itself after the FileNotFoundException. Where is the job's temp directory? Or, where was it configured?

athtsang · ‎05-29-2016

CDH 5.5.1 installed with parcels, CentOS 6.7 I have a spark streaming job which used Phoenix (jar phoenix-1.2.0-client.jar). After the job ran for a few days, it tried to reload the jar and got a FileNotFoundException. Command used to start job nohup spark-submit --master yarn --deploy-mode client --class com.myCompany.MyStreamProc --driver-class-path /opt/mycompany/my-spark.jar:/opt/cloudera/parcels/CLABS_PHOENIX/lib/phoenix/phoenix-1.2.0-client.jar:... --jars /opt/mycompany/my-spark.jar,/opt/cloudera/parcels/CLABS_PHOENIX/lib/phoenix/phoenix-1.2.0-client.jar,... my-spark.jar Log entry around FileNotFoundException in Driver Log [INFO] 2016-05-28 15:28:00,052 org.apache.spark.scheduler.TaskSetManager logInfo - Starting task 69.0 in stage 27723.0 (TID 1692793, node3.mycompany.com, partition 69,NODE_LOCAL, 2231 bytes) [INFO] 2016-05-28 15:28:00,205 org.apache.spark.storage.BlockManagerInfo logInfo - Added input-0-1464420480000 in memory on node1.mycompany.com:47601 (size: 15.0 KB, free: 302.0 MB) [INFO] 2016-05-28 15:28:00,213 org.apache.spark.storage.BlockManagerInfo logInfo - Added input-0-1464420480000 in memory on node2.mycompany.com:42510 (size: 15.0 KB, free: 308.7 MB) [INFO] 2016-05-28 15:28:00,351 org.apache.spark.scheduler.TaskSetManager logInfo - Starting task 70.0 in stage 27723.0 (TID 1692794, node2.mycompany.com, partition 70,NODE_LOCAL, 2231 bytes) [WARN] 2016-05-28 15:28:00,391 org.apache.spark.scheduler.TaskSetManager logWarning - Lost task 69.0 in stage 27723.0 (TID 1692793, node2.mycompany.com): java.io.FileNotFoundException: http://192.168.88.28:55310/jars/phoenix-1.2.0-client.jar at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:556) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:356) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:405) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:397) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:397) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) (Note: node3.mycompany.com = 192.168.88.28) According to Executor logs, when job started (2016-05-09) they downloaded http://192.168.88.28:55310/jars/phoenix-1.2.0-client.jar successfully. Seems to me somehow Spark wants to reload the jar, but it was missing. Any suggestion? Is the job running too long (nearly 20 days already)?

athtsang · ‎04-27-2016

That's it. Thanks.

athtsang · ‎04-27-2016

CDH 5.2.0, Centos 6.4 The skeleton of decision_tree.scala is like ... val raw_data = sqlContext.parquetFile("/path/to/raw/data/") raw_data.registerTempTable("raw_data") val raw_rdd = sqlContext.sql("select ... from raw_data where rec_type=3") val filtered_rdd = raw_rdd.map{case Row(label: Integer, ...) => LabeledPoint(label, Vector.dense(...)) } val splits = filtered_rdd.randomSplit(Array(0.7, 0.3)) val (trainingData, testData) = (splits(0), splits(1)) val numClasses = 2 val categoricalFeaturesInfo = Map[Int, Int](0 -> 20, 1 -> 30) val impurity = "gini" val maxDepth = 12 val maxBins = 32 val model = DecisionTree.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo, impurity, maxDepth, maxBins) ... When I invoke spark-shell with command $ spark-shell --executor-memory 2g --driver-memory 2g -deprecation -i decision_tree.scala The job fails with following error, even maxBins was set to 32 java.lang.IllegalArgumentException: requirement failed: maxBins (= 4) should be greater than max categories in categorical features (>= 20) at scala.Predef$.require(Predef.scala:233) at org.apache.spark.mllib.tree.impl.DecisionTreeMetadata$$anonfun$buildMetadata$2.apply(DecisionTreeMetadata.scala:91) at org.apache.spark.mllib.tree.impl.DecisionTreeMetadata$$anonfun$buildMetadata$2.apply(DecisionTreeMetadata.scala:90) at scala.collection.immutable.Map$Map4.foreach(Map.scala:181) at org.apache.spark.mllib.tree.impl.DecisionTreeMetadata$.buildMetadata(DecisionTreeMetadata.scala:90) at org.apache.spark.mllib.tree.DecisionTree.train(DecisionTree.scala:66) at org.apache.spark.mllib.tree.DecisionTree$.train(DecisionTree.scala:339) at org.apache.spark.mllib.tree.DecisionTree$.trainClassifier(DecisionTree.scala:368) at $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply$mcVI$sp(<console>:124) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:22) at $iwC$$iwC$$iwC.<init>(<console>:160) at $iwC$$iwC.<init>(<console>:162) at $iwC.<init>(<console>:164) at <init>(<console>:166) at .<init>(<console>:170) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:846) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1119) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:672) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:703) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:667) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:819) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:864) ... (long chain of reallyInterpret$1 and interpretStartingWith) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:776) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:619) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:627) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:632) at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(SparkILoop.scala:642) at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(SparkILoop.scala:639) at scala.reflect.io.Streamable$Chars$class.applyReader(Streamable.scala:104) at scala.reflect.io.File.applyReader(File.scala:82) at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SparkILoop.scala:639) at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(SparkILoop.scala:639) at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(SparkILoop.scala:639) at org.apache.spark.repl.SparkILoop.savingReplayStack(SparkILoop.scala:153) at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1.apply$mcV$sp(SparkILoop.scala:638) at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1.apply(SparkILoop.scala:638) at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1.apply(SparkILoop.scala:638) at org.apache.spark.repl.SparkILoop.savingReader(SparkILoop.scala:158) at org.apache.spark.repl.SparkILoop.interpretAllFrom(SparkILoop.scala:637) at org.apache.spark.repl.SparkILoop$$anonfun$loadCommand$1.apply(SparkILoop.scala:702) at org.apache.spark.repl.SparkILoop$$anonfun$loadCommand$1.apply(SparkILoop.scala:701) at org.apache.spark.repl.SparkILoop.withFile(SparkILoop.scala:695) at org.apache.spark.repl.SparkILoop.loadCommand(SparkILoop.scala:701) at org.apache.spark.repl.SparkILoop$$anonfun$standardCommands$7.apply(SparkILoop.scala:311) at org.apache.spark.repl.SparkILoop$$anonfun$standardCommands$7.apply(SparkILoop.scala:311) at scala.tools.nsc.interpreter.LoopCommands$LineCmd.apply(LoopCommands.scala:81) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:771) at org.apache.spark.repl.SparkILoop$$anonfun$loadFiles$1.apply(SparkILoop.scala:872) at org.apache.spark.repl.SparkILoop$$anonfun$loadFiles$1.apply(SparkILoop.scala:870) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:870) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:957) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:907) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:907) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:907) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1002) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:331) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) If the criteria (rec_type=3) was removed from raw_rdd, the job runs to completion. Any idea?

athtsang · ‎01-05-2016

Replying myself. I worked around this with Sink Groups and a Null Sink. Relevant settings in flume.conf a1.sinks = hdfssink avrosink nullsink a1.sinkgroups = avrosinkgroup a1.sinkgroups.avrosinkgroup.sinks = avrosink nullsink a1.sinkgroups.avrosinkgroup.processor.type = failover a1.sinkgroups.avrosinkgroup.processor.priority.avrosink = 100 a1.sinkgroups.avrosinkgroup.processor.priority.nullsink = 10 a1.sinks.nullsink.type = null a1.sinks.nullsink.channel = avrochannel a1.sinks.nullsink.batchsize = 10000 The end result is that avrochannel use the high priority avrosink (priority=100) normally. If this sink fails, it failover to the low prioirty nullsink, which simply discard the events. PS: Upgraded to CDH5.5.1, which bundles Flume 1.6 This works with Spark Streaming "Flume-style Push-based Approach" (sink type=avro), but not "Pull-based Approach using a Custom Sink" (sink type=org.apache.spark.streaming.flume.sink.SparkSink). Guess the custom sink refuse to admit fail because of fault-tolerance guarantees. Reference: http://spark.apache.org/docs/latest/streaming-flume-integration.html

athtsang · ‎10-04-2015

CDH5.2 installed with Cloudera Manager and Parcels Are Flume Channels isolated with each other? It seems when I have problem with a channel, other channel is affected. I want to record and process Syslog data with Flume using 2 Channel+Sink (channels are replicating) as follows: Memory Channel + HDFSSink (hdfschannel+hdfssink) to write raw Syslog records to HDFS Optional File Channel + Avro Sink (avrochannel+avrosink) to send the Syslog records to Spark Streaming to further process. Since the processing can be reproduced using raw data, the Avro channel is optional. When Spark Streaming is running, the above works well. Data were handled by both sink correctly. However, when the Spark Streaming job hanged / stopped, the avrochannnel had network related exceptions and ChannelFullException. This is understandable because the events could not be sent. The problem was that the amount of raw data logged by hdfschannel+hdfssink became around 1-2% of normal condition. Is this expected? I don't understand why error with an optional channel affect others. (Note: the use of File Channel was historical. But this seems not the cause of the behaviour anyway?)

athtsang · ‎06-25-2015

It makes sense and provides much insight on Impala operation. Actually, I always thought Impala would materialize inline views / CTE before moving to "next step". Thanks for clarification! I suggest Cloudera to consider discussing this in performance guide (if not yet), or optimize the materialization of intermediate data.

athtsang · ‎06-23-2015

3 node cluster, CDH 5.2.0, 5.4.2 I have an interesting finding that a query with an ORDER BY clause runs much faster than without ORDER BY clause (4m17s vs 37m4s). The query has no selection criteria, but used a complicated SQL with concat(), many regexp and extract() to format a timestamp to string of specific format, sort of testing of an import job by rebuilding the original data. Create table statement: create table impala_timestp_txt ( batch_id string, setup_time_str string, setup_time_ts timestamp) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' stored as textfile; Some sample data&colon; NCWEUI320150301.05;00:06:57.490 HKT Sun May 11 2008;2008-05-11 00:06:57.490000000 NCWEUI320150301.05;00:34:15.141 HKT Sun May 11 2008;2008-05-11 00:34:15.141000000 NCWEUI320150301.05;01:05:46.346 HKT Sun May 11 2008;2008-05-11 01:05:46.346000000 NCWEUI320150301.05;01:06:49.306 HKT Sun May 11 2008;2008-05-11 01:06:49.306000000 NCWEUI320150301.05;01:07:50.340 HKT Sun May 11 2008;2008-05-11 01:07:50.340000000 NCWEUI320150301.05;01:08:46.359 HKT Sun May 11 2008;2008-05-11 01:08:46.359000000 NCWEUI320150301.05;01:09:50.341 HKT Sun May 11 2008;2008-05-11 01:09:50.341000000 NCWEUI320150301.05;01:25:45.978 HKT Sun May 11 2008;2008-05-11 01:25:45.978000000 NCWEUI320150301.05;01:26:49.958 HKT Sun May 11 2008;2008-05-11 01:26:49.958000000 NCWEUI320150301.05;01:27:47.975 HKT Sun May 11 2008;2008-05-11 01:27:47.975000000 Number of Rows: 3.67M Partitioned: No (similar result if partitioned) Number of rows per BATCH_ID fluctuates between 100 to 100000 rows. Mostly around 5000. The following SQL was run: select batch_id, concat(setup_time_prefix, lpad(cast(hour (setup_time) as string), 2, '0'), ':', lpad(cast(minute (setup_time) as string), 2, '0'), ':', lpad(cast(second (setup_time) as string), 2, '0'), '.', lpad(cast(extract(millisecond from setup_time) as string), 3, '0'), ' ', setup_time_tz, ' ', case dayofweek(v.setup_time) when 1 then 'Sun ' when 2 then 'Mon ' when 3 then 'Tue ' when 4 then 'Wed ' when 5 then 'Thu ' when 6 then 'Fri ' when 7 then 'Sat ' end, case month(v.setup_time) when 1 then 'Jan ' when 2 then 'Feb ' when 3 then 'Mar ' when 4 then 'Apr ' when 5 then 'May ' when 6 then 'Jun ' when 7 then 'Jul ' when 8 then 'Aug ' when 9 then 'Sep ' when 10 then 'Oct ' when 11 then 'Nov ' when 12 then 'Dec ' end, cast(day(v.setup_time) as string), ' ', cast(year(v.setup_time) as string) ) from ( select batch_id, case when regexp_extract(setup_time_str, '(.*) (.*) (.* .* .* .*)', 2)='UTC' then hours_add(setup_time_ts, -8) else setup_time_ts end setup_time, regexp_extract(v.setup_time_str, '^([\*\.])', 1) setup_time_prefix, regexp_extract(v.setup_time_str, '(.*) (.*) (.* .* .* .*)', 2) setup_time_tz from impala_timestp_txt v ) v order by batch_id /* Comment out for not sorted version */; (Please ignore problem with data structure or time zone usage. However, suggestion on improving the formatting is appreciated) I generated the execution summary and profile. It seems without the ORDER BY clause, all Impala Daemon sends raw data to the coordinator, so that it needs to do all the formatting. Howver, with ORDER BY clause, besides sorting the data, the daemons also do the formatting, so that the complicated work was distributed among all daemons. (But this cannot explain the over 9x difference in performance on a 3 node cluster) I can provide the complete or more sample data for reproducing the case, but I think it is easy to generate similar data and reproduce the case. The result was basically the same with data files in parquet. Sorted Query Execution Plan and Execution Summary (note that according to summary, the SQL took a few seconds only): Sorted Query Execution Plan and Execution Summary (note that according to summary, the SQL took a few seconds only): F01:PLAN FRAGMENT [UNPARTITIONED] 02:MERGING-EXCHANGE [UNPARTITIONED] order by: batch_id ASC hosts=3 per-host-mem=unavailable tuple-ids=2 row-size=63B cardinality=unavailable F00:PLAN FRAGMENT [RANDOM] DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=02, UNPARTITIONED] 01:SORT | order by: batch_id ASC | hosts=3 per-host-mem=0B | tuple-ids=2 row-size=63B cardinality=unavailable | 00:SCAN HDFS [anthony.impala_timestp_txt v, RANDOM] partitions=1/1 files=1 size=282.02MB table stats: unavailable column stats: unavailable hosts=3 per-host-mem=176.00MB tuple-ids=0 row-size=46B cardinality=unavailable +---------------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------------+ | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +---------------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------------+ | 02:MERGING-EXCHANGE | 1 | 753.03ms | 753.03ms | 3.67M | -1 | 0 B | -1 B | UNPARTITIONED | | 01:SORT | 3 | 88.50s | 147.82s | 3.67M | -1 | 176.32 MB | 0 B | | | 00:SCAN HDFS | 3 | 1.18s | 1.85s | 3.67M | -1 | 34.74 MB | 176.00 MB | impala_timestp_txt v | +---------------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------------+ Unsorted Query Execution Plan and Execution Summary: F01:PLAN FRAGMENT [UNPARTITIONED] 01:EXCHANGE [UNPARTITIONED] hosts=3 per-host-mem=unavailable tuple-ids=0 row-size=46B cardinality=unavailable F00:PLAN FRAGMENT [RANDOM] DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, UNPARTITIONED] 00:SCAN HDFS [anthony.impala_timestp_txt v, RANDOM] partitions=1/1 files=1 size=282.02MB table stats: unavailable column stats: unavailable hosts=3 per-host-mem=176.00MB tuple-ids=0 row-size=46B cardinality=unavailable +--------------+--------+----------+----------+-------+------------+----------+---------------+------------------------------+ | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +--------------+--------+----------+----------+-------+------------+----------+---------------+------------------------------+ | 01:EXCHANGE | 1 | 7.41s | 7.41s | 3.67M | -1 | 0 B | -1 B | UNPARTITIONED | | 00:SCAN HDFS | 3 | 566.29ms | 1.12s | 3.67M | -1 | 34.68 MB | 176.00 MB | impala_timestp_txt v | +--------------+--------+----------+----------+-------+------------+----------+---------------+------------------------------+ Profile values with significant differences: Sorted query: Query Runtime Profile: Query (id=4446304fc3a13547:10fea8c6a4752287): Summary: Session ID: 9d4999d13980722c:981b5c7d788c3db2 Session Type: BEESWAX Start Time: 2015-06-15 10:14:13.038578000 End Time: 2015-06-15 10:18:30.939273000 ... Query Timeline: 4m17s - Start execution: 52.752us (52.752us) - Planning finished: 200.34ms (199.981ms) - Ready to start remote fragments: 201.467ms (1.432ms) - Remote fragments started: 648.550ms (447.83ms) - Rows available: 2m31s (2m30s) - First row fetched: 2m31s (478.400ms) - Unregister query: 4m17s (1m45s) ImpalaServer: - ClientFetchWaitTimer: 1m9s - RowMaterializationTimer: 36s073ms Execution Profile 4446304fc3a13547:10fea8c6a4752287:(Total: 2m32s, non-child: 0ns, % non-child: 0.00%) ... Coordinator Fragment F01:(Total: 2m31s, non-child: 10.567ms, % non-child: 0.01%) ... - TotalCpuTime: 4m17s - TotalNetworkReceiveTime: 0ns ... Averaged Fragment F00:(Total: 3m52s, non-child: 0ns, % non-child: 0.00%) ... - TotalCpuTime: 2m55s - TotalNetworkReceiveTime: 0ns - TotalNetworkSendTime: 2m22s Unsorted query: Query Runtime Profile: Query (id=794de9ed5ccc55ac:ab1391c885dbf2b9): Summary: Session ID: e14d21c04cf7dd00:bd027a1ab2b142b9 Session Type: BEESWAX Start Time: 2015-06-15 11:21:09.472389000 End Time: 2015-06-15 11:58:13.872735000 ... Query Timeline: 37m4s - Start execution: 81.543us (81.543us) - Planning finished: 332.875ms (332.793ms) - Ready to start remote fragments: 339.182ms (6.307ms) - Remote fragments started: 1s386ms (1s047ms) - Rows available: 1s388ms (2.486ms) - First row fetched: 1s489ms (100.985ms) - Unregister query: 37m4s (37m2s) ImpalaServer: - ClientFetchWaitTimer: 1m26s - RowMaterializationTimer: 35m25s Execution Profile 794de9ed5ccc55ac:ab1391c885dbf2b9:(Total: 8s577ms, non-child: 0ns, % non-child: 0.00%) ... Coordinator Fragment F01:(Total: 7s439ms, non-child: 25.752ms, % non-child: 0.35%) ... - TotalCpuTime: 36m56s - TotalNetworkReceiveTime: 7s277ms ... Averaged Fragment F00:(Total: 22m46s, non-child: 0ns, % non-child: 0.00%) ... - TotalCpuTime: 21m59s - TotalNetworkReceiveTime: 0ns - TotalNetworkSendTime: 22m45s

athtsang · ‎05-27-2015

Resolved by upgrade coreutils to 8.4-37.el6 (CentOS 6.5 original version: 8.4-31.el6)

athtsang · ‎05-27-2015

Thanks for the reply. It seems this is a general Linux issue when killing a "su" process (where agent is), not caused by CM. Will update here when I get meaningful finding.

Online	Offline
Last Visited	‎07-03-2019 10:01 PM

Member Since	‎11-03-2014 11:01 PM
Last Visited	‎07-03-2019 10:01 PM
Posts	46
Kudos received	8

Cloudera Community

Re: Fixing Over-replicated Blocks

Re: Isolation between Flume Channels?

Re: Problem Controlling CM Agent Start / Stop in a...

Re: Repairing a corrupt Cloudera Manager Installat...

Re: After server crash, HA Standby NameNode "Prema...

Re: Spark Streaming: FileNotFoundException on file...

Spark Streaming: FileNotFoundException on files in...

Re: IllegalArgumentException: requirement failed: ...

IllegalArgumentException: requirement failed: maxB...

Re: Isolation between Flume Channels?

Isolation between Flume Channels?

Re: Performance Reduced after Removing ORDER BY cl...

Performance Reduced after Removing ORDER BY clause

Re: Problem Controlling CM Agent Start / Stop in a...

Re: Problem Controlling CM Agent Start / Stop in a...