Member since
06-09-2020
5
Posts
0
Kudos Received
0
Solutions
06-10-2020
06:29 AM
@stevenmatison Thanks for prompt reply. I'm using HDP-3.0.1.0 with Ambari. Here's my current Hive config: Tez Container Size: 3072 MB HiveServer2 Heap Size: 4096 MB Memory: 819.2 MB Data per Reducer: 2042.9 MB They are mostly the default values. Do they make sense? Any suggestion on which to increase / decrease for optimum performance?
... View more
06-10-2020
03:30 AM
FYI my table is partition by Year + Month + Day. Total file size in HDFS is 10TB. Total records is 21 Billion records. We have 8 data nodes in HDP. I'm using HDP-3.0.1.0 with Ambari. Here's my current Hive config: Tez Container Size: 3072 MB HiveServer2 Heap Size: 4096 MB Memory: 819.2 MB Data per Reducer: 2042.9 MB They are mostly the default values. Do they make sense? Any suggestion on which to increase / decrease for optimum performance?
... View more
06-10-2020
03:30 AM
FYI my table is partition by Year + Month + Day. Total file size in HDFS is 10TB. Total records is 21 Billion records. We have 8 data nodes in HDP.
... View more
06-09-2020
08:53 AM
I have a Hive Table in ORC format. I used Zeppelin to query the table with jdbc(hive) SELECT `timestamp`, url FROM events where id='f9e43fc7b' ORDER BY `timestamp` DESC However, the query runs into below error: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1591402457216_0009_2_01, diagnostics=[Vertex vertex_1591402457216_0009_2_01 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: evt initializer failed, vertex=vertex_1591402457216_0009_2_01 [Map 1], java.lang.OutOfMemoryError: Java heap space
at java.util.regex.Matcher.<init>(Matcher.java:225)
at java.util.regex.Pattern.matcher(Pattern.java:1093)
at org.apache.hadoop.hive.ql.io.AcidUtils$BucketMetaData.parse(AcidUtils.java:318)
at org.apache.hadoop.hive.ql.io.AcidUtils$BucketMetaData.parse(AcidUtils.java:332)
at org.apache.hadoop.hive.ql.io.AcidUtils.parseBucketId(AcidUtils.java:367)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.determineSplitStrategy(OrcInputFormat.java:2331)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.determineSplitStrategies(OrcInputFormat.java:2306)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1811)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1939)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:522)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:777)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1591402457216_0009_2_02, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1591402457216_0009_2_02 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:401)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:266)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:718)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:801)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:633)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) Any idea regarding this error? Any resource configuration I should look into?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hive
-
Apache Tez
06-09-2020
08:42 AM
I have a Hive Table in ORC format, which I tried to query via Beeline: SELECT `timestamp`, url FROM events WHERE id='0ef3c9ba6cb5' ORDER BY `timestamp` DESC; However, this simple query failed with: INFO : Compiling command(queryId=hive_20200605073915_22eb45aa-25f6-419a-9b55-57a0d98e3dac): select `timestamp`, url from events where partyid='0:3pu60uagp0:db698229-272e-4a1c-a18a-0ef3c9ba6cb5' order by `timestamp` desc
INFO : Warning: Map Join MAPJOIN[16][bigTable=?] in task 'Map 1' is a cross product
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:timestamp, type:bigint, comment:null), FieldSchema(name:url, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20200605073915_22eb45aa-25f6-419a-9b55-57a0d98e3dac); Time taken: 0.724 seconds
INFO : Executing command(queryId=hive_20200605073915_22eb45aa-25f6-419a-9b55-57a0d98e3dac): select `timestamp`, url from events where partyid='0:3pu60uagp0:db698229-272e-4a1c-a18a-0ef3c9ba6cb5' order by `timestamp` desc
INFO : Query ID = hive_20200605073915_22eb45aa-25f6-419a-9b55-57a0d98e3dac
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Subscribed to counters: [] for queryId: hive_20200605073915_22eb45aa-25f6-419a-9b55-57a0d98e3dac
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: select `timestamp`, url fr...desc (Stage-1)
INFO : Setting tez.task.scale.memory.reserve-fraction to 0.30000001192092896
INFO : Status: Running (Executing on YARN cluster with App id application_1586459578755_0105)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------
VERTICES: 00/00 [>>--------------------------] 0% ELAPSED TIME: 9428.98 s
----------------------------------------------------------------------------------------------
ERROR : Status: Failed------------------------] 0% ELAPSED TIME: 9426.66 s
ERROR : Application application_1586459578755_0105 failed 2 times due to ApplicationMaster for attempt appattempt_1586459578755_0105_000002 timed out. Failing the application.
ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Application application_1586459578755_0105 failed 2 times due to ApplicationMaster for attempt appattempt_1586459578755_0105_000002 timed out. Failing the application.
INFO : Completed executing command(queryId=hive_20200605073915_22eb45aa-25f6-419a-9b55-57a0d98e3dac); Time taken: 9433.73 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Application application_1586459578755_0105 failed 2 times due to ApplicationMaster for attempt appattempt_1586459578755_0105_000002 timed out. Failing the application. (state=08S01,code=2) Any clue what is this error about? Other queries take long too if they ever succeed. Any resource configuration I should look into?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hive
-
Apache Tez