Member since
11-20-2018
19
Posts
0
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
16099 | 01-20-2019 02:59 AM | |
1295 | 01-16-2019 09:07 PM | |
10385 | 01-16-2019 04:49 PM | |
4407 | 01-11-2019 04:06 AM | |
1881 | 11-20-2018 05:11 PM |
01-20-2019
02:59 AM
After trail and error basis, I am able to resolve this issue after removing JSON Serde on ORC table.
... View more
01-18-2019
09:25 PM
I am getting below error when loading Json data from Hive table of type TEXTFILE to ORC type Hive table. Please share your thoughts to overcome from this issue. Thanks. Input sample file TestFile1.json:
{"jsondata":[{ "id":1, "m":"Edward the Elder", "cty":"Uited Kigdom", "hse":"House of Wessex", "yrs":"899925" },{ "id":2, "m":"Edward the Elder", "cty":"Uited Kigdom", "hse":"House of Wessex", "yrs":"899925" }]} Cluster version: HDP 2.6.5.0-292 TEXFILE Hive Table schema: CREATE EXTERNAL TABLE IF NOT EXISTS TestJson1 (jsondata array<struct<id:int,nm:varchar(30),cty:varchar(30),hse:varchar(30),yrs:varchar(20)>>) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' Location '/data/3rdPartyData/Hive/TestJson1'; Load json into above table: load data local inpath '/data/Hive/TestFile1.json' overwrite into table TestJson1; ORC Hive table schema: CREATE EXTERNAL TABLE IF NOT EXISTS TestJson1ORC(jsondata array<struct<id:int,nm:varchar(30),cty:varchar(30),hse:varchar(30),yrs:varchar(20)>>) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED as ORC Location '/data/3rdPartyData/Hive/TestJson1ORC'; Insert statement: INSERT OVERWRITE TABLE TestJson1ORC SELECT * FROM TestJson1; Getting below error while executing above statement. Error log: <small> java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1547835302497_0003_1_00, diagnostics=[Task failed, taskId=task_1547835302497_0003_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"jsondata":[{"id":1,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"},{"id":2,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"}]}
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"jsondata":[{"id":1,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"},{"id":2,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"}]}
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:96)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:73)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"jsondata":[{"id":1,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"},{"id":2,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"}]}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:88)
... 17 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:81)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:763)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
... 18 more
], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"jsondata":[{"id":1,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"},{"id":2,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"}]}
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"jsondata":[{"id":1,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"},{"id":2,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"}]}
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:96)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:73)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"jsondata":[{"id":1,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"},{"id":2,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"}]}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:88)
... 17 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:81)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:763)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
... 18 more
], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"jsondata":[{"id":1,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"},{"id":2,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"}]}
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"jsondata":[{"id":1,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"},{"id":2,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"}]}
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:96)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:73)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"jsondata":[{"id":1,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"},{"id":2,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"}]}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:88)
... 17 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:81)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:763)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
... 18 more
], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"jsondata":[{"id":1,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"},{"id":2,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"}]}
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"jsondata":[{"id":1,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"},{"id":2,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"}]}
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:96)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:73)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"jsondata":[{"id":1,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"},{"id":2,"nm":null,"cty":"Uited Kigdom","hse":"House of Wessex","yrs":"899925"}]}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:88)
... 17 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:81)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:763)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
... 18 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1547835302497_0003_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
<a href="http://ambaripoc:8080/views/HIVE/2.0.0/AUTO_HIVE20_INSTANCE/#">(less...)</a></small>
... View more
Labels:
- Labels:
-
Apache Hive
01-18-2019
08:53 PM
As per my analysis, I understand that org.apache.hive.hcatalog.data.JsonSerDe doesn't support the JSON which starts with square bracket "[". [{ "id":1, "nm":"Edward the Elder", "cty":"United Kingdom", "hse":"House of Wessex", "yrs":"899925" }, { "id":2, "nm":"Edward the Elder", "cty":"United Kingdom", "hse":"House of Wessex", "yrs":"899925" }]
... View more
01-16-2019
09:07 PM
Since it is weird behavior, I have rechecked all the memory usage and configuration and noticed that it was due to TEZ memory set to Yarn Max memory. After reducing the TEZ memory, this is fixed.
... View more
01-16-2019
05:42 PM
I have noticed that it is not only ORC table, it is also the same for normal table. This is happening to load data from other table whereas loading data from source file into table is fine.
... View more
01-16-2019
04:57 PM
Issue: Loading data into Hive ORC table is infinite, I should manually kill the load process. I am trying to load data into ORC Hive table from another
Hive TEXTFILE table. Since the source files are TXT/Json, loading data first
into TEXT table and then trying to load into ORC table. Cluster: HDP 2.6.5-292 Hive version: 1.2.1000.2.6.5.0-292 Here is the Hive TEXTFILE table schema: Create external table if not exists TEXTTable(ID
bigint, DOCUMENT_ID bigint, NUM varchar(20), SUBMITTER_ID bigint, FILING
string, CODE varchar(10), RECEIPTNUM varchar(20)) row format delimited fields terminated by '|' Location '/data/3rdPartyData/Hive/ TEXTTable ' TBLPROPERTIES ('skip.header.line.count'='1'); Load Data into TEXTFILE table: load data local inpath '/data/TextFile.txt' overwrite into
table TEXTTable; Here is the Hive ORC table schema: Create external table if not exists ORCTable(ID
bigint, DOCUMENT_ID bigint, NUM varchar(20), SUBMITTER_ID bigint, FILING
TIMESTAMP, CODE varchar(10), RECEIPTNUM varchar(20)) row format delimited fields terminated by '|' STORED as ORC Location '/data/3rdPartyData/Hive/ ORCTable ' TBLPROPERTIES ('orc.compress'='SNAPPY'); Load data into ORC table: Insert overwrite table ORCTable select _ID, DOCUMENT_ID,
NUM, SUBMITTER_ID,from_unixtime(unix_timestamp(FILING,
"yyyy-MM-dd'T'HH:mm:ss")) as FILING, CODE, RECEIPTNUM from TEXTTable;
... View more
Labels:
- Labels:
-
Apache Hive
01-16-2019
04:52 PM
@atrivedi Thank you. do you mean starting with Square bracket "[" instead of curly bracket "{"?
... View more
01-16-2019
04:49 PM
Thank you @jbarnett. I worked on the similar JSON structure before with Spark, but I am checking now the possibility to ingest data using only shell scripts and Hive scripts.
... View more
01-15-2019
12:02 AM
Thank you @jbarnett Yes, it worked as well to me the simple json format after correcting the schema but the other format JSON starts with Square bracket that requires a tweak to work it. Receiving hundreds of files, and 1000s of Array elements in each file as below format: [{ "id":1, "nm":"Edward the
Elder", "cty":"United Kingdom",
"hse":"House of Wessex", "yrs":"899925"
}, { "id":2, "nm":"Edward the Elder",
"cty":"United Kingdom", "hse":"House of
Wessex", "yrs":"899925" }] However, JSON SerDe only supporting single line JSON meaning each JSON
doc have to be in a different line otherwise Array of JSON objects in the below updated format: { "jsondata":[{ "id":1,
"nm":"Edward the Elder", "cty":"United
Kingdom", "hse":"House of Wessex",
"yrs":"899925" }] } And, the schema: CREATE EXTERNAL TABLE IF NOT EXISTS TestJson1 (jsondata
array<struct<id:int,nm:varchar(30),cty:varchar(30),hse:varchar(30),yrs:varchar(20)>>) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' Location '/data/3rdPartyData/Hive'; Any thoughts here to manage SerDe/Schema without updating the source files?
... View more
01-12-2019
04:08 AM
@jbarnett Thank you! Please find below requested details Hive Schema: CREATE TABLE IF NOT EXISTS TestJson (id int, nm varchar(30), cty varchar(30), hse varchar(30), yrs varchar(20)) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS TEXTFILE; Sample Json: [ { "id": 1, "nm": "Edward the Elder", "cty": "United Kingdom", "hse": "House of Wessex", "yrs": "899-925" }, { "id": 2, "nm": "Athelstan", "cty": "United Kingdom", "hse": "House of Wessex", "yrs": "925-940" }, ] Also tried with below JSON format as well: { "id": 1, "nm": "Edward the Elder", "cty": "United Kingdom", "hse": "House of Wessex", "yrs": "899-925" } { "id": 2, "nm": "Athelstan", "cty": "United Kingdom", "hse": "House of Wessex", "yrs": "925-940" } Error stack trace: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start token not found where expected
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start token not found where expected
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250)
at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:373)
at org.apache.ambari.view.hive20.actor.ResultSetIterator.getNext(ResultSetIterator.java:119)
at org.apache.ambari.view.hive20.actor.ResultSetIterator.handleMessage(ResultSetIterator.java:78)
at org.apache.ambari.view.hive20.actor.HiveActor.onReceive(HiveActor.java:38)
at akka.actor.UntypedActor$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start token not found where expected
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:416)
at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:243)
at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:793)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
at com.sun.proxy.$Proxy29.fetchResults(Unknown Source)
at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:523)
at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:709)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1617)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1602)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start token not found where expected
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:520)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:427)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1782)
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:411)
... 25 more
Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start token not found where expected
at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:186)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:501)
... 29 more
Caused by: java.io.IOException: Start token not found where expected
at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:172) ... 30 more
... View more
01-11-2019
06:18 AM
I am getting an error when querying Hive table over JSON Data. I think the root cause would be JSON SerDe. Should I have to change Hive native JSON SerDe to any other JSON SerDe? Please advise Error: Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start token not found where expected
... View more
Labels:
- Labels:
-
Apache Hive
01-11-2019
04:06 AM
This issue is resolved by reducing container size (increase memory for paramers - yarn.scheduler.minimum-allocation-mb, and tez.task.resource.memory.mb)
... View more
01-08-2019
07:15 PM
Hive
Table: Internal table ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS TEXTFILE Data ingestion is successful Facing memory issues while querying! Input file: JSON file While researching
on memory configuration, found some information here to
calculate recommended memory configuration against available resources. Our POC cluster size: 4 node cluster - 8 GB, 2
cores and 2 disks on each node Please
refer below information and let me know if memory needs to be extended or do TEZ memory configuration needs to be changed to overcome "out of memory" issue? Calculated memory
configuration uisng yarn-utils.py: $ python yarn-utils.py -c 2 -m 8 -d 2 -k True Using cores=2 memory=8GB disks=2 hbase=True Profile: cores=2 memory=5120MB reserved=3GB usableMem=5GB disks=2 Num Container=4 Container Ram=1024MB Used Ram=4GB Unused Ram=3GB yarn.scheduler.minimum-allocation-mb=1024 yarn.scheduler.maximum-allocation-mb=4096 yarn.nodemanager.resource.memory-mb=4096 mapreduce.map.memory.mb=1024 mapreduce.map.java.opts=-Xmx819m mapreduce.reduce.memory.mb=2048 mapreduce.reduce.java.opts=-Xmx1638m yarn.app.mapreduce.am.resource.mb=2048 yarn.app.mapreduce.am.command-opts=-Xmx1638m mapreduce.task.io.sort.mb=409 Error Log: <small> java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1542737559534_0026_1_00, diagnostics=[Task failed, taskId=task_1542737559534_0026_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:159)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:563)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:88)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:73)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
... 14 more
], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:159)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:563)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:88)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:73)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
... 14 more</small>
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez
-
Apache YARN
11-21-2018
06:51 PM
Any other solution to resume Workflow-manager on Ambari? I am following this, but couldn't delete entries mentioned above. link: https://community.hortonworks.com/articles/138689/oozie-jobs-get-stuck-in-prep-status.html
... View more
11-21-2018
05:33 PM
As workflow-manager is hung, I am trying to delete entries from Oozie-db configured on Derby database, however I am getting an error while delete records from table "Error: An SQL data change is not permitted for a read-only connection, user or database. (state=25502,code=30000)". Launched Derby from command line using: java -cp .:/usr/hdp/2.6.5.0-292/oozie/libserver/derby-10.10.1.1.jar:/usr/hdp/2.6.5.0-292/phoenix/bin/../phoenix-4.7.0.2.6.5.0-292-thin-client.jar sqlline.SqlLine -d org.apache.derby.jdbc.EmbeddedDriver -u jdbc:derby:/hadoop/oozie/data/oozie-db -n none -p none --color=true --fastConnect=false --verbose=true --isolation=TRANSACTION_READ_UNCOMMITTED It might be due to launching a Derby from non-admin account. Your inputs would be greatly appreciated. Thank you.
... View more
Labels:
- Labels:
-
Apache Oozie
11-20-2018
05:11 PM
Resolved this issue! it worked with Hive2 action with Cluster configuration - Local cluster instead of custom.
... View more
11-20-2018
05:16 AM
Here is the workflow <?xml version="1.0" encoding="UTF-8" standalone="no"?>
<workflow-app xmlns="uri:oozie:workflow:0.5" name="TestHive1">
<start to="hive_1"/>
<action name="hive_1">
<hive xmlns="uri:oozie:hive-action:0.6">
<job-tracker>${resourceManager}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>/user/admin/apps/oozie/conf/hive-site.xml</job-xml>
<configuration>
<property>
<name>oozie.libpath</name>
<value>http://host1:8020/user/oozie/share/lib/</value>
</property>
</configuration>
<script>/user/admin/apps/oozie/conf/test.hql</script>
<file>/user/admin/apps/oozie/conf/test.hql</file>
</hive>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>${wf:errorMessage(wf:lastErrorNode())}</message>
</kill>
<end name="end"/>
</workflow-app>
... View more
11-20-2018
05:16 AM
Above error with Query option whereas Script option (.hql) is throwing below error. I already created /user/admin with required permissions (chown admin:hdfs) but still getting error. USER[admin] GROUP[-] TOKEN[] APP[TestHive1] JOB[0000034-181115165802518-oozie-oozi-W] ACTION[0000034-181115165802518-oozie-oozi-W@hive_1] Error starting action [hive_1]. ErrorType [TRANSIENT], ErrorCode [JA009], Message [JA009: Directory/File does not exist /user/admin/.staging/job_1542322485217_0046/job.split]
org.apache.oozie.action.ActionExecutorException: JA009: Directory/File does not exist /user/admin/.staging/job_1542322485217_0046/job.split
at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:437)
at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1258)
at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1440)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:234)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:65)
at org.apache.oozie.command.XCommand.call(XCommand.java:287)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:331)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:260)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
... View more
11-20-2018
12:42 AM
A simple Hive query "use dbname; show tables;" is failing. It would be great help to provide your thoughts. Thanks.
USER[admin] GROUP[-] TOKEN[] APP[CreateHiveSchema] JOB[0000030-181115165802518-oozie-oozi-W] ACTION[0000030-181115165802518-oozie-oozi-W@hive_1] Error starting action [hive_1]. ErrorType [ERROR], ErrorCode [IllegalArgumentException], Message [IllegalArgumentException: Can not create a Path from an empty string]
org.apache.oozie.action.ActionExecutorException: IllegalArgumentException: Can not create a Path from an empty string
at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:446)
at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1258)
at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1440)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:234)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:65)
at org.apache.oozie.command.XCommand.call(XCommand.java:287)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:331)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:260)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
... View more
- Tags:
- workflow-manager
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Oozie