Member since
03-17-2016
13
Posts
3
Kudos Received
0
Solutions
02-07-2017
05:03 PM
1 Kudo
Hi, We have a Kerberized Hadoop cluster. This is been running a for a while and we don't have any issues. Suddenly all our weekly batch jobs failed with Token delegation key error. Here is the error message Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258)
Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258)
Transaction isolation: TRANSACTION_REPEATABLE_READ
INFO : Tez session hasn't been created yet. Opening session
ERROR : Failed to execute tez graph.
org.apache.tez.dag.api.TezException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1485103343268_0137 to YARN : Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, Service: 10.19.8.245:8190, Ident: (owner=hive, renewer=yarn, realUser=, issueDate=1486486486195, maxDate=1487091286195, sequenceNumber=5130, masterKeyId=627)
at org.apache.tez.client.TezClient.start(TezClient.java:414)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:271)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:151)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1485103343268_0137 to YARN : Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, Service: 10.19.8.245:8190, Ident: (owner=hive, renewer=yarn, realUser=, issueDate=1486486486195, maxDate=1487091286195, sequenceNumber=5130, masterKeyId=627)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:271)
at org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72)
at org.apache.tez.client.TezClient.start(TezClient.java:409)
... 6 more
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=1)
It would be great if anyone has any idea what is triggering this issue all of a sudden Thanks Arun
... View more
Labels:
- Labels:
-
Apache Hive
10-21-2016
03:44 PM
Hi Dan, Thank you for your sample code. I am able to get this working by using Flowfile attributes and by specifying local file system file. I ended up using os.system call as I am facing issues with python libraries(requests,urllib,..etc) for webhdfs op command options. Issues with this is approach , I have to keep my source files exist in my Getfile processor. If I keep the source files exist then the subsequent iterations it complains the file already exists. If I don't keep the source file , once the Getfile processor completes it removes the source file and it will not be available for the execute script processor to refer for the curl command. Here is my new code.. import sys
import os
import java.io
from org.apache.commons.io import IOUtils
flowFile = session.get()
filename = flowFile.getAttribute('filename')
filepath = flowFile.getAttribute('absolute.path')
file= filepath + filename
if (flowFile != None):
print 'filename is ::'+file
os.system('curl
-ku user:password -L -T '+file+' -X PUT "https://gatewayhost:8443/gateway/default/webhdfs/v1/user/'+filename+'?op=CREATE"')
session.remove(flowFile)
session.commit()
... View more
10-20-2016
03:32 PM
@Matt Burgess I did try those two processors. But it didn't work for me. I need to run the curl command on incoming filefile.
Executeprocess - I don't see an option to specify the incoming flowfile.
ExecuteStreamCommand - I did try running this by specifying the curl command and specifying static target filename for now and @- to get the stdin to the curl. curl -iku user:password -L --data-binary @- -x PUT https://gatewayhost:8443/gateway/default/webhdfs/v1/user/user/ad_test1020.txt?op=CREATE But it doesn't work for me with the below error. seems it doesn't understand the @- org.apache.nifi.processor.exception.ProcessException: java.io.IOException: Cannot run program " curl -iku user:password -L --data-binary @- -x PUT https://gatewayhost:8443/gateway/default/webhdfs/v1/user/user/ad_test1020.txt?op=CREATE": error=2, No such file or directory
... View more
10-20-2016
02:50 PM
1 Kudo
I have a simple Nifi flow where it picks up the files from landing directory (Get file processor) and place them on HDFS. To place files on HDFS I am using webhdfs over knox.
As Nifi does not have webhdfs processor, I am using python executeScript processor to run the Curl command for webhdfs post. But I am getting an error while running the below code. Here is my python script code import sys
import os
import traceback
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import InputStreamCallback
from org.python.core.util import StringUtil class readFirstLineCallback(InputStreamCallback):
def __init__(self):
pass def process(self, inputStream):
try:
input_text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
os.system('touch /u/itlaked/process_outputs/' + input_text)
print 'AD_TEST'
os.system('echo '+input_text+' | curl -iku user:password -L --data-binary @- -x PUT https://gatewayhost:8443/gateway/default/webhdfs/v1/user/user/ad_test1020.txt?op=CREATE')
except:
traceback.print_exc(file=sys.stdout)
raise
flowFile = session.get()
if flowFile != None:
readCallback = readFirstLineCallback()
session.read(flowFile, readCallback)
session.remove(flowFile)
session.commit()
ERROR Message:
2016-10-20 09:36:15,106 ERROR [NiFi logging handler] org.apache.nifi.StdErr /bin/sh: -c: line 1: syntax error near unexpected token `|'
3262 2016-10-20 09:36:15,106 ERROR [NiFi logging handler] org.apache.nifi.StdErr /bin/sh: -c: line 1: ` | curl -iku user:password -L --data-binary @- -x PUT https://gatewayhost:8443/gateway/default/webhdfs/v1/user/user/ad_test1020.txt?op=CREATE'
Please let me know if anybody has any thoughts ...I am very new to the Nifi and I just chose ExecuteScript because this curl command runs fine on the command line.
Not sure if there is any other ways to achieve this
... View more
Labels:
- Labels:
-
Apache NiFi
04-26-2016
02:32 PM
I have tried creating table with Lines terminated by as Control A to see how it works but it got failed with below error message. Create stmt: create table multiline_test
(
num int,
desc string
)
row format delimited
fields terminated by ','
lines terminated by '\001'
location '/test/multiline_test'; Error : Error while compiling statement: FAILED: SemanticException 8:20 LINES TERMINATED BY only supports newline '\n' right now. Error encountered near token ''\001''
Seems like hive only supports '\n' as Lines terminated though it says configurable in the DDLs. I did see an open HIVE JIRA issue for the same. https://issues.apache.org/jira/browse/HIVE-11996
... View more
04-26-2016
02:16 PM
Thanks Ravi and Divakar for your response. Right now my current field delimiter is Control A(\001) and I cannot use comma or tab because my data may contain these standard delimiters. Is there any other delimiter I can use it for lines termination ?
... View more
04-25-2016
03:35 PM
Hi, I am trying to store multiline character fields in hive table. I tried using csv serde but the data is been shows as multiple records. Table description CREATE EXTERNAL TABLE `serde_test1`(
`num` string COMMENT 'from deserializer',
`name` string COMMENT 'from deserializer')
ROW FORMAT SERDE
'com.bizo.hive.serde.csv.CSVSerde'
WITH SERDEPROPERTIES (
'quoteChar'='\"',
'separatorChar'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 'HDFS location'; Data that is stored in hive: "300","test newline
add testings" If I select from hive table it shows as NULLs. serde_test1.num serde_test1.name
300 (null)
(null) (null) Anybody has any idea on how to store multi line fields in hive table?
... View more
Labels:
- Labels:
-
Apache Hive
03-17-2016
03:58 PM
1 Kudo
I am able to successfully import few tables from Netezza to HDFS. Failing tables have primary key constraint on netezza and I see the Sqoop Split-by is using primary key column. I tried changing the split-by to different column and increased the split count as well. However I am getting following error message for few tables. 16/03/15 14:00:23 INFO mapreduce.Job: Task Id : attempt_1456951008977_0160_m_000000_0, Status : FAILED
Error: java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.netezza.sql.NzConnection.receiveDbosTuple(NzConnection.java:739)
at org.netezza.internal.QueryExecutor.getNextResult(QueryExecutor.java:177)
at org.netezza.internal.QueryExecutor.execute(QueryExecutor.java:73)
at org.netezza.sql.NzConnection.execute(NzConnection.java:2688)
at org.netezza.sql.NzStatement._execute(NzStatement.java:849)
at org.netezza.sql.NzPreparedStatament.executeQuery(NzPreparedStatament.java:169)
at org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111)
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 16/03/15 14:00:43 INFO mapreduce.Job: Task Id : attempt_1456951008977_0160_m_000000_1, Status : FAILED
Error: java.lang.ArrayIndexOutOfBoundsException
at org.netezza.sql.NzConnection.receiveDbosTuple(NzConnection.java:739)
at org.netezza.internal.QueryExecutor.update(QueryExecutor.java:340)
at org.netezza.sql.NzConnection.updateResultSet(NzConnection.java:2704)
at org.netezza.sql.NzResultSet.next(NzResultSet.java:1924)
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:237)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Sqoop