About adadi

adadi · ‎02-07-2017

Hi, We have a Kerberized Hadoop cluster. This is been running a for a while and we don't have any issues. Suddenly all our weekly batch jobs failed with Token delegation key error. Here is the error message Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258) Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258) Transaction isolation: TRANSACTION_REPEATABLE_READ INFO : Tez session hasn't been created yet. Opening session ERROR : Failed to execute tez graph. org.apache.tez.dag.api.TezException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1485103343268_0137 to YARN : Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, Service: 10.19.8.245:8190, Ident: (owner=hive, renewer=yarn, realUser=, issueDate=1486486486195, maxDate=1487091286195, sequenceNumber=5130, masterKeyId=627) at org.apache.tez.client.TezClient.start(TezClient.java:414) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196) at org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:271) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:151) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75) Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1485103343268_0137 to YARN : Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, Service: 10.19.8.245:8190, Ident: (owner=hive, renewer=yarn, realUser=, issueDate=1486486486195, maxDate=1487091286195, sequenceNumber=5130, masterKeyId=627) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:271) at org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72) at org.apache.tez.client.TezClient.start(TezClient.java:409) ... 6 more Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=1) It would be great if anyone has any idea what is triggering this issue all of a sudden Thanks Arun

adadi · ‎10-21-2016

Hi Dan, Thank you for your sample code. I am able to get this working by using Flowfile attributes and by specifying local file system file. I ended up using os.system call as I am facing issues with python libraries(requests,urllib,..etc) for webhdfs op command options. Issues with this is approach , I have to keep my source files exist in my Getfile processor. If I keep the source files exist then the subsequent iterations it complains the file already exists. If I don't keep the source file , once the Getfile processor completes it removes the source file and it will not be available for the execute script processor to refer for the curl command. Here is my new code.. import sys import os import java.io from org.apache.commons.io import IOUtils flowFile = session.get() filename = flowFile.getAttribute('filename') filepath = flowFile.getAttribute('absolute.path') file= filepath + filename if (flowFile != None): print 'filename is ::'+file os.system('curl -ku user:password -L -T '+file+' -X PUT "https://gatewayhost:8443/gateway/default/webhdfs/v1/user/'+filename+'?op=CREATE"') session.remove(flowFile) session.commit()

adadi · ‎10-20-2016

@Matt Burgess I did try those two processors. But it didn't work for me. I need to run the curl command on incoming filefile. Executeprocess - I don't see an option to specify the incoming flowfile. ExecuteStreamCommand - I did try running this by specifying the curl command and specifying static target filename for now and @- to get the stdin to the curl. curl -iku user:password -L --data-binary @- -x PUT https://gatewayhost:8443/gateway/default/webhdfs/v1/user/user/ad_test1020.txt?op=CREATE But it doesn't work for me with the below error. seems it doesn't understand the @- org.apache.nifi.processor.exception.ProcessException: java.io.IOException: Cannot run program " curl -iku user:password -L --data-binary @- -x PUT https://gatewayhost:8443/gateway/default/webhdfs/v1/user/user/ad_test1020.txt?op=CREATE": error=2, No such file or directory

adadi · ‎10-20-2016

I have a simple Nifi flow where it picks up the files from landing directory (Get file processor) and place them on HDFS. To place files on HDFS I am using webhdfs over knox. As Nifi does not have webhdfs processor, I am using python executeScript processor to run the Curl command for webhdfs post. But I am getting an error while running the below code. Here is my python script code import sys import os import traceback from org.apache.commons.io import IOUtils from java.nio.charset import StandardCharsets from org.apache.nifi.processor.io import InputStreamCallback from org.python.core.util import StringUtil class readFirstLineCallback(InputStreamCallback): def __init__(self): pass def process(self, inputStream): try: input_text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) os.system('touch /u/itlaked/process_outputs/' + input_text) print 'AD_TEST' os.system('echo '+input_text+' | curl -iku user:password -L --data-binary @- -x PUT https://gatewayhost:8443/gateway/default/webhdfs/v1/user/user/ad_test1020.txt?op=CREATE') except: traceback.print_exc(file=sys.stdout) raise flowFile = session.get() if flowFile != None: readCallback = readFirstLineCallback() session.read(flowFile, readCallback) session.remove(flowFile) session.commit() ERROR Message: 2016-10-20 09:36:15,106 ERROR [NiFi logging handler] org.apache.nifi.StdErr /bin/sh: -c: line 1: syntax error near unexpected token `|' 3262 2016-10-20 09:36:15,106 ERROR [NiFi logging handler] org.apache.nifi.StdErr /bin/sh: -c: line 1: ` | curl -iku user:password -L --data-binary @- -x PUT https://gatewayhost:8443/gateway/default/webhdfs/v1/user/user/ad_test1020.txt?op=CREATE' Please let me know if anybody has any thoughts ...I am very new to the Nifi and I just chose ExecuteScript because this curl command runs fine on the command line. Not sure if there is any other ways to achieve this

adadi · ‎04-26-2016

I have tried creating table with Lines terminated by as Control A to see how it works but it got failed with below error message. Create stmt: create table multiline_test ( num int, desc string ) row format delimited fields terminated by ',' lines terminated by '\001' location '/test/multiline_test'; Error : Error while compiling statement: FAILED: SemanticException 8:20 LINES TERMINATED BY only supports newline '\n' right now. Error encountered near token ''\001'' Seems like hive only supports '\n' as Lines terminated though it says configurable in the DDLs. I did see an open HIVE JIRA issue for the same. https://issues.apache.org/jira/browse/HIVE-11996

adadi · ‎04-26-2016

Thanks Ravi and Divakar for your response. Right now my current field delimiter is Control A(\001) and I cannot use comma or tab because my data may contain these standard delimiters. Is there any other delimiter I can use it for lines termination ?

adadi · ‎04-25-2016

Hi, I am trying to store multiline character fields in hive table. I tried using csv serde but the data is been shows as multiple records. Table description CREATE EXTERNAL TABLE `serde_test1`( `num` string COMMENT 'from deserializer', `name` string COMMENT 'from deserializer') ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde' WITH SERDEPROPERTIES ( 'quoteChar'='\"', 'separatorChar'=',') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'HDFS location'; Data that is stored in hive: "300","test newline add testings" If I select from hive table it shows as NULLs. serde_test1.num serde_test1.name 300 (null) (null) (null) Anybody has any idea on how to store multi line fields in hive table?

adadi · ‎03-17-2016

I am able to successfully import few tables from Netezza to HDFS. Failing tables have primary key constraint on netezza and I see the Sqoop Split-by is using primary key column. I tried changing the split-by to different column and increased the split count as well. However I am getting following error message for few tables. 16/03/15 14:00:23 INFO mapreduce.Job: Task Id : attempt_1456951008977_0160_m_000000_0, Status : FAILED Error: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.netezza.sql.NzConnection.receiveDbosTuple(NzConnection.java:739) at org.netezza.internal.QueryExecutor.getNextResult(QueryExecutor.java:177) at org.netezza.internal.QueryExecutor.execute(QueryExecutor.java:73) at org.netezza.sql.NzConnection.execute(NzConnection.java:2688) at org.netezza.sql.NzStatement._execute(NzStatement.java:849) at org.netezza.sql.NzPreparedStatament.executeQuery(NzPreparedStatament.java:169) at org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111) at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 16/03/15 14:00:43 INFO mapreduce.Job: Task Id : attempt_1456951008977_0160_m_000000_1, Status : FAILED Error: java.lang.ArrayIndexOutOfBoundsException at org.netezza.sql.NzConnection.receiveDbosTuple(NzConnection.java:739) at org.netezza.internal.QueryExecutor.update(QueryExecutor.java:340) at org.netezza.sql.NzConnection.updateResultSet(NzConnection.java:2704) at org.netezza.sql.NzResultSet.next(NzResultSet.java:1924) at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:237) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Online	Offline
Last Visited	‎03-20-2017 08:55 PM

Member Since	‎03-17-2016 03:39 PM
Last Visited	‎03-20-2017 08:55 PM
Posts	13
Kudos received	3

Cloudera Community

Jobs are getting failed to renew token TIMELINE_DE...

Re: How to pass flowfile to a Python curl command ...

Re: How to pass flowfile to a Python curl command ...

How to pass flowfile to a Python curl command ?

Re: Storing Multiline character fields in hive tab...

Re: Storing Multiline character fields in hive tab...

Storing Multiline character fields in hive table

Sqoop import from Netezza to HDFS failing with jav...