About aliyesami

aliyesami · ‎03-27-2018

I used the sqoop command with the outdir qualifier resulting in the avsc files and the data files , I am not clear on what to do next to create the hive external table . in the outdir there also a java file produced . [hdfs@hadoop1 ~]$ hdfs dfs -ls /sqoop-avro Found 4 items -rw-r--r-- 3 hdfs hdfs 28808 2018-03-27 15:40 /sqoop-avro/part-m-00000.avro -rw-r--r-- 3 hdfs hdfs 3127 2018-03-27 15:42 /sqoop-avro/part-m-00001.avro -rw-r--r-- 3 hdfs hdfs 3474 2018-03-27 15:42 /sqoop-avro/part-m-00002.avro -rw-r--r-- 3 hdfs hdfs 682403 2018-03-27 15:43 /sqoop-avro/part-m-00003.avro [hdfs@hadoop1 ~]$ [hdfs@hadoop1 ~]$ ls -ltr /tmp/sqoop total 40 -rw-r--r-- 1 hdfs hadoop 34553 Mar 27 15:39 PATRON_TAB4.java -rw-r--r-- 1 hdfs hadoop 1838 Mar 27 15:39 PATRON_TAB4.avsc [hdfs@hadoop1 ~]$ [hdfs@hadoop1 ~]$ [hdfs@hadoop1 ~]$ [hdfs@hadoop1 ~]$ [hdfs@hadoop1 ~]$ sqoop import -Dmapreduce.job.user.classpath.first=true --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron)))" --username PATRON --password xxxx --as-avrodatafile --incremental append --check-column PUR_TRANS_DATE --table PATRON.TAB4 --split-by TAB4.PUR_DET_ID --compression-codec snappy --target-dir /sqoop-avro --outdir /tmp/sqoop

aliyesami · ‎03-27-2018

if I add "stored as AVRO" then I get all nulls ULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL Time taken: 0.303 seconds, Fetched: 29999 row(s)

aliyesami · ‎03-27-2018

I have imported table data as AVRO files using sqoop . now I want to map an external table to it but its not working . this command gives me binary output CREATE EXTERNAL TABLE IF NOT EXISTS sqoop_text (ACCT_NUM STRING, PUR_ID STRING, PUR_DET_ID STRING, PRODUCT_PUR_PRODUCT_CODE STRING, PROD_AMT STRING, PUR_TRANS_DATE STRING, ACCTTYPE_ACCT_TYPE_CODE STRING, ACCTSTAT_ACCT_STATUS_CODE STRING, EMP_EMP_CODE STRING, PLAZA_PLAZA_ID STRING, PURSTAT_PUR_STATUS_CODE STRING) location '/sqoop-avro'

aliyesami · ‎03-27-2018

thanks that worked but its not working in scoop job , probably wrong syntax ? [hdfs@hadoop1 ~]$ sqoop job --create incjob4 -- import -Dmapreduce.job.user.classpath.first=true --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron)))" --username PATRON --password patvps --as-avrodatafile --incremental append --check-column PUR_TRANS_DATE --table PATRON.TAB4 --split-by TAB4.PUR_DET_ID --compression-codec snappy --target-dir /sqoop-avro Warning: /usr/hdp/2.6.3.0-235/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 18/03/27 14:36:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.3.0-235 18/03/27 14:36:50 ERROR tool.BaseSqoopTool: Error parsing arguments for import: 18/03/27 14:36:50 ERROR tool.BaseSqoopTool: Unrecognized argument: -Dmapreduce.job.user.classpath.first=true 18/03/27 14:36:50 ERROR tool.BaseSqoopTool: Unrecognized argument: --connect 18/03/27 14:36:50 ERROR tool.BaseSqoopTool: Unrecognized argument: jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron))) 18/03/27 14:36:50 ERROR tool.BaseSqoopTool: Unrecognized argument: --username 18/03/27 14:36:50 ERROR tool.BaseSqoopTool: Unrecognized argument: PATRON 18/03/27

aliyesami · ‎03-27-2018

I narrowed the error , it is causing by the "--as-avrodatafile" option , if I remove this option sqoop works fine. but I do want this option since I want to create AVRO file .

aliyesami · ‎03-27-2018

following sqoop command is failing with the error shown below sqoop job --create incjob -- import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron)))" --username PATRON --password patvps --as-avrodatafile --incremental append --check-column PUR_TRANS_DATE --table PATRON.TAB4 --split-by TAB4.PUR_DET_ID --compression-codec snappy --target-dir /sqoop-avro sqoop job --exec incjob command error 18/03/27 13:23:24 INFO mapreduce.Job: Task Id : attempt_1522170856018_0003_m_000000_2, Status : FAILED Error: org.apache.avro.reflect.ReflectData.addLogicalTypeConversion(Lorg/apache/avro/Conversion;)V Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143. yarn application log output echo "Copying debugging information" # Creating copy of launch script cp "launch_container.sh" "/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/launch_container.sh" chmod 640 "/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/launch_container.sh" # Determining directory contents echo "ls -l:" 1>"/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/directory.info" ls -l 1>>"/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/directory.info" echo "find -L . -maxdepth 5 -ls:" 1>>"/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/directory.info" find -L . -maxdepth 5 -ls 1>>"/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/directory.info" echo "broken symlinks(find -L . -maxdepth 5 -type l -ls):" 1>>"/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/directory.info" find -L . -maxdepth 5 -type l -ls 1>>"/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/directory.info" echo "Launching container" exec /bin/bash -c "$JAVA_HOME/bin/java -server -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true -Dhdp.version=2.6.3.0-235 -Xmx1228m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.100.44.20 59891 attempt_1522170856018_0003_m_000000_2 8796093022213 1>/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/stdout 2>/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/stderr " End of LogType:launch_container.sh.This log file belongs to a running container (container_e08_1522170856018_0003_01_000005) and so may not be complete. ************************************************************************************

aliyesami · ‎03-23-2018

its working now .. It brought all 1000 rows into one ORC file . I will increase the source data and see if it creates more smaller files . thanks a lot for your help

aliyesami · ‎03-23-2018

flow-diag.jpg qdb-properties.jpg qdb-schedule.jpg hi Shu I did the modifications as you suggested that is I modified the PutHDFS relation to only loop back for failure and for success auto terminate . Also I added Maximum-value-columns=pur_trans_date for QueryDatabaseTable processor . But I am doing something wrong as if I run the flow it only creates just one ORC file and then stops. attaching the flow and processor details , please see

aliyesami · ‎03-22-2018

ok I have this flow working where I am reading the table from the database and creating ORC file in hdfs . its running now and I see the files are increasing . for 1000 rows table it has created so far 44 orc files . How do I know when the process will stop ? or can I know how many files will be created for my table? will the QueryDatabaseTable process stop once all the 1000 table rows are read ? capture.jpg

aliyesami · ‎03-22-2018

I am using NIFI 1.2 . If you look at the flow , its not showing any values for "in" for the PutHiveStreaming processor , why ? even though at this point I can see 29 records in the database . The inbound queue number has increased to 10 but no more records are added yet to the database . I know later it will be . both the QuerydatabaseProcessor and the PutHiveStreaming processor schedule time is set to 30 secs. how can you explain this behavior ? (please see attached ) capture.jpg

Online	Offline
Last Visited	‎11-03-2016 03:37 PM

Member Since	‎04-22-2016 08:38 AM
Last Visited	‎11-03-2016 03:37 PM
Posts	931
Kudos received	46

Cloudera Community

Re: NON-ANSI JOIN in hive

Re: insert query hangs from hive view ONLY

Re: cant start rest server

Re: hbase rest api failing

Re: how to read time reported by yarn

Re: hive external table pointing to AVRO files

Re: hive external table pointing to AVRO files

hive external table pointing to AVRO files

Re: sqoop 1.4.6.2.6.3.0-235 import failing

Re: sqoop 1.4.6.2.6.3.0-235 import failing

sqoop 1.4.6.2.6.3.0-235 import failing

Re: hive table loading in NIFI extremely slow

Re: hive table loading in NIFI extremely slow

Re: hive table loading in NIFI extremely slow

Re: hive table loading in NIFI extremely slow