Member since
04-22-2016
931
Posts
46
Kudos Received
26
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1855 | 10-11-2018 01:38 AM | |
| 2217 | 09-26-2018 02:24 AM | |
| 2246 | 06-29-2018 02:35 PM | |
| 2915 | 06-29-2018 02:34 PM | |
| 6096 | 06-20-2018 04:30 PM |
03-27-2018
07:48 PM
I used the sqoop command with the outdir qualifier resulting in the avsc files and the data files , I am not clear on what to do next to create the hive external table . in the outdir there also a java file produced . [hdfs@hadoop1 ~]$ hdfs dfs -ls /sqoop-avro
Found 4 items
-rw-r--r-- 3 hdfs hdfs 28808 2018-03-27 15:40 /sqoop-avro/part-m-00000.avro
-rw-r--r-- 3 hdfs hdfs 3127 2018-03-27 15:42 /sqoop-avro/part-m-00001.avro
-rw-r--r-- 3 hdfs hdfs 3474 2018-03-27 15:42 /sqoop-avro/part-m-00002.avro
-rw-r--r-- 3 hdfs hdfs 682403 2018-03-27 15:43 /sqoop-avro/part-m-00003.avro
[hdfs@hadoop1 ~]$
[hdfs@hadoop1 ~]$ ls -ltr /tmp/sqoop
total 40
-rw-r--r-- 1 hdfs hadoop 34553 Mar 27 15:39 PATRON_TAB4.java
-rw-r--r-- 1 hdfs hadoop 1838 Mar 27 15:39 PATRON_TAB4.avsc
[hdfs@hadoop1 ~]$
[hdfs@hadoop1 ~]$
[hdfs@hadoop1 ~]$
[hdfs@hadoop1 ~]$
[hdfs@hadoop1 ~]$ sqoop import -Dmapreduce.job.user.classpath.first=true --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron)))" --username PATRON --password xxxx --as-avrodatafile --incremental append --check-column PUR_TRANS_DATE --table PATRON.TAB4 --split-by TAB4.PUR_DET_ID --compression-codec snappy --target-dir /sqoop-avro --outdir /tmp/sqoop
... View more
03-27-2018
07:01 PM
if I add "stored as AVRO" then I get all nulls ULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
Time taken: 0.303 seconds, Fetched: 29999 row(s)
... View more
03-27-2018
06:41 PM
I have imported table data as AVRO files using sqoop . now I want to map an external table to it but its not working . this command gives me binary output CREATE EXTERNAL TABLE IF NOT EXISTS sqoop_text (ACCT_NUM STRING, PUR_ID STRING, PUR_DET_ID STRING,
PRODUCT_PUR_PRODUCT_CODE STRING, PROD_AMT STRING, PUR_TRANS_DATE STRING, ACCTTYPE_ACCT_TYPE_CODE STRING,
ACCTSTAT_ACCT_STATUS_CODE STRING, EMP_EMP_CODE STRING, PLAZA_PLAZA_ID STRING, PURSTAT_PUR_STATUS_CODE STRING)
location '/sqoop-avro'
... View more
Labels:
03-27-2018
06:37 PM
thanks that worked but its not working in scoop job , probably wrong syntax ? [hdfs@hadoop1 ~]$ sqoop job --create incjob4 -- import -Dmapreduce.job.user.classpath.first=true --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron)))" --username PATRON --password patvps --as-avrodatafile --incremental append --check-column PUR_TRANS_DATE --table PATRON.TAB4 --split-by TAB4.PUR_DET_ID --compression-codec snappy --target-dir /sqoop-avro
Warning: /usr/hdp/2.6.3.0-235/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/03/27 14:36:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.3.0-235
18/03/27 14:36:50 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
18/03/27 14:36:50 ERROR tool.BaseSqoopTool: Unrecognized argument: -Dmapreduce.job.user.classpath.first=true
18/03/27 14:36:50 ERROR tool.BaseSqoopTool: Unrecognized argument: --connect
18/03/27 14:36:50 ERROR tool.BaseSqoopTool: Unrecognized argument: jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron)))
18/03/27 14:36:50 ERROR tool.BaseSqoopTool: Unrecognized argument: --username
18/03/27 14:36:50 ERROR tool.BaseSqoopTool: Unrecognized argument: PATRON
18/03/27
... View more
03-27-2018
06:11 PM
I narrowed the error , it is causing by the "--as-avrodatafile" option , if I remove this option sqoop works fine. but I do want this option since I want to create AVRO file .
... View more
03-27-2018
05:29 PM
following sqoop command is failing with the error shown below sqoop job --create incjob -- import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron)))" --username PATRON --password patvps --as-avrodatafile --incremental append --check-column PUR_TRANS_DATE --table PATRON.TAB4 --split-by TAB4.PUR_DET_ID --compression-codec snappy --target-dir /sqoop-avro
sqoop job --exec incjob command error 18/03/27 13:23:24 INFO mapreduce.Job: Task Id : attempt_1522170856018_0003_m_000000_2, Status : FAILED
Error: org.apache.avro.reflect.ReflectData.addLogicalTypeConversion(Lorg/apache/avro/Conversion;)V
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143.
yarn application log output echo "Copying debugging information"
# Creating copy of launch script
cp "launch_container.sh" "/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/launch_container.sh"
chmod 640 "/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/launch_container.sh"
# Determining directory contents
echo "ls -l:" 1>"/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/directory.info"
ls -l 1>>"/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/directory.info"
echo "find -L . -maxdepth 5 -ls:" 1>>"/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/directory.info"
find -L . -maxdepth 5 -ls 1>>"/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/directory.info"
echo "broken symlinks(find -L . -maxdepth 5 -type l -ls):" 1>>"/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/directory.info"
find -L . -maxdepth 5 -type l -ls 1>>"/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/directory.info"
echo "Launching container"
exec /bin/bash -c "$JAVA_HOME/bin/java -server -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true -Dhdp.version=2.6.3.0-235 -Xmx1228m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.100.44.20 59891 attempt_1522170856018_0003_m_000000_2 8796093022213 1>/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/stdout 2>/hadoop/yarn/log/application_1522170856018_0003/container_e08_1522170856018_0003_01_000005/stderr "
End of LogType:launch_container.sh.This log file belongs to a running container (container_e08_1522170856018_0003_01_000005) and so may not be complete.
************************************************************************************
... View more
Labels:
03-23-2018
03:37 PM
its working now .. It brought all 1000 rows into one ORC file . I will increase the source data and see if it creates more smaller files . thanks a lot for your help
... View more
03-23-2018
02:39 PM
flow-diag.jpg qdb-properties.jpg qdb-schedule.jpg hi Shu I did the modifications as you suggested that is I modified the PutHDFS relation to only loop back for failure and for success auto terminate . Also I added Maximum-value-columns=pur_trans_date for QueryDatabaseTable processor . But I am doing something wrong as if I run the flow it only creates just one ORC file and then stops. attaching the flow and processor details , please see
... View more
03-22-2018
09:51 PM
ok I have this flow working where I am reading the table from the database and creating ORC file in hdfs . its running now and I see the files are increasing . for 1000 rows table it has created so far 44 orc files . How do I know when the process will stop ? or can I know how many files will be created for my table? will the QueryDatabaseTable process stop once all the 1000 table rows are read ? capture.jpg
... View more
03-22-2018
07:02 PM
I am using NIFI 1.2 . If you look at the flow , its not showing any values for "in" for the PutHiveStreaming processor , why ? even though at this point I can see 29 records in the database . The inbound queue number has increased to 10 but no more records are added yet to the database . I know later it will be . both the QuerydatabaseProcessor and the PutHiveStreaming processor schedule time is set to 30 secs. how can you explain this behavior ? (please see attached ) capture.jpg
... View more