Created 03-27-2018 06:41 PM
I have imported table data as AVRO files using sqoop . now I want to map an external table to it but its not working .
this command gives me binary output
CREATE EXTERNAL TABLE IF NOT EXISTS sqoop_text (ACCT_NUM STRING, PUR_ID STRING, PUR_DET_ID STRING, PRODUCT_PUR_PRODUCT_CODE STRING, PROD_AMT STRING, PUR_TRANS_DATE STRING, ACCTTYPE_ACCT_TYPE_CODE STRING, ACCTSTAT_ACCT_STATUS_CODE STRING, EMP_EMP_CODE STRING, PLAZA_PLAZA_ID STRING, PURSTAT_PUR_STATUS_CODE STRING) location '/sqoop-avro'
Created 03-27-2018 07:22 PM
You are creating the table wrong!
There are two types of files when we talk about avro.
When you do a sqoop import, you can see your avsc files on your machine somewhere, probably the outdir, if you mentioned one. If you are not able to spot the avsc files, follows the steps to extract the avsc files from avro data and then create table using those avsc files.
//Take a few lines from your avro file hdfs dfs -cat <your avro file name>| head --bytes 10K> SAMPLE_FILE //Extract the avro schema from your avro data file java -jar $AVRO_TOOLS_PATH/avro-tools-1.7.7.jar getschema SAMPLE_FILE > AVRO_SCHEMA_FILE //Upload the schema to hdfs hdfs dfs -put AVRO_SCHEMA_FILE $AVRO_SCHEMA_DIR //Create the hive table using avro schema CREATE EXTERNAL TABLE sampe_table STORED AS AVRO LOCATION 'hdfs:///user/hive/' TBLPROPERTIES ('avro.schema.url'='<your avro schema path here>');
Refer this AvroSerDe documentation for more details.
PS - If you already have the avro schema files, you can skip all the schema creation and steps and simply use the last step to create your table.
Created 03-27-2018 07:01 PM
if I add "stored as AVRO" then I get all nulls
ULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL Time taken: 0.303 seconds, Fetched: 29999 row(s)
Created 03-27-2018 07:22 PM
You are creating the table wrong!
There are two types of files when we talk about avro.
When you do a sqoop import, you can see your avsc files on your machine somewhere, probably the outdir, if you mentioned one. If you are not able to spot the avsc files, follows the steps to extract the avsc files from avro data and then create table using those avsc files.
//Take a few lines from your avro file hdfs dfs -cat <your avro file name>| head --bytes 10K> SAMPLE_FILE //Extract the avro schema from your avro data file java -jar $AVRO_TOOLS_PATH/avro-tools-1.7.7.jar getschema SAMPLE_FILE > AVRO_SCHEMA_FILE //Upload the schema to hdfs hdfs dfs -put AVRO_SCHEMA_FILE $AVRO_SCHEMA_DIR //Create the hive table using avro schema CREATE EXTERNAL TABLE sampe_table STORED AS AVRO LOCATION 'hdfs:///user/hive/' TBLPROPERTIES ('avro.schema.url'='<your avro schema path here>');
Refer this AvroSerDe documentation for more details.
PS - If you already have the avro schema files, you can skip all the schema creation and steps and simply use the last step to create your table.
Created 03-27-2018 07:48 PM
I used the sqoop command with the outdir qualifier resulting in the avsc files and the data files , I am not clear on what to do next to create the hive external table . in the outdir there also a java file produced .
[hdfs@hadoop1 ~]$ hdfs dfs -ls /sqoop-avro Found 4 items -rw-r--r-- 3 hdfs hdfs 28808 2018-03-27 15:40 /sqoop-avro/part-m-00000.avro -rw-r--r-- 3 hdfs hdfs 3127 2018-03-27 15:42 /sqoop-avro/part-m-00001.avro -rw-r--r-- 3 hdfs hdfs 3474 2018-03-27 15:42 /sqoop-avro/part-m-00002.avro -rw-r--r-- 3 hdfs hdfs 682403 2018-03-27 15:43 /sqoop-avro/part-m-00003.avro [hdfs@hadoop1 ~]$ [hdfs@hadoop1 ~]$ ls -ltr /tmp/sqoop total 40 -rw-r--r-- 1 hdfs hadoop 34553 Mar 27 15:39 PATRON_TAB4.java -rw-r--r-- 1 hdfs hadoop 1838 Mar 27 15:39 PATRON_TAB4.avsc [hdfs@hadoop1 ~]$ [hdfs@hadoop1 ~]$ [hdfs@hadoop1 ~]$ [hdfs@hadoop1 ~]$ [hdfs@hadoop1 ~]$ sqoop import -Dmapreduce.job.user.classpath.first=true --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron)))" --username PATRON --password xxxx --as-avrodatafile --incremental append --check-column PUR_TRANS_DATE --table PATRON.TAB4 --split-by TAB4.PUR_DET_ID --compression-codec snappy --target-dir /sqoop-avro --outdir /tmp/sqoop
Created 03-27-2018 07:54 PM
I tried this but failed
hive> > > CREATE EXTERNAL TABLE sqoop-avro > STORED AS AVRO > LOCATION 'hdfs:///sqoop-avro' > TBLPROPERTIES ('avro.schema.url'='/tmp/sqoop/PATRON_TAB4.avsc'); NoViableAltException(306@[202:1: tableName : (db= identifier DOT tab= identifier -> ^( TOK_TABNAME $db $tab) |tab= identifier -> ^( TOK_TABNAME $tab) );]) at org.antlr.runtime.DFA.noViableAlt(DFA.java:158) at org.antlr.runtime.DFA.predict(DFA.java:116) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.tableName(HiveParser_FromClauseParser.java:4942) at org.apache.hadoop.hive.ql.parse.HiveParser.tableName(HiveParser.java:49834) at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:6431) at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:4012) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1786) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1152) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:211) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:438) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1265) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1151) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:217) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:740) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) FAILED: ParseException line 1:27 cannot recognize input near 'sqoop' '-' 'avro' in table name hive> quit > ; [hdfs@hadoop1 ~]$ hdfs dfs -ls 'hdfs:///sqoop-avro' Found 4 items -rw-r--r-- 3 hdfs hdfs 28808 2018-03-27 15:40 hdfs:///sqoop-avro/part-m-00000.avro -rw-r--r-- 3 hdfs hdfs 3127 2018-03-27 15:42 hdfs:///sqoop-avro/part-m-00001.avro -rw-r--r-- 3 hdfs hdfs 3474 2018-03-27 15:42 hdfs:///sqoop-avro/part-m-00002.avro -rw-r--r-- 3 hdfs hdfs 682403 2018-03-27 15:43 hdfs:///sqoop-avro/part-m-00003.avro [hdfs@hadoop1 ~]$
Created 03-27-2018 08:15 PM
ok I found the issue with my statement , I was using "-" in table name which it didn't like .
thanks for your help