Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

hive external table pointing to AVRO files

avatar
Super Collaborator

I have imported table data as AVRO files using sqoop . now I want to map an external table to it but its not working .

this command gives me binary output

CREATE EXTERNAL TABLE IF NOT EXISTS sqoop_text (ACCT_NUM STRING, PUR_ID STRING, PUR_DET_ID STRING, 
PRODUCT_PUR_PRODUCT_CODE STRING, PROD_AMT STRING, PUR_TRANS_DATE STRING, ACCTTYPE_ACCT_TYPE_CODE STRING, 
ACCTSTAT_ACCT_STATUS_CODE STRING, EMP_EMP_CODE STRING, PLAZA_PLAZA_ID STRING, PURSTAT_PUR_STATUS_CODE STRING) 
location '/sqoop-avro'
1 ACCEPTED SOLUTION

avatar
@Sami Ahmad

You are creating the table wrong!

There are two types of files when we talk about avro.

  • Avro files - which have the data
  • avsc files - avro schema files

When you do a sqoop import, you can see your avsc files on your machine somewhere, probably the outdir, if you mentioned one. If you are not able to spot the avsc files, follows the steps to extract the avsc files from avro data and then create table using those avsc files.

//Take a few lines from your avro file
hdfs dfs -cat <your avro file name>| head --bytes 10K> SAMPLE_FILE 

//Extract the avro schema from your avro data file
java -jar $AVRO_TOOLS_PATH/avro-tools-1.7.7.jar getschema SAMPLE_FILE > AVRO_SCHEMA_FILE 

//Upload the schema to hdfs
hdfs dfs -put AVRO_SCHEMA_FILE $AVRO_SCHEMA_DIR

//Create the hive table using avro schema
CREATE EXTERNAL TABLE sampe_table 
STORED AS AVRO 
LOCATION 'hdfs:///user/hive/'
TBLPROPERTIES ('avro.schema.url'='<your avro schema path here>');

Refer this AvroSerDe documentation for more details.

PS - If you already have the avro schema files, you can skip all the schema creation and steps and simply use the last step to create your table.

View solution in original post

5 REPLIES 5

avatar
Super Collaborator

if I add "stored as AVRO" then I get all nulls

ULL NULL NULL NULL  NULL NULL  NULL NULL  NULL NULL    NULL
NULL NULL NULL NULL  NULL NULL  NULL NULL  NULL NULL    NULL
Time taken: 0.303 seconds, Fetched: 29999 row(s)

avatar
@Sami Ahmad

You are creating the table wrong!

There are two types of files when we talk about avro.

  • Avro files - which have the data
  • avsc files - avro schema files

When you do a sqoop import, you can see your avsc files on your machine somewhere, probably the outdir, if you mentioned one. If you are not able to spot the avsc files, follows the steps to extract the avsc files from avro data and then create table using those avsc files.

//Take a few lines from your avro file
hdfs dfs -cat <your avro file name>| head --bytes 10K> SAMPLE_FILE 

//Extract the avro schema from your avro data file
java -jar $AVRO_TOOLS_PATH/avro-tools-1.7.7.jar getschema SAMPLE_FILE > AVRO_SCHEMA_FILE 

//Upload the schema to hdfs
hdfs dfs -put AVRO_SCHEMA_FILE $AVRO_SCHEMA_DIR

//Create the hive table using avro schema
CREATE EXTERNAL TABLE sampe_table 
STORED AS AVRO 
LOCATION 'hdfs:///user/hive/'
TBLPROPERTIES ('avro.schema.url'='<your avro schema path here>');

Refer this AvroSerDe documentation for more details.

PS - If you already have the avro schema files, you can skip all the schema creation and steps and simply use the last step to create your table.

avatar
Super Collaborator

I used the sqoop command with the outdir qualifier resulting in the avsc files and the data files , I am not clear on what to do next to create the hive external table . in the outdir there also a java file produced .

[hdfs@hadoop1 ~]$ hdfs dfs -ls /sqoop-avro
Found 4 items
-rw-r--r--   3 hdfs hdfs      28808 2018-03-27 15:40 /sqoop-avro/part-m-00000.avro
-rw-r--r--   3 hdfs hdfs       3127 2018-03-27 15:42 /sqoop-avro/part-m-00001.avro
-rw-r--r--   3 hdfs hdfs       3474 2018-03-27 15:42 /sqoop-avro/part-m-00002.avro
-rw-r--r--   3 hdfs hdfs     682403 2018-03-27 15:43 /sqoop-avro/part-m-00003.avro
[hdfs@hadoop1 ~]$
[hdfs@hadoop1 ~]$ ls -ltr /tmp/sqoop
total 40
-rw-r--r-- 1 hdfs hadoop 34553 Mar 27 15:39 PATRON_TAB4.java
-rw-r--r-- 1 hdfs hadoop  1838 Mar 27 15:39 PATRON_TAB4.avsc
[hdfs@hadoop1 ~]$
[hdfs@hadoop1 ~]$
[hdfs@hadoop1 ~]$
[hdfs@hadoop1 ~]$
[hdfs@hadoop1 ~]$ sqoop import -Dmapreduce.job.user.classpath.first=true  --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron)))" --username PATRON --password xxxx --as-avrodatafile --incremental append --check-column PUR_TRANS_DATE --table PATRON.TAB4 --split-by TAB4.PUR_DET_ID --compression-codec snappy --target-dir /sqoop-avro --outdir /tmp/sqoop

avatar
Super Collaborator

I tried this but failed

hive>
    >
    > CREATE EXTERNAL TABLE sqoop-avro
    > STORED AS AVRO
    > LOCATION 'hdfs:///sqoop-avro'
    > TBLPROPERTIES ('avro.schema.url'='/tmp/sqoop/PATRON_TAB4.avsc');
NoViableAltException(306@[202:1: tableName : (db= identifier DOT tab= identifier -> ^( TOK_TABNAME $db $tab) |tab= identifier -> ^( TOK_TABNAME $tab) );])
  at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
  at org.antlr.runtime.DFA.predict(DFA.java:116)
  at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.tableName(HiveParser_FromClauseParser.java:4942)
  at org.apache.hadoop.hive.ql.parse.HiveParser.tableName(HiveParser.java:49834)
  at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:6431)
  at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:4012)
  at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1786)
  at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1152)
  at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:211)
  at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:438)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321)
  at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224)
  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1265)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1151)
  at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:217)
  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169)
  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380)
  at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:740)
  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685)
  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
FAILED: ParseException line 1:27 cannot recognize input near 'sqoop' '-' 'avro' in table name
hive> quit
    > ;
[hdfs@hadoop1 ~]$ hdfs dfs -ls 'hdfs:///sqoop-avro'
Found 4 items
-rw-r--r--   3 hdfs hdfs      28808 2018-03-27 15:40 hdfs:///sqoop-avro/part-m-00000.avro
-rw-r--r--   3 hdfs hdfs       3127 2018-03-27 15:42 hdfs:///sqoop-avro/part-m-00001.avro
-rw-r--r--   3 hdfs hdfs       3474 2018-03-27 15:42 hdfs:///sqoop-avro/part-m-00002.avro
-rw-r--r--   3 hdfs hdfs     682403 2018-03-27 15:43 hdfs:///sqoop-avro/part-m-00003.avro
[hdfs@hadoop1 ~]$

avatar
Super Collaborator

ok I found the issue with my statement , I was using "-" in table name which it didn't like .

thanks for your help