Support Questions
Find answers, ask questions, and share your expertise

Issue moving sqoop data from HDFS to Phoenix

Contributor

@Sandeep Nemuri

I recently moved data into HDFS using this command:

sqoop import --connect "jdbc:sqlserver://my_server:1433;database=APS_AMI" --username boop -P --table "my_table" --target-dir "hdfs://my_server/test/300M.csv" --split-by "pk" --m 12;

Now I'm trying to move into Phoenix and getting this error:

Error: java.lang.RuntimeException: org.apache.phoenix.schema.IllegalDataException: java.sql.SQLException: ERROR 201 (22000): Illegal data.
...
Caused by: org.apache.phoenix.schema.IllegalDataException: java.sql.SQLException: ERROR 201 (22000): Illegal data.
...
Caused by: java.lang.NumberFormatException: For input string: "\N"

It looks like Phoenix is having some issues parsing the HDFS csv. Do you know how I can fix it?

FOR REFERENCE

The Phoenix schema I'm using is:

CREATE IMMUTABLE TABLE my_table (
pk VARCHAR(50), 
id CHAR(10),
height_value FLOAT, 
read_date INTEGER,
read_time SMALLINT, 
units CHAR(1), 
is_estimate CHAR(1), 
UTC_offset SMALLINT, 
Match_flag CHAR(1)
CONSTRAINT pk PRIMARY KEY (id))
IMMUTABLE_STORAGE_SCHEME = SINGLE_CELL_ARRAY_WITH_OFFSETS,
COLUMN_ENCODED_BYTES = 1;

The columns, already loaded in Hive are:

| table.pk  | table.id  | table.height_value  | table.read_date  | atable.read_time  | table.units  | table.is_estimate  | table.utc_offset  | table.match_flag |

This is the code I used to move the data into Phoenix

HADOOP_CLASSPATH=/usr/hdp/current/hbase-master/lib/hbase-protocol.jar:/usr/hdp/current/hbase-master/conf hadoop jar /usr/hdp/current/phoenix-client/phoenix-5.0.0.3.0.0.0-1634-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table my_table --input /test/300M.cs
4 REPLIES 4

@Daniel Zafar

How does your data look in the csv file? Would it be possible to paste couple of lines here?

Contributor
@Sandeep Nemuri

Thanks for your response, here is the head of one of the blocks on HDFS:

$ head C://Users/dzafar/Downloads/000000_0
75000001,2220901556,1.2,20180602,2100,n,A,-700,n
75000002,4631103346,2.13,20180602,2100,n,A,-700,n
75000003,810079025,8.26,20180602,2100,n,A,-700,n
75000004,4991209767,3.95,20180602,2100,n,A,-700,n
75000005,4161180101,0.48,20180602,2100,n,A,-700,n
75000006,8450216737,1.14,20180602,2100,n,A,-700,n
75000007,4170587823,1.86,20180602,2100,n,A,-700,n
75000008,4920666845,10.39,20180602,2100,n,A,-700,n
75000009,840684899,0.29,20180602,2100,n,A,-700,n
75000010,3180799190,2.66,20180602,2100,n,A,-700,n

Contributor
@Sandeep Nemuri

I think this may be because of the way sqoop writes null values from the RDBMS into hdfs. I was seeing some "null" strings in the FLOAT column. I set --null-non-string to "" in sqoop and am experimenting to see if Phoenix still gets held up. I feel the "/n" issue is probably related. Sqoop may be writing these values in instead of the float values that the parser is looking for.

But it's confusing because there is no reason why it would write "/n" in the float column.

If I use the -g,–ignore-errors option, what happens?

@Daniel Zafar

You are right, '\N' is because of the null values in the source data. We'd need to adjust the --null-non-string accordingly. Also i don't see -g option in sqoop import, by the way this won't be considered as error during import.