I recently moved data into HDFS using this command:
sqoop import --connect "jdbc:sqlserver://my_server:1433;database=APS_AMI" --username boop -P --table "my_table" --target-dir "hdfs://my_server/test/300M.csv" --split-by "pk" --m 12;
Now I'm trying to move into Phoenix and getting this error:
Error: java.lang.RuntimeException: org.apache.phoenix.schema.IllegalDataException: java.sql.SQLException: ERROR 201 (22000): Illegal data. ... Caused by: org.apache.phoenix.schema.IllegalDataException: java.sql.SQLException: ERROR 201 (22000): Illegal data. ... Caused by: java.lang.NumberFormatException: For input string: "\N"
It looks like Phoenix is having some issues parsing the HDFS csv. Do you know how I can fix it?
The Phoenix schema I'm using is:
CREATE IMMUTABLE TABLE my_table ( pk VARCHAR(50), id CHAR(10), height_value FLOAT, read_date INTEGER, read_time SMALLINT, units CHAR(1), is_estimate CHAR(1), UTC_offset SMALLINT, Match_flag CHAR(1) CONSTRAINT pk PRIMARY KEY (id)) IMMUTABLE_STORAGE_SCHEME = SINGLE_CELL_ARRAY_WITH_OFFSETS, COLUMN_ENCODED_BYTES = 1;
The columns, already loaded in Hive are:
| table.pk | table.id | table.height_value | table.read_date | atable.read_time | table.units | table.is_estimate | table.utc_offset | table.match_flag |
This is the code I used to move the data into Phoenix
HADOOP_CLASSPATH=/usr/hdp/current/hbase-master/lib/hbase-protocol.jar:/usr/hdp/current/hbase-master/conf hadoop jar /usr/hdp/current/phoenix-client/phoenix-184.108.40.206.0.0.0-1634-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table my_table --input /test/300M.cs
Thanks for your response, here is the head of one of the blocks on HDFS:
$ head C://Users/dzafar/Downloads/000000_0 75000001,2220901556,1.2,20180602,2100,n,A,-700,n 75000002,4631103346,2.13,20180602,2100,n,A,-700,n 75000003,810079025,8.26,20180602,2100,n,A,-700,n 75000004,4991209767,3.95,20180602,2100,n,A,-700,n 75000005,4161180101,0.48,20180602,2100,n,A,-700,n 75000006,8450216737,1.14,20180602,2100,n,A,-700,n 75000007,4170587823,1.86,20180602,2100,n,A,-700,n 75000008,4920666845,10.39,20180602,2100,n,A,-700,n 75000009,840684899,0.29,20180602,2100,n,A,-700,n 75000010,3180799190,2.66,20180602,2100,n,A,-700,n
I think this may be because of the way sqoop writes null values from the RDBMS into hdfs. I was seeing some "null" strings in the FLOAT column. I set --null-non-string to "" in sqoop and am experimenting to see if Phoenix still gets held up. I feel the "/n" issue is probably related. Sqoop may be writing these values in instead of the float values that the parser is looking for.
But it's confusing because there is no reason why it would write "/n" in the float column.
If I use the -g,–ignore-errors option, what happens?
You are right, '\N' is because of the null values in the source data. We'd need to adjust the --null-non-string accordingly. Also i don't see -g option in sqoop import, by the way this won't be considered as error during import.