Created 12-23-2016 10:23 AM
Hi,
Attaching a sample CSV which I am trying to load in HBase using importtsv. Command gets successfully executed however I can't see records in the table.
1) CSV file name warehouse.dat (comma separated). 1st column in the CSV is the unique key
2) HBase table : create 'warehouse','mycf'
3) importtsv command : hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=, -Dimporttsv.columns="HBASE_ROW_KEY,mycf:w_warehouse_id,mycf:w_warehouse_name,mycf:w_warehouse_sq_ft,mycf:w_street_number,mycf:w_street_name,mycf:w_street_type,mycf:w_suite_number,mycf:w_city,mycf:w_county,mycf:w_state,mycf:w_zip,mycf:w_country,mycf:w_gmt_offset" warehouse /user/tcs_ge_user/warehouse/warehouse.csv
4) Scan table shows below output.
hbase(main):027:0> scan 'warehouse' ROW COLUMN+CELL 0 row(s) in 0.0210 seconds
Need your URGENT help on this.
Created 12-23-2016 07:08 PM
I used your warehouse.csv and your load command. While it does appear to finish successfully, this is what I see at the end:
ImportTsv Bad Lines=5 File Input Format Counters Bytes Read=590 File Output Format Counters Bytes Written=0
As you can see there were 5 bad lines, which is the total line count in your file. That means the command ran, but there was a problem with the data.
It took a little bit of effort to find the issue, but the problem was that your csv file has an extra , at the end of the lines.
Here is an example from your file:
1,AAAAAAAABAAAAAAA,Conventional childr,977787,651,6th ,Parkway,Suite 470,Fairview,Williamson County,TN,35709,United States,-5,
It should look like this:
1,AAAAAAAABAAAAAAA,Conventional childr,977787,651,6th ,Parkway,Suite 470,Fairview,Williamson County,TN,35709,United States,-5
Notice that I removed the trailing comma. Now the data was loaded:
ImportTsv Bad Lines=0 File Input Format Counters Bytes Read=585 File Output Format Counters Bytes Written=0
And here is the scan:
hbase(main):011:0> scan 'warehouse' ROW COLUMN+CELL 1 column=mycf:w_city, timestamp=1482521720833, value=Fairview 1 column=mycf:w_country, timestamp=1482521720833, value=United States 1 column=mycf:w_county, timestamp=1482521720833, value=Williamson County 1 column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5 1 column=mycf:w_state, timestamp=1482521720833, value=TN 1 column=mycf:w_street_name, timestamp=1482521720833, value=6th 1 column=mycf:w_street_number, timestamp=1482521720833, value=651 1 column=mycf:w_street_type, timestamp=1482521720833, value=Parkway 1 column=mycf:w_suite_number, timestamp=1482521720833, value=Suite 470 1 column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAABAAAAAAA 1 column=mycf:w_warehouse_name, timestamp=1482521720833, value=Conventional childr 1 column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=977787 1 column=mycf:w_zip, timestamp=1482521720833, value=35709 2 column=mycf:w_city, timestamp=1482521720833, value=Fairview 2 column=mycf:w_country, timestamp=1482521720833, value=United States 2 column=mycf:w_county, timestamp=1482521720833, value=Williamson County 2 column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5 2 column=mycf:w_state, timestamp=1482521720833, value=TN 2 column=mycf:w_street_name, timestamp=1482521720833, value=View First 2 column=mycf:w_street_number, timestamp=1482521720833, value=600 2 column=mycf:w_street_type, timestamp=1482521720833, value=Avenue 2 column=mycf:w_suite_number, timestamp=1482521720833, value=Suite P 2 column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAACAAAAAAA 2 column=mycf:w_warehouse_name, timestamp=1482521720833, value=Important issues liv 2 column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=138504 2 column=mycf:w_zip, timestamp=1482521720833, value=35709 3 column=mycf:w_city, timestamp=1482521720833, value=Fairview 3 column=mycf:w_country, timestamp=1482521720833, value=United States 3 column=mycf:w_county, timestamp=1482521720833, value=Williamson County 3 column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5 3 column=mycf:w_state, timestamp=1482521720833, value=TN 3 column=mycf:w_street_name, timestamp=1482521720833, value=Ash Laurel 3 column=mycf:w_street_number, timestamp=1482521720833, value=534 3 column=mycf:w_street_type, timestamp=1482521720833, value=Dr. 3 column=mycf:w_suite_number, timestamp=1482521720833, value=Suite 0 3 column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAADAAAAAAA 3 column=mycf:w_warehouse_name, timestamp=1482521720833, value=Doors canno 3 column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=294242 3 column=mycf:w_zip, timestamp=1482521720833, value=35709 4 column=mycf:w_city, timestamp=1482521720833, value=Fairview 4 column=mycf:w_country, timestamp=1482521720833, value=United States 4 column=mycf:w_county, timestamp=1482521720833, value=Williamson County 4 column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5 4 column=mycf:w_state, timestamp=1482521720833, value=TN 4 column=mycf:w_street_name, timestamp=1482521720833, value=Wilson Elm 4 column=mycf:w_street_number, timestamp=1482521720833, value=368 4 column=mycf:w_street_type, timestamp=1482521720833, value=Drive 4 column=mycf:w_suite_number, timestamp=1482521720833, value=Suite 80 4 column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAAEAAAAAAA 4 column=mycf:w_warehouse_name, timestamp=1482521720833, value=Bad cards must make. 4 column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=621234 4 column=mycf:w_zip, timestamp=1482521720833, value=35709 5 column=mycf:w_city, timestamp=1482521720833, value=Fairview 5 column=mycf:w_country, timestamp=1482521720833, value=United States 5 column=mycf:w_county, timestamp=1482521720833, value=Williamson County 5 column=mycf:w_gmt_offset, timestamp=1482521720833, value= 5 column=mycf:w_state, timestamp=1482521720833, value=TN 5 column=mycf:w_street_name, timestamp=1482521720833, value= 5 column=mycf:w_street_number, timestamp=1482521720833, value= 5 column=mycf:w_street_type, timestamp=1482521720833, value= 5 column=mycf:w_suite_number, timestamp=1482521720833, value= 5 column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAAFAAAAAAA 5 column=mycf:w_warehouse_name, timestamp=1482521720833, value= 5 column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value= 5 column=mycf:w_zip, timestamp=1482521720833, value=35709 5 row(s) in 0.3110 seconds
Created 12-23-2016 10:26 AM
sorry, CSV file name is warehouse.csv
Created 12-23-2016 07:08 PM
I used your warehouse.csv and your load command. While it does appear to finish successfully, this is what I see at the end:
ImportTsv Bad Lines=5 File Input Format Counters Bytes Read=590 File Output Format Counters Bytes Written=0
As you can see there were 5 bad lines, which is the total line count in your file. That means the command ran, but there was a problem with the data.
It took a little bit of effort to find the issue, but the problem was that your csv file has an extra , at the end of the lines.
Here is an example from your file:
1,AAAAAAAABAAAAAAA,Conventional childr,977787,651,6th ,Parkway,Suite 470,Fairview,Williamson County,TN,35709,United States,-5,
It should look like this:
1,AAAAAAAABAAAAAAA,Conventional childr,977787,651,6th ,Parkway,Suite 470,Fairview,Williamson County,TN,35709,United States,-5
Notice that I removed the trailing comma. Now the data was loaded:
ImportTsv Bad Lines=0 File Input Format Counters Bytes Read=585 File Output Format Counters Bytes Written=0
And here is the scan:
hbase(main):011:0> scan 'warehouse' ROW COLUMN+CELL 1 column=mycf:w_city, timestamp=1482521720833, value=Fairview 1 column=mycf:w_country, timestamp=1482521720833, value=United States 1 column=mycf:w_county, timestamp=1482521720833, value=Williamson County 1 column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5 1 column=mycf:w_state, timestamp=1482521720833, value=TN 1 column=mycf:w_street_name, timestamp=1482521720833, value=6th 1 column=mycf:w_street_number, timestamp=1482521720833, value=651 1 column=mycf:w_street_type, timestamp=1482521720833, value=Parkway 1 column=mycf:w_suite_number, timestamp=1482521720833, value=Suite 470 1 column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAABAAAAAAA 1 column=mycf:w_warehouse_name, timestamp=1482521720833, value=Conventional childr 1 column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=977787 1 column=mycf:w_zip, timestamp=1482521720833, value=35709 2 column=mycf:w_city, timestamp=1482521720833, value=Fairview 2 column=mycf:w_country, timestamp=1482521720833, value=United States 2 column=mycf:w_county, timestamp=1482521720833, value=Williamson County 2 column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5 2 column=mycf:w_state, timestamp=1482521720833, value=TN 2 column=mycf:w_street_name, timestamp=1482521720833, value=View First 2 column=mycf:w_street_number, timestamp=1482521720833, value=600 2 column=mycf:w_street_type, timestamp=1482521720833, value=Avenue 2 column=mycf:w_suite_number, timestamp=1482521720833, value=Suite P 2 column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAACAAAAAAA 2 column=mycf:w_warehouse_name, timestamp=1482521720833, value=Important issues liv 2 column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=138504 2 column=mycf:w_zip, timestamp=1482521720833, value=35709 3 column=mycf:w_city, timestamp=1482521720833, value=Fairview 3 column=mycf:w_country, timestamp=1482521720833, value=United States 3 column=mycf:w_county, timestamp=1482521720833, value=Williamson County 3 column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5 3 column=mycf:w_state, timestamp=1482521720833, value=TN 3 column=mycf:w_street_name, timestamp=1482521720833, value=Ash Laurel 3 column=mycf:w_street_number, timestamp=1482521720833, value=534 3 column=mycf:w_street_type, timestamp=1482521720833, value=Dr. 3 column=mycf:w_suite_number, timestamp=1482521720833, value=Suite 0 3 column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAADAAAAAAA 3 column=mycf:w_warehouse_name, timestamp=1482521720833, value=Doors canno 3 column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=294242 3 column=mycf:w_zip, timestamp=1482521720833, value=35709 4 column=mycf:w_city, timestamp=1482521720833, value=Fairview 4 column=mycf:w_country, timestamp=1482521720833, value=United States 4 column=mycf:w_county, timestamp=1482521720833, value=Williamson County 4 column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5 4 column=mycf:w_state, timestamp=1482521720833, value=TN 4 column=mycf:w_street_name, timestamp=1482521720833, value=Wilson Elm 4 column=mycf:w_street_number, timestamp=1482521720833, value=368 4 column=mycf:w_street_type, timestamp=1482521720833, value=Drive 4 column=mycf:w_suite_number, timestamp=1482521720833, value=Suite 80 4 column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAAEAAAAAAA 4 column=mycf:w_warehouse_name, timestamp=1482521720833, value=Bad cards must make. 4 column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=621234 4 column=mycf:w_zip, timestamp=1482521720833, value=35709 5 column=mycf:w_city, timestamp=1482521720833, value=Fairview 5 column=mycf:w_country, timestamp=1482521720833, value=United States 5 column=mycf:w_county, timestamp=1482521720833, value=Williamson County 5 column=mycf:w_gmt_offset, timestamp=1482521720833, value= 5 column=mycf:w_state, timestamp=1482521720833, value=TN 5 column=mycf:w_street_name, timestamp=1482521720833, value= 5 column=mycf:w_street_number, timestamp=1482521720833, value= 5 column=mycf:w_street_type, timestamp=1482521720833, value= 5 column=mycf:w_suite_number, timestamp=1482521720833, value= 5 column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAAFAAAAAAA 5 column=mycf:w_warehouse_name, timestamp=1482521720833, value= 5 column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value= 5 column=mycf:w_zip, timestamp=1482521720833, value=35709 5 row(s) in 0.3110 seconds