Support Questions

Find answers, ask questions, and share your expertise

No data shown in HBase after importtsv

avatar
Expert Contributor

Hi,

Attaching a sample CSV which I am trying to load in HBase using importtsv. Command gets successfully executed however I can't see records in the table.

1) CSV file name warehouse.dat (comma separated). 1st column in the CSV is the unique key

2) HBase table : create 'warehouse','mycf'

3) importtsv command : hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=, -Dimporttsv.columns="HBASE_ROW_KEY,mycf:w_warehouse_id,mycf:w_warehouse_name,mycf:w_warehouse_sq_ft,mycf:w_street_number,mycf:w_street_name,mycf:w_street_type,mycf:w_suite_number,mycf:w_city,mycf:w_county,mycf:w_state,mycf:w_zip,mycf:w_country,mycf:w_gmt_offset" warehouse /user/tcs_ge_user/warehouse/warehouse.csv

4) Scan table shows below output.

hbase(main):027:0> scan 'warehouse' ROW COLUMN+CELL 0 row(s) in 0.0210 seconds

Need your URGENT help on this.

1 ACCEPTED SOLUTION

avatar
Super Guru

@rajdip chaudhuri

I used your warehouse.csv and your load command. While it does appear to finish successfully, this is what I see at the end:

    ImportTsv
        Bad Lines=5
    File Input Format Counters
        Bytes Read=590
    File Output Format Counters
        Bytes Written=0

As you can see there were 5 bad lines, which is the total line count in your file. That means the command ran, but there was a problem with the data.

It took a little bit of effort to find the issue, but the problem was that your csv file has an extra , at the end of the lines.

Here is an example from your file:

1,AAAAAAAABAAAAAAA,Conventional childr,977787,651,6th ,Parkway,Suite 470,Fairview,Williamson County,TN,35709,United States,-5,

It should look like this:

1,AAAAAAAABAAAAAAA,Conventional childr,977787,651,6th ,Parkway,Suite 470,Fairview,Williamson County,TN,35709,United States,-5

Notice that I removed the trailing comma. Now the data was loaded:

    ImportTsv
        Bad Lines=0
    File Input Format Counters
        Bytes Read=585
    File Output Format Counters
        Bytes Written=0

And here is the scan:

hbase(main):011:0> scan 'warehouse'
ROW                              COLUMN+CELL
 1                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 1                               column=mycf:w_country, timestamp=1482521720833, value=United States
 1                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 1                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5
 1                               column=mycf:w_state, timestamp=1482521720833, value=TN
 1                               column=mycf:w_street_name, timestamp=1482521720833, value=6th
 1                               column=mycf:w_street_number, timestamp=1482521720833, value=651
 1                               column=mycf:w_street_type, timestamp=1482521720833, value=Parkway
 1                               column=mycf:w_suite_number, timestamp=1482521720833, value=Suite 470
 1                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAABAAAAAAA
 1                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=Conventional childr
 1                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=977787
 1                               column=mycf:w_zip, timestamp=1482521720833, value=35709
 2                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 2                               column=mycf:w_country, timestamp=1482521720833, value=United States
 2                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 2                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5
 2                               column=mycf:w_state, timestamp=1482521720833, value=TN
 2                               column=mycf:w_street_name, timestamp=1482521720833, value=View First
 2                               column=mycf:w_street_number, timestamp=1482521720833, value=600
 2                               column=mycf:w_street_type, timestamp=1482521720833, value=Avenue
 2                               column=mycf:w_suite_number, timestamp=1482521720833, value=Suite P
 2                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAACAAAAAAA
 2                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=Important issues liv
 2                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=138504
 2                               column=mycf:w_zip, timestamp=1482521720833, value=35709
 3                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 3                               column=mycf:w_country, timestamp=1482521720833, value=United States
 3                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 3                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5
 3                               column=mycf:w_state, timestamp=1482521720833, value=TN
 3                               column=mycf:w_street_name, timestamp=1482521720833, value=Ash Laurel
 3                               column=mycf:w_street_number, timestamp=1482521720833, value=534
 3                               column=mycf:w_street_type, timestamp=1482521720833, value=Dr.
 3                               column=mycf:w_suite_number, timestamp=1482521720833, value=Suite 0
 3                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAADAAAAAAA
 3                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=Doors canno
 3                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=294242
 3                               column=mycf:w_zip, timestamp=1482521720833, value=35709
 4                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 4                               column=mycf:w_country, timestamp=1482521720833, value=United States
 4                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 4                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5
 4                               column=mycf:w_state, timestamp=1482521720833, value=TN
 4                               column=mycf:w_street_name, timestamp=1482521720833, value=Wilson Elm
 4                               column=mycf:w_street_number, timestamp=1482521720833, value=368
 4                               column=mycf:w_street_type, timestamp=1482521720833, value=Drive
 4                               column=mycf:w_suite_number, timestamp=1482521720833, value=Suite 80
 4                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAAEAAAAAAA
 4                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=Bad cards must make.
 4                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=621234
 4                               column=mycf:w_zip, timestamp=1482521720833, value=35709
 5                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 5                               column=mycf:w_country, timestamp=1482521720833, value=United States
 5                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 5                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=
 5                               column=mycf:w_state, timestamp=1482521720833, value=TN
 5                               column=mycf:w_street_name, timestamp=1482521720833, value=
 5                               column=mycf:w_street_number, timestamp=1482521720833, value=
 5                               column=mycf:w_street_type, timestamp=1482521720833, value=
 5                               column=mycf:w_suite_number, timestamp=1482521720833, value=
 5                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAAFAAAAAAA
 5                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=
 5                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=
 5                               column=mycf:w_zip, timestamp=1482521720833, value=35709
5 row(s) in 0.3110 seconds

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

sorry, CSV file name is warehouse.csv

warehouse.zip

avatar
Super Guru

@rajdip chaudhuri

I used your warehouse.csv and your load command. While it does appear to finish successfully, this is what I see at the end:

    ImportTsv
        Bad Lines=5
    File Input Format Counters
        Bytes Read=590
    File Output Format Counters
        Bytes Written=0

As you can see there were 5 bad lines, which is the total line count in your file. That means the command ran, but there was a problem with the data.

It took a little bit of effort to find the issue, but the problem was that your csv file has an extra , at the end of the lines.

Here is an example from your file:

1,AAAAAAAABAAAAAAA,Conventional childr,977787,651,6th ,Parkway,Suite 470,Fairview,Williamson County,TN,35709,United States,-5,

It should look like this:

1,AAAAAAAABAAAAAAA,Conventional childr,977787,651,6th ,Parkway,Suite 470,Fairview,Williamson County,TN,35709,United States,-5

Notice that I removed the trailing comma. Now the data was loaded:

    ImportTsv
        Bad Lines=0
    File Input Format Counters
        Bytes Read=585
    File Output Format Counters
        Bytes Written=0

And here is the scan:

hbase(main):011:0> scan 'warehouse'
ROW                              COLUMN+CELL
 1                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 1                               column=mycf:w_country, timestamp=1482521720833, value=United States
 1                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 1                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5
 1                               column=mycf:w_state, timestamp=1482521720833, value=TN
 1                               column=mycf:w_street_name, timestamp=1482521720833, value=6th
 1                               column=mycf:w_street_number, timestamp=1482521720833, value=651
 1                               column=mycf:w_street_type, timestamp=1482521720833, value=Parkway
 1                               column=mycf:w_suite_number, timestamp=1482521720833, value=Suite 470
 1                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAABAAAAAAA
 1                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=Conventional childr
 1                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=977787
 1                               column=mycf:w_zip, timestamp=1482521720833, value=35709
 2                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 2                               column=mycf:w_country, timestamp=1482521720833, value=United States
 2                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 2                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5
 2                               column=mycf:w_state, timestamp=1482521720833, value=TN
 2                               column=mycf:w_street_name, timestamp=1482521720833, value=View First
 2                               column=mycf:w_street_number, timestamp=1482521720833, value=600
 2                               column=mycf:w_street_type, timestamp=1482521720833, value=Avenue
 2                               column=mycf:w_suite_number, timestamp=1482521720833, value=Suite P
 2                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAACAAAAAAA
 2                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=Important issues liv
 2                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=138504
 2                               column=mycf:w_zip, timestamp=1482521720833, value=35709
 3                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 3                               column=mycf:w_country, timestamp=1482521720833, value=United States
 3                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 3                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5
 3                               column=mycf:w_state, timestamp=1482521720833, value=TN
 3                               column=mycf:w_street_name, timestamp=1482521720833, value=Ash Laurel
 3                               column=mycf:w_street_number, timestamp=1482521720833, value=534
 3                               column=mycf:w_street_type, timestamp=1482521720833, value=Dr.
 3                               column=mycf:w_suite_number, timestamp=1482521720833, value=Suite 0
 3                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAADAAAAAAA
 3                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=Doors canno
 3                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=294242
 3                               column=mycf:w_zip, timestamp=1482521720833, value=35709
 4                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 4                               column=mycf:w_country, timestamp=1482521720833, value=United States
 4                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 4                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5
 4                               column=mycf:w_state, timestamp=1482521720833, value=TN
 4                               column=mycf:w_street_name, timestamp=1482521720833, value=Wilson Elm
 4                               column=mycf:w_street_number, timestamp=1482521720833, value=368
 4                               column=mycf:w_street_type, timestamp=1482521720833, value=Drive
 4                               column=mycf:w_suite_number, timestamp=1482521720833, value=Suite 80
 4                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAAEAAAAAAA
 4                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=Bad cards must make.
 4                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=621234
 4                               column=mycf:w_zip, timestamp=1482521720833, value=35709
 5                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 5                               column=mycf:w_country, timestamp=1482521720833, value=United States
 5                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 5                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=
 5                               column=mycf:w_state, timestamp=1482521720833, value=TN
 5                               column=mycf:w_street_name, timestamp=1482521720833, value=
 5                               column=mycf:w_street_number, timestamp=1482521720833, value=
 5                               column=mycf:w_street_type, timestamp=1482521720833, value=
 5                               column=mycf:w_suite_number, timestamp=1482521720833, value=
 5                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAAFAAAAAAA
 5                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=
 5                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=
 5                               column=mycf:w_zip, timestamp=1482521720833, value=35709
5 row(s) in 0.3110 seconds