Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

No data shown in HBase after importtsv

avatar
Expert Contributor

Hi,

Attaching a sample CSV which I am trying to load in HBase using importtsv. Command gets successfully executed however I can't see records in the table.

1) CSV file name warehouse.dat (comma separated). 1st column in the CSV is the unique key

2) HBase table : create 'warehouse','mycf'

3) importtsv command : hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=, -Dimporttsv.columns="HBASE_ROW_KEY,mycf:w_warehouse_id,mycf:w_warehouse_name,mycf:w_warehouse_sq_ft,mycf:w_street_number,mycf:w_street_name,mycf:w_street_type,mycf:w_suite_number,mycf:w_city,mycf:w_county,mycf:w_state,mycf:w_zip,mycf:w_country,mycf:w_gmt_offset" warehouse /user/tcs_ge_user/warehouse/warehouse.csv

4) Scan table shows below output.

hbase(main):027:0> scan 'warehouse' ROW COLUMN+CELL 0 row(s) in 0.0210 seconds

Need your URGENT help on this.

1 ACCEPTED SOLUTION

avatar
Super Guru

@rajdip chaudhuri

I used your warehouse.csv and your load command. While it does appear to finish successfully, this is what I see at the end:

    ImportTsv
        Bad Lines=5
    File Input Format Counters
        Bytes Read=590
    File Output Format Counters
        Bytes Written=0

As you can see there were 5 bad lines, which is the total line count in your file. That means the command ran, but there was a problem with the data.

It took a little bit of effort to find the issue, but the problem was that your csv file has an extra , at the end of the lines.

Here is an example from your file:

1,AAAAAAAABAAAAAAA,Conventional childr,977787,651,6th ,Parkway,Suite 470,Fairview,Williamson County,TN,35709,United States,-5,

It should look like this:

1,AAAAAAAABAAAAAAA,Conventional childr,977787,651,6th ,Parkway,Suite 470,Fairview,Williamson County,TN,35709,United States,-5

Notice that I removed the trailing comma. Now the data was loaded:

    ImportTsv
        Bad Lines=0
    File Input Format Counters
        Bytes Read=585
    File Output Format Counters
        Bytes Written=0

And here is the scan:

hbase(main):011:0> scan 'warehouse'
ROW                              COLUMN+CELL
 1                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 1                               column=mycf:w_country, timestamp=1482521720833, value=United States
 1                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 1                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5
 1                               column=mycf:w_state, timestamp=1482521720833, value=TN
 1                               column=mycf:w_street_name, timestamp=1482521720833, value=6th
 1                               column=mycf:w_street_number, timestamp=1482521720833, value=651
 1                               column=mycf:w_street_type, timestamp=1482521720833, value=Parkway
 1                               column=mycf:w_suite_number, timestamp=1482521720833, value=Suite 470
 1                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAABAAAAAAA
 1                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=Conventional childr
 1                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=977787
 1                               column=mycf:w_zip, timestamp=1482521720833, value=35709
 2                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 2                               column=mycf:w_country, timestamp=1482521720833, value=United States
 2                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 2                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5
 2                               column=mycf:w_state, timestamp=1482521720833, value=TN
 2                               column=mycf:w_street_name, timestamp=1482521720833, value=View First
 2                               column=mycf:w_street_number, timestamp=1482521720833, value=600
 2                               column=mycf:w_street_type, timestamp=1482521720833, value=Avenue
 2                               column=mycf:w_suite_number, timestamp=1482521720833, value=Suite P
 2                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAACAAAAAAA
 2                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=Important issues liv
 2                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=138504
 2                               column=mycf:w_zip, timestamp=1482521720833, value=35709
 3                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 3                               column=mycf:w_country, timestamp=1482521720833, value=United States
 3                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 3                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5
 3                               column=mycf:w_state, timestamp=1482521720833, value=TN
 3                               column=mycf:w_street_name, timestamp=1482521720833, value=Ash Laurel
 3                               column=mycf:w_street_number, timestamp=1482521720833, value=534
 3                               column=mycf:w_street_type, timestamp=1482521720833, value=Dr.
 3                               column=mycf:w_suite_number, timestamp=1482521720833, value=Suite 0
 3                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAADAAAAAAA
 3                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=Doors canno
 3                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=294242
 3                               column=mycf:w_zip, timestamp=1482521720833, value=35709
 4                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 4                               column=mycf:w_country, timestamp=1482521720833, value=United States
 4                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 4                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5
 4                               column=mycf:w_state, timestamp=1482521720833, value=TN
 4                               column=mycf:w_street_name, timestamp=1482521720833, value=Wilson Elm
 4                               column=mycf:w_street_number, timestamp=1482521720833, value=368
 4                               column=mycf:w_street_type, timestamp=1482521720833, value=Drive
 4                               column=mycf:w_suite_number, timestamp=1482521720833, value=Suite 80
 4                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAAEAAAAAAA
 4                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=Bad cards must make.
 4                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=621234
 4                               column=mycf:w_zip, timestamp=1482521720833, value=35709
 5                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 5                               column=mycf:w_country, timestamp=1482521720833, value=United States
 5                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 5                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=
 5                               column=mycf:w_state, timestamp=1482521720833, value=TN
 5                               column=mycf:w_street_name, timestamp=1482521720833, value=
 5                               column=mycf:w_street_number, timestamp=1482521720833, value=
 5                               column=mycf:w_street_type, timestamp=1482521720833, value=
 5                               column=mycf:w_suite_number, timestamp=1482521720833, value=
 5                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAAFAAAAAAA
 5                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=
 5                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=
 5                               column=mycf:w_zip, timestamp=1482521720833, value=35709
5 row(s) in 0.3110 seconds

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

sorry, CSV file name is warehouse.csv

warehouse.zip

avatar
Super Guru

@rajdip chaudhuri

I used your warehouse.csv and your load command. While it does appear to finish successfully, this is what I see at the end:

    ImportTsv
        Bad Lines=5
    File Input Format Counters
        Bytes Read=590
    File Output Format Counters
        Bytes Written=0

As you can see there were 5 bad lines, which is the total line count in your file. That means the command ran, but there was a problem with the data.

It took a little bit of effort to find the issue, but the problem was that your csv file has an extra , at the end of the lines.

Here is an example from your file:

1,AAAAAAAABAAAAAAA,Conventional childr,977787,651,6th ,Parkway,Suite 470,Fairview,Williamson County,TN,35709,United States,-5,

It should look like this:

1,AAAAAAAABAAAAAAA,Conventional childr,977787,651,6th ,Parkway,Suite 470,Fairview,Williamson County,TN,35709,United States,-5

Notice that I removed the trailing comma. Now the data was loaded:

    ImportTsv
        Bad Lines=0
    File Input Format Counters
        Bytes Read=585
    File Output Format Counters
        Bytes Written=0

And here is the scan:

hbase(main):011:0> scan 'warehouse'
ROW                              COLUMN+CELL
 1                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 1                               column=mycf:w_country, timestamp=1482521720833, value=United States
 1                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 1                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5
 1                               column=mycf:w_state, timestamp=1482521720833, value=TN
 1                               column=mycf:w_street_name, timestamp=1482521720833, value=6th
 1                               column=mycf:w_street_number, timestamp=1482521720833, value=651
 1                               column=mycf:w_street_type, timestamp=1482521720833, value=Parkway
 1                               column=mycf:w_suite_number, timestamp=1482521720833, value=Suite 470
 1                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAABAAAAAAA
 1                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=Conventional childr
 1                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=977787
 1                               column=mycf:w_zip, timestamp=1482521720833, value=35709
 2                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 2                               column=mycf:w_country, timestamp=1482521720833, value=United States
 2                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 2                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5
 2                               column=mycf:w_state, timestamp=1482521720833, value=TN
 2                               column=mycf:w_street_name, timestamp=1482521720833, value=View First
 2                               column=mycf:w_street_number, timestamp=1482521720833, value=600
 2                               column=mycf:w_street_type, timestamp=1482521720833, value=Avenue
 2                               column=mycf:w_suite_number, timestamp=1482521720833, value=Suite P
 2                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAACAAAAAAA
 2                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=Important issues liv
 2                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=138504
 2                               column=mycf:w_zip, timestamp=1482521720833, value=35709
 3                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 3                               column=mycf:w_country, timestamp=1482521720833, value=United States
 3                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 3                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5
 3                               column=mycf:w_state, timestamp=1482521720833, value=TN
 3                               column=mycf:w_street_name, timestamp=1482521720833, value=Ash Laurel
 3                               column=mycf:w_street_number, timestamp=1482521720833, value=534
 3                               column=mycf:w_street_type, timestamp=1482521720833, value=Dr.
 3                               column=mycf:w_suite_number, timestamp=1482521720833, value=Suite 0
 3                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAADAAAAAAA
 3                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=Doors canno
 3                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=294242
 3                               column=mycf:w_zip, timestamp=1482521720833, value=35709
 4                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 4                               column=mycf:w_country, timestamp=1482521720833, value=United States
 4                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 4                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=-5
 4                               column=mycf:w_state, timestamp=1482521720833, value=TN
 4                               column=mycf:w_street_name, timestamp=1482521720833, value=Wilson Elm
 4                               column=mycf:w_street_number, timestamp=1482521720833, value=368
 4                               column=mycf:w_street_type, timestamp=1482521720833, value=Drive
 4                               column=mycf:w_suite_number, timestamp=1482521720833, value=Suite 80
 4                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAAEAAAAAAA
 4                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=Bad cards must make.
 4                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=621234
 4                               column=mycf:w_zip, timestamp=1482521720833, value=35709
 5                               column=mycf:w_city, timestamp=1482521720833, value=Fairview
 5                               column=mycf:w_country, timestamp=1482521720833, value=United States
 5                               column=mycf:w_county, timestamp=1482521720833, value=Williamson County
 5                               column=mycf:w_gmt_offset, timestamp=1482521720833, value=
 5                               column=mycf:w_state, timestamp=1482521720833, value=TN
 5                               column=mycf:w_street_name, timestamp=1482521720833, value=
 5                               column=mycf:w_street_number, timestamp=1482521720833, value=
 5                               column=mycf:w_street_type, timestamp=1482521720833, value=
 5                               column=mycf:w_suite_number, timestamp=1482521720833, value=
 5                               column=mycf:w_warehouse_id, timestamp=1482521720833, value=AAAAAAAAFAAAAAAA
 5                               column=mycf:w_warehouse_name, timestamp=1482521720833, value=
 5                               column=mycf:w_warehouse_sq_ft, timestamp=1482521720833, value=
 5                               column=mycf:w_zip, timestamp=1482521720833, value=35709
5 row(s) in 0.3110 seconds