Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Suggestions for Bulk Loading Large Files into HBase Tables

avatar
Explorer

Dear Cloudera Community,

We are currently conducting warehouse testing using Apache HBase and need to load large files into HBase tables. Could you kindly suggest any tools or specifically designed for bulk loading large datasets into HBase?

Thank You!

5 REPLIES 5

avatar
Expert Contributor

Hi @Amandi 

Check below blog to know how to use bulk loading in hbase with examples -

https://blog.cloudera.com/how-to-use-hbase-bulk-loading-and-why/

Was your question answered? Please take some time to click on "Accept as Solution" -- If you find a reply useful, say thanks by clicking on the thumbs up button below this post.

avatar
Explorer

hello shubham,

I am encountering errors while running the following command:

 "/home/super/hbase/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,Name,Age,Gender my_1 /hbase/test2.txt"

The error message is:

2024-09-11 15:53:44,808 INFO [main] impl.YarnClientImpl: Submitted application application_1725184331906_0018
2024-09-11 15:53:44,847 INFO [main] mapreduce.Job: The url to track the job: http://dc1-apache-hbase.mobitel.lk:8088/proxy/application_1725184331906_0018/
2024-09-11 15:53:44,848 INFO [main] mapreduce.Job: Running job: job_1725184331906_0018
2024-09-11 15:53:52,941 INFO [main] mapreduce.Job: Job job_1725184331906_0018 running in uber mode : false
2024-09-11 15:53:52,943 INFO [main] mapreduce.Job: map 0% reduce 0%
2024-09-11 15:53:52,952 INFO [main] mapreduce.Job: 2]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.


[2024-09-11 15:53:51.942]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.


For more detailed output, check the application tracking page: http://___________________________________________/cluster/app/application_1725184331906_0018 Then click on links to logs of each attempt.
. Failing the application.
2024-09-11 15:53:52,967 INFO [main] mapreduce.Job: Counters: 0
[super@dc1-apache-hbase mapreduce-job]$

What are the possible solutions, and how can I fix this?

avatar
Expert Contributor

Hi @Amandi 

The failure seems to be when launching containers in the pre-launch stage.

Since the ImportTsv job uses MR and yarn as underlying services for running jobs, could you please confirm that a simple MR pi job from a yarn gateway node is able to run without any issues:

# hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 100

 

avatar
Explorer

hi @shubham_sharma ,

I ran a simple job as you requested, and it seems to have run without any issues. For your reference, I am attaching the output as well.

Output:

[super@dc1-apache-hbase mapreduce]$ /home/super/hadoop/bin/hadoop jar /home/super/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0.jar pi 10 100
Number of Maps = 10
Samples per Map = 100
2024-09-12 08:25:59,664 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
2024-09-12 08:26:01,082 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
2024-09-12 08:26:01,580 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/super/.staging/job_1725184331906_0019
2024-09-12 08:26:01,730 INFO input.FileInputFormat: Total input files to process : 10
2024-09-12 08:26:01,775 INFO mapreduce.JobSubmitter: number of splits:10
2024-09-12 08:26:01,931 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1725184331906_0019
2024-09-12 08:26:01,931 INFO mapreduce.JobSubmitter: Executing with tokens: []
2024-09-12 08:26:02,186 INFO conf.Configuration: resource-types.xml not found
2024-09-12 08:26:02,187 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2024-09-12 08:26:02,284 INFO impl.YarnClientImpl: Submitted application application_1725184331906_0019
2024-09-12 08:26:02,317 INFO mapreduce.Job: The url to track the job: http://dc1-apache-hbase.mobitel.lk:8088/proxy/application_1725184331906_0019/
2024-09-12 08:26:02,318 INFO mapreduce.Job: Running job: job_1725184331906_0019
2024-09-12 08:26:10,443 INFO mapreduce.Job: Job job_1725184331906_0019 running in uber mode : false
2024-09-12 08:26:10,445 INFO mapreduce.Job: map 0% reduce 0%
2024-09-12 08:26:20,637 INFO mapreduce.Job: map 60% reduce 0%
2024-09-12 08:26:27,696 INFO mapreduce.Job: map 100% reduce 0%
2024-09-12 08:26:29,713 INFO mapreduce.Job: map 100% reduce 100%
2024-09-12 08:26:31,745 INFO mapreduce.Job: Job job_1725184331906_0019 completed successfully
2024-09-12 08:26:31,912 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=67
FILE: Number of bytes written=3407061
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2650
HDFS: Number of bytes written=215
HDFS: Number of read operations=45
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=68883
Total time spent by all reduces in occupied slots (ms)=6614
Total time spent by all map tasks (ms)=68883
Total time spent by all reduce tasks (ms)=6614
Total vcore-milliseconds taken by all map tasks=68883
Total vcore-milliseconds taken by all reduce tasks=6614
Total megabyte-milliseconds taken by all map tasks=70536192
Total megabyte-milliseconds taken by all reduce tasks=6772736
Map-Reduce Framework
Map input records=10
Map output records=20
Map output bytes=180
Map output materialized bytes=250
Input split bytes=1470
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=250
Reduce input records=20
Reduce output records=0
Spilled Records=40
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=1565
CPU time spent (ms)=6180
Physical memory (bytes) snapshot=3867213824
Virtual memory (bytes) snapshot=28280057856
Total committed heap usage (bytes)=3555196928
Peak Map Physical memory (bytes)=369606656
Peak Map Virtual memory (bytes)=2578915328
Peak Reduce Physical memory (bytes)=284368896
Peak Reduce Virtual memory (bytes)=2575523840
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1180
File Output Format Counters
Bytes Written=97
Job Finished in 31.068 seconds
Estimated value of Pi is 3.14800000000000000000

avatar
Expert Contributor

Hi @Amandi 

We need to check the Job logs in order to find out why it is failing -

http://dc1-apache-hbase.mobitel.lk:8088/proxy/application_1725184331906_0018/

Also, we can check Resource Manager logs to check if there is any issue with permissions or launching containers.