Member since
10-18-2017
52
Posts
2
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1202 | 01-27-2022 01:11 AM | |
8635 | 05-03-2021 08:03 AM | |
4778 | 02-06-2018 02:32 AM | |
6264 | 01-26-2018 07:36 AM | |
4092 | 01-25-2018 01:29 AM |
05-12-2021
03:08 AM
I am wondering what difference in IO can be expected for hbase with storage in the cloud VS storage on hdfs. I would expect that when data is retrieved from hdfs, it will be a lot faster than from the cloud (like adls-in my specific case adls gen2=abfs). Is there somewhere where I can test this? Or find a previous study for this? If this is the case, then one would expect that current hbase performance for reading data is a bit less than some years ago when everything was on premise and typically hdfs was used? Maybe I am missing something obvious here, so any insight is appreciated !
... View more
Labels:
- Labels:
-
Apache HBase
05-03-2021
08:03 AM
For future reference: I would like to add that the reason for the observed behaviour was an overcommit of the memory. While I am writing, the memory used of the box at some point comes so close to the maximum available on the regionservers, that the problems start. In my example at the start of writing I use about 24/31GB on the regionserver, and after a while this becomes > 30GB/31GB and eventually failures start. I had to take a way a bit of memory from both the offheap bucketcache and a bit of the regionserver's memory. Then the process starts with 17GB/31GB used, and after writing for an hour it maxes at about 27GB, but the failure was not observed anymore. The reason I was trying to use a much of the memory as possible is that when reading, I would like to have the best performance. Then making use of all resources, does not lead to errors. While writing however it does. Lesson learned: when going from a period that is write-intensive to a period that is read-intensive, it could be recommended to change the hbase config. Hope this can help others! PS: although the reply of @smdas was of very high quality and lead me to many new insights , I believe the explanation above in the current post should be marked as the solution. I sincerely want to thank you for your contribution, as your comments in combination with the current answer, will help others in the future.
... View more
03-19-2021
10:22 AM
Hello, When posting I had never hoped to get such a fast and remarkably clear and useful answer!Really helped for me to think more about the problem. Hereby some comments: SOLUTION 1 : Indeed, allowing some more failures might be a quick fix. Will try. But true fix lies in solving #2 below probably. SOLUTION 2: If I understand correctly, when JVM is full, GC takes place to clean up. And if this is really urgent, the actual JVM pauses. But if this happens longer than the zookeeper timeout (=60seconds), then the regionserver is believed to have died, and the master will copy all regions to other healthy regionservers. (I am not the expert on GC, but see that my regionserver starts with "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC" ) But I had expected to see this mentioned somewere clearly in the regionserver's logs or in cloudera manager and I fail to do so. When I see my spark job saying "regionserver worker-x not available" at that exact timestamp I see no ERROR in the worker-x regionserver log. Here some more info wrt your comments 1)regionserver out of memory, I assume in the /var/log/hbase/regionserver*out this should definitely show up as error/warning. This seems not the case. 2)I believe in case there would be a JVM pause, this would show up in the regionserver's logs "Detected pause in JVM or host machine (eg GC): pause of approximately 129793ms No GCs detected" https://community.cloudera.com/t5/Support-Questions/Hbase-region-server-went-down-because-of-a-JVM-pause-what/td-p/231140 -> I see no such message 4)note 32GB=total memory of the server; In fact I was wrong: 10GB(not 20GB)=regionserver heap size . You make a very good point: the other 29 days of the month we want read efficiency. So that is why the memstore only receives 0.25. I should change it to 0.4% when writing and see if the error still persists. 5)I have defined my table to have 3x more shards than there are regionservers. I think this shuold avoid hotspotting. +Bulk load indeed would bypass the need for memory. I understand it directly would create the hfiles then. But I am using some external libraries related to geostuff and not sure it is possible. Thanks agin for your valuable contribution!
... View more
03-18-2021
04:08 AM
I have a question about regionservers that go to bad health when I am writing data from Spark. My question is: how do I know (in what logs to look exactly) what the cause is of the bad health? BACKGROUND: My spark job processes a month of some data and writes to hbase. This runs 1/month. For most months there are no problems. For some months (probably with slightly higher traffic), I notice the regionservers go into bad health. The master notices this and when the server goes down, it moves all regions to another regionserver, and then it becomes healthy again. But as my writing is going on, the same happens to other regionservers and eventually my spark job fails. I am confident my error in write is not due to corrupt data (like impossible UTC time or so), since sometimes this happens and then I clearly see in my spark logs "caused by value out of bounds". Now I see 21/03/17 15:54:22 INFO client.AsyncRequestFutureImpl: id=5, table=sometable, attempt=10/11, failureCount=2048ops, last exception=org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server myregionserver4,16020,1615996455401 is not running yet LOGS: In the logs of the master I mainly see that it starts to identify all regions on the regionserver that is dying, and moves them. In the logs of the regionserver around the time of failure I noticed a "org.apache.hadoop.hbase.io.hfile.LruBlockCache: totalSize=3.39 GB, freeSize=2.07 GB, max=5.46 GB, blockCount=27792, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=1049, evicted=0, evictedPerRun=0.0 2021-03-17 15:52:31,442 INFO " but this might be not relevant. This is the behavior of the #of regions for the regionserver that dies: it dies here 2x and the number of regions collapses as they are moved. 2 failures of regionserver at 3.54 and 4.15, followed by copying of regions The cpu looks as follows (looks ok to me): The memory looks like it comes close to the max available: --> Maybe it fails because of memory? Like that all writes are first written to memory and the available memory has been used up? But I had expected to see this kind of error message somewhere (logs or cloudera manager). These are the hbase settings (on CDP 7.2, hbase 2.2) 32 GB regionservers Java Heap Size HBASE regionserver =20GB hfile.block.cache.size =0.55 hbase.regionserver. memstore. global.size =0.25 hbase.bucketsize.cache=16GB I am wondering how I can understand better what the reason is that the regionserver fails. Then I can change the hbase config accordingly.
... View more
Labels:
- Labels:
-
Apache HBase
09-25-2020
04:07 AM
We are trying to use hbase with adls as a storage layer. Is this still possible in cdh 6.1?
This is described in the docs for cdh 5.12 :
https://docs.cloudera.com/documentation/enterprise/5-12-x/topics/admin_using_adls_storage_with_hbase.html#hbase_adls_configuration_steps
but we are using cdh 6.1 and there I don't see a similar page anymore.
Off course I ask since I have an issue following the 5.12 docs. When I want to change in Cloudera Manager in the hbase configuration,
hbase.root.dir = adl://ourproject.azuredatalakestore.net/hbase
I can not save this method as I get an error saying the variable is not in the allowed format:
HDFS Root Directory: Path adls://ourproject.azuredatalakestore.net/hbase does not conform to the pattern "(/[-+=_.a-zA-Z0-9]+)+(/)*".
... View more
Labels:
07-10-2019
06:46 AM
Thank you for your interest! We are using cdh 6.2 , impala 3.2.0-cdh6.2.0, HIVE Hive 2.1.1-cdh6.2.0. Some updated info on the case above: I notice the files are not corrupted. So the files are created in some table in HIVE, but when queried he shows sometimes special characters. I have also encountered that he shows a concatenation of 2 lines that are not related to each other, on one line. I could even trace that he was getting one line from 1 of the files that make up the table, and incorrectly combined it with the end of another line that was even in a different file! Situation like this: file 1 contains id: some_id_1, data1, data2, data3 file 10 contains: some_id_2, otherdata1,otherdata2,otherdata3 SELECT * FROM <problematictable> WHERE id='some_id_1' should return --> some_id_1, data1, data2, data3 would return--> some_id_1, data1, data2herdata2,otherdata3 When I restart impala services, and the table is queried, it shows the results as expected. When you create a new external table with a different location, and cp the files to that location, and query this new table , the results are as expected. It might have to do with the metadata store? Maybe he has problems to know where he needs to retrieve the data? And after a restart of the services, everything is flushed and he does this correctly
... View more
07-01-2019
11:57 PM
I noticed my title is wrong (did not find the edit button )- it should be : Table Created in hive as text and queried by impala shows special characters in impala, not in hive.
... View more
07-01-2019
08:24 AM
Dear forum,
another case to think about!
I created a table in HIVE as a textfile. When I query it, it looks fine for all records.
Next, in Impala, I use the INVALIDATE METADATA statement and afterwards , query the table. Now Impala shows me for some records a question mark as if there are special characters for a couple of records (�). I notice the data in these fields can not be used anywhere else in following steps (instead of reading values, he will complain the values are not valid).
When I examine the textfile on hdfs (through some text editor like sublime with UTF-8 encoding ), I see no special characters and all characters encountered, look as expected. As said, invalidate metadata nor refresh fixes the issue but after a restart of the impala services, the data is available as expected in impala.
Currently we create the table as text file and get the behavior described above. Before we created the table as a parquet file, but then got the error :
-->
File 'hdfs://ourcluster/user/hive/warehouse/tmp.db/thetable/000000_0' has an invalid version number: <some value> This could be due to stale metadata. Try running "refresh tmp.thetable".
-->
Note that this <some value> would always be something that comes from the data (a part of a string that is in the data). The refresh would not fix it, (and as said we already do an invalidate metadata). Note that when we restart the impala service, this error goes away and the data can be queried. The files then seem to be "uncorrupted". I have read a similar post elsewhere that suggests the data would be corrupted when one encounters this error.
Note: we use a collect_set function to create the field that gives the problem during the creation of the table in HIVE.Our current trail of thought is that in some cases (15 out of several million) this gives problematic results but what happens exactly is not understood.
Thanks for any input!
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
-
Apache Spark
10-09-2018
08:50 AM
A kudu table
... View more
10-02-2018
08:21 AM
Note that this was resolved by restarting the impala and sentry service in the cloudera manager.
... View more