Member since
07-17-2019
738
Posts
433
Kudos Received
111
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2613 | 08-06-2019 07:09 PM | |
2853 | 07-19-2019 01:57 PM | |
4037 | 02-25-2019 04:47 PM | |
4028 | 10-11-2018 02:47 PM | |
1343 | 09-26-2018 02:49 PM |
10-25-2018
03:48 PM
1 Kudo
The more backups you have, the more information the mapreduce needs to read to execute correctly. Similarly, the more incremental backups you have without a full backup means more files that the MR job needs to read/hold. The solution is to increase the Java heap that you provide to the job. You should also do some learning to understand the difference between Java heap and physical memory as your analysis implies that you don't understand the difference between them.
... View more
10-11-2018
02:47 PM
1 Kudo
"waitTime=60001, operationTimeout=60000 expired" You need to include hbase-site.xml on the classpath for your application. It is obvious from the error that the hbase.rpc.timeout and phoenix.query.timeoutMs are not being respected from this error.
... View more
10-11-2018
02:46 PM
1 Kudo
org.apache.hadoop.hbase.client.RetriesExhaustedException:Can't get the location for replica 0 "Replica 0" is the location of a Region. If the client can't find this location, that means that the Region is not being hosted (it's in-transition). Make sure all of the Regions for your table are assigned.
... View more
09-26-2018
02:49 PM
1 Kudo
No, by definition the superuser has all permission to perform actions on the system. You can change who the superuser is, if you choose. Without strong authentication via Kerberos, you're wasting your time trying to apply any kind of authorization rules to your system because anyone will be able to masquerade as whoever they want.
... View more
09-04-2018
08:07 PM
You have issues where DataNodes are being marked as dead which causes HBase to be unable to reach the required level of replication that you have configured. Inspect why the DataNodes are being marked as failures: JVM gc pauses or networking issues are common suspects.
... View more
08-29-2018
03:32 PM
1 Kudo
Sounds like you're hitting https://issues.apache.org/jira/browse/PHOENIX-4489. This was fixed in HDP-2.6.5. However, it seems like you are using a version of Phoenix which is not included in HDP, so you are on your own to address that issue.
... View more
08-07-2018
03:31 PM
You don't need to scan the entire table if you can enumerate the values of salt that you used. E.g. if you only use salt 000 through 009, you would have to execute 10 Gets to look for the data, 000:rowkey1, 001:rowkey1, 002:rowkey1, 003:rowkey1, ... If you used a stable hashing algorithm to choose a salt based on the rowkey value, you will know the exact salt value to use (e.g. rowkey1 always generates salt "004"). At the end of the day, HBase is only storing bytes -- it's up to you to know how you inserted the data and need to retrieve it.
... View more
08-06-2018
03:42 PM
Does your data actually span all of the regions you created splitpoints for? Or, when this finishes generating the HFile, does the client end up having to split the HFiles (and not just load them?). The only thing I can guess would be that the HBaseStorageHandler isn't doing something right. Generating only on HFile when you have 10 regions is definitely suboptimal.
... View more
08-03-2018
02:36 PM
When you are generating HFiles for HBase, the typical pattern is that you have one reducer per Region because HFiles must only contain data for a specific Region. As such, tweaking the number Reducers you get is more of a factor of presplitting your table to increase the number of Reducers (or merging, to reduce the number of Reducers).
... View more
08-01-2018
02:59 PM
How are you using Hive (mapreduce, tez, LLAP)? Can you add some context about where you think slowness would be? e.g. how long does it take to just read that data from Hive (run a select)? Can you tell how much time is actually spent writing data to HBase from logs? If you rerun the same INSERT, does it always take this much time? If you change the LIMIT, does 2000 rows take twice as long to insert?
... View more