Member since
08-05-2016
52
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1062 | 07-21-2017 12:22 PM |
01-17-2021
12:41 PM
Hi @vjain , To configure the BuckeCache in the descripption there is a two JVM properties. Which one to use please? : HBASE_OPTS or HBASE_REGIONSERVER_OPTS In the hbase-env.sh file for each RegionServer, or in the hbase-env.sh file supplied to Ambari, set the -XX:MaxDirectMemorySize argument for HBASE_REGIONSERVER_OPTS to the amount of direct memory you wish to allocate to HBase. In the configuration for the example discussed above, the value would be 241664m . ( -XX:MaxDirectMemorySize accepts a number followed by a unit indicator; m indicates megabytes.) HBASE_OPTS="$HBASE_OPTS -XX:MaxDirectMemorySize=241664m" Thanks, Helmi KHALIFA
... View more
01-11-2021
01:34 AM
same issue here with default configuration (once in 7 days). any suggestions? 2021-01-11 03:27:19,028 INFO org.apache.hadoop.hbase.regionserver.HStore: Completed major compaction of 4 (all) file(s) in Info of CallLogs,\xE9\x7F\x9EJ\x10\x06L\xF7\x9A\xBF+\xCD\xA8\xB7\x9D\xBB,1608101963434.f0d1a5f4e816118ac167fe9730258102. into a54298ecc9594f9aa0cf6657a795bb54(size=6.0 G), total size for store is 6.0 G. This selection was in queue for 0sec, and took 56sec to execute.
2021-01-11 03:55:46,943 INFO org.apache.hadoop.hbase.regionserver.HStore: Completed major compaction of 5 (all) file(s) in Info of CallLogs,M}\xC4;,1609043919090.8c704f7385c3c3b07bc3aa4be1adc577. into 1d3f086c88bc4972a2e550dd093e9824(size=5.7 G), total size for store is 5.7 G. This selection was in queue for 0sec, and took 1mins, 29sec to execute.
2021-01-11 04:24:56,056 INFO org.apache.hadoop.hbase.regionserver.HStore: Completed major compaction of 4 (all) file(s) in Info of CallLogs,.\x81\xC6\x99e1K\x00\xAE\xB3@\x14g \x0Av,1608158031158.4998d8db979dfea2751136bf1767fb1b. into 4b1c6db0ed5d440d9adb58bf00109b57(size=5.6 G), total size for store is 5.6 G. This selection was in queue for 0sec, and took 1mins, 33sec to execute.
2021-01-11 05:36:34,562 INFO org.apache.hadoop.hbase.regionserver.HStore: Completed major compaction of 5 (all) file(s) in Info of CallLogs,\x19~A\x8F\xD3^G\xFB\xB5!.\x8C6\xCB\xC7t,1607673667302.6c08c1a2f5648c5f190bc378f628a838. into 71d8c8268fcc46d4ad1be29a6c6ce880(size=5.9 G), total size for store is 5.9 G. This selection was in queue for 0sec, and took 1mins, 42sec to execute.
2021-01-11 05:38:13,268 INFO org.apache.hadoop.hbase.regionserver.HStore: Completed major compaction of 4 (all) file(s) in Info of CallLogs,\xB9~\x8EX\xA5cH\xBE\x94g\xFF\xB76\xD6\x80/,1608131740376.47868b4d2475ef1fef1f23fea51b2e0f. into 21e524b8483047b7a6529ff20ea56602(size=5.9 G), total size for store is 5.9 G. This selection was in queue for 0sec, and took 1mins, 17sec to execute.
2021-01-11 07:11:21,277 INFO org.apache.hadoop.hbase.regionserver.HStore: Completed major compaction of 3 (all) file(s) in Info of CallLogs,\x9E\x7F\xD9\xADe\x81H\x8C\x8E\x80\x87)\xE0G\xD7\xFE,1608533709336.c152e8f046c8f75ca265c1fc9c742909. into f4dc8f12dcee4f118acc779ae000ff6b(size=5.8 G), total size for store is 5.8 G. This selection was in queue for 0sec, and took 1mins, 19sec to execute.
2021-01-11 08:51:46,548 INFO org.apache.hadoop.hbase.regionserver.HStore: Completed major compaction of 5 (all) file(s) in Info of CallLogs,\x7F\x81\x01M8{O\x92\x9C\xC2J\x01\xB7r8\xF4,1608529936723.34e0ffe299134cd8ad22145ae2314d3e. into 246537faa92741ea865437eebf2e1e9a(size=6.0 G), total size for store is 6.0 G. This selection was in queue for 0sec, and took 1mins, 29sec to execute.
... View more
12-08-2020
09:23 AM
Hello @lihao This is an Old Post, yet we can use "-skip" flag of HBCK2 Tool to ensure the HBCK2 Tool doesn't check the Master Version. The "-skip" flag is documented via Link [1], which is the Git Page of HBCK2 Tool. - Smarak [1] https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2
... View more
04-13-2020
11:55 PM
Hi @jsensharma I have all correct shell and outputs of all commands look exactly same as you have mentioned.
... View more
02-24-2020
06:14 PM
I tried with spark.streaming.backpressure.pid.minRate which work as expected. My configuration: spark.shuffle.service.enabled: "true"
spark.streaming.kafka.maxRatePerPartition: "600"
spark.streaming.backpressure.enabled: "true"
spark.streaming.concurrentJobs: "1"
spark.executor.extraJavaOptions: "-XX:+UseConcMarkSweepGC"
batch.duration: 5
spark.streaming.backpressure.pid.minRate: 2000 so, by configuration it starts with = 15 (total Number of partitions) X 600 (maxRatePerPartition) X 5 (batch Duration) = 45000 but it doesn't able to process these many records in 5 seconds. It drops to ~10,000 = 2000 (pid.minRate) X 5 (batch duration) So, it spark.streaming.backpressure.pid.minRate is total records per seconds. Just set spark.streaming.backpressure.pid.minRate and leave following config as default spark.streaming.backpressure.pid.integral
spark.streaming.backpressure.pid.proportional
spark.streaming.backpressure.pid.derived
... View more
11-26-2019
10:46 AM
Hi,
I have an HBASE table with one million rows and when we query the table using a none existant rowkey value the query takes more than 50 sec. Example:
table : test
rowkey 1 : AB1234
query 1 : get 'test', 'AB12345'
rowkey 2 : DF1234
query 2 : get 'test', 'DF12345'
rowkey 3 : BC1234
query 3 : get 'test', 'BC12345'
this kind of queries 1, 2 and 3 take more than 50 sec
any idea please ?
best,
Helmi KHALIFA
... View more
Labels:
11-14-2019
05:07 AM
Hi!
I have some problems managing the HBase major compaction.
I configured the major compaction between 1 an 4 am but we still see major compactions executed at any hour.
Here the two configurations I tried :
First configuration:
hbase.hregion.majorcompaction=7 Days 0 Hours
hbase.offpeak.start.hour=1
hbase.offpeak.end.hour=4
Second configuration:
hbase.hregion.majorcompaction=0 Days 0 Hours
hbase.offpeak.start.hour=1
hbase.offpeak.end.hour=4
Did I miss something, please?
Thank you for your answer.
Best,
Helmi KHALIFA
... View more
Labels:
11-14-2019
02:54 AM
Hey @avengers, Just thought, this could add some more value to this question here. Spark SQL uses a Hive Metastore to manage the metadata of persistent relational entities (e.g. databases, tables, columns, partitions) in a relational database (for fast access) [1]. Also, I don't think there would be a MetaStore crash if we use it along with HiveOnSpark. [1] https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-hive-metastore.html
... View more
11-12-2019
06:22 AM
Hi, We need to see the cluster Resource usage during the time frame when the jobs are in accepted state. If the entire memory and allocated vcores has been used the job does not have suffiicient resources to run the application. We need to look for the RM webui and Scheduler screenshot of all the queues to view the usage of the cluster. Thanks AKR
... View more
11-05-2019
01:21 AM
Hi @Rak ; here the script : CREATE EXTERNAL TABLE IF NOT EXISTS sample_date (sc_code string, ddate timestamp, co_code DECIMAL, high DECIMAL, low DECIMAL, open DECIMAL, close DECIMAL, volume DECIMAL, no_trades DECIMAL, net_turnov DECIMAL, dmcap DECIMAL, return DECIMAL, factor DECIMAL, ttmpe DECIMAL, yepe DECIMAL, flag string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/lab/itim/ccbd/helmi/sampleDate' tblproperties('skip.header.line.count'='1'); ALTER TABLE sample_date SET SERDEPROPERTIES ("timestamp.formats"="MM/DD/YYYY"); Could you accept the answer please ? Best, Helmi KHALIFA
... View more
- Tags:
- Hive
10-25-2019
01:18 AM
1 Kudo
Hi @RNN The best solution is to convert the Monthes to integers like: -Oct- => -10- -Dec- =>-12- So that is what i tested as you can see my file below: $ hdfs dfs -cat /lab/helmi/test_timestamp_MM.txt 1,2019-10-14 20:00:01.027898 2,2019-12-10 21:00:01.023 3,2019-11-25 20:00:01.03 4,2019-01-06 20:00:01.123 Create a Hive table : hive> CREATE EXTERNAL TABLE ttime(id int, t string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/lab/helmi/'; hive> select * from ttime; OK 1 2019-10-14 20:00:01.027898 2 2019-12-10 21:00:01.023 3 2019-11-25 20:00:01.03 4 2019-01-06 20:00:01.123 Time taken: 0.566 seconds, Fetched: 4 row(s) Finally i created another table with the right format: hive> create table mytime as select id, from_utc_timestamp(date_format(t,'yyyy-MM-dd HH:mm:ss.SSSSSS'),'UTC') as datetime from ttime; Best, Helmi KHALIFA
... View more
10-24-2019
09:37 PM
Hive works in UTF-8 by default Create table as select with this condition works well, but view cannot be created correctly with the same condition.
... View more
10-24-2019
07:55 PM
Unfortunately it doesn't. My result CSV looks like this id,age,name 1,29, 2,17,
... View more
09-24-2019
02:18 AM
HI @hadoopguy Yes there is an impact you will have longer processing time and the operations will be queued. You have to carefully handle the timeout in your jobs. Best, @helmi_khalifa
... View more
09-24-2019
02:11 AM
Hi Suresh, There is no command but you can easily find the information on the HBase Web UI. http://host:16010/master-status#baseStats Best, Helmi KHALIFA
... View more
05-27-2019
09:56 AM
Hi, Because of too frequent HBase major compaction I am trying to run major compaction manually on all tables using a script. Is there easier way of doing this? Best, Helmi KHALIFA
... View more
Labels:
05-19-2019
04:02 PM
The above was originally posted in the Community Help track. On Sun May 19 16:00 UTC 2019, the HCC moderation staff moved it to the Hadoop Core track. The Community Help Track is intended for questions about using the HCC site itself.
... View more
03-07-2019
02:37 PM
Hi, I installed : zeppelin 0.8.0 HDP-3.1.0.0 (3.1.0.0-78) then I configured the zeppelin.server.port=8080 The problem now is that it works randomly. When it shows green everything is ok and i see my notebooks but when i login and it still showing red with the message WebSocket Disconnected my notebook disappear and i can't work or create anything! Any help please ? Thanks Best, Helmi KHALIFA
... View more
Labels:
03-18-2019
09:08 PM
@Josh Elser Can you pls. guide me, how and where to set the Java heap space on the client ? I have windows machines where my app runs and the phoenix queries are trigger from these windows system. I see no logs on the server side, I believe, the query is failing on the client side itself.
... View more
06-07-2018
10:26 AM
here are some hints given: http://hbase.apache.org/0.94/book/secondary.indexes.html In most cases you'll have to create a second index table.
... View more
01-30-2018
08:27 PM
thank you @Josh Elser 🙂
... View more
01-17-2018
12:11 AM
is there any best practices for tunning the Spark Streaming optimal number of executors Vs the number of Kafka partitions? NB. We have 20 Kafka partitions (1 To of logs per day) and 21 Spark Streaming Executors. Unfortunately, this configuration blocks 400 GB of RAM even when there is no event. thanks.
... View more
Labels:
10-19-2017
08:56 AM
Yes ii works ! Thank you @Aditya Sirna 🙂
... View more
10-11-2017
06:42 PM
Thank you @Matt Burgess 🙂
... View more
07-22-2017
09:35 AM
I am using the same syntax as yours but it does'nt work. there are some missing properties in the hive-site.xml file. I added these properties in my comment below and it works now mapred.input.dir.recursive hive.mapred.supports.subdirectories Thanks
... View more
12-20-2018
01:36 PM
hi Muji, Great job 🙂 just missing a ',' after : B_df("_c1").cast(StringType).as("S_STORE_ID") // Assign column names to the Region dataframe
val storeDF = B_df.select( B_df("_c0").cast(IntegerType).as("S_STORE_SK"), B_df("_c1").cast(StringType).as("S_STORE_ID"), B_df("_c5").cast(StringType).as("S_STORE_NAME")
)
... View more
12-16-2016
10:09 AM
Thank you! 🙂
... View more