Member since
11-18-2014
196
Posts
18
Kudos Received
8
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6435 | 03-16-2016 05:54 AM | |
2627 | 02-05-2016 04:49 AM | |
1878 | 01-08-2016 06:55 AM | |
13648 | 09-29-2015 01:31 AM | |
1089 | 05-06-2015 01:50 AM |
09-26-2016
01:33 PM
2 Kudos
Another suggestion.. Give requirements such that a cluster is certifiable in production /development etc.. Hopes this helps 🙂
... View more
09-26-2016
01:29 PM
3 Kudos
About the conferences, I have to register every time.. It would be nice to have access with the cloudera account..
... View more
09-17-2016
09:37 AM
I have the same here, after un update from CDH 5.3...
... View more
09-02-2016
02:51 AM
Hello, Your problem is not linked to the impala scheduler, but to the shell. In fact oozie cannot find your shell file. 1.What are the permission on the file? does oozie has acces? shell-impala-invalidate.sh 2.What are the premission to the folder? /data/4/yarn/nm/usercache/*******/appcache/application_1463053085953_30120/container_e49_1463053085953_30120_01_000002 (this folder is on one of your workers) Alina
... View more
08-24-2016
01:25 AM
Hello, I have a small problem with the round function, when I'm adding the nb of decimals, it is not rounding at all. I saw that there was a problem with the rounding function in https://issues.apache.org/jira/browse/HIVE-4523 , just that i'm in hive 1.3.1 CDH 5.3.4 and normally it should be fixed in hive 1.3.0.. Thank you!
... View more
04-12-2016
08:17 AM
Thank you! Indeed, I recreated all the tables... since I have the trash disabled, I had nothing in trash... However, this is a very complete reply. Thank you!
... View more
04-12-2016
08:12 AM
I'm note sure that I can change all the sources in order to post to all my Flume agents, but this is an interesting solution. Thank you!
... View more
04-02-2016
03:27 PM
Hello, Thank you for your reply, In my case the flume source is HTTP, and I wanted to know if there is a way to ensure that if the machine with the flume source if getting down, I can still receive the data (HA). However, I can imagine only a solution with two sources and a load balancer machine before the 2 machines....and I was searching more for a solution within the Hadoop cluster (as it is done with YARN and HBase..) Thank you, Alina
... View more
03-29-2016
09:40 PM
Hello, I had a problem, my job failed because hbase could not find an existing table. I then did a: sudo -u hbase hbase hbck -repair and now all my tables are gone (beside one)!! I cannot see my old data in the hbase folder! Is there a way to recover all this? Please help! Thank you!
... View more
Labels:
- Labels:
-
HBase
03-29-2016
02:54 AM
Hello, Is there a way to deploy Flume in HA? Thank you,
... View more
Labels:
- Labels:
-
Flume
03-16-2016
05:54 AM
What's strange is that it worked when I did cat file_with_all_split_cmmands|hbase shell ... Alina
... View more
03-16-2016
03:11 AM
Hello, I would like to change my HBase table and add some splits for the keys that don't exists yet. I tried the folowings without success : split 'table_name', 'key_that_dont_exists_yet'
alter 'table_name', {SPLITS=>['my_new_split1','my_new_split2']} However, I have the felling that is not working. When I check the splits, I have the same splits as before in the browser and when I do a hdfs dfs -ls /hbase/data/default/table_name I also have no additional folders... Note: I'm using CDH 5.3 Thank you!
... View more
- Tags:
- split_regions
Labels:
- Labels:
-
HBase
03-14-2016
09:04 AM
ello, I would like to set the folowing to true for all my impala queries: set APPX_COUNT_DISTINCT=true; and I can't find any way to do it... I need this, because I have some impala queries that I send over the Impala JDBC Driver and that I would like to optimize. Note: this topic is somehow linked to http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/use-set-command-through-Impala-JDBC-Driver/td-p/37455 Thank you! Alina
... View more
- Tags:
- jdbc
- set_command
03-07-2016
04:13 AM
Hello, In CDH 5.6 there is Hive on Spark and Impala. How should we choose between these 2 services? Are there any benchmarks that compare these 2 services? Thank you! 🙂
... View more
- Tags:
- Hive on Spark
- impala
02-25-2016
05:31 AM
Hello, If it's for test purposes, and since you have that much of disk, I suppose you could add some swap. That should help (not for the performances, but for having access to the platform). Requirements: http://www.cloudera.com/downloads/quickstart_vms/5-4.html If you count the OS; the browser that you may have open, the VM if you are running in a VM... I think you may be at the limit of memory.
... View more
02-24-2016
09:43 PM
Hello, I think may happen if Cloudera doesn't have enough resources (memory, CPU). What is your machine ? I think that in order to have Cloudera Manager you need about at least 9 GB on your PC. Check this out: https://community.cloudera.com/t5/Cloudera-Manager-Installation/CDH-5-1-2-Unable-to-issue-query-the-Host-Monitor-is-not-running/td-p/18404 Hopes this helps,
... View more
02-22-2016
02:29 AM
There was a difference in the amount of average load. Since it's computed when we do the command, it may vary.. I forgot to mention one question , what is the difference in the information that I get in the commands status and status 'replication'?
... View more
02-22-2016
01:53 AM
Hello, I'm tring to understand the HBase status command meaning. However, I can't find any documentation on this ... hbase(main):003:0> status
4 servers, 0 dead, 487.2500 average load hbase(main):004:0> status 'replication'
4 servers, 0 dead, 512.0000 average load hbase(main):006:0> status 'summary' 4 servers, 0 dead, 512.0000 average load - is the average load in byte?/kb? - the average is done on each day/ week? - why is the average different than when I do only status and the same on status 'summary' and status 'replication' ? status 'simple'
4 live servers
ip1:60020 1456128551442
requestsPerSecond=0.0, numberOfOnlineRegions=216, usedHeapMB=1963, maxHeapMB=4062, numberOfStores=216, numberOfStorefiles=5321, storefileUncompressedSizeMB=133145, storefileSizeMB=31861, compressionRatio=0,2393, memstoreSizeMB=0, storefileIndexSizeMB=43, readRequestsCount=88432, writeRequestsCount=0, rootIndexSizeKB=64669, totalStaticIndexSizeKB=503682, totalStaticBloomSizeKB=17227, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
ip2:60020 1456128544596
requestsPerSecond=0.0, numberOfOnlineRegions=631, usedHeapMB=2778, maxHeapMB=4062, numberOfStores=631, numberOfStorefiles=16192, storefileUncompressedSizeMB=293264, storefileSizeMB=75410, compressionRatio=0,2571, memstoreSizeMB=0, storefileIndexSizeMB=92, readRequestsCount=266918, writeRequestsCount=0, rootIndexSizeKB=158970, totalStaticIndexSizeKB=1215591, totalStaticBloomSizeKB=35553, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
ip3:60020 1456128568835
requestsPerSecond=0.0, numberOfOnlineRegions=489, usedHeapMB=2520, maxHeapMB=4062, numberOfStores=489, numberOfStorefiles=12905, storefileUncompressedSizeMB=279315, storefileSizeMB=69048, compressionRatio=0,2472, memstoreSizeMB=0, storefileIndexSizeMB=91, readRequestsCount=751324, writeRequestsCount=0, rootIndexSizeKB=144189, totalStaticIndexSizeKB=930584, totalStaticBloomSizeKB=26132, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
ip4:60020 1456128516858
requestsPerSecond=174.0, numberOfOnlineRegions=712, usedHeapMB=2893, maxHeapMB=4062, numberOfStores=712, numberOfStorefiles=17456, storefileUncompressedSizeMB=617248, storefileSizeMB=131324, compressionRatio=0,2128, memstoreSizeMB=1, storefileIndexSizeMB=98, readRequestsCount=22001272, writeRequestsCount=2047, rootIndexSizeKB=167594, totalStaticIndexSizeKB=1776285, totalStaticBloomSizeKB=149625, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
0 dead servers
Aggregate load: 174, regions: 2048 - what is the meaning of the 'aggregate load' indicator? status 'detailed' "test_all,,1453275538489.ed10387c718e4b42573b6720a407b155." numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=1, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN ... - does the compactionProgressPct correesponds to major_compact ? - What is the meaning of: totalCompactingKVs currentCompactedKVs Thank you!
... View more
- Tags:
- Status command
Labels:
- Labels:
-
HBase
02-05-2016
04:49 AM
I found out why this was happening. Since I was on a DEV cluster I stopped and started the services every day. Also, the data from the table to which I was writing was moving to a single machine from time to time (due to service failing, start and stop etc.) After I balanced the HBase Table the script was distributed.
... View more
01-22-2016
04:52 AM
Most of them, I do not found them in CM even with the good prefixes (in their corresponding service). However the safety valve ( Custom Configuration [1] ) si a good ideea.
... View more
01-22-2016
01:54 AM
1 Kudo
Our test cluster (on amazon): - 5 workers m4.xlarge, 250 GB disk magnetic (we increased the disk to 1T afterwards) * we used one of the 5 machine just for flume(kafka) - 2 masters m4.2xlarge , 125 GB SSD (we decreased the memory and CPU afterwards ==> m4.xlarge ) This was perfect for us for testing purposes.
... View more
01-22-2016
01:44 AM
Hello, I wanted to folow the slides in order to optimize HBase :http://fr.slideshare.net/lhofhansl/h-base-tuninghbasecon2015ok However, there are some configurations that I didn't find in Cloudera Manager configurations (in their respective services): namenode.avoid.read.stale.datanode = true namenode.avoid.write.stale.datanode = true namenode.stale.datanode.interval = 30000 client.read.shortcircuit.buffer.size = 131072 regionserver.checksum.verify = true server.tcpnodelay = true client.tcpnodelay = true hregion.majorcompaction.jitter = 0.5 (½ week, default) hstore.min.locality.to.skip.major.compact master.wait.on.regionservers.timeout ipc.client.tcpnodelay zookeeper.useMulti Why aren't these available? Thank you!
... View more
01-08-2016
06:55 AM
I found a partial answer to my question: http://hadoop-hbase.blogspot.fr/2013/07/hbase-and-data-locality.html Since I am in DEV, I start and Stop the machines every day. Also, we do restart the service from time to time.
... View more
01-08-2016
06:38 AM
Hello, I also have a quick question. On the HBase interface http://<hbase ip>:60010/table.jsp?name=table_name we can see where is each region located. However, normally I have a replication of 3. How is this done? For a region X, is the hole region duplicated on 2 other servers? How can I found out the 2 other servers on wich I can find this X region?
... View more
01-07-2016
07:48 AM
Hello, I want to do a script that decides where to move each region. For that I want to use the command: move
ERROR: wrong number of arguments (0 for 1)
Here is some help for this command:
Move a region. Optionally specify target regionserver else we choose one
at random. NOTE: You pass the encoded region name, not the region name so
this command is a little different to the others. The encoded region name
is the hash suffix on region names: e.g. if the region name were
TestTable,0094429456,1289497600452.527db22f95c8a9e0116f0cc13c680396. then
the encoded region name portion is 527db22f95c8a9e0116f0cc13c680396
A server name is its host, port plus startcode. For example:
host187.example.com,60020,1289493121758
Examples:
hbase> move 'ENCODED_REGIONNAME'
hbase> move 'ENCODED_REGIONNAME', 'SERVER_NAME' For that I started to use the hdfs dfs -du "/hbase/data/default_table_name/" in order to find the region names. However, I do not know, if there si a shell command to find out the startcode for each region (?). Thank you!
... View more
- Tags:
- Region startcode
01-07-2016
05:36 AM
Hello, I have several HBase tables (each with a bunch of regions), and I do not understand why, but Hbase tends to put all the regions of the same table on the same machine. However, I already moved the regions in order to have regions from each table on all workers (with the move command), and the day after I saw that the regions were again on either less machines, either on one machine. I tried to do 'balance_switch false' but with no success (the regions were still moved on one machine..). I do not want to have them on less machines, because if all the regions are on one machine, then only one machine is working... Do I get it wrong? Should I let Hbase move the regions on one machine/less machines? Thank you!
... View more
Labels:
- Labels:
-
HBase
01-06-2016
06:01 AM
Hello, Impala/ is having better performances on partitioned tables (if they are big enough). However, I use Impala on an external HBase table. My HBase table is partitioned. Therefore, I wonder if I can match the HBase partitions with the Impala/Hive ones? If yes, How can I do this? Thank you!
... View more
12-17-2015
01:38 AM
Hello, Yes I can see my application_id logs . However, frstly it shows the message : Logs not available at /tmp/logs/hdfs/logs/application_1449728267224_0138 Log aggregation has not completed or is not enabled. and then it recharge the page and it shows the good logs: SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/disk2/yarn/nm/filecache/7107/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Dec 15, 2015 10:16:44 AM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get
WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information.
Dec 15, 2015 10:16:44 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class
Dec 15, 2015 10:16:44 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Dec 15, 2015 10:16:44 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering
.......
(What I'm searching for..) Thank you! Alina
... View more
12-16-2015
01:12 AM
I didn't found any OutOfMemory error in the indicated logs (I did a grep). However, changing the heap helped. So it really was a heap problem 🙂 Thank you! Alina
... View more