Member since
11-18-2014
196
Posts
17
Kudos Received
8
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5601 | 03-16-2016 05:54 AM | |
2274 | 02-05-2016 04:49 AM | |
1495 | 01-08-2016 06:55 AM | |
12308 | 09-29-2015 01:31 AM | |
864 | 05-06-2015 01:50 AM |
01-07-2016
05:36 AM
Hello, I have several HBase tables (each with a bunch of regions), and I do not understand why, but Hbase tends to put all the regions of the same table on the same machine. However, I already moved the regions in order to have regions from each table on all workers (with the move command), and the day after I saw that the regions were again on either less machines, either on one machine. I tried to do 'balance_switch false' but with no success (the regions were still moved on one machine..). I do not want to have them on less machines, because if all the regions are on one machine, then only one machine is working... Do I get it wrong? Should I let Hbase move the regions on one machine/less machines? Thank you!
... View more
Labels:
01-06-2016
06:01 AM
Hello, Impala/ is having better performances on partitioned tables (if they are big enough). However, I use Impala on an external HBase table. My HBase table is partitioned. Therefore, I wonder if I can match the HBase partitions with the Impala/Hive ones? If yes, How can I do this? Thank you!
... View more
12-30-2015
12:50 AM
The show create table table name (Anonymised): CREATE TABLE database_name.table_name ( a STRING, b STRING, c STRING, d STRING, e TIMESTAMP, f TIMESTAMP, g STRING, h TIMESTAMP, i STRING, j BIGINT, k STRING, l STRING, m STRING, n STRING, o STRING, p STRING, r STRING, s STRING, t STRING, x STRING, y TIMESTAMP, z STRING, aa STRING, bb STRING, cc STRING, dd STRING, ee STRING, ff BIGINT, gg BIGINT, hh TIMESTAMP, ii TIMESTAMP, jj STRING, kk STRING, ll STRING, mm STRING, nn STRING, oo STRING, oo STRING, pp STRING, qq STRING, rr STRING, ss STRING, tt STRING, uu STRING, xx STRING, yy STRING, zz STRING, aaa STRING, bbb STRING, ccc STRING, ddd STRING, eee STRING, fff BIGINT, ggg STRING, hhh STRING, iii STRING, jjj STRING, kkk STRING, lll STRING, mmm STRING, nnn STRING, ooo STRING, ppp STRING, qqq STRING, rrr STRING, sss STRING, ttt STRING, uuu STRING, xxx STRING, yyy STRING, zzz STRING )
PARTITIONED BY ( abc STRING, abcde STRING ) WITH SERDEPROPERTIES ('serialization.format'='1')
STORED AS TEXTFILE LOCATION 'hdfs://HaNameNode/user/hive/warehouse/database_name.db/table_name'
TBLPROPERTIES ('STATS_GENERATED_VIA_STATS_TASK'='true', 'transient_lastDdlTime'='1450869167', 'numRows'='106717515') Query: select count(distinct a, b, c, d, e),a
from table_name
group by a Impala error and logs: Bad status for request 9360: TGetOperationStatusResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), operationState=5, errorMessage=None, sqlState=None, errorCode=None)
Query 364e9ad30338567c:3bb748508f6431af: 33% Complete (1370 out of 4149)
Backend 6:For better performance, snappy, gzip and bzip-compressed files should not be split into multiple hdfs-blocks. file=hdfs://HaNameNode/user/hive/warehouse/database_name.db/table_name/abc=value_abc/abcd=2015-12/000000_0_copy_42.snappy offset 134217728
For better performance, snappy, gzip and bzip-compressed files should not be split into multiple hdfs-blocks. file=hdfs://HaNameNode/user/hive/warehouse/database_name.db/table_name/abc=value_abc/abcd=2015-12/000000_0_copy_41.snappy offset 134217728
For better performance, snappy, gzip and bzip-compressed files should not be split into multiple hdfs-blocks. file=hdfs://HaNameNode/user/hive/warehouse/database_name.db/table_name/abc=value_abc/abcd=2015-11/000001_0.snappy offset 134217728
Backend 7:For better performance, snappy, gzip and bzip-compressed files should not be split into multiple hdfs-blocks. file=hdfs://HaNameNode/user/hive/warehouse/database_name.db/table_name/abc=value_abc/abcd=2015-12/000000_0_copy_43.snappy offset 134217728
... View more
12-30-2015
12:30 AM
Hello, I noticed that the average IO Wait on my workers is somewhere about 24%. The workers are amazon machines de type m 4 .xlarge ( 4 w/ Hyperthreading ). Is there any configuration that can help me reduce the IO wait? Is this normal for a Hadoop/Cloudera cluster? Thank you!
... View more
12-21-2015
09:19 AM
Hello, I have a pig job that writes into HBase. However, from time to time, for a successful job, in the logs I have: Input(s):
Successfully read 2588027 records (1523635920 bytes) from: "my_database.my_table"
Output(s):
Successfully stored 2588027 records in: "hdfs://StandbyNameNode/user/agherman/my_write_table"
Counters:
Total records written : 2588027
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
And when I have Total bytes written : 0 in fact, my data wasn't write. However, when I run the second time (the same job) this is sometimes working Could you please let me know what this means? what I could do to stop it ? how could I identify that in fact the job wasn't working (besides checking that I have bytes written) Thank you!
... View more
- Tags:
- HBase
- write-0-byte
Labels:
12-21-2015
02:26 AM
Hello, Today my job failed because: Cannot obtain block length for LocatedBlock{BP-1623273649-<IP>-1419337015794:blk_1076750425_3012802; getBlockSize()=16188; corrupt=false; offset=0; locs=[<IP-Worker1>:50010, <IP-Worker-2>:50010, <IP-WORKER-3>:50010]} I tried to find the filename: hdfs fsck / -files -blocks | grep 'BP-1623273649-<IP>-1419337015794:blk_1076750425' but there is nothing(no file found with this block ID). I restarted the job twice and the second time it worked. I didn't change anything. The cluster is in HA and the NN didn't change. Would you please explain to me what could have happened? Thank you,
... View more
12-17-2015
01:38 AM
Hello, Yes I can see my application_id logs . However, frstly it shows the message : Logs not available at /tmp/logs/hdfs/logs/application_1449728267224_0138 Log aggregation has not completed or is not enabled. and then it recharge the page and it shows the good logs: SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/disk2/yarn/nm/filecache/7107/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Dec 15, 2015 10:16:44 AM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get
WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information.
Dec 15, 2015 10:16:44 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class
Dec 15, 2015 10:16:44 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Dec 15, 2015 10:16:44 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering
.......
(What I'm searching for..) Thank you! Alina
... View more
12-16-2015
01:12 AM
I didn't found any OutOfMemory error in the indicated logs (I did a grep). However, changing the heap helped. So it really was a heap problem 🙂 Thank you! Alina
... View more
12-15-2015
07:51 AM
Hello, I tried to do : export HADOOP_USER_NAME=my_user
load_events=`yarn logs -applicationId $application_id` I also tried: export HADOOP_USER_NAME=hdfs
load_events=`yarn logs -applicationId $application_id` However I get : Logs not available at /tmp/logs/hdfs/logs/application_1449728267224_0138 Log aggregation has not completed or is not enabled. And this is the message that I get when I do this command with un unauthorised user..
... View more
12-15-2015
05:33 AM
My hive version:Hive 0.13.1-cdh5.3.5 I tried to do the select * from VERSION command in Hive but is not working...
... View more
12-15-2015
02:34 AM
Hello, Suddently one of the 3 flume agents that are on the same machine is not starting anymore. All I have in logs is: DEBUG December 15 2015 10:16 AM Shell
Failed to detect a valid hadoop home directory
java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.
at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:302)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:327)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)
at org.apache.hadoop.security.Groups.<init>(Groups.java:86)
at org.apache.hadoop.security.Groups.<init>(Groups.java:66)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:269)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:246)
at org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:323)
at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:317)
at org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:557)
at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:272)
at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:413)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:98)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
The health test result for FLUME_AGENT_SCM_HEALTH has become concerning: This role's process exited while starting. A retry is in process.
The health test result for FLUME_AGENT_SCM_HEALTH has become bad: This role's process is starting. This role is supposed to be started.
However, the files remain in .tmp (are never rolled anymore).. I cannot understant how come this agent has the error of the hadoop home dir and the other 2 don"t ... Thank you! Alina
... View more
- Tags:
- Flume
12-14-2015
11:48 PM
Hello, I told myself that if I cannot add an attachement, I'll just add a link to my log files from HDFS. So I did a : yarn application -list -appStates FINISHED |grep 'my_workflow_name' |grep -Po 'application_\d+_\d+' | sed 's/.*application://' | tail -n 1 in order to find the application id ($my_application_id) that I needed. Afterwards I wanted to do a : yarn logs -applicationId $my_application_id However, this doesn't return any logs if it is not executed with a user that has the rights to read the logs. So I wanted to change it into: sudo -u hdfs yarn logs -applicationId $application_id but then I got the ERROR: sudo: sorry, you must have a tty to run sudo Is there a proper way to find out the logs without changing the level of securiyt security? (http://unix.stackexchange.com/questions/122616/why-do-i-need-a-tty-to-run-sudo-if-i-can-sudo-without-a-password ) Thank you!
... View more
12-14-2015
11:24 PM
Hello, I wanted to install CDAP (to try it out) so I folowed the http://blog.cloudera.com/blog/2015/02/how-to-install-and-use-cask-data-application-platform-alongside-impala/ http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cm_mc_addon_services.html documentations. Everything worked fine until I started to push the download button from Cloudera Manager (parcels), when I got a Local parcel error for parcel CDAP-3.2.1-1-el6 : Parcel not available for OS Distribution RHEL6. Note: - I'm using CDH 5.3 - OS: CentOS 6 Is there a site where we can find all the parcels and the avaibabilities of the parcels on different OS distributions? (I just can't find it...) Thank you! Alina
... View more
12-14-2015
05:28 AM
You were right, this is linked to : OOZIE-2160 - Oozie email action now supports attachments with an <attachment> element I am in CDH 5.3 Thank you!
... View more
- Tags:
- Oozie
12-10-2015
11:41 PM
Hello, I saw that there is an attachement element, however, I cannot add it in hue... Thank you!
... View more
11-27-2015
02:32 AM
Thank you for your answer. I will try to do a schell script for gathering the logs. However, is there a way to add the logs in the attachement? Thank you! Alina
... View more
11-20-2015
07:59 AM
Hello, Is there a way to attach or add in the content of the email (email action in oozie) the job logs? (the logs from all the actions of the job) I didn't find any wf parameter that could help .. thank you!
... View more
- Tags:
- email action
- Oozie
Labels:
11-16-2015
07:34 AM
Hello, My fault. I had a timestamp problem. Since in France the hour changed (to UTC+1), and we are having the servers still on UTC+2, I tought that it is not changing. Thank you!
... View more
11-16-2015
07:28 AM
Yes I did, but with that tutorial I didn't manage to configure it right. .. Thank you,
... View more
11-15-2015
06:48 PM
Hello, I have the same error. I have no error showned up when I execute the show command. The same command works with hive, so file is not corrupt. Thank you, Best regards, Alina GHERMAN
... View more
11-10-2015
06:46 AM
Hello, I'm having a coordinator that is executed every day that is writing in HBase tables. Yesterday the job failed because: 12477375 [hconnection-0x17444a28-shared--pool954-t772] INFO org.apache.hadoop.hbase.client.AsyncProcess - #3792, table=table_name, attempt=31/35 failed 1 ops, last exception: org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region table_name,dsqdqs|122A48C3-,1439883135077.f07d81b4d4ff8e9d4170cce187fc2027. is not online on <IP>,60020,1447053312111
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2762)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4268)
at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3476)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30069)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:116)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:96)
at java.lang.Thread.run(Thread.java:745) I did the folowings checks: - hbase hbck ==> no error - hbase fsck / ==> no error - major_compact 'table_name' ==> I managed to run the job However, even if the workflow finished succesfully, there is no data wroto to hbase tables. I tried: - flush 'table_name' ==> didn't changed anything. Do you have any suggestions on why the data is not wrote? (I tried the flush command because I supposed that the files are not written)
... View more
Labels:
11-08-2015
09:32 AM
I upgraded to CDH 5.3.5 and now I can delete projects/add projects/add components to projects.
... View more
11-08-2015
07:29 AM
Hello, Cloudera Manager is not starting if you don't have enough RAM memory . I know that one year ago, with the Quickstart VM, Cloudera Manager didn't started on the VM with 6 GB Memory but it did on a VM with 9 GB memory. However, if you don't have more memory, you can still play with the installed VM just that: - you have to start/stop all the services (deamon by deamon) in command line http://www.cloudera.com/content/www/en-us/documentation/archive/cdh/4-x/4-7-1/CDH4-Installation-Guide/cdh4ig_topic_27_3.html - I suggest that you start at least the Hue interface in order to be able to do all the queries from the interface. If not, you can do the queries in each service shell (hbase shell, hive, impala-shell etc). Best regards, Alina GHERMAN
... View more
11-08-2015
07:03 AM
Hello, I just tring to understand better the HDFS Federation. If I get it right: - we should use it in order to split for example the real time space and the batch space. - if we want to split the namespace into N namespaces than we have to have N namenodes Thank you! Alina
... View more
Labels:
11-05-2015
09:43 AM
Hello, Thank you for your answer. The problem is that clasic Pig scripts (no access to Hive tables, nor to HBase) are running in a distributed way (they have mappers and reducers). However, this one is running only on one node (in Cloudera Manager ->Hosts all namenodes have a Load Average of 0.* and one node has 9.* as load charge) Since you say that normally, even if only mappers are created the script should run in a distributed node, I will post an anonymised version of my script. SET mapreduce.fileoutputcommitter.marksuccessfuljobs false;
SET output.compression.codec org.apache.hadoop.io.compress.SnappyCodec;
SET hbase.zookeeper.quorum '${ZOOKEEPER_QUORUM}';
SET oozie.use.system.libpath true
SET oozie.libpath '${PATH_LIB_OOZIE}'
------------------------------------------------------------
-- hcat
register 'hive-hcatalog-core-0.13.1-cdh5.3.0.jar';
register 'hive-hcatalog-core.jar';
register 'hive-hcatalog-pig-adapter-0.13.1-cdh5.3.0.jar';
register 'hive-hcatalog-pig-adapter.jar';
register 'hive-metastore-0.13.1-cdh5.3.0.jar';
register 'datanucleus-core-3.2.10.jar';
register 'datanucleus-api-jdo-3.2.6.jar';
register 'datanucleus-rdbms-3.2.9.jar';
register 'commons-dbcp-1.4.jar';
register 'commons-pool-1.5.4.jar';
register 'jdo-api-3.0.1.jar';
-- UDF
REGISTER 'MyStoreUDF-0.3.8.jar';
------------------------------------------------------------------------------------------------------------
----------------------------------------------- input data -------------------------------------------------
var_a= LOAD 'my_database.my_table' USING org.apache.hcatalog.pig.HCatLoader() as
(
a:chararray ,
b:chararray,
c:chararray,
d:chararray,
e:chararray,
f:long,
g:chararray,
h:chararray,
i:long,
j:chararray,
k:bag{((name:chararray,value:chararray))},
l:chararray,
m:chararray );
var_a_filtered= FILTER sessions BY (a== 'abcd' );
var_a_proj= FOREACH var_a_filteredGENERATE
a,
b,
c,
d;
STORE var_a_proj INTO 'hbaseTableName'
USING MyStoreUDF('-hbaseTableName1 hbaseTableName1 -hbaseTableName2 -hbaseTableName2 '); Thank you! Alina GHERMAN
... View more
11-05-2015
01:55 AM
Hello, In http://<IP>/oozie/list_oozie_coordinators/ the Next Submission field is never updated. In fact it always contains the first submission of the job. Thank you! Alina GHERMAN
... View more
11-05-2015
01:42 AM
Hello, This returns un error: TRUNCATE TABLE database_name.table_name PARTITION (date_by_day='2015-08-19'); While this works: USE database_name;
TRUNCATE TABLE table_name PARTITION (date_by_day='2015-08-19'); Note: I'm using Cloudera 5.3 Thank you!
... View more
- Tags:
- Hive
- Truncate Table
11-04-2015
11:29 PM
Hello, I have a pig job that I schedule with oozie. This pig job is reading data from a Hive table and is writing into 3 HBase tables (UDF). The problem is that only one node is working. I notice that this job has only mappers and no reducers. Is this the problem? I'm asking this because of the thread: https://community.cloudera.com/t5/Batch-Processing-and-Workflow/Execute-Shell-script-through-oozie-job-in-all-node/m-p/33136#M1765 where @Sue said " The Oozie shell action is run as a Hadoop job with one map task and zero reduce tasks - the job runs on one arbitrary node in the cluster. " Is there a way to force the cluster to use all the nodes? Thank you!
... View more
10-20-2015
09:56 AM
Hello, I'm searching a tutorial about how to integrate impala with yarn (and llama). - what configurations should I change in my existing HA cluster ? Thank you!
... View more