Member since
07-16-2015
177
Posts
28
Kudos Received
19
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9494 | 11-14-2017 01:11 AM | |
54548 | 11-03-2017 06:53 AM | |
3544 | 11-03-2017 06:18 AM | |
11690 | 09-12-2017 05:51 AM | |
1368 | 09-08-2017 02:50 AM |
01-16-2018
01:33 AM
Hi, It's been a while ! If I remember correctly, we did not find any solution back then (with CDH5.3.0) - at least other than recreating the collection and re-indexing the data. But after upgrading the CDH version using a version of Solr supporting the "ADDREPLICA" and "DELETEREPLICA" functions in the API you can add an other replica and then delete the one which is down. regards, mathieu
... View more
12-08-2017
03:09 AM
Hello, The ticket you acquire from the keytab has an expiry date and a max renewable date. So, if you see that error after a few days, it might just be that (either the expiry date or the max renewable date). You need to "handle" these cases. regards, Mathieu
... View more
11-29-2017
02:53 AM
Hi, How are scheduled theses jobs ? If using oozie coordinators then it is more an oozie issue. And I don't think oozie work well with daylight saving time. Guess the workaround is to "reinstall" the coordinators. regards, Mathieu
... View more
11-21-2017
07:58 AM
1 Kudo
This error (exit code 143) usualy mean that the container is killed because it tried to use more memory than configured : Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143 A missconfiguration of yarn could lead to this. Check your configuration of containers memory and tasks memory (map & reduce). regards, Mathieu
... View more
11-16-2017
06:57 AM
For this particular case we have used HAProxy + Keepalive. The cluster shouldn't have to know which instance is active but your balancer need to know.
... View more
11-15-2017
03:16 AM
Using Cloudera I don't think it is supported. Check in the latest version of CDH if "slider" appears, but I don't think so.
... View more
11-14-2017
01:11 AM
Hi, Well for deleting corrupted blocks there is an option on the hdfs fsck command. Add the option "-delete" and it should delete all corrupted (or missing) files. You might need to leave safe mode for deleting the corrupted files. If you want to "restore" them, then you shoulld try to follow these guidances : https://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hdfs-files Most cases of corrupted files cannot be restored. regards, Mathieu
... View more
11-13-2017
09:14 AM
Great ! Nice debug
... View more
11-10-2017
01:11 AM
First : - Why do you suspect that Hiveserver2 need to be up and running for impala query to works ? Did you observe that when running Hiveserver2 impala queries worked ? By the way, Hivemetastore is running right ? - Second : You should go look at hue server log files. You should see a better error. regards, mathieu
... View more
11-06-2017
02:52 AM
Does the log shown is correlated to an observed reboot of the agent ? If yes, I would investigate this "flood" service that seems to constantly reboot. Possible cause of a never ending looping restart service : out of memory > agent kill the service > agent restart the service > out of memory > repeat. regards, Mathieu
... View more
11-03-2017
10:15 AM
For the HDFS command try targeting explecitely the active namenode hdfs dfs -ls hdfs://host:8020/
... View more
11-03-2017
08:39 AM
1 Kudo
Before fixing the situation, I would try to start only one namenode (the one with data in its directory). It should be considered as the active namenode if he is alone as long as it can start successfuly.
... View more
11-03-2017
06:53 AM
1 Kudo
The timestamp column is not "suitable" for a partition (unless you want thousands and thousand of partitions). What is suitable : - is to create an Hive table on top of the current not partitionned data, - create a second Hive table for hosting the partitionned data (the same columns + the partition column), - eventualy load the data from the first table to the second one using a query that will "parse" the timestamp column and extract what should be a suitable value for the partition column (for example the year or the year-and-the-month, ...). Example : INSERT INTO TABLE my_partitioned_table PARTITION (part_col_name) SELECT *, year(to_date(my_timestamp_column)) FROM my_not_partitioned_table; You don't have to put the partition value in the insert statement if you enable dynamic partition in Hive. set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict; And on your sample it's not working properly because you didn't parse the timestamp column, you use it as is. Each unique value will create a partition. For a timestamps, it's almost each value that is unique.
... View more
11-03-2017
06:45 AM
1 Kudo
Did you check the supervisor log ?
... View more
11-03-2017
06:43 AM
This issue just means that your shell action has exited with a error code (different from 0). If you want to know the reason then you need to add logging inside the shell script for knowing what happened. Be aware that the scipt execute localy on a data-node. The log you made with the script will be on that particular data-node.
... View more
11-03-2017
06:38 AM
Alternatively you could search around "yarn queue" and ressource allocation. This will not "restrict" the number of mappers or reducers but this will control how many can run concurrently by giving access to only a subset of the available resources.
... View more
11-03-2017
06:31 AM
First : save the namenode dir content. Second : can you launch the second namenode only ? Does it start ? If yes, you should be able to start the data-nodes and get access to the data.
... View more
11-03-2017
06:27 AM
CM is composed of the CM server and the CM agents (one agent per node managed by CM). When you act on the UI for restarting a service it will : - take the command at the CM server level, - then the command will be dispatch to the correct host(s) (CM server ask the relevevant CM agent(s)) to execute the task - the task is executed by the relevant CM agent(s)
... View more
11-03-2017
06:18 AM
1 Kudo
Hi, The concept of Hive partition do not map to HBase tables. So if you want to have HBase as the storage then you will need to workaround your use case. You could try to use "one HBase table" having a row key constructed with the partition value. That way you should be able to query your HBase table using the row key and avoid a full scan of the table. Or you could have one HBase table per "partition" (this also mean one hive table per partition). Or you could see that HBase do not answer your need and stay in Hive ? regards, Mathieu
... View more
11-03-2017
06:11 AM
Cloudera search provide an utility for administrating Solr (solrctl). https://www.cloudera.com/documentation/enterprise/latest/topics/search_solrctl_ref.html One of the command purpose is to upload a "collection configuration files" into zookeeper. "instancedir --create"
... View more
10-25-2017
02:57 AM
I think what you search is a configuration located inside the "core-site.xml" file (in HDFS configuration). search for "proxyuser" on the documentation of Cloudera. regards, Mathieu
... View more
10-25-2017
02:51 AM
For Impala I don't know. but for Hive yes, that was the case when I last tested. You need to give proper permission to "hive" user in HBase. Since all access done from Hive to HBase (Hive table using the HBase Storage Handler) will be handled by the "hive" user. You can handle "end-user" permissions on Hive-side for this use case. regards, Mathieu
... View more
10-25-2017
02:44 AM
Hi, Unless it has changed since I last used sentry (which is possible), it is a little different than how ranger works. In ranger you can explicitely defined security rules for HDFS. In Sentry, there is a plugin that synchronize the Hive/Impala security rules with HDFS ACLs (on a list of HDFS directories). What does it means ? - If you grant "SELECT" permissions on a table for a group, then it will give "read" permission on HDFS on the folder of that table. - If you grant "INSERT" permissions on a database for a group, then it will give "write" permission on HDFS on the root folder of the database. - etc. https://www.cloudera.com/documentation/enterprise/latest/topics/sg_hdfs_sentry_sync.html regards, Mathieu
... View more
10-13-2017
12:47 AM
Hi, A "Connection Refused" error message can means two things mainly : - There is no service behind the "IP:PORT" you have specified (wrong ip, wrong port, the service is currently down, ...) - There is a firewall blocking the connection. It doesn't seem to be related to running the job as hbase user (or some other user).
... View more
09-19-2017
08:55 AM
my question 1 : can i have any way to solve the proble that need me to insert the password when i use command "su hdfs",because i have no idea about the default password >> Ask your administrator. Not sure he is willing to give you that. my question 2 : i do some work , and i see some people say that the default password is on purpose . because common users don't need having the permission to log in HDFS. so , if i'm a common user called "permission", and i don't have any directory in HDFS just like the shot but i want to create a directory just like other users , what can i do?? >> Ask the administrator of the platform to provide you the directory with correct permission for you to be able to work in it.
... View more
09-12-2017
06:03 AM
I don't know if such a function exist. But if not you can create your own UDF for doing this quickly. I do agree that you should not use a "CASE" syntax in a query. Too complex 🙂
... View more
09-12-2017
05:58 AM
Never heard of that. I would guess the architect didn't bother placing them explicitely or has just forgotten them. Regards, mathieu
... View more
09-12-2017
05:51 AM
Not sure this information is available. You could go with the "yarn logs" command or go with the basic way using command line : - pdsh to distribute the same command on every data-node - launch a find on the container id regards, mathieu
... View more
09-12-2017
05:39 AM
Hi, 1/ These articles describe some usefull information about rowkey and its design : - http://archive.cloudera.com/cdh5/cdh/5/hbase-0.98.6-cdh5.3.8/book/rowkey.design.html - https://www.linkedin.com/pulse/performance-tuning-hbase-part-1-rowkey-crux-kuldeep-deshpande/ 2/ If you need to query your data in HBase by a cells value .. it will be totaly inefficient. Cloudera search can help you in these cases but you will need to index the data into Cloudera search. 3/ Well, I don't have the answer to that but I would recommend you to stick to querying by the "rowkey only" if you need some "performance". Hope this helps.
... View more
09-08-2017
02:50 AM
I believe this wait time of 30s is hard coded into the cloudera agent. I don't think we can alter it other than doing a real dirty modification which I wouldn't recommend. regards, Mathieu
... View more