Member since
01-25-2019
56
Posts
9
Kudos Received
13
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
108 | 02-25-2021 02:10 AM | |
142 | 02-23-2021 11:31 PM | |
161 | 02-18-2021 10:18 PM | |
196 | 02-11-2021 10:08 PM | |
368 | 02-01-2021 01:47 PM |
02-25-2021
09:42 AM
1 Kudo
Hello @marccasajus
Yes, this has been documented internally as a BUG (OPSAPS-53043) and is currently not fixed.
Also, it looks you have already applied the changes which would address this.
... View more
02-25-2021
02:10 AM
Hello @SajawalSultan It seems you are running the job via user cloudera_user and it needs access to /user/<username> directory to create scratch directories which it is unable to create because user "cloudera_user" does not has permissions. hdfs:supergroup:drwxr-xr-x /user Run hdfs dfs -chmod 777 /user from hdfs user to ensure you get proper access to /user directory. Let me know if this solves your sqoop import.
... View more
02-25-2021
02:05 AM
Hello @Sample If we don't have hadoop ecosystem, hive and impala would not exist in first place. If you have hive on one side(basically hadoop ecosystem) and mysql on other end. Now if you want to import data into hive from mysql, you will have to make use of sqoop to perform the same and vice-versa. Let me know if the above answers all your questions.
... View more
02-25-2021
12:27 AM
Hello @saamurai Thanks for the confirmation. Cheers! Was your question answered? Make sure to mark the answer as the accepted solution. If you find a reply useful, say thanks by clicking on the thumbs up button.
... View more
02-24-2021
11:44 PM
Hello @saamurai We have separate drivers for Impala and hive and I am not sure why you intent on using hive driver for Impala. We do connect to Impala from edge nodes via beeline which is jdbc but the sole purpose is to perform some tests whether connectivity works fine or not. We do not recommend to use beeline for Impala as we have impala-shell designed for the same. Cloudera recommends to use specific drivers along with version compatibility for each components.
... View more
02-23-2021
11:31 PM
1 Kudo
hello @Benj1029 You need to go to the below path on the host which is hosting HIveserver2 process. cd /var/log/hive/ vi hiveserver2.log file and just before the shutdown try looking at the stack trace, the would help you with some pointers.
... View more
02-18-2021
10:18 PM
1 Kudo
Well @ryu, My understanding is when you are storing things on HDFS and that too things related to hive, it is best to use managed table considering in mind that CDP is now coming up with compaction features where in small file issue would automatically get addressed. Compaction will not happen on external tables. one would prefer to choose external tables if the data is stored outside HDFS like S3. This is my understanding, but again it could vary on customer to customer based on their use cases.
... View more
02-18-2021
10:38 AM
Hello @ryu There is no such path as best path but obviously not /tmp location. You can create some path under /user/external_tables and further create the tables here. Again it totally depends upon you how you are designing and your use case.
... View more
02-18-2021
10:34 AM
Hello @bb9900m The use case in your scenario is make use of Load balancer. You can make user of external Load Balancer and balance the impala coordinators behind the Load balancer. This way, you will have the Load balancer Ip exposed to the client(Virtual IP) which further will balance the load based upon the load balancing mechanism set. Attaching the link and sample configuration for your reference. https://docs.cloudera.com/runtime/7.2.6/impala-manage/topics/impala-load-balancer-configure.html When it comes to SSL, make sure you have the LB hostname added as SAN names on all the impala certificates. Let me know if you have issues in any of the points above.
... View more
02-18-2021
10:26 AM
Hello @saamurai Could you please share the link where you read so. Meanwhile you can use the latest driver links to connect to Hive and Impala respectively for CDP. //For Hive https://docs.cloudera.com/documentation/other/connectors/hive-jdbc/2-6-13/Cloudera-JDBC-Driver-for-Apache-Hive-Install-Guide.pdf //For Impala https://docs.cloudera.com/documentation/other/connectors/impala-jdbc/latest/Cloudera-JDBC-Driver-for-Impala-Install-Guide.pdf You have separate drivers for Hive and Impala respectively. Let me know if the above helps.
... View more
02-17-2021
01:20 AM
Hello @uk_travler Compaction will not honour hive.compactor.job.queue. Basically compactions works differently for fully acid tables and insert only tables. For fully acid tables, when you perform a manual/auto compaction, there are two jobs spawned, one MR which is responsible for compaction which will honour compaction queue and another tez job which is responsible for stats analysis and is a tez job submitted to default queue. For inserts only tables, when you perform a manual/auto compaction, there is tez job spawned which is submitted to default queue. There is a jira raised raised which is being worked on it. Bug details for your reference. HIVE-24781 let me know if you have any doubts on the above.
... View more
02-11-2021
10:08 PM
1 Kudo
Hello @ryu The purpose of Ranger is to give the necessary user authentication to access the tables/database. If you allow a certain user access to a particular table/database, the user will be able to perform those actions on the table/database and the unwanted user automatically will not be able to remove the table. Let say there are two users test1 and test2. If I allow test1 user to have access to table t1 and test2 user to have access to table t2, test1 will not be able to see table t2 and test2 will not be able to see test1. You can further also add granularity as to what user can perform what actions on the table. This authorization is checked via Ranger hook which is present in the Hiveserver2. Let me know if the above answers your queries.
... View more
02-10-2021
09:45 PM
Hello @ryu Well you can make that user similar to hive user. hive user is mapped to hadoop group and you can make alterations to simulate a normal user to hive user but again as I mentioned earlier, you'll have to spend time managing it and eventually land up in spending more time in tshooting if things break. Remember hadoop is a complex setup with multiple components talking to each other. 🙂
... View more
02-10-2021
09:41 PM
Hello @dwill @Srivatsan CDH 5.7.3 is a very old version and there has been lot of fixes post that. Again coming to that error. Generally when you see the query in created state, we need to check where exactly the query is waiting. For example, query can be in created state if it is not able to fetch metadata from catalog server which is needed for submitting the query. It can also be in created state if the resources are less and query is queued.
... View more
02-09-2021
07:46 AM
Hello @ryu If you run the job with the end user, you will eventually end up managing internal permissions, job submission permissions your self. Also you will find difficulty integrating things as per my experience. But if you submit the job and let the hive user take care of the file creation, managing part in the backend, admin's job life become easier. You also will be able to hook/integrate things more properly. the above was just a jist, recommendations are to authenticate using the end user but then keep the impersonation off and let the hive take care of things in the backend.
... View more
02-01-2021
09:06 PM
@anujseeker and the HMS logs ?
... View more
02-01-2021
01:52 PM
hello @pphot You can migrate HMS and HS2 instances to any other hosts. So you can add another hosts for HS2 and HMS instance and remove the previous ones once the new ones are added and are functioning normally. For backend database, if you migrate then you have to update the configuration in the CM so that HMS has the updated information and access as to which host it needs to communicate in order to access its backend DB. Please note this is mandatory as Hive has all the information stored in its backend DB. Let me know if the above helps. Regards, Tushar
... View more
02-01-2021
01:47 PM
Hello @BhaveshP The error you are seeing is because certificate CN/SAN name mismatch than that of hostname. Let me try to explain you with an example. client [A]--> Balancer[B] -->NIFI Nodes[C,D,E] Here you have a client A who wants to access NIFI nodes C,D and E and he wants to access via B. When you create SSL certificates for C,D,E, you create 3 certificates with 3 different "Common Names" C, D. E respectively. Now when you try connecting to nifi nodes C,D,E respectively directly from client A, you will not observe the issue. But when you try to access C,D,E via balancer or any proxy B, you are likely to get the error. WHY? Client A is trying to talk to Nifi Node C or D or E but to client the NIFI node is B. During SSL handshake, the certificates are passed by Nifi server to A where in client gets confused becuase it wanted to talk to B however the certificate it got is from C. Name don't match. FIX: Make use to SAN Names in certificates. (Subject Alternative Names) Issue certificate of C,D,E such that it has a SAN name as B. What I am trying to say is, the certificate should have both the name that of C and B in nifi node C, D and B in nifi node D and E and B in nifi node E. In this way client will come to know that when it talks to B and when the certificate received to A from C, it will not get confused because the certificate will have two name present in SAN that of B and C. Let me know if the above gives some clarity as to what exactly is happening.
... View more
02-01-2021
01:34 PM
Hello @anujseeker It seems you are using the wrong path of hive. Below command works for me. hive --orcfiledump -d --rowindex 5 /warehouse/tablespace/managed/hive/tkathpal.db/orctable/delta_0000001_0000001_0000/bucket_00000 Now in my case, hive points to the actual parcel. [root@c2511-node1 ~]# which hive /usr/bin/hive [root@c2511-node1 ~]# ls -lt /usr/bin/hive lrwxrwxrwx 1 root root 22 Aug 3 2020 /usr/bin/hive -> /etc/alternatives/hive [root@c2511-node1 ~]# ls -lt /etc/alternatives/hive lrwxrwxrwx 1 root root 62 Aug 3 2020 /etc/alternatives/hive -> /opt/cloudera/parcels/CDH-7.1.1-1.cdh7.1.1.p0.3266817/bin/hive So when I run hive, ultimately the jars is being picked up from the right path. Could you please check the same on your end?
... View more
02-01-2021
01:27 PM
Hello @AHassan Did you try increasing the hive tez container memory and see if this fixes your issue or not. After connecting to beeline, try checking the hive tez container size first using the below command SET hive.tez.container.size; Once found, try increase the container size to twice the current size. Lets say if you have the container size as 5 GB, then try setting to 10GB and re-run the query. SET hive.tez.container.size=10240MB The reason I am asking to increase the container size is because I can see the attempts failing due to OOM. If still the above fails, tune the below. tez.runtime.io.sort.mb should not be more than 2 GB (ideally it should be 40% of the tez container size) tez.runtime.unordered.output.buffer.size-mb=1000 (ideally it should be 10% of the tez container size)
... View more
02-01-2021
01:16 PM
Hello @anujseeker The best option to submit queries to hive is to use HIVESERVER2 and not hive cli. Hive cli is deprecated. Coming to your main query as to you are unable to create database. You need check the hive cli logs present here ( /tmp/<userid>/hive) to gather more info as why the path was not created and the Hive Metastore logs present here (/var/log/hive/). This is because hive cli (client) here talks directly to HMS and bypasses hiveserver2. If you could share the details from hive cli log and HMS logs, it would be easier to further guide you through the next steps. Regards, Tushar https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
... View more
01-26-2021
07:34 PM
Hey @saurabh707 Could you please try the below: Update hive log4j configs through CM Hive on Tez -> Configuration -> HiveServer2 Logging Advanced Configuration Snippet (Safety Valve) Hive -> Configuration -> Hive Metastore Server Logging Advanced Configuration Snippet (Safety Valve) Add the following to config: appender.DRFA.strategy.action.type=DELETE appender.DRFA.strategy.action.basepath=${log.dir} appender.DRFA.strategy.action.maxdepth=1 appender.DRFA.strategy.action.PathConditions.glob=${log.file}.* appender.DRFA.strategy.action.PathConditions.type=IfFileName appender.DRFA.strategy.action.PathConditions.nestedConditions.type=IfAccumulatedFileCount appender.DRFA.strategy.action.PathConditions.nestedConditions.exceeds=10 Let me know if the above addresses it.
... View more
11-08-2020
08:06 PM
@avlasenko It seems Impala is having trouble communicating with HMS. Could you please check if you are able to perform the same from Hive?
... View more
11-08-2020
07:15 PM
@ateka_18 As mentioned before as well, view is just a query statement in HMS. It's not actually a table. If you want to know the size of the table, below is the best approach. hdfs dfs -du -s -h /path/to/table Let's say you have a table named test(/user/hive/warehouse/default.db/test) and have several partitions under it (part1, part2, part3). To get the size of the table. hdfs dfs -du -s -h /user/hive/warehouse/default.db/test Let me know if the above helps.
... View more
11-07-2020
07:45 AM
@ateka_18 VIEW is just a query statement saved in HMS. So basically its the size of the table you gather and to do so, the best option is below. hdfs dfs -ls <path of the table> With regards to the error you are observing, it clearly says that analyze table is not supported for views. You can get the same for tables.
... View more
11-04-2020
08:51 AM
1 Kudo
Hey @banshidhar_saho Glad to hear it works. regarding permission, it depends upon acl permissions you give. In the previous comment I had updated rwx for group. You can set it something for "other" hdfs dfs -setfacl -m default:group:<group-name>:rwx <Path> hdfs dfs -setfacl -m default:other::- <path> Can you try the above and update if it works.
... View more
11-04-2020
06:54 AM
@pphot Yes hdfs acls will come into picture even if you use Impala. After all Impala is a client for hdfs service. If hdfs path has permissions, let's say no permission for impala user then impala will be unable to read data from hdfs and eventually your query will fail with permission denied error. Let me know if the above clarifies your doubt.
... View more
11-04-2020
06:51 AM
1 Kudo
@banshidhar_saho This is the expected behavior as in Hadoop there are no groups based on user name. It does not use the OS level groups, instead it inherits it from parent directory. The reason is that group resolution happens on NameNode where groups related to the user may not exist. To fix the case in your environment, modify the owner of the parent directory. This will cause new files to use the correct group. Run:-->hdfs dfs -chown -R username:groupname <PATH to DIR> This will recursively change the owner and group for the given <PATH>. (Note: do this with hdfs user) You can further use acls' to ensure the groups you are trying have access to it. hdfs dfs -setfacl -m default:group:<group-name>:rwx <Path> Let me know if this helps
... View more
11-04-2020
01:32 AM
@drgenious Could you please connect to impala-shell and submit the same query just to bee confirmed that the error is not from impala.
... View more
11-02-2020
12:35 AM
Could you please print the error you are observing so that I can help you..
... View more