Member since
07-17-2017
143
Posts
16
Kudos Received
17
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
758 | 07-03-2019 02:49 AM | |
893 | 04-22-2019 03:13 PM | |
800 | 01-30-2019 10:21 AM | |
5564 | 07-25-2018 09:45 AM | |
4752 | 05-31-2018 10:21 AM |
08-08-2019
12:09 AM
Hi @Zane- This is an OOM (Out If Memory) error, it simply means that this query needs more memory to be completed; it usually occurs when the cluster is in charge. The request has exceeded the existing memory. The solution is to add more memory to your nodes or to add more nodes. Otherwise, try to optimize your queries, and you can make sure that your impala setting is optimal too. https://www.cloudera.com/documentation/enterprise/latest/topics/impala_performance.html Good Luck.
... View more
07-04-2019
07:29 AM
hi @punshi Try to read this to get more info about HDFS caching in Impala: https://www.cloudera.com/documentation/enterprise/5-16-x/topics/impala_perf_hdfs_caching.html
... View more
07-04-2019
07:05 AM
Have you made any manual changes to the metastore user or database permissions? because it looks like the DBS table is not found in metastore DB! Remark: check if the CM point to the correct metastore DB with the pertinent user.
... View more
07-04-2019
02:22 AM
Hi @andreas It looks like you have a connectivity issue with hive metastore.. try to put off the firewall and test. Else, please share with us those log files: /var/log/hive/hadoop-cmf-hive-HIVEMETASTORE-XXXX.log.out /var/log/impalad/impalad.hdm1.emd.impala.log.INFO.YYYYYYY Good luck
... View more
07-04-2019
02:16 AM
Hi @punshi Yes you can change it by editing this parameter: Maximum Memory Used for Caching dfs.datanode.max.locked.memory But you need to know that data caching has moved from HDFS to memory (RAM), so you can not increase it considerably!
... View more
07-03-2019
11:03 AM
Hi @punshi How much cache space you have configured? please try this hdfs command to display the details of cache configured and used. hdfs dfsadmin -report
... View more
07-03-2019
02:49 AM
HI @punshi Do you use the QuickStart VM for CDH 5.13? Indeed, the VM is just for testing !! To unlock the real power of the impala (CDH), you should have a cluster so that you can benefit from the senergy of several nodes. Anyway, you can improve your query time by: 1- Set PARQUET_FILE_SIZE = 256MB instead of 512MB. 2- Try to minimize the number of partitions (in fact, I think in your case, the year is sufficient). NB: I think that in a cluster that has more than 10 nodes, this request will not exceed 2 to 4 seconds. Good luck.
... View more
05-06-2019
04:42 PM
Hi @anis447 Can you bring us this queries results in SQL server and Impala: Select avg(tagno) from tag;
Select avg(tagno) from has_tag;
Select count(*) from tag where tagno is null;
Select count(*) from has_tag where tagno is null; Also try to add this on Impala query, and let us know if there is any change: ...
Inner join has_tags hit on (s.tagno = hit.tagno and s.categorycode = hit.categorycode)
... Good luck.
... View more
04-28-2019
03:35 AM
1 Kudo
Hi @bridgor It looks like you have a firewall issue between your nodes.. try to check iptables. Please share with us the impalad log files. Good luck.
... View more
04-25-2019
08:14 AM
Hi @Ravikiran Normaly, most softwares on top of HDFS reads only one replication in one time, so the memory will be charged with the real size of the data, not size*replication_factor. I think YARN service that concerned by the resource management take this role, and always the most free and fast datanode will be chosed to pull his data to memory to be treated by impalad. Good luck.
... View more
04-22-2019
03:13 PM
Hi, the unix_timestamp function return a number of seconds but it seems that ipid.bk_eff_strt_dt column is inserted with the milliseconds number!? About your query try this: SELECT COUNT(*) FROM ipid WHERE unix_timestamp('20190124',"yyyyMMdd")*1000 BETWEEN ipid.BK_EFF_STRT_DT AND ipid.BK_EFF_END_DT; Good luck.
... View more
04-21-2019
04:48 AM
Hi, The "thread_network_send_wait_time" is the sum of the time spent waiting to send data over the network by all threads of the query, so it depends mainly to your nodes and/or client network, try to use "profile" to see who is the slowest node network that made the problem or maybe it's just a load issue..
... View more
04-21-2019
04:35 AM
Hi, Optimization is not an easy task, please try to read this document and explore the reel Impala query optimization methods. https://www.cloudera.com/documentation/enterprise/5-15-x/topics/impala_performance.html Good luck.
... View more
02-09-2019
04:47 AM
Hi @Tim Armstrong While IMPALA-1618 steel open and unresolved, I confirmed that this "workaround" is safe and efficient (I'm using it on a large scope and during more than 9 months) so that this is the only solution I find to solve or -get around- this big problem. Hope that the main problem will be fixed ASAP. Thanks for the remark.
... View more
02-09-2019
04:29 AM
Hi @Bishnup If you still have the same problem please try to share with us your URL string.
... View more
02-09-2019
04:01 AM
1 Kudo
Hi @zeni86cit try to change the file content by this SELECT CONCAT('invalidate metadata ', trim(table_name), '; refresh ', trim(table_name) ,';') from my_table; or SELECT CONCAT("invalidate metadata ", trim(table_name), "; refresh ", trim(table_name) ,";") from my_table; Good luck.
... View more
02-02-2019
06:49 AM
Hi @kampith The first step of troubleshooting a cloudera service is to look at it's log file, so share it with us please. the log files paths : scm agent: /var/log/cloudera-scm-server/cloudera-scm-agent.log scm-server /var/log/cloudera-scm-server/cloudera-scm-server.log impala deamon: /var/log/impalad Good luck.
... View more
01-30-2019
10:21 AM
Hi @Rr, Please give us more details, errors messages or screenshots so we can help you.
... View more
10-01-2018
05:03 AM
Hi, Please try to change all these 3 params: TSaslTransportBufSize=4000;
RowsFetchedPerBlock=60536;
SSP_BATCH_SIZE=60536;
... View more
09-29-2018
12:25 PM
Hi @Bishnup ConfiguringServer-SideProperties When connecting to a server that is running Impala 2.0 or later, you can use the driver to apply configuration properties to the server by setting the properties in the connection URL. https://www.cloudera.com/documentation/other/connectors/impala-jdbc/latest/Cloudera-JDBC-Driver-for-Impala-Install-Guide.pdf Good luck.
... View more
07-26-2018
02:17 AM
Hi @lonetiger You can do it by two kind of scripts: 1- show all tables: SHOW TABLES; get the list and run this query on all tables: DESCRIBE FORMATTED tableX; Then you can get the results and extract the owner, if it's the desired one, drop the table. 2- Connect with your hive metastore DB and get the table list of the owner you want SELECT "TBL_NAME"
FROM "TBLS"
WHERE "OWNER" = 'ownerX'; Then drop them. Good luck.
... View more
07-25-2018
09:45 AM
Hi @Tomas79 Check the value of your params SSL=
UseSASL= in odbc.ini
... View more
07-25-2018
09:38 AM
Hi @lonetiger What do you mean by table owner ? If you use apache sentry just login to impala-shell by the user concerned (owner) and execute show tables; and you'll see just this user tables Good luck.
... View more
07-25-2018
09:24 AM
@saranvisa Thank you, I think your idea is very good to apply in several use cases to define the modified hive/impala tables. . But I thing there is a misunderstanding, because I mean by the 56 tables the metastore (postgres) database tables! And the backup approach that I used based on 2 steps: 1- Backup the updated HDFS tables by DistCp (done). 2- Backup the metastore (postgres) datastores (current question ).
... View more
07-25-2018
07:23 AM
Hi, Thanks @saranvisa to you response, Yes, but the question is how can I define the delta/impacted tables backup within about 56 tables in the metastore database. Thanks again.
... View more
07-25-2018
02:36 AM
Hi all, I use DistCp every X minutes to transfert the HDFS data to a hot bachup cluster, should I replicate the whole hive metastore database (manualy or using DB HA..) to accomplish the backup/restore ? or I need just to import/export only some specefic hive metastore tables ? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hive
-
HDFS
07-25-2018
02:27 AM
I'm not using the BDR. I want to make a hot backup cluster, so I'll test to replicate the whole metastore database every X minutes and see the results. Thanks for your replies.
... View more
07-25-2018
02:16 AM
Thanks @alexmc6 for your reply, The note in the documentation is clear: "you must run the Impala INVALIDATE METADATA statement on the destination cluster to prevent queries from failing". I have some tables with a big partition number, and I use it in real-time cases, so there is no time to execute the first query and wait a minutes ..
... View more
07-20-2018
02:52 AM
HI @Derek Try this workaround: 1- Create new table with the original table's data. 2- Drop all data from old table (using delete). 3- Insert data from new table into old table. 4- Drop new table . Good luck.
... View more
07-18-2018
04:21 AM
Hi, My question is, after using DistCp to transfert all the HDFS data to a second cluster, can I replicate the whole hive metastore database (manualy or using DB HA..) to accomplish the backup/restore ?or I should import/export only some specefic hive metastore tables ? Thanks.
... View more
Labels:
- Labels:
-
Apache Hive
-
Cloudera Manager