Member since
03-28-2016
194
Posts
18
Kudos Received
0
Solutions
07-03-2017
07:32 AM
Hi @suresh krish this is now fixed in Ambari 2.5.1: https://issues.apache.org/jira/browse/AMBARI-20868
... View more
11-17-2017
06:05 PM
And also enable yarn acl in yarn>config>Resource Manager yarn.acl.enable=true
yarn.admin.acl=<add the user to this list>
... View more
01-24-2017
06:01 PM
Hi @suresh krish Check if there are any localhost entries under Ranger configs from Ambari and change to appropriate hostname and also, check if HDFS plugin is correctly installed with all necessary access.
... View more
03-13-2017
09:43 PM
2 Kudos
Could you try this, https://hortonworks.secure.force.com/articles/en_US/Issue/java-io-IOException-ORC-does-not-support-type-conversion-from-VARCHAR-to-STRING-while-inserting-into-table
... View more
02-11-2019
06:46 AM
https://stackoverflow.com/questions/40405538/how-to-enable-setup-log4j-for-oozi-java-workflows This tells that you can try adding oozie-log4j.properties in your oozie directory (where workflow.xml is).
... View more
03-05-2018
07:53 PM
@Geoffrey Shelton Okot Can i use open ldap instead of AD , i mean create users and groups in openldap and use it as backend for Kerberos?? Is it good practice?
... View more
04-27-2017
11:43 AM
References worked for me by replacing /dev/random to /dev/../dev/urandom in java.security file !
... View more
08-25-2016
12:24 AM
5 Kudos
@suresh krish Answer from Santhosh B Gowda could be helpful, but that is brute force with 50-50% chance of luck. You need to understand query execution plan, how much data is processed, how many tasks execute the job. Each task has a container allocated. You could increase the RAM allocated for the container but if you have a single task performing the map and data is more than the container allocated memory you are still seeing "Out of memory". What you have to do is to understand how much data is processed and how to chunk it for parallelism. Increasing the size of the container is not always needed. It is almost like saying that instead of tuning a bad SQL, let's throw more hardware at it. It is better to have reasonable size containers and have enough of them to process your query data. For example, let's take a cross-join of a two tables that are small, 1,000,000 records each. The cartesian product will be 1,000,000 x 1,000,000 = 1,000,000,000,000. That is a big size input for a mapper. You need to translate that in GB to understand how much memory is needed. For example, assuming that the memory requirements are 10 GB and tez.grouping.max-size is set to the default 1 GB, 10 mappers will be needed. Those will use 10 containers. Now assume that each container is set to 6 GB each. You will be wasting 60 GB for 10 GB need. In that specific case, it would be actually better to have 1 GB container. Now, if your data is 10 GB and you have only one 6 GB container, that will generate "Out of memory". If the execution plan of the query has one mapper that means one container is allocated and if that is not big enough, you will get your out of memory error. However, if you reduce tez.grouping.max-size to a lower value that will force the execution plan to have multiple mappers, you will have one container for each and those tasks will work in parallel reducing the time and meeting data requirements. You can override the global tez.grouping.max-size for your specific query. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/ref-ffec9e6b-41f4-47de-b5cd-1403b4c4a7c8.1.html describes Tez parameters and some of them could help, however, for your case you could give tez.grouping.max-size a shot. Summary: - Understanding data volume that needs to be processed - EXPLAIN SqlStatement to understand the execution plan - tasks and containers - use ResouceManager UI to see how many containers are used and cluster resources used for this query; Tez View can also give you a good understanding of Mapper and Reducer tasks involved. The more of them the more resources are used, but the response time is better. Balance that to use reasonably resources for a reasonable response time. - setting tez.grouping.max-size to a value that makes sense for your query; by default is set to 1 GB. That is a global value.
... View more
07-21-2016
09:46 PM
Cab you run below command and paste the output. show partitions <table name>;
... View more
07-01-2016
07:26 PM
@suresh krish Yes, you will not be able to set this at runtime unless you have included that in the whitelist until this is done and you have restarted hive service along with HiveServer2 this would not take effect.
... View more
- « Previous
-
- 1
- 2
- Next »