About cstanca

Eric_Periard · ‎09-29-2016

Changed YARN Java heap size from 1Gb to 4... it still dies?

coatespt · ‎09-20-2016

Do you mean 50Mbps per mapper or for the cluster as a whole? (I assume you mean the former, as the latter would imply almost two days to read a TB of S3 data.) Assuming you do mean 50Mbps per mapper, what is the limit on S3 throughput to the whole cluster—that’s the key information. Do you have a ballpark number for this?

pankajpanjiar · ‎12-06-2016

Hi Diego, Could you please explain how exactly you got your issue resolved. I am facing same issue and i am using my personal network (not company's)

dolenam317 · ‎07-17-2018

@mb I am facing the same issue, could you please advice how to work around or troubleshoot this problem ? Thanks, Nam

cstanca · ‎09-01-2016

@deepak sharma Crazy enough. I just reached to this customer and s simple restart of Kafka service addressed the issue. Kerberos was enabled recently and probably this service was not restarted. Not much to learn. The symlink suggestion from you is an interesting approach which while not applicable here, is worth it to remember for other situations. Thank you for the suggestion.

vnv · ‎08-26-2016

That did the trick! Thanks @Constantin Stanca!

arunak · ‎08-26-2016

Not Sure why, but when a user "x" was created in IPA, there was an entry for x under users and also under groups. Could be this lead to ambiguity for the search to locate the right user "x" (arun in my case). To resolve the ambiguity, I thought of referring users by their uid rather than the default cn, which could conflict.

cstanca · ‎08-25-2016

@suresh krish Answer from Santhosh B Gowda could be helpful, but that is brute force with 50-50% chance of luck. You need to understand query execution plan, how much data is processed, how many tasks execute the job. Each task has a container allocated. You could increase the RAM allocated for the container but if you have a single task performing the map and data is more than the container allocated memory you are still seeing "Out of memory". What you have to do is to understand how much data is processed and how to chunk it for parallelism. Increasing the size of the container is not always needed. It is almost like saying that instead of tuning a bad SQL, let's throw more hardware at it. It is better to have reasonable size containers and have enough of them to process your query data. For example, let's take a cross-join of a two tables that are small, 1,000,000 records each. The cartesian product will be 1,000,000 x 1,000,000 = 1,000,000,000,000. That is a big size input for a mapper. You need to translate that in GB to understand how much memory is needed. For example, assuming that the memory requirements are 10 GB and tez.grouping.max-size is set to the default 1 GB, 10 mappers will be needed. Those will use 10 containers. Now assume that each container is set to 6 GB each. You will be wasting 60 GB for 10 GB need. In that specific case, it would be actually better to have 1 GB container. Now, if your data is 10 GB and you have only one 6 GB container, that will generate "Out of memory". If the execution plan of the query has one mapper that means one container is allocated and if that is not big enough, you will get your out of memory error. However, if you reduce tez.grouping.max-size to a lower value that will force the execution plan to have multiple mappers, you will have one container for each and those tasks will work in parallel reducing the time and meeting data requirements. You can override the global tez.grouping.max-size for your specific query. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/ref-ffec9e6b-41f4-47de-b5cd-1403b4c4a7c8.1.html describes Tez parameters and some of them could help, however, for your case you could give tez.grouping.max-size a shot. Summary: - Understanding data volume that needs to be processed - EXPLAIN SqlStatement to understand the execution plan - tasks and containers - use ResouceManager UI to see how many containers are used and cluster resources used for this query; Tez View can also give you a good understanding of Mapper and Reducer tasks involved. The more of them the more resources are used, but the response time is better. Balance that to use reasonably resources for a reasonable response time. - setting tez.grouping.max-size to a value that makes sense for your query; by default is set to 1 GB. That is a global value.

cstanca · ‎08-23-2016

@Kumar Veerappana Assuming that you are only interested who has access to Hadoop services, extract all OS users from all nodes by checking /etc/passwd file content. Some of them are legitimate users needed by Hadoop tools, e.g. hive, hdfs, etc.For hdfs, they will have a /user/username folder in hdfs. You can see that with hadoop -fs ls -l /user executed as a user member of the hadoop group. If they have access to hive client, they are able to also perform DDL and DML actions in Hive. The above will allow you to understand the current state, however, this is your opportunity to improve security even without the bells and whistles of Kerberos/LDAP/Ranger. You can force the users to access Hadoop ecosystem client services via a few client/edge nodes, where only client services are running, e.g. Hive client. Users, other than power users, should not have accounts on name node, admin node or data nodes. Any user that can access those nodes where client services are running can access those services, e.g. hdfs or Hive.

mburgess · ‎12-02-2016

What error(s) are you seeing? If it mentions Avro, then if your column names are in Chinese, it's likely that Avro does not accept them. This may be alleviated in NiFi 1.1.0 with NIFI-2262, but it would just replace non-Avro-compatible characters with underscores, so you may face a "duplicate field" exception. In that case you would need column aliases in your SELECT statement to use Avro-compatible names for the columns.

Online	Offline
Last Visited	‎03-22-2019 03:12 AM

Member Since	‎03-16-2016 04:06 PM
Last Visited	‎03-22-2019 03:12 AM
Posts	707
Kudos received	1728

Cloudera Community

Re: 5th attempt at getting an answer to this quest...

Re: Trying to reinstall Apache NiFi 1.5 on HDF 3.1

Re: Is it mandatory that we should have exact moun...

Re: Alternate to smartsense

Re: Tracking of Hive tables metadata changes in re...

Re: Lz0 is enabled now what?

Re: Are there any special considerations or optimi...

Re: HDP2.4 sandbox - no internet access (VirtualBo...

Re: Spark sqlContext - unable to access hbase tabl...

Re: Issue with Kafka after Kerberos was enabled: d...

Re: HDP 2.5 TP Sandbox Virtual Box image not worki...

Re: Not able to login to Ambari using LDAP User

Re: HIVE TEZ Java Heap size

Re: How to get the list of users

Re: How to insert data into Hive using NiFi ?