About mathieu.d

mathieu.d · ‎11-03-2017

Alternatively you could search around "yarn queue" and ressource allocation. This will not "restrict" the number of mappers or reducers but this will control how many can run concurrently by giving access to only a subset of the available resources.

mathieu.d · ‎11-03-2017

Hi, The concept of Hive partition do not map to HBase tables. So if you want to have HBase as the storage then you will need to workaround your use case. You could try to use "one HBase table" having a row key constructed with the partition value. That way you should be able to query your HBase table using the row key and avoid a full scan of the table. Or you could have one HBase table per "partition" (this also mean one hive table per partition). Or you could see that HBase do not answer your need and stay in Hive ? regards, Mathieu

mathieu.d · ‎10-25-2017

I think what you search is a configuration located inside the "core-site.xml" file (in HDFS configuration). search for "proxyuser" on the documentation of Cloudera. regards, Mathieu

csguna · ‎10-19-2017

there are couple of places that needsd tuining in the query level 1 . stats for the table is must for good performance 2. when user is joining two tables make sure there are using the large table in the last and the first table is smaller 3. you can also use HINTS to imporve query performance. 4. hive table's file format is big a factor 5. choosing when to use paritioning vs bucketing. 6.allocate good memory to hiveserver2 and metastore 7.heapsize 8 .load balancer on the host https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_cm_ha_hosts.html#concept_qkr_bfd_pr

oraman · ‎10-14-2017

Do you need the --override? I reran my Tutorial 1 and it didnt append new records....I thought it would...why do you think it allowed it?

sunilosunil · ‎09-15-2017

Finding logs manually in machine sound very brute force; I was thinking more of an API or CLI option to find logs Anyway the main issue we're trying to solve is access to logs to all developers in prod environment. Our node managers are behind the bars and not accessible ( any port or web ) to develoeprs and it's unlikely to happen. So we're trying to find a way to proxy the logs. I discovered that there is a jobhistory proxy to look at completed jobs / yarn apps but I coudln't get it working for running app. Is there any trick / way to access running app's logs like above ? http://resourcemanager.xyz.com:19888/jobhistory/logs//dataNode.com:8041/container_id_000001/container_id_000001/root

mathieu.d · ‎09-08-2017

I believe this wait time of 30s is hard coded into the cloudera agent. I don't think we can alter it other than doing a real dirty modification which I wouldn't recommend. regards, Mathieu

chriswalton007 · ‎08-11-2017

Thank you for the detailed answer

pdvorak · ‎07-25-2017

the hbase-indexer morphlines.conf is managed by CM, and will automatically be distributed to each node in the /var/run/cloudera-scm-agent/process directory when hbase-indexer starts. You'll want to specify a relative path name in the morphline-hbase-mapper.xml, and it will pick it up from the process directory: https://www.cloudera.com/documentation/enterprise/latest/topics/search_hbase_batch_indexer.html#concept_q3l_2tb_4r -pd

mathieu.d · ‎06-12-2017

From my understanding when you use the Sentry HDFS synchronization plugin you only need to set the following ACLs : hive:hive / 771 https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_sg_hiveserver2_security.html#concept_vxf_pgx_nm https://www.cloudera.com/documentation/enterprise/latest/topics/sg_sentry_service_config.html#concept_z5b_42s_p4__section_lvc_4g4_rp Then it is the plugin that will manage the other permission according to permissions granted in Sentry. If you set the permissions yourself then there is not point in using the Sentry HDFS synchronization plugin.

Online	Offline
Last Visited	‎01-17-2018 02:52 AM

Member Since	‎07-16-2015 01:41 AM
Last Visited	‎01-17-2018 02:52 AM
Posts	177
Kudos received	28

Cloudera Community

Re: Unable to delete HDFS Corrupt files

Re: Hive partitions based on date from timestamp

Re: Partition Hive Table to Hbase Handler ?

Re: yarn logs location on disk

Re: Increase Flume graceful restart time

Re: Hive limit number of mappers and reducers

Re: Partition Hive Table to Hbase Handler ?

Re: Delegation UID with Hive

Re: Adding nodes will improve performance ?

Re: Exercise 1 Sqoop import fails

Re: yarn logs location on disk

Re: Increase Flume graceful restart time

Re: Should Impala release memory after use?

Re: hbase-indexer's configuration in Zookeeper?

Re: What are the ideal ACL's that need to be appli...