Member since
07-06-2018
59
Posts
1
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2694 | 03-05-2019 07:20 AM | |
2864 | 01-16-2019 09:15 AM | |
1523 | 10-25-2018 01:46 PM | |
1709 | 08-02-2018 12:34 PM |
07-22-2020
04:31 AM
@Prav You can leverage CM API to track parcel distribution status: /api/v19/clusters/{clusterName}/parcels - This can be used to note the parcel name and version the cluster has access to /api/v19/clusters/{clusterName}/parcels/products/{product}/versions/{version} - This can be used to track the parcel distribution status Refer below link for more details http://cloudera.github.io/cm_api/apidocs/v19/path__clusters_-clusterName-_parcels_products_-product-_versions_-version-.html Hope this helps, Paras Was your question answered? Make sure to mark the answer as the accepted solution. If you find a reply useful, say thanks by clicking on the thumbs up button.
... View more
09-24-2019
12:10 PM
Queries are in the "waiting to be closed" stage if they are in the EXCEPTION state or if all the rows from the query have been read. In either case, the query needs to be explicitly closed for it to be "completed". https://community.cloudera.com/t5/Support-Questions/Query-Cancel-and-idle-query-timeout-is-not-working/td-p/58104 might be useful as well.
... View more
08-21-2019
11:28 AM
Thanks that does show more information. Though what i find weird is the same query has run with a large load earlier (with same config params) and now has failed (from the logs: java.lang.OutOfMemoryError: Java heap space). Regards
... View more
08-09-2019
08:45 AM
Thanks for confirming that. We'll enable for Impala as well but after a week or so, but wanted to know if in the meantime it'd still work or not.
... View more
08-02-2019
09:05 AM
Thanks, to be on the same page taking help of below scenario: hdfs snapshottable location /a/b/ has a file c which is snapshotted. Consider a scenario where c is deleted from hdfs using cli hdfs -rm -r -skipTrash (NN transaction happened and hdfs cli command doesn't show up the file anymore) and then a new file is created with same content/size and name. - What gets stored in hdfs? whats the delta that snapshot add in hdfs in this case? --> is it just that snapshot still holds c as block in hdfs in addition to the same file that was created in hdfs --> NN resource used to maintain both of their metadata in heap? is this all or there is more to it . Regards
... View more
07-30-2019
11:52 PM
1 Kudo
Since the "list" commands gets the apps from the ResourceManager and doesn't set any explicit filters and limits (except those provided with it) on the request, technically it returns all the applications which are present with RM at the moment. That number is controlled by "yarn.resourcemanager.max-completed-applications" config. Hope that clarifies.
... View more
06-06-2019
07:56 AM
1 Kudo
@Prav , This appears to have been listed as a bug (which is actually a longstanding limitation due to the definition of files and directories with _ and . being considered as "hidden" in FileInputFormat in Hadoop) of Hive since the 0.12 version: https://issues.apache.org/jira/browse/HIVE-6431 https://stackoverflow.com/questions/19830264/which-files-are-ignored-as-input-by-mapper If these files are needed to be seen, please consider using a pre-process script to rename them after loading. Thanks,
... View more
03-05-2019
07:20 AM
Ways to change the pools via API today: Use the PUT call of http://$HOSTNAME:7180/api/v19/clusters/<cluster>/services/<yarn>/config, to change yarn_fs_scheduled_allocations, followed by a POST to refresh pools (http://$HOSTNAME:7180/api/v19/clusters/<cluster>/commands/poolsRefresh) Pros: It does update the pools, as desired. It does NOT affect the web UI Cons: The JSON is complex and prone to typos. A typo could mess up all pools and cause issues on the cluster
... View more
12-13-2018
12:08 PM
Hi @Prav,
Unfortunately, there is no officially supported way to increase the number of tables loaded in Hue. However, we do currently have a feature request to improve on this behavior.
In the meantime, you can workaround this by:
distribute the tables in multiple DBs (recommended)
manually adjust the 'max_rows' limit in hive_server2_lib.py as shown in below. However, keep these implications in mind before you do that:
The more the limit is increased, the more it will impact the performance of Hue.
If something goes wrong with Hue, this change would potentially make troubleshooting difficult.
The next time CDH is upgraded (even to a maintenance release), a new copy of hive_server2_lib.py will be installed, and change will have to be made again.
Before making the change, hive_server2_lib.py should be backed up.
Here is the sample code reference from /opt/cloudera/parcels/CDH/lib/hue/apps/beeswax/src/beeswax/server/hive_server2_lib.py:
-----------------
def get_tables(self, database, table_names, table_types=None): if not table_types: table_types = self.DEFAULT_TABLE_TYPES req = TGetTablesReq(schemaName=database, tableName=table_names, tableTypes=table_types) res = self.call(self._client.GetTables, req)
results, schema = self.fetch_result(res.operationHandle, orientation=TFetchOrientation.FETCH_NEXT, max_rows=5000) self.close_operation(res.operationHandle)
return HiveServerTRowSet(results.results, schema.schema).cols(('TABLE_NAME',))
-----------------
Hope this helps,
Li
Cloudera Employee
... View more
10-25-2018
01:46 PM
Found a solution to this, had to get configuration on role level which prints everything set in CM. https://hostname:7183/api/v19/clusters/cluster/services/sentry/roles/role-name/config?view=full
... View more