Member since
01-09-2019
401
Posts
163
Kudos Received
80
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2604 | 06-21-2017 03:53 PM | |
| 4310 | 03-14-2017 01:24 PM | |
| 2404 | 01-25-2017 03:36 PM | |
| 3842 | 12-20-2016 06:19 PM | |
| 2103 | 12-14-2016 05:24 PM |
05-20-2016
01:58 PM
There is no additional charge from Hortonworks for selecting sandbox VM. However, after 30 days of free trial, you may get billed from Azure for any usage as a resource usage time (not for selecting this specific Hortonworks sandbox)
... View more
05-18-2016
04:10 PM
You can take a look at http://hortonworks.com/blog/hortonworks-sandbox-azure/ If you are new to azure, you will get one month free trial that you can use to try hortonworks sandbox on azure.
... View more
05-18-2016
03:41 PM
You can use either a selfjoin or rank to get only the latest extraction date. This can then either go into a view on top of your table or a new table that does not have duplicates. Query for view/new table creation would use the below select. SELECT
<columns>
FROM
(SELECT *, RANK() over (partition by xyz,
order by DateExtraction desc) as rank
FROM onetable) ranked_data
WHERE ranked_data.rank=1;
... View more
05-17-2016
05:20 PM
You can take a look at http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/ yarn.nodemanager.local-dirs: This is a comma separated list of local-directories that one can configure to be used for copying files during localization. The idea behind allowing multiple directories is to use multiple disks for localization – it helps both fail-over (one/few disk(s) going bad doesn’t affect all containers) and load balancing (no single disk is bottlenecked with writes). Thus, individual directories should be configured if possible on different local disks. You can follow same approach for container logs as well.
... View more
05-17-2016
01:53 PM
Generally not a good idea to use /var/log for yarn.nodemanager.log-dirs which are container logs. Typically, we direct these logs to all the Data mount points (like /grid/N/yarn/log). Same thing for yarn local (/gird/N/yarn/local). This can help with reducing all your IO going to your OS disk (where you typically have /var/log)
... View more
05-17-2016
01:20 PM
You can add a comma separated list of local/log directories that go to different disks instead of a single folder there to avoid location getting full. It is not a good idea to try to write it to hdfs (even if its possible which I doubt)
... View more
05-17-2016
01:16 PM
1 Kudo
In your configs for hue, I think you have 'namenode' as hdfs://irxvlndchad1.corp.irco.com.namenode.host:8020. Please change it right one. (hdfs://irxvlndchad1.corp.irco.com:8020)
... View more
05-16-2016
06:26 PM
1 Kudo
which version of ambari is this? On current latest ambari version (2.2.2.0) I see an explicit check to not go to that code. elif OSCheck.is_redhat7()
return PG_HBA_ROOT_DEFAULT
... View more
05-16-2016
02:46 PM
We had a comparative run between hive on tez and spark sql and have run into multiple outliers on sparksql that took a long time. Are you seeing these issues with a single query or have run into this on multiple runs? If you are using ORC, you can set spark.sql.orc.filterPushdown to true. You can also try increasing executor memory. But you need to look at logs to see where its taking this time and if there are any GC issues.
... View more