About ravi1

ravi1 · ‎05-20-2016

There is no additional charge from Hortonworks for selecting sandbox VM. However, after 30 days of free trial, you may get billed from Azure for any usage as a resource usage time (not for selecting this specific Hortonworks sandbox)

ravi1 · ‎05-18-2016

You can take a look at http://hortonworks.com/blog/hortonworks-sandbox-azure/ If you are new to azure, you will get one month free trial that you can use to try hortonworks sandbox on azure.

ravi1 · ‎05-18-2016

You can use either a selfjoin or rank to get only the latest extraction date. This can then either go into a view on top of your table or a new table that does not have duplicates. Query for view/new table creation would use the below select. SELECT <columns> FROM (SELECT *, RANK() over (partition by xyz, order by DateExtraction desc) as rank FROM onetable) ranked_data WHERE ranked_data.rank=1;

ravi1 · ‎05-17-2016

You can take a look at http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/ yarn.nodemanager.local-dirs: This is a comma separated list of local-directories that one can configure to be used for copying files during localization. The idea behind allowing multiple directories is to use multiple disks for localization – it helps both fail-over (one/few disk(s) going bad doesn’t affect all containers) and load balancing (no single disk is bottlenecked with writes). Thus, individual directories should be configured if possible on different local disks. You can follow same approach for container logs as well.

ravi1 · ‎05-17-2016

Generally not a good idea to use /var/log for yarn.nodemanager.log-dirs which are container logs. Typically, we direct these logs to all the Data mount points (like /grid/N/yarn/log). Same thing for yarn local (/gird/N/yarn/local). This can help with reducing all your IO going to your OS disk (where you typically have /var/log)

ravi1 · ‎05-17-2016

You can add a comma separated list of local/log directories that go to different disks instead of a single folder there to avoid location getting full. It is not a good idea to try to write it to hdfs (even if its possible which I doubt)

ravi1 · ‎05-17-2016

In your configs for hue, I think you have 'namenode' as hdfs://irxvlndchad1.corp.irco.com.namenode.host:8020. Please change it right one. (hdfs://irxvlndchad1.corp.irco.com:8020)

ravi1 · ‎05-16-2016

which version of ambari is this? On current latest ambari version (2.2.2.0) I see an explicit check to not go to that code. elif OSCheck.is_redhat7() return PG_HBA_ROOT_DEFAULT

ravi1 · ‎05-16-2016

Correct.

ravi1 · ‎05-16-2016

We had a comparative run between hive on tez and spark sql and have run into multiple outliers on sparksql that took a long time. Are you seeing these issues with a single query or have run into this on multiple runs? If you are using ORC, you can set spark.sql.orc.filterPushdown to true. You can also try increasing executor memory. But you need to look at logs to see where its taking this time and if there are any GC issues.

Online	Offline
Last Visited	‎12-18-2021 05:54 PM

Member Since	‎01-09-2019 05:01 PM
Last Visited	‎12-18-2021 05:54 PM
Posts	401
Kudos received	163

Cloudera Community

Re: 2 hosts not running master services

Re: ambari restart and service restart updating kr...

Re: How to automate sqoop incremental import using...

Re: Path to core-site.xml in sandbox?

Re: Curious to know why majority of the people are...

Re: Will i be charged to my credit card for choosi...

Re: I have to practise hadoop(hive,pig,sqoop,oozie...

Re: Can't delete rows in Hive table + complex hive...

Re: Can we change yarn.nodemanager.log-dirs value ...

Re: Can we change yarn.nodemanager.log-dirs value ...

Re: Can we change yarn.nodemanager.log-dirs value ...

Re: Hue Oozie: UnknownHostException error during s...

Re: While starting ambari server, I am getting Ind...

Re: how many hdfs users do we have?

Re: Spark SQL query execution is very very slow wh...