About sandyy006

sandyy006 · ‎09-09-2018

@Daniel Zafar Apache Tez replaces MapReduce as the default Hive execution engine in HDP 3.0. MapReduce is no longer supported. You may want to check what is the actual issue with Tez and fix it. Ref: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/hive-overview/content/hive-apache-hive-3-architecturural-overview.html

sandyy006 · ‎09-07-2018

@Kumar Veerappan Sizing HiveServer2 Heap Memory The following are general recommendations for sizing heap memory of a HiveServer2 instance: 1 to 20 concurrent executing queries: Set to 6 GB heap size. 21 to 40 concurrent executing queries: Set to 12 GB heap size. More than 40 concurrent executing queries: Create a new HiveServer2 instance. See Multiple HiveServer2Instances for Different Workloads for how to add a HiveServer2 instance. Please refer the official doc : https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_hive-performance-tuning/content/ch_connectivity-admission-control.html#guidelines-hiveserver2-heaps

sandyy006 · ‎09-07-2018

@Ronnie 10 Ambari will by default pick up the mount points and configure them to appropriate services. eg: For HDFS ambari configures dfs.datanode.data.dir and dfs.namenode.data.dir with all the mount points. Do when you start using HDFS you should see data insode your /data0 /data1 /data2 and so on. Hope this helps.

sandyy006 · ‎09-06-2018

@Michael Bronson By default Spark2 has log level as WARN. Set it to INFO to get more context on what is going on in the driver and executor. More over the log will be locally available in Nodemanager when the container is still running. The easiest way is to go to spark UI (yarn application master UI) -> click on executors tab -> Here you should see stderr and stdout corresponding to driver and executors. Regarding the WARN on heartbeat , we'd need to check what driver is doing at that point. I think you already have asked another question with more details on driver and executor.

sandyy006 · ‎09-06-2018

@Michael Bronson Spark will not log anything in Datanode machines(where executors/containers are running) at /var/log/spark2. Spark app is like any other yarn application. When the application is running the logs will be stored in the container home directory and then it will be moved to hdfs post log aggregation(which can be extracted by yarn logs command). Hope this helps.

sandyy006 · ‎09-06-2018

@Michael Bronson, You will see Spark thrift server and Spark History Server in /var/log/spark2. The above log4j i proposed is for spark applications (which will not be stored in /var/log/spark2 rather you should use yarn logs command and extract the log). What is it you want to enable DEBUG logging for? spark application , Spark Thrift server or Spark History Server?

sandyy006 · ‎09-05-2018

@Michael Bronson Use below : # Set everything to be logged to the console log4j.rootCategory=DBEUG, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}%p %c{1}:%m%n # Settings to quiet third party logs that are too verbose log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO log4j.logger.org.apache.spark.metrics.MetricsConfig=DEBUG log4j.logger.org.apache.spark.deploy.yarn.Client=DEBUG

sandyy006 · ‎08-30-2018

@David Hoyle The code structure has changed since this article was written. 1) checkout trunk 2) brew install protobuf250 (protobuf is needed to build hadoop) 3) Build using : mvn clean package -Phdds -Pdist -Dtar -DskipShade -DskipTests -Dmaven.javadoc.skip=true edit: updated the proto version

sandyy006 · ‎08-27-2018

This Blog series would give you the complete overview of Hive+Druid : https://hortonworks.com/blog/apache-hive-druid-part-1-3/

sandyy006 · ‎08-21-2018

Not Yet...

Online	Offline
Last Visited	‎01-23-2020 02:33 AM

Member Since	‎02-01-2019 10:51 AM
Last Visited	‎01-23-2020 02:33 AM
Posts	650
Kudos received	142

Cloudera Community

Re: Distributed I/O Benchmark of HDFS

Re: How to reset Ambari admin password in Ambari 2...

Re: Discovering existing Hive tables in Atlas

Re: Does Distcp use Tez now in HDP 3.0 instead of ...

Re: hive server2 interactive logs

Re: Cannot Disable Tez with Hive on HDP3.0

Re: Java Heap size recommendation for Hiveserver2 ...

Re: will Hortonworks create hdfs folder under /dat...

Re: why spark2 logs are not created in the datano...

Re: why spark2 logs are not created in the datano...

Re: change the Advanced spark2-log4j-propertiese ...

Re: change the Advanced spark2-log4j-propertiese ...

Re: What is HDFS Ozone?

Re: When to use hive table over druid table

Re: Spark streaming support for stream analytics m...