About zhuangmz

zhuangmz · ‎11-17-2016

Hi, I'm confused about how to config memory in YARN cluster. So far, I have some machines each with 64GB physical memory. Because all machines are 64GB, I can set a unified yarn.nodemanager.resource.memory-mb=60GB。 If I want to add new and better machines with 128GB physical memory. How should I set the memory configuration in YARN. If I set yarn.nodemanager.resource.memory-mb=120GB, will this affect the old machines? If I set yarn.nodemanager.resource.memory-mb=60GB, will this waste the new machines' resource? Is there a relative ration to fit each machine adaptively? For example, set yarn.nodemanager.resource.memory-mb=0.8*physical_memory? Thanks.

zhuangmz · ‎11-16-2016

<repository> <id>Cloudera Repository</id> <url>https://repository.cloudera.com/content/repositories/releases/</url> </repository> <repository> <id>Cloudera Beta Repository</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository> I'm using this links 🙂

zhuangmz · ‎11-16-2016

Oh, I see. The prefix URL is different.,,

zhuangmz · ‎11-16-2016

this link https://github.com/OryxProject/oryx/tree/master/bin Thanks

zhuangmz · ‎11-15-2016

Hi, the link is unavailabe now. Could you provide a new link?

zhuangmz · ‎11-15-2016

I'm so stupid... There's a Hive Service configuration item in Spark 2.0.0 beta2... Just check this is enable to the correct Hive Service in CDH.

zhuangmz · ‎11-15-2016

Hi, I guess I met a Hive metastore configuration issue. Here's my first post: https://community.cloudera.com/t5/Beta-Releases-Apache-Kudu/Spark-2-beta-load-or-save-Hive-managed-table/m-p/47397#U47397 Could anyone tell me how to config Thanks.

zhuangmz · ‎11-15-2016

Hi, I'm using Spark 2.0.0 beta2. I find the following beta2.jars /opt/cloudera/parcels/SPARK2-2.0.0.cloudera.beta2-1.cdh5.7.0.p0.110234/lib/spark2/jars/spark-core_2.11-2.0.0.cloudera.beta2.jar Spark2 beta works well by using interactive shell spark2-shell. I'm going to write some java code. Is there a maven repository for beta projects? Just like release repository https://repository.cloudera.com/content/repositories/releases/ ? I've tried https://repository.cloudera.com/content/repositories/beta but got 404 error. Thanks.

zhuangmz · ‎11-15-2016

another concern is about $SPARK_LIBRARY_PATH. might be something wrong with Hive jar dependencies.

zhuangmz · ‎11-15-2016

Thank you, Brian. I think the problem is about cannot connect to Hive metastore. The default database is not empty. If I use spark-shell in Spark 1.6.0 (/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/spark): scala> sqlContext.sql("show tables").show() scala> sys.env("HADOOP_CONF_DIR") res1: String = /usr/lib/spark/conf/yarn-conf:/etc/hive/conf:/etc/hive/conf It works well and prints all the table managed by Hive. However, in Spark 2.0.0 (SPARK2-2.0.0.cloudera.beta1-1.cdh5.7.0.p0.108015): scala> sys.env("HADOOP_CONF_DIR") res0: String = /opt/cloudera/parcels/SPARK2-2.0.0.cloudera.beta2-1.cdh5.7.0.p0.110234/lib/spark2/conf/yarn-conf There's no hive related conf dir in $HADOOP_CONF_DIR By the way, in Spark 2.0.0 scala> val df = spark.read.parquet("/user/hive/warehouse/test_db.db/test_table_pqt") scala> df.show(5) This works with pre-managed Hive table in Spark 1.6.0.

Online	Offline
Last Visited	‎09-06-2017 05:11 AM

Member Since	‎10-19-2016 06:22 AM
Last Visited	‎09-06-2017 05:11 AM
Posts	52
Kudos received	3

Cloudera Community

Re: Cannot auto-commit offsets for group console-c...

Re: Spark2 beta Hive metastore configuration

Re: Spark 2 beta load or save Hive managed table

YARN memory configuration relatively?

Re: Maven Repository for Spark2.0 beta?

Re: Maven Repository for Spark2.0 beta?

Re: What dependencies to submit Spark jobs program...

Re: What dependencies to submit Spark jobs program...

Re: Spark 2 beta load or save Hive managed table

Spark2 beta Hive metastore configuration

Maven Repository for Spark2.0 beta?

Re: Spark 2 beta load or save Hive managed table

Re: Spark 2 beta load or save Hive managed table