Member since
01-16-2014
336
Posts
43
Kudos Received
31
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3403 | 12-20-2017 08:26 PM | |
3379 | 03-09-2017 03:47 PM | |
2844 | 11-18-2016 09:00 AM | |
5030 | 05-18-2016 08:29 PM | |
3862 | 02-29-2016 01:14 AM |
07-20-2015
09:05 PM
YARN-2865 is fixed in CDH 5.3.3 and later and CDH 5.4.0 and later. You are most likely seeing something that looks like YARN-2865 but is slightly different. Unless the change for YARN-2865 is incorrect, which could also happen. Can you please share the logs that show the exception. Wilfred
... View more
07-19-2015
05:54 PM
In Spark a transformation works directly on the RDD. Transforms are implemented lazely and closely coupled to the RDDs. You can not use them separately. What you are looking for is a tool that can generate Saprk code for you based on the transformation rule. I don't think that something like that exists. Wilfred
... View more
07-15-2015
10:36 PM
Do these transformations not work for you? Anything that you write in Spark can be adjusted to work with different storage underneath. What else would you be looking for. Wilfred
... View more
07-15-2015
10:18 PM
A container log is not part of the yarn service logs and will not be affected by any of the yarn settings. The container log looks like a log from an AM and that means that you most likely are looking at a problem of the AM web UI not being able to bind. The AM web ui will bind to an ephemeral port which can not be limited to a set of ports. Make sure that you allow binding to any port on the NM's from your security groups in AWS. Wilfred
... View more
07-08-2015
10:28 PM
Check this part of the documentation for YARN tuning it explains it all. You might have a default value set which you have overlooked causing the issue. Wilfred
... View more
07-08-2015
12:43 AM
1 Kudo
You need to always provide your own dependencies for your application. There is no dependency on HBase in Spark and the fact that some of the HBase jars are pulled in due to being part of a Hive dependency which Spark has is a coincidence. If you build an application then you should always make sure that you resolve your own dependencies. It might have worked in previous versions or in a distribution from a different provider out of the box becuase the Spark version had different dependencies. BTW: you should be using the spark.[driver|executor].extraClassPath settings as that is the current way to do this. Wilfred
... View more
07-07-2015
12:27 AM
The fact that the job runs as the hive user is correct. You have impersonation turned off when you turned on Sentry, at least that is what you should have done. The Hive user is thus the user that executes the job. However the end user should be used to retrieve which queue the application is submitted in (if you use the FairScheduler). This does require some configuration on your side to make this work. There is a Knowledge Base article in our support portal on how to set that up for CM and non CM clusters. Search for "Hive FairScheduler". I can remember already providing the steps using CM before on the forum: Login to Cloudera Manager Navigate to Cluster > Yarn > Instances > ResourceManager > Processes Click on the link fair-scheduler.xml, this will open a new tab or window Copy the contents into the a new file called: fair-scheduler.xml On the HiveServer2 host create a new directory to store the xml file (for example, /etc/hive/fsxml) Note: This file should not be placed in the standard Hive configuration directory since that directory is managed by Cloudera Manager and the file could be removed when changing other configuration settings. Upload the fair-scheduler.xml file to the above created directory In Cloudera Manager navigate to Cluster > Hive > Service-Wide > Advanced > Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml and add the following property: <property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>/etc/hive/fsxml/fair-scheduler.xml</value>
</property> Save changes Restart the Hive Service NOTE: you must have the follwoing rule as the first rule in the placement policy: <rule name="specified" /> Wiflred
... View more
07-06-2015
09:21 PM
Please make sure that you also have added the setting to the configuration on the client node. The setting should be applied to all nodes in the cluster not just the nodes that run the service. Wilfred
... View more
06-29-2015
12:43 AM
We do not support upgrading Spark without upgrading the rest of CDH. Spark is compiled against a version of Hadoop and the versions of Hadoop can change between releases of CDH. You also need to take into account the dependencies of Spark (like Hive) which might change between versions. Even if you would be able to upgrade the package you might get weird failures due to the dependency breakage. Wilfred
... View more
06-29-2015
12:33 AM
No there is nothing that you can run to check if the log aggregation has finished. It is a distributed state only known inside the NM's The only thing you can do is retry the log retrieval. Log aggregation is performed by the NodeManager(s) when the containers finish. There is no possibility to tell how long that will take since one node could be running more than one container that finishes at almost the same time. The load on HDFS is also a factor: copying to HDFS will only be as fast as HDFS can handle it at that point.Wilfred
... View more