Member since
08-01-2017
7
Posts
0
Kudos Received
0
Solutions
08-02-2017
01:36 AM
Hi @Spyros, @balance002, @Vivian, @Romainr, I have been having the same problems and have finally now arrived at a solution which I thought I should share here: My setup: I have Hadoop/HDFS/Yarn running in one docker container and Hue running in a separate docker container (both on the same machine). However, the following should work with any other setup as well, e.g. if you have Hadoop and Hue installed in one container or installed directly on your machine. It is important, however, that you have a running instance of Hadoop/HDFS. Step 1: Configuration of your HDFS instance Add the following property to hdfs-site.xml (in my installation located in /usr/local/hadoop/etc/hadoop/hdfs-site.xml): <property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property> Add the following properties to core-site.xml (in my installation located in /usr/local/hadoop/etc/hadoop/core-site.xml): <property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property> Restart HDFS and Yarn: cd to the bin directory where the start/stop scripts are located (in my case it's /usr/local/hadoop/sbin/) and call: ./stop-yarn.sh & ./stop-dfs.sh
./start-dfs.sh & ./start-yarn.sh Step 2: Add new directory and adjust access rights in HDFS Prerequisite: You can call the hdfs command directly on your HDFS host. If not, add the directory containing the respective binary to your PATH (on my machine: /usr/local/hadoop/bin). First, change the owner of / to hdfs: hdfs dfs -chown hdfs:hdfs / Then create a dir for the user hdfs (This user is used by Hue. It can be configured in the hue.ini via default_hdfs_superuser) and adjust ownership: hdfs dfs -mkdir /user/hdfs
hdfs dfs -chown hdfs:hdfs /user/hdfs Step 3: Configuration of your Hue instance Find your hui.ini (instructions here: http://gethue.com/how-to-configure-hue-in-your-hadoop-cluster/), in my case it's /hue/desktop/conf/pseudo-distributed.ini. Find the line webhdfs_url=http://localhost:50070/webhdfs/v1 and uncomment it, in case it is commented out. Then, if your HDFS instance is not running in the same container or host, change localhost to the IP adress of where your HDFS instance is running. Step 4 (only if Hue and HDFS are not on the same container/host): Add HDFS host to /etc/hosts This step depends on your setup: If you have Hue and HDFS running in two containers, you need to link them, i.e. when calling docker run on your Hue container, add the following parameters (I know that --link is deprecated, but it works and is easier to explain): --link <HDFS-container-ID>:<HDFS-container-ID/Hostname> If you are not using containers, simply add the following line to the /etc/hosts file of your Hue host: <IP of HDFS host> <hostname of HDFS host> Here is why you need this: Hue will first adress the HDFS host via its IP (as you configured it in hue.ini). However, it will later on, for some requests, use the hostname instead of the IP, which can only be resolved by the Hue host if it's in the /etc/hosts file. Then restart Hue (in my case I simply stop and start the container) and your done 🙂
... View more
07-21-2017
08:42 AM
@George Meltser Unfortunately not. I have given up and I am now building the Hadoop stack manually without Ambari.
... View more
05-15-2017
02:01 PM
Thank you @Namit Maheshwari. I did improve the memory settings as mentioned in your referenced article.
However, it is not possible to restart all affected services after a
parameter change. E.g. the Hive Metastore service always aborts its
restart. Stopping and restarting the container does not help either. If I run the query anyway (without restarting), the error is this: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1494585560171_0001_1_00, diagnostics=[Task failed, taskId=task_1494585560171_0001_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.IllegalArgumentException: tez.runtime.io.sort.mb 2568 should be larger than 0 and should be less than the available task memory (MB):192
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
at org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.getInitialMemoryRequirement(ExternalSorter.java:327)
at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.initialize(OrderedPartitionedKVOutput.java:93)
... View more
05-15-2017
01:56 PM
Thank you all for your replies. I did improve the memory settings (according to https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html). However, it is not possible to restart all affected services after a parameter change. E.g. the Hive Metastore service always aborts its restart. Stopping and restarting the container does not help either.
... View more
03-21-2017
02:42 PM
Hi Graham, thanks for your reply. Generally SQL queries work. However, of the two queries you posted only the first one completed (but empty result). The second one (analyze table truck_mileage compute statistics for columns;) gave me the same java heap space exception. And no, the sandbox is not using its maximum resources, not even close. I tried playing around with the container sizes, but that lead to the queries not returning at all anymore (i.e. no result, no crash, just keep running).
... View more
03-21-2017
11:31 AM
Hello, I am running the HDP 2.5 Sandbox on Docker and following the Hello World tutorial step-by-step. Everything worked up until I tried running the following Hive query from the Ambari Hive View (section 2.4.3 of the tutorial): SELECT truckid, avg(mpg) avgmpg FROM truck_mileage GROUP BY truckid; Problem: After several seconds the query fails with the following exception: java.lang.Exception: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1490094767913_0001_1_00, diagnostics=[Task failed, taskId=task_1490094767913_0001_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:159)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.(PipelinedSorter.java:173)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.(PipelinedSorter.java:117)
at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:142)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:138)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
... 14 more Here are my system specs: CPU: Quad Core Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Memory: 64 GB
Linux: 4.4.0-53-generic #74~14.04.1-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux I did not change ANY settings (in the tutorial it says to set Tez as the execution engine, which was already set by default). Can you please tell me how to fix this, otherwise the Sandbox is not usable at all. Kind Regards
Benjamin
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
03-21-2017
11:24 AM
Hello, I am running the HDP 2.5 Sandbox on Docker and following the Hello World tutorial step-by-step. Everything worked up until I tried running the following Hive query from the Ambari Hive View: SELECT truckid, avg(mpg) avgmpg FROM truck_mileage GROUP BY truckid; (section 2.4.3 of the tutorial). Problem: After several seconds the query fails with the following exception: java.lang.Exception: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1490094767913_0001_1_00, diagnostics=[Task failed, taskId=task_1490094767913_0001_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:159)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.(PipelinedSorter.java:173)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.(PipelinedSorter.java:117)
at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:142)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:138)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
... 14 more
Here are my system specs: CPU: Quad Core Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Memory: 64 GB
Linux: 4.4.0-53-generic #74~14.04.1-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux
I did not change ANY settings except the ones mentioned in the tutorial (basically just setting Tez as the execution engine, which was already set by default). Can you please tell me how to fix this, otherwise the Sandbox is not usable at all. Kind Regards
Benjamin
... View more
Labels: