Member since
12-21-2015
43
Posts
10
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2120 | 12-08-2016 12:33 AM | |
3474 | 01-29-2016 08:44 PM | |
1642 | 01-28-2016 10:48 PM |
04-21-2017
03:27 PM
I could re-create the same issue on hdp 2.5 sandbox. it looks to me a bug.
... View more
04-18-2017
01:50 AM
Spark 2.1 and Zeppelin 0.7 can be run as standalone as follows: Spark Installation Do the followings: 1.Go to http://spark.apache.org/downloads.html and
download the latest file. 2.Unzip the file to the
appropriate location. 3.Read https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html and
follow the instruction. 4.After the
installation, go to Spark's bin directory in the command window and run
spark-shell to see scala prompt. You can close the command window. *above step 3 summary: - Download winutils.exe binary from
https://github.com/steveloughran/winutils repository. (You should select the
version of Hadoop the Spark distribution.) - Save winutils.exe binary to a directory of your choice,
e.g. c:\hadoop\bin. - Set HADOOP_HOME to reflect the directory with
winutils.exe (without bin). e.g. set HADOOP_HOME=c:\hadoop - Set PATH environment variable to include
%HADOOP_HOME%\bin as follows: set PATH=%HADOOP_HOME%\bin;%PATH% - Create c:\tmp\hive directory. - Execute winutils.exe chmod -R 777\tmp\hive command
and the check with winutils.exe ls \tmp\hive command. Zeppelin Installation Do the followings: 1.Go to http://zeppelin.apache.org/download.html and
download the latest file. 2.Unzip the file to the
appropriate location. 3.Go to https://github.com/elodina/zeppelin-notebooks/blob/master/conf/interpreter.json. 4.Copy the content of
interpreter.json and save it into conf/interpreter.json file. If you don't find
the file in conf directory, create it. 5.Learn how to start and
stop Zeppelin in http://zeppelin.apache.org/docs/0.7.1/install/install.html. 6.Go to http://localhost:8080 and click anonymous
user at the top/right and click Interpreter. Look for Spark section and click
edit button at the right. 7.Update master value to
local[*], save, and restart the Spark interpreter. Restart button is next to
edit button. 8.Don't use the
tutorial. It does not work; instead, use Spark's latest tutorial: http://spark.apache.org/docs/latest/sql-programming-guide.html 9.When you code with
scala, you don't need to specify any interpreter such as %xyz, but use %sql
when you use Spark SQL.
... View more
04-11-2017
08:42 PM
Does it mean livy.spark.yarn.queue overrides the setting I found?
... View more
04-10-2017
08:12 PM
I found a partial answer here: https://community.hortonworks.com/questions/35632/how-to-choose-the-queue-in-which-you-want-to-submi.html
... View more
04-07-2017
08:38 PM
when I execute a simple job, I received the error message: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1489006841512_0500 to YARN : Application application_1489006841512_0500 submitted by user zeppelin to unknown queue: default How can we change the user zeppelin and queue default to something else on Ambari?
... View more
Labels:
- Labels:
-
Apache YARN
-
Apache Zeppelin
04-07-2017
07:52 PM
I added: export ZEPPELIN_MEM="-Xms1024m -Xmx2048m -XX:MaxMetaspaceSize=512m" The out of memory issue was fixed; however, I got another new issue related to yarn configuration. I will ask it with "new question". Thanks!
... View more
04-07-2017
06:25 PM
Yes, that's right. On Ambari, should we add zeppelin_mem, let's say -XmXXXXm to Custom zeppelin-env?
... View more
04-07-2017
06:16 PM
log file says: INFO [2017-04-07 11:53:58,682] ({pool-2-thread-4} SchedulerFactory.java[jobStarted]:131) - Job remoteInterpretJob_1491580438681 started by scheduler org.apache.zeppelin.spark.SparkInterpreter283573820
INFO [2017-04-07 11:54:02,145] ({pool-2-thread-4} Logging.scala[logInfo]:58) - Changing view acls to: zeppelin
INFO [2017-04-07 11:54:02,145] ({pool-2-thread-4} Logging.scala[logInfo]:58) - Changing modify acls to: zeppelin
INFO [2017-04-07 11:54:02,145] ({pool-2-thread-4} Logging.scala[logInfo]:58) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zeppelin); users with modify permissions: Set(zeppelin)
INFO [2017-04-07 11:54:05,322] ({pool-2-thread-4} Logging.scala[logInfo]:58) - Starting HTTP Server
INFO [2017-04-07 11:54:05,945] ({pool-2-thread-4} Server.java[doStart]:272) - jetty-8.y.z-SNAPSHOT
INFO [2017-04-07 11:54:08,337] ({pool-2-thread-4} AbstractConnector.java[doStart]:338) - Started SocketConnector@0.0.0.0:22900
INFO [2017-04-07 11:54:08,338] ({pool-2-thread-4} Logging.scala[logInfo]:58) - Successfully started service 'HTTP class server' on port 22900.
ERROR [2017-04-07 11:54:10,658] ({pool-2-thread-4} Job.java[run]:189) - Job failed
java.lang.OutOfMemoryError: GC overhead limit exceeded
INFO [2017-04-07 11:54:10,658] ({pool-2-thread-4} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1491580438681 finished by scheduler org.apache.zeppelin.spark.SparkInterpreter283573820
INFO [2017-04-07 11:54:11,954] ({pool-1-thread-18} Logging.scala[logInfo]:58) - Changing view acls to: zeppelin
INFO [2017-04-07 11:54:11,954] ({pool-1-thread-18} Logging.scala[logInfo]:58) - Changing modify acls to: zeppelin
INFO [2017-04-07 11:54:11,955] ({pool-1-thread-18} Logging.scala[logInfo]:58) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zeppelin); users with modify permissions: Set(zeppelin) "xxx.out" file says: Exception in thread "qtp686649452-427" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1855)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2068)
at org.spark-project.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
at org.spark-project.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
at org.spark-project.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
at org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-1-thread-4"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "pool-1-thread-4"
Exception in thread "pool-1-thread-7" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "pool-1-thread-6" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "pool-1-thread-9" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "qtp1834748429-966" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "pool-1-thread-8" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "pool-1-thread-13" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "pool-1-thread-12" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "pool-1-thread-11" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "pool-1-thread-3" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "qtp184565796-1023" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "qtp1834748429-962" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "qtp851486649-779" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "qtp1459091505-28" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "qtp1613009598-14753" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "pool-1-thread-18" java.lang.OutOfMemoryError: GC overhead limit exceeded
... View more
04-07-2017
04:40 PM
We could install Zeppelin through Ambari, and it runs with green light. However, when I open and try %sql (I guess Spark interpereter), it got error. When I check the zeppelin log file, I see the out of memory error with -XX:MaxPermSize=512m. I know this parameter is untill jdk 7. The jdk 8 has a different new parameter. I'm certain we need to use ZEPPELIN_MEM to set the new parameter, but I don't know how to do it correctly through Ambari. Could anyone help on this?
... View more
Labels:
- Labels:
-
Apache Zeppelin
03-31-2017
08:40 PM
Move to hdp 2.5 and use Spark action may be easier.
... View more
03-02-2017
08:35 PM
1 Kudo
Thank you for the answer; however, both are not for levelDB, which is used in node manager. Do you have any idea to initialize levelDB. I try to find it, but i can't find any good article.
... View more
02-11-2017
02:42 AM
Yes, I use Ambari. I have one more question to clear my head. Assume I have nodes that have at least 20GB memory and 4 cores for each node, and each node has node manager installed. If I add one old computer that has 5GB memory and 2 cores, and if I install node manager on this new computer, Yarn memory setting for one node allocation should be reduced at max 5GB? This would be because Yarn configuration is made against all node managers. I am asking because in my cluster, the Ambari Yarn's memory setting, especially for node, the max memory size is the lowest of all nodes that has node manager installed. In other words, if we want to utilize resources, we should add nodes that have similar memory size and number of cores? Thanks you,
... View more
02-10-2017
11:24 PM
I understand Yarn and HDFS are separately defined. One thing that confuses me is that when I see Yarn's resource manager UI, it shows max memory size and max vcore, I guess, which is not an actual figure from data node servers but from the Yarn configuration? If the yarn has not actual configuration information, how can we know the accurate information of the current processing? I also wonder how resource manager can allocate resource correctly. (sorry about too many questions...) Thank you,
... View more
02-10-2017
10:29 PM
When we need to have more resources, which means typically, at the time when we add more data node servers? This makes sense to me. What I'm not so clear is let's say without adding anything with my example 5 servers setup, what does it mean to install node manager to all 5 servers? Even if we install more node managers, it does not mean we increase the resources because the number of data node servers stay 3. On the other hand, if we install node manager to all 5 servers, Yarn configuration may give us wrong resource information? Thank you,
... View more
02-10-2017
09:08 PM
I'm not so clear on the hadoop setup. I understand how it works, but when I see the Yarn configuration, I come to one question. Let's say we have 5 servers and 3 out of 5 servers are used for data node server. In this case, how many node managers should be installed and executed? Should we have node manager on all 5 servers or on only 3 data node servers?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN
12-08-2016
12:33 AM
The solution was: The Spark provide a sample HBase test program in /usr/hdp/current/spark-client/examples/src/main/scala/org/apache/spark/examples. The program name is HBaseTest.scala. If you open this file, you will see the comment: // please ensure HBASE_CONF_DIR is on classpath of spark driver // e.g: set it through spark.driver.extraClassPath property // in spark-defaults.conf or through --driver-class-path // command line option of spark-submit
So, I added that parameter and my command line becomes as follows: spark-submit --jars hive-hbase-handler.jar,hbase-client.jar,hbase-common.jar,hbase-hadoop-compact.jar,hbase-hadoop2-compact.jar,hbase-protocol.jar,hbase-server.jar,metrics-core.jar,guava.jar --driver-class-path postgresql.jar --master yarn-client --files /usr/hdp/current/hbase-client/conf/hbase-site.xml --class SparkJS --driver-class-path /etc/hbase/2.5.0.0-1245/0 spark-js-1.jar The issue is gone and I can do what I need to do.
... View more
12-01-2016
05:19 AM
Thank you for the recommendation, but I would like to solve this issue first. We are using hdp 2.5. Previously, we used hdp 2.3 where I could not run Spark with Phoenix. Can hdp 2.5 allow us to use Phoenix in Spark 1.6.2?
... View more
12-01-2016
12:53 AM
I have a Hive table that is integrated with HBase table. It works fine on Hive command line to see data; however, when I try to do the same in Spark Java code where create a dataframe object by select statement and call show method, I see the following message forever: 16/11/30 19:40:31 INFO ClientCnxn: Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x15802d56675006a, negotiated timeout = 90000 16/11/30 19:40:31 INFO RegionSizeCalculator: Calculating region sizes for table "st_tbl_1". 16/11/30 19:41:19 INFO RpcRetryingCaller: Call exception, tries=10, retries=35, started=48332 ms ago, cancelled=false, msg= 16/11/30 19:41:40 INFO RpcRetryingCaller: Call exception, tries=11, retries=35, started=68473 ms ago, cancelled=false, msg= 16/11/30 19:42:00 INFO RpcRetryingCaller: Call exception, tries=12, retries=35, started=88545 ms ago, cancelled=false, msg= 16/11/30 19:42:20 INFO RpcRetryingCaller: Call exception, tries=13, retries=35, started=108742 ms ago, cancelled=false, msg=
... View more
Labels:
02-04-2016
03:04 AM
2 Kudos
Where did you copy your jdbc driver for Sqoop action?
... View more
02-03-2016
03:42 PM
2 Kudos
As far as I see sqoop action above, I don't see hive-site.xml file. I guess you added it into lib directory in the deployment directory, which will keep Hive action from running and it gives you error something like hive-site.xml permssion error. You should add the hive-site.xml file in "Files" in the Sqoop action.
... View more
02-02-2016
02:55 PM
When I added Hive action before the Sqoop action, I got hive-site.xml file permission error. To avoid this problem, delete lib/hive-site.xml file, and add this file to inside of Sqoop action as a file.
... View more
02-02-2016
02:52 PM
When I tried to run an Oozie workflow that contains Hive and Sqoop actions, I had the same problem. In my case, I had lib directory where hive-site.xml exists for Sqoop action. After I move hive-site.xml to the parent directory of lib directory, which is the HDFS deployment directory and I add the hive-site.xml as a file to Sqoop action, the workflow works. By the way, this is for HDP 2.3.2.
... View more
02-02-2016
01:04 AM
1 Kudo
I have successfully run Hive action from Oozie Workflow. My simple Hive operation does: drop table test1;
create table test1 as select * from A_BASE; Followings are the steps: 1: run su - oozie from SSH window. 2: run hdfs dfs -put /usr/hdp/2.3.2.0-2950/atlas/hook/hive/* /user/oozie/share/lib/lib_20151027124452/hive (assume HDP 2.3.2 used) 3: create a workflow that contains a Hive action. 4: add a property oozie.action.sharelib.for.hive = hive,hcatalog,sqoop in Oozie parameters. 5: create a hive script like above and upload it from "Script name" in the Hive action edit page. 6: Save the workflow. 7: Run it. 8: It should run.
... View more
01-29-2016
10:41 PM
It looks like HDP 2.3.2 already has this patch.
... View more
01-29-2016
10:41 PM
I also tested HiveContext so that Hive processing works in Spark memory. It works.
... View more
01-29-2016
08:44 PM
1 Kudo
I figured it out by myself. Here is the steps: 1: download sandbox or use your existing sandbox (HDP 2.3.2) 2: create a workflow on Hue's oozie 3: click "Edit Properties" and add a property in Oozie parameters: oozie.action.sharelib.for.spark = spark,hcatalog,hive 4: click Save button 5: add a shell action; fill name field. shell command field may be required; enter whatever any string temporary and save the shell action. We come back to edit it later. 6: Close workflow and open file browser; click oozie, then workspaces. Identify _hue_xxx directory for the workflow you are creating. 7: create lib directory. 8: copy your jar file that contains spark java program. 9: move up the directory and copy shell file (e.g. script.sh) that contains: spark-submit --class JDBCTest spark-test-1.0.jar spark-test-1.0.jar is the file you uploaded to lib directory. 10: Go back to workflow web page 11: Open the shell action and set Shell command by selecting shell file (e.g. script.sh) 12: Also populate Files field to add the schell file (e.g. script.sh) again 13: click Done 14: save the workflow 15: submit the workflow 16: it should run. My java program does like this: Statement stmt = con.createStatement(); String sql = "SELECT s07.description AS job_category, s07.salary , s08.salary , (s08.salary - s07.salary) AS salary_difference FROM sample_07 s07 JOIN sample_08 s08 ON ( s07.code = s08.code) WHERE s07.salary < s08.salary SORT BY s08.salary-s07.salary DESC LIMIT 5"; ResultSet res = stmt.executeQuery(sql); It uses hive jdbc driver.
... View more
01-28-2016
10:54 PM
Thank you, Artem.
... View more
01-28-2016
10:48 PM
1 Kudo
This is my summary: 1. download HDP sandbox 2. create a worflow that does list-databases and run it from Hue. We expect the failure. 3. To see the error log, localhost:19888/jobhistory/?user.name=hue, click job Id link, click value 1 link for Maps at the bottom of tables. In the new web page, click logs link. You should see java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.OracleDriver. 4. open SSH shell as root. Copy oracle jdbc jar to /home/oozie from your local file system. 5. run su - oozie
6. run hdfs dfs -put ojdbc7-12.1.0.1.0.jar /user/oozie/share/lib/lib_20151027124452/sqoop 7. check if oracle jdbc jar is copied or not from Ambari. Use HDFS Files viewer from clicking the second icon from the right side on the Ambari page. Navigate it /user/oozie/share/lib/lib_20151027124452/sqoop. 8. Restart sandbox. 9. Run the workflow created at step 2. 10. It should work.
11. create another workflow that does import Hive and run it from Hue. 12. You see the warning message: 2016-01-28 21:10:23,398 WARN [main] tool.SqoopTool (SqoopTool.java:loadPluginsFromConfDir(177)) - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
Intercepting System.exit(1) 13. Need the followings two things: (1) add oozie.action.sharelib.for.sqoop=hive,hcatalog,sqoop >> To do this from Hue oozie workflow web page, click "Edit properties" link and add this property to Oozie parameters. The oozie.use.system.libpath is already set true for default. Just add it. (2) copy hive-site.xml to lib dir >> To do this from Hue, click File Browser, click oozie that takes you to /user/hue/oozie. Click workspaces, click _hue_xxx directory for the current workflow, create lib directory in the identified directory, copy hive-site.xml file that contains something like: (you don't need to update jdbc connection string of yours; it looks it needs hive.metastore.uris value; so maybe you can delete the first property, which I have not tried yet.) <configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/var/lib/hive/metastore/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://sandbox.hortonworks.com:9083</value>
</property>
</configuration> Once this file is created, go back to the oozie workflow editor, and click Add Path button and select hive-site.xml. Save it. 14. Run the workflow. It should run.
... View more
01-28-2016
07:51 PM
1 Kudo
Since I could see the detail error message from http://localhost:19888/jobhistory/?user.name=hue, I could identify what's wrong. it was due to a temp dir for A_BASE; after I delete this, it works, but since so many things I touched, I will do from the scratch and I will comment my summary here. Thanks, Artem for this with me.
... View more
01-28-2016
07:07 PM
Since whenever we submit the workflow, it will create a new directory, it does not work. I did sudo -u yarn with sqoop import. I does not return code 1 any more, which is good, but neither HDFS file nor Hive table is created. Any idea?
... View more