Member since
03-25-2016
142
Posts
48
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5812 | 06-13-2017 05:15 AM | |
1906 | 05-16-2017 05:20 AM | |
1346 | 03-06-2017 11:20 AM | |
7932 | 02-23-2017 06:59 AM | |
2228 | 02-20-2017 02:19 PM |
04-28-2017
09:51 AM
1 Kudo
ENVIRONMENT and SETUP I did test the below solution with HDP-2.6.0.3-8 Ambari 2.5.0.3 To configure this environment, follow up the steps defined for HDP 2.5 (Zeppelin 0.6) - https://community.hortonworks.com/articles/81069/how-to-enable-user-impersonation-for-sh-interprete.html As the Zeppelin UI -> Interpreter window has changed, follow this up. I. Default config looks like below:
II. Do the following change: - edit the interpreter
- change The interpreter will be instantiated from Globally to Per User and shared to isolated
- then you will be able to see and select User Impersonate check-box - also remove zeppelin.shell.auth.type
zeppelin.shell.keytab.location
zeppelin.shell.principal
III. Changed config looks like below: After Saving the changes and running below as admin %sh
whoami
We get
... View more
Labels:
04-26-2017
05:45 AM
PROBLEM: Running the job in notebook through Zeppelin sometimes I am getting 500 error. The following can be seen in livy-livy-server.log file ...
17/04/25 14:04:30 ERROR SessionServlet$: internal error
com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 13)): has to be escaped using backslash to be included in string value
at [Source: HttpInputOverHTTP@62c493f2; line: 1, column: 76]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419)
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508)
at com.fasterxml.jackson.core.base.ParserMinimalBase._throwUnquotedSpace(ParserMinimalBase.java:472)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2235)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString(UTF8StreamJsonParser.java:2165)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:279)
at com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:29)
at com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:12)
at com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:538)
at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:344)
at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1064)
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:264)
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:124)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3066)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2207) at com.cloudera.livy.server.JsonServlet.bodyAs(JsonServlet.scala:102)
...
SOLUTION: This problem happens when the code for notebook is copied over from ie. notepad, textpad.
To fix it, type the code manually.
... View more
Labels:
04-25-2017
08:08 AM
PROBLEM:
I have just started with livy through Zeppelin and have made some livy interpreter config changes i.e. spark memory. When running
%livy.spark
sc.version
I am getting
Cannot start spark SOLUTION: When checking the application log from RM UI I noticed the following ...
End of LogType:launch_container.sh
LogType:stderr
Log Upload Time:Tue Apr 18 15:13:02 +0200 2017
LogLength:142
Log Contents:
Invalid maximum heap size: -Xmx0m
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
... As you can see, the Xmx value is set to 0m. Doing the further research I noticed the following in livy interpreter: livy.spark.executor.memory 512
livy.spark.driver.memory 512
Seting the values to livy.spark.executor.memory 512m
livy.spark.driver.memory 512m
saving the changes and restarting livy interpreter fixed the problem
... View more
Labels:
04-24-2017
05:54 PM
Hi Vipin, This feature also exists in HDP 2.5.3. The question is if it is actually supported.
... View more
04-24-2017
01:24 PM
Problem This problem happened on HDP 2.5.3 when running Spark On HBase. Here is the error seen in the application log: ...
17/04/11 10:12:04 WARN RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
17/04/11 10:12:05 INFO ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
17/04/11 10:12:05 WARN ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
...
Solution To fix that problem ensure that hbase-site.xml file exists in /etc/spark/conf on each of the NodeManager node.
... View more
Labels:
04-24-2017
11:02 AM
Is scheduler supported through Zeppelin in HDP 2.5.3 and/or HDP 2.6?
... View more
Labels:
- Labels:
-
Apache Zeppelin
04-24-2017
10:53 AM
Problem I have a problem with accessing spark history data for killed streaming jobs.
The problem looks like the one mentioned here: When I click on History under Tracking UI for the FINISHED Spark Job I am redirected to:
http://dan3.dklocal:18080/history/application_1491987409680_0013/jobs/ - Spark History URL
However, when doing the same for the KILLED one I am redirected to:
http://dan3.dklocal:8088/cluster/app/application_1491987409680_0012 - application log on RM UI.
And the need is to be redirected for the KILLED jobs to Spark History URL. Solution Killing the spark job using 'yarn application -kill' is the not the right way of doing it. When doing: 'yarn application -kill' - the job got killed (status in RM UI is State: KILLED, FinalStatus: KILLED), the '.inprogress' suffix got removed. But the History does not go to Spark History but RM log.
When having done: kill -SIGTERM <Spark Driver>, the job also get killed. In RM UI is State: FINISHED, FinalStatus: SUCCEEDED, the '.inprogress' suffix got removed. And the History now goes to Spark History UI.
... View more
Labels:
04-21-2017
01:03 PM
1 Kudo
Here are the steps to get pyspark working on SHC a) add the following into Ambari -> Spark -> Configs -> Advanced spark-env -> spark-env template export SPARK_CLASSPATH=/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.6.0.3-8/spark/conf/ b) kinit as e.g. hbase user c) run $ pyspark --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ d) call each line separately: from pyspark.sql import Row
data = range(0,255)
rdd = sc.parallelize(data).map(lambda i : Row(name=i,age=i))
import json
cat = json.dumps({"table":{"namespace":"default", "name":"dk", "tableCoder":"PrimitiveType"},"rowkey":"key","columns":{"name":{"cf":"rowkey", "col":"key", "type":"string"},"age":{"cf":"cf1", "col":"age", "type":"string"}}})
print(cat)
rdd.toDF().write.option("catalog",cat).option("newtable","5").format("org.apache.spark.sql.execution.datasources.hbase").save()
NOTE: running the last command from above the following error comes up: 17/04/18 15:39:57 INFO ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15b42da54290012, negotiated timeout = 60000
17/04/18 15:39:57 INFO ZooKeeperRegistry: ClusterId read in ZooKeeper is null
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/hdp/current/spark-client/python/pyspark/sql/readwriter.py", line 395, in save
self._jwrite.save()
File "/usr/hdp/current/spark-client/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/usr/hdp/current/spark-client/python/pyspark/sql/utils.py", line 45, in deco
return f(*a, **kw)
File "/usr/hdp/current/spark-client/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o63.save.
: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations
...
To get around the problem, do the following: go to Ambari -> HBase -> Configs -> Advanced tab -> Advanced hbase-site
change the value of zookeeper.znode.parent
FROM /hbase-unsecure
TO /hbase
save the changes
restart all required services
re-run the pyspark -> re-run point c) and d)
e) test from the HBase shell
[root@dan261 ~]# hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2.2.6.0.3-8, r3307790b5a22cf93100cad0951760718dee5dec7, Sat Apr 1 21:41:47 UTC 2017
hbase(main):001:0> list 'dk'
TABLE
dk
1 row(s) in 0.3880 seconds
=> ["dk"]
hbase(main):002:0> scan 'dk'
ROW COLUMN+CELL
\x00\x00\x00\x00\x00\x00\x00\x00 column=cf1:age, timestamp=1492595613501, value=\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x01 column=cf1:age, timestamp=1492595613501, value=\x00\x00\x00\x00\x00\x00\x00\x01
\x00\x00\x00\x00\x00\x00\x00\x02 column=cf1:age, timestamp=1492595613488, value=\x00\x00\x00\x00\x00\x00\x00\x02
\x00\x00\x00\x00\x00\x00\x00\x03 column=cf1:age, timestamp=1492595613488, value=\x00\x00\x00\x00\x00\x00\x00\x03
\x00\x00\x00\x00\x00\x00\x00\x04 column=cf1:age, timestamp=1492595613488, value=\x00\x00\x00\x00\x00\x00\x00\x04
...
\x00\x00\x00\x00\x00\x00\x00\xFA column=cf1:age, timestamp=1492577972182, value=\x00\x00\x00\x00\x00\x00\x00\xFA
\x00\x00\x00\x00\x00\x00\x00\xFB column=cf1:age, timestamp=1492577972182, value=\x00\x00\x00\x00\x00\x00\x00\xFB
\x00\x00\x00\x00\x00\x00\x00\xFC column=cf1:age, timestamp=1492577972182, value=\x00\x00\x00\x00\x00\x00\x00\xFC
\x00\x00\x00\x00\x00\x00\x00\xFD column=cf1:age, timestamp=1492577972182, value=\x00\x00\x00\x00\x00\x00\x00\xFD
\x00\x00\x00\x00\x00\x00\x00\xFE column=cf1:age, timestamp=1492577972182, value=\x00\x00\x00\x00\x00\x00\x00\xFE
255 row(s) in 0.8570 seconds
hbase(main):003:0>
... View more
Labels:
04-21-2017
01:03 PM
Here are the steps to get pyspark working on SHC a) add the following into Ambari -> Spark -> Configs -> Advanced spark-env -> spark-env template export SPARK_CLASSPATH=/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.5.3.0-37/spark/conf/ b) kinit as e.g. hbase user c) run $ pyspark --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ d) call each line separately: from pyspark.sql import Row
data = range(0,255)
rdd = sc.parallelize(data).map(lambda i : Row(name=i,age=i))
import json
cat = json.dumps({"table":{"namespace":"default", "name":"dk", "tableCoder":"PrimitiveType"},"rowkey":"key","columns":{"name":{"cf":"rowkey", "col":"key", "type":"string"},"age":{"cf":"cf1", "col":"age", "type":"string"}}})
print(cat)
rdd.toDF().write.option("catalog",cat).option("newtable","5").format("org.apache.spark.sql.execution.datasources.hbase").save()
e) test from the HBase shell [root@dan2 ~]# hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2.2.5.3.0-37, rcb8c969d1089f1a34e9df11b6eeb96e69bcf878d, Tue Nov 29 18:48:22 UTC 2016
hbase(main):001:0> list 'dk'
TABLE
dk
1 row(s) in 0.4220 seconds
=> ["dk"]
hbase(main):002:0> scan 'dk'
ROW COLUMN+CELL
\x00\x00\x00\x00\x00\x00\x00\x00 column=cf1:age, timestamp=1492525198948, value=\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x01 column=cf1:age, timestamp=1492525198948, value=\x00\x00\x00\x00\x00\x00\x00\x01
\x00\x00\x00\x00\x00\x00\x00\x02 column=cf1:age, timestamp=1492525198948, value=\x00\x00\x00\x00\x00\x00\x00\x02
\x00\x00\x00\x00\x00\x00\x00\x03 column=cf1:age, timestamp=1492525198948, value=\x00\x00\x00\x00\x00\x00\x00\x03
\x00\x00\x00\x00\x00\x00\x00\x04 column=cf1:age, timestamp=1492525198948, value=\x00\x00\x00\x00\x00\x00\x00\x04
\x00\x00\x00\x00\x00\x00\x00\x05 column=cf1:age, timestamp=1492525198948, value=\x00\x00\x00\x00\x00\x00\x00\x05
...
\x00\x00\x00\x00\x00\x00\x00\xFC column=cf1:age, timestamp=1492525198941, value=\x00\x00\x00\x00\x00\x00\x00\xFC
\x00\x00\x00\x00\x00\x00\x00\xFD column=cf1:age, timestamp=1492525198941, value=\x00\x00\x00\x00\x00\x00\x00\xFD
\x00\x00\x00\x00\x00\x00\x00\xFE column=cf1:age, timestamp=1492525198941, value=\x00\x00\x00\x00\x00\x00\x00\xFE
255 row(s) in 0.5950 seconds
hbase(main):003:0>
... View more
Labels:
03-23-2017
06:52 AM
There is no "audit" word. However, you should have some extra entries identifying users being logged in and doing some operations. The details in that file is not presented in the same way like for Ambari 2.4.
... View more