About dkozlowski

dkozlowski · ‎04-28-2017

ENVIRONMENT and SETUP I did test the below solution with HDP-2.6.0.3-8 Ambari 2.5.0.3 To configure this environment, follow up the steps defined for HDP 2.5 (Zeppelin 0.6) - https://community.hortonworks.com/articles/81069/how-to-enable-user-impersonation-for-sh-interprete.html As the Zeppelin UI -> Interpreter window has changed, follow this up. I. Default config looks like below: II. Do the following change: - edit the interpreter - change The interpreter will be instantiated from Globally to Per User and shared to isolated - then you will be able to see and select User Impersonate check-box - also remove zeppelin.shell.auth.type zeppelin.shell.keytab.location zeppelin.shell.principal III. Changed config looks like below: After Saving the changes and running below as admin %sh whoami We get

dkozlowski · ‎04-26-2017

PROBLEM: Running the job in notebook through Zeppelin sometimes I am getting 500 error. The following can be seen in livy-livy-server.log file ... 17/04/25 14:04:30 ERROR SessionServlet$: internal error com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 13)): has to be escaped using backslash to be included in string value at [Source: HttpInputOverHTTP@62c493f2; line: 1, column: 76] at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419) at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508) at com.fasterxml.jackson.core.base.ParserMinimalBase._throwUnquotedSpace(ParserMinimalBase.java:472) at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2235) at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString(UTF8StreamJsonParser.java:2165) at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:279) at com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:29) at com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:12) at com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:538) at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:344) at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1064) at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:264) at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:124) at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3066) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2207) at com.cloudera.livy.server.JsonServlet.bodyAs(JsonServlet.scala:102) ... SOLUTION: This problem happens when the code for notebook is copied over from ie. notepad, textpad. To fix it, type the code manually.

dkozlowski · ‎04-25-2017

PROBLEM: I have just started with livy through Zeppelin and have made some livy interpreter config changes i.e. spark memory. When running %livy.spark sc.version I am getting Cannot start spark SOLUTION: When checking the application log from RM UI I noticed the following ... End of LogType:launch_container.sh LogType:stderr Log Upload Time:Tue Apr 18 15:13:02 +0200 2017 LogLength:142 Log Contents: Invalid maximum heap size: -Xmx0m Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. ... As you can see, the Xmx value is set to 0m. Doing the further research I noticed the following in livy interpreter: livy.spark.executor.memory 512 livy.spark.driver.memory 512 Seting the values to livy.spark.executor.memory 512m livy.spark.driver.memory 512m saving the changes and restarting livy interpreter fixed the problem

dkozlowski · ‎04-24-2017

Hi Vipin, This feature also exists in HDP 2.5.3. The question is if it is actually supported.

dkozlowski · ‎04-24-2017

Problem This problem happened on HDP 2.5.3 when running Spark On HBase. Here is the error seen in the application log: ... 17/04/11 10:12:04 WARN RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid 17/04/11 10:12:05 INFO ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 17/04/11 10:12:05 WARN ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125) ... Solution To fix that problem ensure that hbase-site.xml file exists in /etc/spark/conf on each of the NodeManager node.

dkozlowski · ‎04-24-2017

Is scheduler supported through Zeppelin in HDP 2.5.3 and/or HDP 2.6?

dkozlowski · ‎04-24-2017

Problem I have a problem with accessing spark history data for killed streaming jobs. The problem looks like the one mentioned here: When I click on History under Tracking UI for the FINISHED Spark Job I am redirected to: http://dan3.dklocal:18080/history/application_1491987409680_0013/jobs/ - Spark History URL However, when doing the same for the KILLED one I am redirected to: http://dan3.dklocal:8088/cluster/app/application_1491987409680_0012 - application log on RM UI. And the need is to be redirected for the KILLED jobs to Spark History URL. Solution Killing the spark job using 'yarn application -kill' is the not the right way of doing it. When doing: 'yarn application -kill' - the job got killed (status in RM UI is State: KILLED, FinalStatus: KILLED), the '.inprogress' suffix got removed. But the History does not go to Spark History but RM log. When having done: kill -SIGTERM <Spark Driver>, the job also get killed. In RM UI is State: FINISHED, FinalStatus: SUCCEEDED, the '.inprogress' suffix got removed. And the History now goes to Spark History UI.

dkozlowski · ‎04-21-2017

Here are the steps to get pyspark working on SHC a) add the following into Ambari -> Spark -> Configs -> Advanced spark-env -> spark-env template export SPARK_CLASSPATH=/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.6.0.3-8/spark/conf/ b) kinit as e.g. hbase user c) run $ pyspark --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ d) call each line separately: from pyspark.sql import Row data = range(0,255) rdd = sc.parallelize(data).map(lambda i : Row(name=i,age=i)) import json cat = json.dumps({"table":{"namespace":"default", "name":"dk", "tableCoder":"PrimitiveType"},"rowkey":"key","columns":{"name":{"cf":"rowkey", "col":"key", "type":"string"},"age":{"cf":"cf1", "col":"age", "type":"string"}}}) print(cat) rdd.toDF().write.option("catalog",cat).option("newtable","5").format("org.apache.spark.sql.execution.datasources.hbase").save() NOTE: running the last command from above the following error comes up: 17/04/18 15:39:57 INFO ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15b42da54290012, negotiated timeout = 60000 17/04/18 15:39:57 INFO ZooKeeperRegistry: ClusterId read in ZooKeeper is null Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/hdp/current/spark-client/python/pyspark/sql/readwriter.py", line 395, in save self._jwrite.save() File "/usr/hdp/current/spark-client/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__ File "/usr/hdp/current/spark-client/python/pyspark/sql/utils.py", line 45, in deco return f(*a, **kw) File "/usr/hdp/current/spark-client/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o63.save. : org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations ... To get around the problem, do the following: go to Ambari -> HBase -> Configs -> Advanced tab -> Advanced hbase-site change the value of zookeeper.znode.parent FROM /hbase-unsecure TO /hbase save the changes restart all required services re-run the pyspark -> re-run point c) and d) e) test from the HBase shell [root@dan261 ~]# hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.1.2.2.6.0.3-8, r3307790b5a22cf93100cad0951760718dee5dec7, Sat Apr 1 21:41:47 UTC 2017 hbase(main):001:0> list 'dk' TABLE dk 1 row(s) in 0.3880 seconds => ["dk"] hbase(main):002:0> scan 'dk' ROW COLUMN+CELL \x00\x00\x00\x00\x00\x00\x00\x00 column=cf1:age, timestamp=1492595613501, value=\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00\x01 column=cf1:age, timestamp=1492595613501, value=\x00\x00\x00\x00\x00\x00\x00\x01 \x00\x00\x00\x00\x00\x00\x00\x02 column=cf1:age, timestamp=1492595613488, value=\x00\x00\x00\x00\x00\x00\x00\x02 \x00\x00\x00\x00\x00\x00\x00\x03 column=cf1:age, timestamp=1492595613488, value=\x00\x00\x00\x00\x00\x00\x00\x03 \x00\x00\x00\x00\x00\x00\x00\x04 column=cf1:age, timestamp=1492595613488, value=\x00\x00\x00\x00\x00\x00\x00\x04 ... \x00\x00\x00\x00\x00\x00\x00\xFA column=cf1:age, timestamp=1492577972182, value=\x00\x00\x00\x00\x00\x00\x00\xFA \x00\x00\x00\x00\x00\x00\x00\xFB column=cf1:age, timestamp=1492577972182, value=\x00\x00\x00\x00\x00\x00\x00\xFB \x00\x00\x00\x00\x00\x00\x00\xFC column=cf1:age, timestamp=1492577972182, value=\x00\x00\x00\x00\x00\x00\x00\xFC \x00\x00\x00\x00\x00\x00\x00\xFD column=cf1:age, timestamp=1492577972182, value=\x00\x00\x00\x00\x00\x00\x00\xFD \x00\x00\x00\x00\x00\x00\x00\xFE column=cf1:age, timestamp=1492577972182, value=\x00\x00\x00\x00\x00\x00\x00\xFE 255 row(s) in 0.8570 seconds hbase(main):003:0>

dkozlowski · ‎04-21-2017

Here are the steps to get pyspark working on SHC a) add the following into Ambari -> Spark -> Configs -> Advanced spark-env -> spark-env template export SPARK_CLASSPATH=/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.5.3.0-37/spark/conf/ b) kinit as e.g. hbase user c) run $ pyspark --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ d) call each line separately: from pyspark.sql import Row data = range(0,255) rdd = sc.parallelize(data).map(lambda i : Row(name=i,age=i)) import json cat = json.dumps({"table":{"namespace":"default", "name":"dk", "tableCoder":"PrimitiveType"},"rowkey":"key","columns":{"name":{"cf":"rowkey", "col":"key", "type":"string"},"age":{"cf":"cf1", "col":"age", "type":"string"}}}) print(cat) rdd.toDF().write.option("catalog",cat).option("newtable","5").format("org.apache.spark.sql.execution.datasources.hbase").save() e) test from the HBase shell [root@dan2 ~]# hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.1.2.2.5.3.0-37, rcb8c969d1089f1a34e9df11b6eeb96e69bcf878d, Tue Nov 29 18:48:22 UTC 2016 hbase(main):001:0> list 'dk' TABLE dk 1 row(s) in 0.4220 seconds => ["dk"] hbase(main):002:0> scan 'dk' ROW COLUMN+CELL \x00\x00\x00\x00\x00\x00\x00\x00 column=cf1:age, timestamp=1492525198948, value=\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00\x01 column=cf1:age, timestamp=1492525198948, value=\x00\x00\x00\x00\x00\x00\x00\x01 \x00\x00\x00\x00\x00\x00\x00\x02 column=cf1:age, timestamp=1492525198948, value=\x00\x00\x00\x00\x00\x00\x00\x02 \x00\x00\x00\x00\x00\x00\x00\x03 column=cf1:age, timestamp=1492525198948, value=\x00\x00\x00\x00\x00\x00\x00\x03 \x00\x00\x00\x00\x00\x00\x00\x04 column=cf1:age, timestamp=1492525198948, value=\x00\x00\x00\x00\x00\x00\x00\x04 \x00\x00\x00\x00\x00\x00\x00\x05 column=cf1:age, timestamp=1492525198948, value=\x00\x00\x00\x00\x00\x00\x00\x05 ... \x00\x00\x00\x00\x00\x00\x00\xFC column=cf1:age, timestamp=1492525198941, value=\x00\x00\x00\x00\x00\x00\x00\xFC \x00\x00\x00\x00\x00\x00\x00\xFD column=cf1:age, timestamp=1492525198941, value=\x00\x00\x00\x00\x00\x00\x00\xFD \x00\x00\x00\x00\x00\x00\x00\xFE column=cf1:age, timestamp=1492525198941, value=\x00\x00\x00\x00\x00\x00\x00\xFE 255 row(s) in 0.5950 seconds hbase(main):003:0>

dkozlowski · ‎03-23-2017

There is no "audit" word. However, you should have some extra entries identifying users being logged in and doing some operations. The details in that file is not presented in the same way like for Ambari 2.4.

Online	Offline
Last Visited	‎02-06-2018 06:34 AM

Member Since	‎03-25-2016 06:26 AM
Last Visited	‎02-06-2018 06:34 AM
Posts	142
Kudos received	48

Cloudera Community

Re: ORC Table Timestamp PySpark 2.1 CASTIssue

Re: Can Kafka handle the mixture of authentication...

Re: How do I automate setting up LDAP in Ambari?

Re: Does jar files missing for spark interpreter?

Re: How to save results from dataframe into a sepa...

How to enable user impersonation for SH interprete...

Running job through notebook in Zeppelin returns a...

livy.spark cannot start

Re: Scheduler in Zeppelin

Running Spark on HBase causes issue in Yarn job

Scheduler in Zeppelin

Killed spark streaming jobs are not present in spa...

Looking for a sample python code for Spark-On-HBas...

Looking for a sample python code for Spark-On-HBas...

Re: How to enable Ambari audit