About falbani

falbani · ‎05-25-2018

@Tamil Selvan K HTTP is more firewall friendly protocol and this is usually the reason why you may endup using it when you need to connect to hive from remote clients. Keep in mind you can have single protocol configured for each hiveserver2. However you can have multiple hiveserver2 on your cluster. If necessary you could have some hiveserver2 services configured with binary and some configured with http. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎05-25-2018

@Mokkan Mok NN does not write blocks to DN, only client to DN and DN to DN (depending on replication factor). Client to DN depends on client you are using. If you are using webhdfs you will be using HTTP for example. Other clients like hdfs use RPC protocol. I think DN to DN replication is always RPC. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎05-25-2018

@bharat sharma Notebooks are not python modules. If you are trying to import a notebook as if it was a python module AFAIK that won't work. If you are trying to import modules to pyspark application you have different ways to do this. One way is to copy the python file to hdfs and use the following: %pyspark sc.addPyFile("/user/zeppelin/my_settings.py") import my_settings HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎05-24-2018

@Mokkan Mok Yes, Namenode gives the delegation token. Command line tool is: # hdfs fetchdt More on it here: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#fetchdt Note: If you are satisfied with the answer, please take a moment to login and click the "accept" link on the answer.

falbani · ‎05-24-2018

@Pramod GM Have you tried yarn-client mode? I would recommend you test using spark-shell with same configuration arguments and see if running a simple sc.textFile("hdfs://...") works or not. Try to point directly to the active NN with port and without port. Are both clusters Name nodes configured in HA? HTH

falbani · ‎05-24-2018

@Mokkan Mok 1. We can get delegation token and even if we kdestroy the tickets, we can still access using delegation token? Yes, the following hc link shows exactly this with an example https://community.hortonworks.com/articles/50069/demystifying-delegation-token.html 2. Is delegation token part of kerberos or just depend on kereberos? Delegation token is not part of kerberos. But in order to get a delegation token you need to have a valid kerberos token. 3. Is it just a separate package? Each hadooop service like HDFS, YARN, HIVE, HBASE client api provides a way to fetch delegation tokens. Each delagation token has expiration and max issue date. As long as is valid clients can use the delegation token to authenticate with the service. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎05-23-2018

@vishal dutt Try this only replace the path on jars and make sure sqlserver.py on working directory (rest leave it as is) spark-submit --master yarn --deploy-mode cluster --jars /path/to/driver/sqljdbc42.jar --conf "spark.driver.extraClassPath=sqljdbc42.jar" --conf "spark.executor.extraClassPath=sqljdbc42.jar" sqlserver.py HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎05-23-2018

Thanks for sharing the solution!

falbani · ‎05-23-2018

@skekatpuray I see you are using session api instead of batches. Try running with curl -X POST --data '{"kind":"pyspark", "conf":{ "pyFiles" : "/user/skekatpu/pw/codebase/splitter.py"} }'-H "Content-Type: application/json"-H "X-Requested-By: root" http://localhost:8999/batches HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎05-22-2018

@skekatpuray --py-files is for command line only. Try using spark.submit.pyFiles instead with Livy. You should add this via Spark configurations in "conf" field of REST. Check this link for more information: https://community.hortonworks.com/articles/151164/how-to-submit-spark-application-through-livy-rest.html Perhaps those pyFiles you should add to hdfs and point from hdfs instead from file system level, since those wont be present for Livy locally. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

Online	Offline
Last Visited	‎02-05-2025 11:14 AM

Member Since	‎06-09-2016 09:21 PM
Last Visited	‎02-05-2025 11:14 AM
Posts	529
Kudos received	129

Cloudera Community

Re: Dependency of HDP Atlas on Ranger

Re: Spark throws "Invalid Sync" Error when trying ...

Re: Does HS2 integration with AD impact zeppelin c...

Re: zeppelin jdbc interpreter issue when HS2 is i...

Re: Accessing hive database outside the cluster ne...

Re: What is the difference between HTTP & Binary t...

Re: hdfs block protocol

Re: Import error on pyspark and Zeppelin for local...

Re: delegation token and block token question

Re: Kerberos Cross Realm HDFS Access Via Spark App...

Re: delegation token and block token question

Re: spark-submit class not found

Re: Invoke Livy with pyFiles attribute

Re: Invoke Livy with pyFiles attribute

Re: Invoke Livy with pyFiles attribute