About falbani

falbani · ‎08-06-2018

@Marcel-Jan Krijgsman have you configured hive.server2.enable.doAs=false ? If you have it set to true then all access to hdfs is going to be made as caller user. However if set to false then access to hdfs is made as hive. I believe the recommended approach is to set this value to false for ranger hive sql authorization. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎08-06-2018

@Carlton Patterson You can use this to write whole dataframe to single file: myresults.coalesce(1).write.csv("/tmp/myresults.csv") HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-31-2018

@Bhushan Kandalkar Could you be more precise as to what step of the ones I provided above is giving you the permission denied error? Thanks!

falbani · ‎07-31-2018

@Bhushan Kandalkar To recover and and change notebook permissions manually you should do the following: 1. Login to zeppelin server host and switch to zeppelin user. If kerberized, kinit as zeppelin principal using zeppelin keytab 2. Make a backup of the following hdfs file hdfs dfs -cp /user/zeppelin/conf/notebook-authorization.json /user/zeppelin/conf/notebook-authorization.json.orig 3. Get the file from hdfs to local file system hdfs dfs -get /user/zeppelin/conf/notebook-authorization.json /tmp/notebook-authorization.json 4. Edit the file and replate all occurences of the username sed -i -e 's/bhushan-kandalkar@test.com/bhushan-kandalkar/g' /tmp/notebook-authorization.json 5. Upload the file to hdfs hdfs dfs -put -f /tmp/notebook-authorization.json /user/zeppelin/conf/ 6. Restart zeppelin server using ambari Let me know if that works for you. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-31-2018

@Mark I suggest you take the NetworkWordCount example as starting point. Then to transform the stream rdd into dataframe I recommend you look into flatMap, as you can map single column RDD into multiple columns after parsing the json content of each object. Finally when saving to hdfs you should consider a good batch size/repartition to avoid having small files in hdfs. 1. The NetworkWordCount code in github is located here: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/NetworkWordCount.scala 2. Here is an example of how to parse JSON using map and flatmap https://github.com/holdenk/learning-spark-examples/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/BasicParseJson.scala 3. Saving Dataframe as ORC is very well documented. Just avoid writing small files as this will hurt namenode and your hdfs overall. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-30-2018

@Bin Ye Please check if the property "hbase.backup.enable" is set correctly in hbase-site.xml Also please review this other documentation link specific to backups with hbase: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_data-access/content/ch_hbase_bar.html if you continue to experience issue please open a separate hcc thread for this problem. If the already provided information has helped you answer the original question please remember to login and mark the answer as Accepted.

falbani · ‎07-30-2018

Hi @Bin Ye Does this snapshot continuously track the changes of this table? No, snapshot is like a photo of the table data at certain point of time. Snapshot will also mark data to prevent it from being deleted. For more information please review this HCC link posts https://es.hortonworks.com/blog/coming-hdp-2-5-incremental-backup-restore-apache-hbase-apache-phoenix/ https://community.hortonworks.com/questions/102843/hbase-snapshot-or-backup.html There are differences between snapshot and backup and incremental back up which you can review in the above link. Hopefully this will help you decide if snapshot is the correct solution for you or if you should go for backup. HTH

falbani · ‎07-27-2018

@HDave Have you checked the following library: https://github.com/crealytics/spark-excel

falbani · ‎07-26-2018

@Paul Lam Please review this HCC link instructions: https://community.hortonworks.com/content/supportkb/178553/more-than-one-mpack-installed-for-the-same-service.html you will find the command to remove the mpack installed ambari-server uninstall-mpack --mpack-name=hdf-ambari-mpack --verbose HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-26-2018

@Nikil Katturi any luck with this one? Please keep me posted.

Online	Offline
Last Visited	‎02-05-2025 11:14 AM

Member Since	‎06-09-2016 09:21 PM
Last Visited	‎02-05-2025 11:14 AM
Posts	529
Kudos received	129

Cloudera Community

Re: Dependency of HDP Atlas on Ranger

Re: Spark throws "Invalid Sync" Error when trying ...

Re: Does HS2 integration with AD impact zeppelin c...

Re: zeppelin jdbc interpreter issue when HS2 is i...

Re: Accessing hive database outside the cluster ne...

Re: Ranger: is it possible to grant access to a se...

Re: How to save all the output of pyspark sql quer...

Re: How to change Owner of Zeppelin Notebook?

Re: How to change Owner of Zeppelin Notebook?

Re: spark streaming json to hive

Re: HBase Snapshot

Re: HBase Snapshot

Re: Best spark Scala API to write data into excel ...

Re: Remove HDF mpack

Re: Unknown option --job=sample during spark submi...