Member since
06-09-2016
529
Posts
129
Kudos Received
104
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1671 | 09-11-2019 10:19 AM | |
9197 | 11-26-2018 07:04 PM | |
2389 | 11-14-2018 12:10 PM | |
5089 | 11-14-2018 12:09 PM | |
3051 | 11-12-2018 01:19 PM |
08-06-2018
01:43 PM
@Marcel-Jan Krijgsman have you configured hive.server2.enable.doAs=false ? If you have it set to true then all access to hdfs is going to be made as caller user. However if set to false then access to hdfs is made as hive. I believe the recommended approach is to set this value to false for ranger hive sql authorization. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
08-06-2018
01:21 PM
@Carlton Patterson You can use this to write whole dataframe to single file: myresults.coalesce(1).write.csv("/tmp/myresults.csv") HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
07-31-2018
07:00 PM
@Bhushan Kandalkar Could you be more precise as to what step of the ones I provided above is giving you the permission denied error? Thanks!
... View more
07-31-2018
03:04 PM
@Bhushan Kandalkar To recover and and change notebook permissions manually you should do the following: 1. Login to zeppelin server host and switch to zeppelin user. If kerberized, kinit as zeppelin principal using zeppelin keytab 2. Make a backup of the following hdfs file hdfs dfs -cp /user/zeppelin/conf/notebook-authorization.json /user/zeppelin/conf/notebook-authorization.json.orig 3. Get the file from hdfs to local file system hdfs dfs -get /user/zeppelin/conf/notebook-authorization.json /tmp/notebook-authorization.json 4. Edit the file and replate all occurences of the username sed -i -e 's/bhushan-kandalkar@test.com/bhushan-kandalkar/g' /tmp/notebook-authorization.json 5. Upload the file to hdfs hdfs dfs -put -f /tmp/notebook-authorization.json /user/zeppelin/conf/ 6. Restart zeppelin server using ambari Let me know if that works for you. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
07-31-2018
01:44 PM
1 Kudo
@Mark I suggest you take the NetworkWordCount example as starting point. Then to transform the stream rdd into dataframe I recommend you look into flatMap, as you can map single column RDD into multiple columns after parsing the json content of each object. Finally when saving to hdfs you should consider a good batch size/repartition to avoid having small files in hdfs. 1. The NetworkWordCount code in github is located here: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/NetworkWordCount.scala 2. Here is an example of how to parse JSON using map and flatmap https://github.com/holdenk/learning-spark-examples/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/BasicParseJson.scala 3. Saving Dataframe as ORC is very well documented. Just avoid writing small files as this will hurt namenode and your hdfs overall. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
07-30-2018
07:01 PM
@Bin Ye Please check if the property "hbase.backup.enable" is set correctly in hbase-site.xml Also please review this other documentation link specific to backups with hbase: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_data-access/content/ch_hbase_bar.html if you continue to experience issue please open a separate hcc thread for this problem. If the already provided information has helped you answer the original question please remember to login and mark the answer as Accepted.
... View more
07-30-2018
04:53 PM
Hi @Bin Ye Does this snapshot continuously track the changes of this table? No, snapshot is like a photo of the table data at certain point of time. Snapshot will also mark data to prevent it from being deleted. For more information please review this HCC link posts https://es.hortonworks.com/blog/coming-hdp-2-5-incremental-backup-restore-apache-hbase-apache-phoenix/ https://community.hortonworks.com/questions/102843/hbase-snapshot-or-backup.html There are differences between snapshot and backup and incremental back up which you can review in the above link. Hopefully this will help you decide if snapshot is the correct solution for you or if you should go for backup. HTH
... View more
07-27-2018
07:11 PM
@HDave Have you checked the following library: https://github.com/crealytics/spark-excel
... View more
07-26-2018
07:53 PM
@Paul
Lam
Please review this HCC link instructions: https://community.hortonworks.com/content/supportkb/178553/more-than-one-mpack-installed-for-the-same-service.html you will find the command to remove the mpack installed ambari-server uninstall-mpack --mpack-name=hdf-ambari-mpack --verbose HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
07-26-2018
05:11 PM
@Nikil Katturi any luck with this one? Please keep me posted.
... View more