Member since
10-24-2015
171
Posts
379
Kudos Received
23
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2622 | 06-26-2018 11:35 PM | |
4335 | 06-12-2018 09:19 PM | |
2869 | 02-01-2018 08:55 PM | |
1432 | 01-02-2018 09:02 PM | |
6729 | 09-06-2017 06:29 PM |
08-14-2017
07:36 PM
1 Kudo
@Lukas Müller, try below way to create dataframes for data.json import json
import requests
r = requests.get("http://api.luftdaten.info/static/v1/data.json")
df = sqlContext.createDataFrame([json.loads(line) for line in r.iter_lines()]) Reference: https://stackoverflow.com/questions/32418829/using-pyspark-to-read-json-file-directly-from-a-website
... View more
07-31-2017
06:38 PM
1 Kudo
@Maya Tydykov, below thread might help you. https://community.hortonworks.com/questions/23242/caused-by-comgoogleprotobufinvalidprotocolbufferex.html
... View more
07-28-2017
07:22 PM
1 Kudo
@PeiHe Zhang, can you please check the value of yarn application classpath ? It should have all below paths added in this property. <property>
<name>yarn.application.classpath</name>
<value>/etc/hadoop/conf/,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*</value>
</property>
... View more
07-17-2017
09:24 PM
7 Kudos
@Mateusz Grabowski, You should enable Dynamic Resource Allocation in Spark to automatically increase/decrease executors of an app as per resource availability. You can choose to enable DRA in either Spark or Zeppelin . 1) Enable DRA for Spark2 as below. https://community.hortonworks.com/content/supportkb/49510/how-to-enable-dynamic-resource-allocation-in-spark.html 2) Enable DRA via Livy Interpreter. Run all spark notebooks via livy interpreters. https://zeppelin.apache.org/docs/0.6.1/interpreter/livy.html
... View more
07-13-2017
09:23 PM
5 Kudos
@Sami Ahmad, It looks like the document used an older property name. You should look for dfs.datanode.data.dir property instead dfs.data.dirs. ( dfs.data.dirs property was renamed to be dfs.datanode.data.dir in Hadoop2)
... View more
07-11-2017
06:33 PM
1 Kudo
@Jeff Stafford, you can change the value of spark_log_dir from /etc/spark2/conf/spark-env.sh. Restart the Spark services after making configuration change. export SPARK_LOG_DIR=/dev/null
... View more
07-07-2017
06:17 PM
8 Kudos
@suyash soni, Zeppelin notebook can be exported via UI in json format only. Exporting Zeppelin notebook in R extension is not supported. http://fedulov.website/2015/10/16/export-apache-zeppelin-notebooks/
... View more
07-05-2017
10:13 PM
1 Kudo
@Paramesh malla, Is testWrite.txt file present on HDFS while running test code second time ? If yes, please delete /hdfs_nfs/hdfs_data/sampledata/testWrite.txt and rerun. HDFS only supports append, so if you intend to append data after file creation, use 'a' option. with open(filename,'a')as f:
f.write(text)
... View more
06-27-2017
06:24 PM
1 Kudo
@dhieru singh, In this case, you will need to validate each service manually. Typically, smoke tests perform below checks. * Check whether service is up or not * If it has UI , check if UI page is accessible or not * Run a simple sanity use case ( like submit sleep job in case of Hadoop ) Example: Follow below doc to validate health of Hadoop services manually. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_command-line-upgrade/content/run-hadoop-tests-24.html
... View more
06-26-2017
11:58 PM
8 Kudos
@dhieru singh, if this is Ambari installed cluster, you can run service checks as below to validate cluster state. https://community.hortonworks.com/articles/11852/ambari-api-run-all-service-checks-bulk.html
... View more