Member since
10-24-2015
171
Posts
379
Kudos Received
23
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1617 | 06-26-2018 11:35 PM | |
2977 | 06-12-2018 09:19 PM | |
1977 | 02-01-2018 08:55 PM | |
856 | 01-02-2018 09:02 PM | |
4903 | 09-06-2017 06:29 PM |
09-28-2017
06:30 PM
1 Kudo
@Theyaa Matti, If you are running mapreduce job, You can update mapreduce.job.queuename to queue1 in mapred-site.xml This way all application will be launched to queue1 by default.
... View more
09-26-2017
07:01 PM
1 Kudo
It worked. Thanks.
... View more
09-26-2017
06:37 PM
1 Kudo
@Aditya Sirna, /tmp/testa dir is present. an livy user has permission to write to it. I received below output while trying to run webhdfs rest api. [root@xx user]# curl -i -X PUT "http://<namenode host>:50070/webhdfs/v1/tmp/testa/a.txt?user.name=livy&op=CREATE"
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Tue, 26 Sep 2017 17:33:17 GMT
Date: Tue, 26 Sep 2017 17:33:17 GMT
Pragma: no-cache
Expires: Tue, 26 Sep 2017 17:33:17 GMT
Date: Tue, 26 Sep 2017 17:33:17 GMT
Pragma: no-cache
X-FRAME-OPTIONS: SAMEORIGIN
Set-Cookie: hadoop.auth="u=livy&p=livy&t=simple&e=1506483197716&s=dRvADKPG0lrenLje4fmEEdgChFw="; Path=/; HttpOnly
Location: http://xxx:50075/webhdfs/v1/tmp/testa/a.txt?op=CREATE&user.name=livy&namenoderpcaddress=xxx:8020&createflag=&createparent=true&overwrite=false
Content-Type: application/octet-stream
Content-Length: 0
Server: Jetty(6.1.26.hwx)
[root@xxx user]# curl -i -T /tmp/a.txt "http://<namenode host>:50070/webhdfs/v1/tmp/testa/a.txt?op=CREATE&overwrite=false"
HTTP/1.1 100 Continue
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Tue, 26 Sep 2017 17:33:49 GMT
Date: Tue, 26 Sep 2017 17:33:49 GMT
Pragma: no-cache
Expires: Tue, 26 Sep 2017 17:33:49 GMT
Date: Tue, 26 Sep 2017 17:33:49 GMT
xRAME-OPTIONS: SAMEORIGIN
Location: http://xxx:50075/webhdfs/v1/tmp/testa/a.txt?op=CREATE&namenoderpcaddress=xx:8020&createflag=&createparent=true&overwrite=false
Content-Type: application/octet-stream
Content-Length: 0
Server: Jetty(6.1.26.hwx)
... View more
09-26-2017
05:50 PM
1 Kudo
I'm looking for Wehdfs Rest api example to upload a file to HDFS. I tried with below Api but could not upload a file to hdfs curl -i -X PUT "http://<namenode host>:50070/webhdfs/v1/tmp/testa/a.txt?user.name=livy&op=CREATE" curl -i -T /tmp/a.txt "http://<namenode host>:50070/webhdfs/v1/tmp/testa/a.txt?op=CREATE&overwrite=false"
... View more
- Tags:
- Hadoop Core
- HDFS
Labels:
- Labels:
-
Apache Hadoop
09-15-2017
06:31 PM
1 Kudo
@Palash Dutta, Find below articles which shows how to rotate Hdfs logs and also zip the logs. https://community.hortonworks.com/articles/50058/using-log4j-extras-how-to-rotate-as-well-as-zip-th.html https://community.hortonworks.com/questions/78699/how-to-rotate-and-archive-hdfs-audit-log-file.html
... View more
09-11-2017
06:39 PM
1 Kudo
@Sebastien Chausson, you can refer to below document to set up spark keystore/truststore. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.5/bk_spark-component-guide/content/spark-encryption.html
... View more
09-06-2017
06:29 PM
11 Kudos
@Sanaz Janbakhsh, Check maximum-applications and maximum-am-resource-percent properties in your cluster. Try increasing values for below properties to allow more applications to be running at a time. yarn.scheduler.capacity.maximum-applications / yarn.scheduler.capacity.<queue-path>.maximum-applications Maximum number of applications in the system which can be concurrently active both running and pending. Limits on each queue are directly proportional to their queue capacities and user limits. This is a hard limit and any applications submitted when this limit is reached will be rejected. Default is 10000. This can be set for all queues with yarn.scheduler.capacity.maximum-applications and can also be overridden on a per queue basis by setting yarn.scheduler.capacity.<queue-path>.maximum-applications. Integer value expected. yarn.scheduler.capacity.maximum-am-resource-percent / yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent Maximum percent of resources in the cluster which can be used to run application masters - controls number of concurrent active applications. Limits on each queue are directly proportional to their queue capacities and user limits. Specified as a float - ie 0.5 = 50%. Default is 10%. This can be set for all queues with yarn.scheduler.capacity.maximum-am-resource-percent and can also be overridden on a per queue basis by setting yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent
... View more
08-30-2017
11:51 PM
1 Kudo
@suresh krishRefer to below thread, it returns Json response to get version for each component. https://community.hortonworks.com/questions/26581/rest-api-to-get-individual-hdp-component-version.html
... View more
08-30-2017
11:33 PM
1 Kudo
kalyanasish chanda , This is a duplicate question. can you please remove this post ?
... View more
08-30-2017
11:31 PM
7 Kudos
@kalyanasish chanda, After making config changes, you can directly run api to restart stale components. Find the post regarding how to restart stale components as below. https://community.hortonworks.com/articles/73272/how-to-find-and-fix-ambari-stale-configuration-usi.html
... View more
08-30-2017
11:26 PM
6 Kudos
@parag dharmadhikari, the permission of /tmp dir is not correct on HDFS. Typically /tmp dir has 777 permission. Run below command on your cluster, it will resolve this permission denied error. hdfs dfs -chmod 777 /tmp
... View more
08-29-2017
08:09 PM
2 Kudos
@Robert Hryniewicz, DSX local installation typically take 2.5 to 3 hours. Installation time mainly depends on network speed and available resources. Make sure your cluster meet the pre-requirements listed below. https://datascience.ibm.com/docs/content/local/requirements.html?linkInPage=true
... View more
08-14-2017
07:36 PM
1 Kudo
@Lukas Müller, try below way to create dataframes for data.json import json
import requests
r = requests.get("http://api.luftdaten.info/static/v1/data.json")
df = sqlContext.createDataFrame([json.loads(line) for line in r.iter_lines()]) Reference: https://stackoverflow.com/questions/32418829/using-pyspark-to-read-json-file-directly-from-a-website
... View more
08-11-2017
06:23 PM
@Theyaa Matti, If you want to move the logs to HTTPS, You will need to enable Wire-encryption in your cluster. Please follow below document to enable Wire encryption. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_security/content/ch_wire-webhdfs-mr-yarn.html
... View more
07-31-2017
06:38 PM
1 Kudo
@Maya Tydykov, below thread might help you. https://community.hortonworks.com/questions/23242/caused-by-comgoogleprotobufinvalidprotocolbufferex.html
... View more
07-31-2017
06:33 PM
1 Kudo
@Fahad Sarwar, Capacity scheduler does not have placement rules, where you can configure that if userX is running job, place the job in QueueX. Capacity scheduler ACLS only check whether the application is allowed to run in specific queue or not. In order to be able to run the job in a specific queue, you will need to set the queue config while running apps. ( If the queue config is not set, by default application gets launched in "default" queue) For mapreduce jobs : set -Dmapred.job.queue.name=<queue-name> or -Dmapred.job.queuename=<queue-name> yarn jar /usr/lib/gphd/hadoop-mapreduce/hadoop-mapreduce-examples-x.x.x-alpha-gphd-x.x.x.x.jar wordcount -D mapreduce.job.queuename=<queue-name> /tmp/test_input /user/fail_user/test_output For spark jobs: set --queue <queue-name> spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --queue <queue-name> /usr/hdp/2.x.x.x-xxxx/spark/lib/spark-examples-x.x.x.x.x.x.x-xxxx-hadoopx.x.x.x.x.x.x-xxxx.jar 10
... View more
07-28-2017
07:22 PM
1 Kudo
@PeiHe Zhang, can you please check the value of yarn application classpath ? It should have all below paths added in this property. <property>
<name>yarn.application.classpath</name>
<value>/etc/hadoop/conf/,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*</value>
</property>
... View more
07-18-2017
05:05 PM
1 Kudo
@Ivan Majnaric, It looks like your job is trying to find jar on local filesystem instead HDFS. java.io.FileNotFoundException: File file:/spark-examples_2.11-2.1.0.2.6.0.3-8.jar does not exist Please follow below article to set up Spark-Oozie Action workflow. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_spark-component-guide/content/ch_oozie-spark-action.html
... View more
07-17-2017
09:24 PM
7 Kudos
@Mateusz Grabowski, You should enable Dynamic Resource Allocation in Spark to automatically increase/decrease executors of an app as per resource availability. You can choose to enable DRA in either Spark or Zeppelin . 1) Enable DRA for Spark2 as below. https://community.hortonworks.com/content/supportkb/49510/how-to-enable-dynamic-resource-allocation-in-spark.html 2) Enable DRA via Livy Interpreter. Run all spark notebooks via livy interpreters. https://zeppelin.apache.org/docs/0.6.1/interpreter/livy.html
... View more
07-13-2017
09:23 PM
5 Kudos
@Sami Ahmad, It looks like the document used an older property name. You should look for dfs.datanode.data.dir property instead dfs.data.dirs. ( dfs.data.dirs property was renamed to be dfs.datanode.data.dir in Hadoop2)
... View more
07-11-2017
06:33 PM
1 Kudo
@Jeff Stafford, you can change the value of spark_log_dir from /etc/spark2/conf/spark-env.sh. Restart the Spark services after making configuration change. export SPARK_LOG_DIR=/dev/null
... View more
07-10-2017
07:47 PM
1 Kudo
@Tom Shiels, Zeppelin currently does not have this feature. It is definitely a good feature request. Feel free to create Apache Zeppelin Jira to include this feature in Zeppelin.
... View more
07-07-2017
06:17 PM
8 Kudos
@suyash soni, Zeppelin notebook can be exported via UI in json format only. Exporting Zeppelin notebook in R extension is not supported. http://fedulov.website/2015/10/16/export-apache-zeppelin-notebooks/
... View more
07-05-2017
10:13 PM
1 Kudo
@Paramesh malla, Is testWrite.txt file present on HDFS while running test code second time ? If yes, please delete /hdfs_nfs/hdfs_data/sampledata/testWrite.txt and rerun. HDFS only supports append, so if you intend to append data after file creation, use 'a' option. with open(filename,'a')as f:
f.write(text)
... View more
07-05-2017
06:17 PM
1 Kudo
@John Ross Tasipit, can you please check Timeline server logs to check why is it going down ?
... View more
06-27-2017
06:24 PM
1 Kudo
@dhieru singh, In this case, you will need to validate each service manually. Typically, smoke tests perform below checks. * Check whether service is up or not * If it has UI , check if UI page is accessible or not * Run a simple sanity use case ( like submit sleep job in case of Hadoop ) Example: Follow below doc to validate health of Hadoop services manually. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_command-line-upgrade/content/run-hadoop-tests-24.html
... View more
06-26-2017
11:58 PM
8 Kudos
@dhieru singh, if this is Ambari installed cluster, you can run service checks as below to validate cluster state. https://community.hortonworks.com/articles/11852/ambari-api-run-all-service-checks-bulk.html
... View more
06-20-2017
04:59 AM
1 Kudo
@Colton Rodgers, Good to know that it helped.Thanks for accepting answer.
... View more
06-20-2017
04:59 AM
1 Kudo
@Colton Rodgers, can you please share livy interpreter logs ?
... View more
06-20-2017
04:59 AM
8 Kudos
@Colton Rodgers, can you please confirm if 'livy.impersonation.enabled' is set to true? There was a known spark issue ( SPARK-13478) where spark application was hitting GSSException while starting HiveContext when livy impersonation was enabled. Due to this issue livy paragraphs were failing with "Can not start Spark". Workaround: remove hive-site.xml from /etc/spark/conf ( on the livy server host) and set livy.impersonation.enabled to false.
... View more