Created 11-26-2018 06:20 PM
I am trying to run the HDP tutorial (trucking example) in a HDP 2.6.5 cluster.
I was able to upload the CSV data files into HDFS. When I am trying to upload new table from trucks.csv, the table preview works fine but I got a "ServiceFormattedException" when I clicked the "Create" button, with the following stack trace from the ambari server logs:
org.apache.ambari.view.utils.hdfs.HdfsApiException: HDFS020 Could not write file /user/admin/hive/jobs/hive-job-54-2018-11-25_10-49/logs at org.apache.ambari.view.utils.hdfs.HdfsUtil.putStringToFile(HdfsUtil.java:57) at org.apache.ambari.view.hive20.resources.jobs.viewJobs.JobControllerImpl.setupLogFile(JobControllerImpl.java:220) at org.apache.ambari.view.hive20.resources.jobs.viewJobs.JobControllerImpl.setupLogFileIfNotPresent(JobControllerImpl.java:189) at org.apache.ambari.view.hive20.resources.jobs.viewJobs.JobControllerImpl.afterCreation(JobControllerImpl.java:182) at org.apache.ambari.view.hive20.resources.jobs.viewJobs.JobResourceManager.create(JobResourceManager.java:56) at org.apache.ambari.view.hive20.resources.jobs.JobServiceInternal.createJob(JobServiceInternal.java:27) at org.apache.ambari.view.hive20.resources.browser.DDLProxy.createJob(DDLProxy.java:384) at org.apache.ambari.view.hive20.resources.browser.DDLProxy.createTable(DDLProxy.java:256) at org.apache.ambari.view.hive20.resources.browser.DDLService.createTable(DDLService.java:147) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ... Caused by: java.io.IOException: Unexpected HTTP response: code=504 != 201, op=CREATE, message=Gateway Timeout at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:467) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:114) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathOutputStreamRunner$1.close(WebHdfsFileSystem.java:950) at org.apache.ambari.view.utils.hdfs.HdfsUtil$1.run(HdfsUtil.java:51) at org.apache.ambari.view.utils.hdfs.HdfsUtil$1.run(HdfsUtil.java:46) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.ambari.view.utils.hdfs.HdfsApi.execute(HdfsApi.java:513) at org.apache.ambari.view.utils.hdfs.HdfsUtil.putStringToFile(HdfsUtil.java:46) ... 105 more Caused by: java.io.IOException: Content-Type "text/html" is incompatible with "application/json" (parsed="text/html") at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.jsonParse(WebHdfsFileSystem.java:443) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:465) ... 114 more
Any ideas on what's causing the content type error? Why is the Ambari client not setting the content type correctly when calling the HDFS API?
Btw I have already added
hadoop.proxyuser.root.groups=* hadoop.proxyuser.root.hosts=*
so the exception is not caused by the ambari user not able to write to the HDFS volume.
Created 11-27-2018 06:26 AM
Hi @Eric Yuan ,
i see error :
Causedby: java.io.IOException:Unexpected HTTP response: code=504!=201, op=CREATE, message=GatewayTimeout
please check if you have enabled any Http Proxy / Network proxy at your end?
I am suspecting that the WebHDFS requests originated by the Hive View is actually passing through some Http Proxy configured on your cluster. You may need to either make the request bypass the proxy server or make the proxy work.
So please check the following:
1. Check the "environment" setting to find out if there is any Http Proxy added? (look for 'proxy')
# /var/lib/ambari-agent/ambari-sudo.sh su hdfs -l -s /bin/bash -c 'env'
2. See if you are able to make the WebHDFS call via terminal from ambari server host? And to see the output of the request is being passed via proxy?
# curl -ivL -X GET "http://$ACTIVE_NAME_NODE:50070/webhdfs/v1/user/admin?op=GETHOMEDIRECTORY&user.name=admin"
3. You can also refer to the following doc to know how to enable Http Proxy settings inside Ambari Server (and you can also configure ambari JVM property to exclude your cluster nodes requests to NOT be passed via proxy) See: https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-administration/content/ch_setting...
-Dhttp.nonProxyHosts=<pipe|separated|list|of|hosts>
4. Or you can also configure "no_proxy" at the "~/.bash_profile" OR "/etc/profile" level globally to make suere that your internal cluster requests are not passed vias Proxy.
no_proxy=".example.com"export no_proxy
.
Please see if this helps you and please accept answer if it did.
Created 11-26-2018 10:13 PM
you also need to add the Ambari admin user as a proxy user (the same way as root), or ensure that admin itself has access to read/write in that HDFS location
the error comes down to the user that hive is executing as on HDFS doesn't have read/write access to the files, this could be hive (if impersonation is disabled) or the end user that you are signed into Ambari as (if impersonation is enabled)
Created 11-26-2018 11:20 PM
@rtheron actually both "hive" and "admin" (which was the end user used for signing into Ambari) were added as proxy users; the error is the same.
Created 11-26-2018 11:39 PM
Try to write to that hdfs folder using each of these users (hdfs dfs -put) for example, whichever user hive impersonates to access hdfs likely lacks permissions on that folder To get around this I usually just do (hdfs dfs -chmod 777) on that folder in hdfs, you may not want open permissions on that file/folder but that's a good way to confirm the issue is actually file permissions
When impersonation is disabled the user that needs access to the folder is 'hive', when enabled it is the user logged in to ambari, so check your impersonation settings for hive views and hive itself
Created 11-27-2018 02:34 AM
@rtheron thanks much for the quick response, but are you sure it's a file permission issue... hdfs dfs commands work just fine. The strange thing is, Ambari is able to create the hive job folder (e.g. /user/admin/hive/jobs/hive-job-75-2018-11-26_05-14), create the correct "query.hql" DDL file in it, but somehow not able to write the execution results to the very same folder.
Btw I can't seem to find the impersonation settings in hive -> config -> advanced. Do you know where? Thanks
Created 11-27-2018 03:43 AM
I saw the "could not write to" error in u your logs and figured it would be worth confirming, if user hive or end user can read /write to that location it's probably not the problem
There are some gateway related errors in there as well, is this happening via Knox gateway?
Created 11-27-2018 04:05 AM
@rtheron yes i just realized the knox gateway is on - I turned it off and restarted hive and hdfs, but the error remains. Should I restart Ambari server? I need to do that in the office tomorrow and give it a try. Thanks again
Created 11-27-2018 06:26 AM
Hi @Eric Yuan ,
i see error :
Causedby: java.io.IOException:Unexpected HTTP response: code=504!=201, op=CREATE, message=GatewayTimeout
please check if you have enabled any Http Proxy / Network proxy at your end?
I am suspecting that the WebHDFS requests originated by the Hive View is actually passing through some Http Proxy configured on your cluster. You may need to either make the request bypass the proxy server or make the proxy work.
So please check the following:
1. Check the "environment" setting to find out if there is any Http Proxy added? (look for 'proxy')
# /var/lib/ambari-agent/ambari-sudo.sh su hdfs -l -s /bin/bash -c 'env'
2. See if you are able to make the WebHDFS call via terminal from ambari server host? And to see the output of the request is being passed via proxy?
# curl -ivL -X GET "http://$ACTIVE_NAME_NODE:50070/webhdfs/v1/user/admin?op=GETHOMEDIRECTORY&user.name=admin"
3. You can also refer to the following doc to know how to enable Http Proxy settings inside Ambari Server (and you can also configure ambari JVM property to exclude your cluster nodes requests to NOT be passed via proxy) See: https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.0/bk_ambari-administration/content/ch_setting...
-Dhttp.nonProxyHosts=<pipe|separated|list|of|hosts>
4. Or you can also configure "no_proxy" at the "~/.bash_profile" OR "/etc/profile" level globally to make suere that your internal cluster requests are not passed vias Proxy.
no_proxy=".example.com"export no_proxy
.
Please see if this helps you and please accept answer if it did.
Created 11-28-2018 05:22 PM
Thanks Akhil! I did forget to set nonProxyHosts in the ambari-env.sh file. Now everything works fine.