Support Questions

Find answers, ask questions, and share your expertise
Announcements
Now Live: Explore expert insights and technical deep dives on the new Cloudera Community BlogsRead the Announcement

Create a Hive table with HDFS RBF location

avatar
Contributor

Hello,

I'm trying to insert data into a hive table configured with hdfs router based federation as table location

Location: | hdfs://router_host:8888/router/router.db/router_test_table

Cluster is kebrerized and all components including Hive and RBF are working as expected except this specific use case.

The hive table insert job is failing with kerberos error when using RBF as the table location

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: DestHost:destPort router_host:8888 , LocalHost:localPort datanode_host. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:639)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:563)
... 17 more
Caused by: java.io.IOException: DestHost:destPort router_host:8888 , LocalHost:localPort datanode_host. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:913)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:888)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1616)
at org.apache.hadoop.ipc.Client.call(Client.java:1558)
at org.apache.hadoop.ipc.Client.call(Client.java:1455)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)

3 REPLIES 3

avatar
Expert Contributor

Hi @Hadoop16 ,

This stack error usually happens when you have an inconsistency on the jdk versions.

Try to check different versions you have in HDFS and Hive.

You can also try to export your java_home.

 

Reference:

https://community.cloudera.com/t5/Internal/ERROR-quot-Failed-on-local-exception-java-io-IOException-...

avatar
Expert Contributor

avatar
Expert Contributor

@Hadoop16 FYI

 This error occurs because of a token delegation gap between Hive and the HDFS Router.
In a Kerberized cluster, when Hive (running on a DataNode/Compute node via Tez or MapReduce) attempts to write to HDFS, it needs a Delegation Token. When you use an HDFS Router address, Hive must be explicitly told to obtain a token specifically for the Router's service principal, which may be different from the backend NameNodes.
 
 The Root Cause
The error Client cannot authenticate via:[TOKEN, KERBEROS] at the FileSinkOperator stage indicates that the tasks running on your worker nodes do not have a valid token to "speak" to the Router at router_host:8888.
When Hive plans the job, it usually fetches tokens for the default filesystem. If your fs.defaultFS is set to a regular NameNode but your table location is an RBF address, Hive might not be fetching the secondary token required for the Router.
 
  The Fix: Configure Token Requirements
You need to ensure Hive and the underlying MapReduce/Tez framework know to fetch tokens for the Router's URI.
 
1. Add the Router URI to Hive's Token List
In your Hive session (or globally in hive-site.xml), you must define the Router as a "known" filesystem that requires tokens.
 
 
SET hive.metastore.token.signature=hdfs://router_host:8888;
SET mapreduce.job.hdfs-servers=hdfs://router_host:8888,hdfs://nameservice-backend;
 
2. Configure HDFS Client to "Trust" the Router for Tokens
 
In core-site.xml or hdfs-site.xml, you need to enable the Router to act as a proxy for the backend NameNodes so it can pass the tokens correctly.
 
<property>
  <name>dfs.federation.router.delegation.token.enable</name>
  <value>true</value>
</property>
 
  Critical Kerberos Configuration
Because the Router is an intermediary, it must be allowed to impersonate the user (Hive) when talking to the backend. Ensure your ProxyUser settings in core-site.xml include the Router's service principal.
Assuming your Router runs as the hdfs or router user:
 
<property>
  <name>hadoop.proxyuser.router.groups</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.router.hosts</name>
  <value>*</value>
</property>
 
 Diagnostic Verification
To prove if the token is missing, run this command from the datanode_host mentioned in your error logs using the same user running the Hive job:
 
# Check if you can manually get a token for the router
hdfs fetchdt --renewer hdfs hdfs://router_host:8888 router.token
 
# Check the contents of your current credentials cache
klist -f
 
If fetchdt fails, the issue is with the Router's ability to issue tokens. If it succeeds but Hive fails, the issue is with Hive's Job Submission not including the Router URI in the mapreduce.job.hdfs-servers list.