Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

[FileNotFoundException issue] Running MapReduceIndexerTool with kerberos setup

avatar
Contributor

Hello,

 

I am trying to index a csv file after enabling kerberos.

Please note, MapReduceIndexer worked fine before using kerberos and I could index files.

 

My settings are as belows. Please help regarding this issue.

 

Command:

 

HADOOP_OPTS="-Djava.security.auth.login.config=/etc/solr/conf/jass.conf" \
hadoop jar /usr/lib/solr/contrib/mr/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool \
  -D 'mapred.child.java.opts=-Xmx2048m' \
  --log4j /usr/share/doc/search-1.0.0+cdh5.10.0+0/examples/solr-nrt/log4j.properties \
  --morphline-file /home/cloudera/workspace/solr_home/morphlines/csv_morphline.conf \
  --output-dir hdfs://quickstart.cloudera:8020/user/hdfs/test_output \
  --verbose \
  --go-live \
  --zk-host quickstart.cloudera:2181/solr \
  --collection csv_collection \
  hdfs://quickstart.cloudera:8020/user/hdfs/test_input/books.cs

 

Jass.conf:

 

Client {
  com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=true
  keyTab="/etc/hadoop/conf/hdfs.keytab"
  storeKey=true
  useTicketCache=false
  debug=true
  principal="hdfs/quickstart.cloudera@CLOUDERA";
};

However, I am getting FileNotFoundException as below.

 

2017-05-30 05:14:36,949 WARN org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService: Could not process job files
java.io.FileNotFoundException: File does not exist: /user/hdfs/.staging/job_1495787732900_0011/job_1495787732900_0011.summary
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2007)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1977)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1890)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:572)
	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:89)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1281)
	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1266)
	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1254)
	at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:305)
	at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:271)
	at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:263)
	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1585)
	at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326)
	at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:322)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:783)
	at org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService$FlagFileHandler.buildJobIndexInfo(KilledHistoryService.java:223)
	at org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService$FlagFileHandler.access$200(KilledHistoryService.java:85)
	at org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService$FlagFileHandler$1.run(KilledHistoryService.java:133)
	at org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService$FlagFileHandler$1.run(KilledHistoryService.java:125)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
	at org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService$FlagFileHandler.run(KilledHistoryService.java:125)
	at java.util.TimerThread.mainLoop(Timer.java:555)
	at java.util.TimerThread.run(Timer.java:505)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/hdfs/.staging/job_1495787732900_0011/job_1495787732900_0011.summary
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2007)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1977)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1890)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:572)
	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:89)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)

	at org.apache.hadoop.ipc.Client.call(Client.java:1472)
	at org.apache.hadoop.ipc.Client.call(Client.java:1409)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
	at com.sun.proxy.$Proxy17.getBlockLocations(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:256)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
	at com.sun.proxy.$Proxy18.getBlockLocations(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1279)
	... 21 more
1 ACCEPTED SOLUTION

avatar
Contributor

I was able to solve this problem. I was getting the concept wrong. So, what I did was --

 

1. I created keytab file for my current user i.e "cloudera"

2. The Jaas.conf should have that keytab, not the hdfs keytab.

3. I had to add this user, hdfs user in the yarn allowed hosts through CM. Also, I added solr user too there.

4. kinit the cloudera user

5. Put the required files in the /user/cloudera directory in hdfs.

6. Run the job.

View solution in original post

3 REPLIES 3

avatar
Champion

@anis016

 

I have a question on your command. 

 

Can we execute "hadoop jar <jar>" command and environment variable setup command (HADOOP_OPTS) as a single command? I never tried that before... not sure I am missing something here

 

Becuase usually environement variable setup command will be prefixed with the keyword 'export' ... I don't know this will fix your issue but you can try to execute them separately as mentioned below.. it may help

 

 

 

export HADOOP_OPTS="-Djava.security.auth.login.config=/etc/solr/conf/jass.conf"

 

hadoop jar /usr/lib/solr/contrib/mr/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool \
  -D 'mapred.child.java.opts=-Xmx2048m' \
  --log4j /usr/share/doc/search-1.0.0+cdh5.10.0+0/examples/solr-nrt/log4j.properties \
  --morphline-file /home/cloudera/workspace/solr_home/morphlines/csv_morphline.conf \
  --output-dir hdfs://quickstart.cloudera:8020/user/hdfs/test_output \
  --verbose \
  --go-live \
  --zk-host quickstart.cloudera:2181/solr \
  --collection csv_collection \
  hdfs://quickstart.cloudera:8020/user/hdfs/test_input/books.cs

 

avatar
Contributor

Hello @saranvisa

 

Thanks for your reply. I was following this link: https://www.cloudera.com/documentation/enterprise/latest/topics/search_using_kerberos.html

 

 

  • The MapReduceIndexerTool
    The MapReduceIndexerTool uses SolrJ to pass the JAAS configuration file. Using the MapReduceIndexerTool in a secure environment requires the use of the HADOOP_OPTS variable to specify the JAAS configuration file. For example, you might issue a command such as the following:
    HADOOP_OPTS="-Djava.security.auth.login.config=/home/user/jaas.conf" \
    hadoop jar MapReduceIndexerTool

     

    BTW, I have a confusion about "jaas.conf" file. What is my understanding is that this "jaas.conf", is for the hdfs.keytab looks like below:

     

     

    Client {
      com.sun.security.auth.module.Krb5LoginModule required
      useKeyTab=true
      keyTab="/etc/hadoop/conf/hdfs.keytab"
      storeKey=true
      useTicketCache=false
      debug=true
      principal="hdfs/quickstart.cloudera@CLOUDERA";
    };
    is my understadning correct ?

     

    -- Thanks

 

avatar
Contributor

I was able to solve this problem. I was getting the concept wrong. So, what I did was --

 

1. I created keytab file for my current user i.e "cloudera"

2. The Jaas.conf should have that keytab, not the hdfs keytab.

3. I had to add this user, hdfs user in the yarn allowed hosts through CM. Also, I added solr user too there.

4. kinit the cloudera user

5. Put the required files in the /user/cloudera directory in hdfs.

6. Run the job.