Member since
09-15-2015
457
Posts
507
Kudos Received
90
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
15659 | 11-01-2016 08:16 AM | |
11080 | 11-01-2016 07:45 AM | |
8555 | 10-25-2016 09:50 AM | |
1917 | 10-21-2016 03:50 AM | |
3821 | 10-14-2016 03:12 PM |
11-24-2015
07:03 AM
2 Kudos
Ambari automatically overwrites these configurations when the services are restarted, however you can set the JobHistory-Server Heapsize directly through Ambari. Select MapReduce2 service Select "Configs" tab Select "Advanced" tab The Group "History Server" contains the configuration "jobhistory_heapsize"
... View more
11-23-2015
09:35 PM
1 Kudo
@Brad Bukacek Jr You could assign a default query to the user that is executing the importtsv statement. What Scheduler are you using? Capacity or Fair Scheduler? Here is the configuration for the capacity scheduler: yarn.scheduler.capacity.queue-mappings This configuration specifies the mapping of user or group to aspecific queue. You can map a single user or a list of users to queues.
Syntax: [u or g]:[name]:[queue_name][,next_mapping]*. Here, u or g indicates whether the mapping is for a user or group. The value is u for user and g for group. name indicates the user name or group name. To specify the user who has submitted the application, %user can be used. queue_name indicates the queue name for which the application has to be mapped. To specify queue name same as user name, %user can be used. To specify queue name same as the name of the primary group for which the user belongs to, %primary_group can be used. Example: <property>
<name>yarn.scheduler.capacity.queue-mappings</name>
<value>u:user1:queue1,g:group1:queue2,u:%user:%user,u:user2:%primary_group</value>
<description>
Here, <user1> is mapped to <queue1>, <group1> is mapped to <queue2>,
maps users to queues with the same name as user, <user2> is mapped
to queue name same as <primary group> respectively. The mappings will be
evaluated from left to right, and the first valid mapping will be used.
</description>
</property>
... View more
11-23-2015
08:32 PM
5 Kudos
I recently ran into a situation where I had enabled HDFS HA and later had to change the value of dfs.nameservices. So basically during HA setup I set the value for dfs.nameservices to "MyHorton", but a couple hours later realized I should have used "MyCluster" instead. This article explains how you can change the dfs.nameservices value after HDFS HA has been enabled already. Background: What is the purpose of dfs.nameservices?
Its the logical name of your HDFS nameservice. Its important to remember that there are several configuration parameters that have a key, which includes the actual value of dfs.nameservices, e.g. dfs.namenode.rpc-address.[nameservice id].nn1 Preparation:
Put your HDFS in safemode and backup the namespace (https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfsadmin; dfsadmin -safemode enter; dfsadmin -saveNamespace); Stop Namenode service
Backup Hive Metastore (mysqldump hive > /tmp/mydir/backup_hive.sql) Change Configuration: You have to adjust the hdfs-site configuration. Change all configurations that contain the old nameservice id to the new nameservice id. In my case the new nameservice ID was "mycluster". fs.defaultFS=hdfs:://mycluster
dfs.nameservices=mycluster
dfs.namenode.shared.edits.dir=qjournal://horton03.cloud.hortonworks.com:8485;horton02.cloud.hortonworks.com:8485;horton01.cloud.hortonworks.com:8485/mycluster
dfs.client.failover.proxy.provider.mycluster=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.namenode.rpc-address.mycluster.nn2=horton02.cloud.hortonworks.com:8020
dfs.ha.namenodes.mycluster=nn1,nn2
dfs.namenode.rpc-address.mycluster.nn1=horton01.cloud.hortonworks.com:8020
dfs.namenode.http-address.mycluster.nn1=horton01.cloud.hortonworks.com:50070
dfs.namenode.http-address.mycluster.nn2=horton02.cloud.hortonworks.com:50070
dfs.namenode.https-address.mycluster.nn1=horton01.cloud.hortonworks.com:50470
dfs.namenode.https-address.mycluster.nn2=horton02.cloud.hortonworks.com:50470 Note: You can remove the configurations that include the old nameservice id (e.g. dfs.namenode.http-address.[old_nameservice_id].nn1) Reinit Journalnodes:
This is necessary because the shared edits directory includes the nameservice id. Please see, http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hadoop-ha/content/ha-nn-deploy-nn-cluster.html Change Hive FSRoot:
It might be necessary to change the Hive metadata after the above configuration changes. Check whether changes are necessary (as Hive-User): hive --service metatool -listFSRoot If you see any table that references the old nameservice id, you have to use the following commands to switch to the new nameservice id. Use the hive metatool to do a dry run (no actual change is made in this mode) of updating the table locations. hive --service metatool -updateLocation hdfs://mycluster hdfs://myhorton -dryRun If you are satisfied with the changes the metatool will make, run the command without the -dryRun option hive --service metatool -updateLocation hdfs://mycluster hdfs://myhorton Additional notes:
If you are using HBase you have to adjust additional configurations.
... View more
Labels:
11-23-2015
07:11 PM
Make sure this file => /etc/hbase/conf/hbase_client_jaas.conf is available on your Regionserver node. It is used to authenticate the regionserver. Content should look like this: Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=false
useTicketCache=true;
};
Do you have Kerberos enabled? @rxu
... View more
11-23-2015
06:45 PM
1 Kudo
I have an environment configured that is similar to yours (Hadoop cluster uses realm XYC.COM, but users can use XYC.COM, ABC.COM, ZET.COM). Users that have a valid Kerberos ticket can use the Storm or Oozie UI, which are secured with Spnego. What Kerberos version is this ? MIT KDC? Can you post your OS, Java, HDP version? thanks The error you are getting is related secret key (KVNO=Key version number) that is used to authenticate your user with the KDC and to obtain and encrypt the Kerberos tickets. A tag associated with encrypted data identifies which key was used for encryption when a long-lived key associated with a principal changes over time. It is used during the transition to a new key so that the party decrypting a message can tell whether the data was encrypted with the old or the new key. (RFC-4120) The error occurs because the key version of your ticket is different than the one on the KDC server. This happens for example when the user changes its password or a new secret key is generated for the service principals and the Keytab files contain the old KVNO. For example: User gets ticket from KDC with kvno=1 User changes password => KVNO is changed to kvno=2 KVNO change is picked up by the server Old User ticket is still valid because user machine was never restarted and the ticket cache never cleared Next access request to the server will fail since the key version numbers are different Possible solutions: Regenerate Keytabs Destroy user ticket and purge cache (reboot should clear cache)
... View more
11-20-2015
05:49 AM
You should definitely talk to @nmaillard he is developing a File-Notification Processor that is capable of doing that. I think it gets triggered when new files show up in HDFS (not sure about changes) and you have access to different file attributes.
... View more
11-19-2015
06:04 AM
I suspect this is a bug, I have seen a similar issue like yours in our internal Jira => javax.persistence.RollbackException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.4.2.v): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.BatchUpdateException: A truncation error was encountered trying to shrink VARCHAR '/bin/sh -c python /usr/hdp/share/hst/parallel-sh.py -n 'hst &' to length 255.
Error Code: 20000
I'd suggest you open a ticket with our support and you might want to mention the Jira ST-602, which is somewhat similar. @Paul Codding
... View more
11-18-2015
07:13 PM
1 Kudo
The put-method of Hbase's Table-class supports single and multiple put elements. So you can either do mytable.put(new Put(...)) or mytable.put(List<Put>) For example: String myFamily = 'f1';
String columnA = 'c1';
String valPrefix = 'blub';
String numRows = 500000;
String batchSize = 1000;
List<Put> puts = new ArrayList<Put>();
for(int row = 0; row < numRows; row++) {
String value = valPrefix + Integer.toString(row);
// create put
Put put = new Put(rowKeys[batch]);
put.add(Bytes.toBytes(myFamily), Bytes.toBytes(columnA), Bytes.toBytes(value));
// add to batch
puts.add(p);
if(puts.size() % batchSize == 0){
try {
myTable.put(puts);
myTable.flushCommits();
} catch (Exception e) {
e.printStackTrace();
}
puts.clear();
}
}
You can also use the batch-method. The only difference between batch and put-batch is that the batch-method accepts other actions as well, for example Gets. https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html void put(List<Put> puts) throws IOException Puts some data in the table, in batch.
This can be used for group commit, or for submitting user defined batches. The writeBuffer will be periodically inspected while the List is processed, so depending on the List size the writeBuffer may flush not at all, or more than once. void batch(List<? extends Row> actions, Object[] results) throws IOException, InterruptedException Method that does a batch call on Deletes, Gets, Puts, Increments and Appends. The ordering of execution of the actions is not defined. Meaning if you do a Put and a Get in the same batch(java.util.List<? extends org.apache.hadoop.hbase.client.Row>, java.lang.Object[]) call, you will not necessarily be guaranteed that the Get returns what the Put had put. Make sure you check out the section about "Writing to HBase" in the HBase book. It has some interesting information about batch writing/performance, e.g. turning off WAL (Write Ahead Log). In regards to the number of RPCCalls, have you considered the bulkloading capabilities of HBase (like saving files in HDFS and afterwards using bulk import to get the data into HBase)?
... View more
11-18-2015
02:53 PM
You're right that might be a bug and should be raised with the engineering/support team.
... View more
11-18-2015
02:45 PM
@Olivier Renault I usually do 🙂
... View more