About jstraub

jstraub · ‎11-24-2015

Ambari automatically overwrites these configurations when the services are restarted, however you can set the JobHistory-Server Heapsize directly through Ambari. Select MapReduce2 service Select "Configs" tab Select "Advanced" tab The Group "History Server" contains the configuration "jobhistory_heapsize"

jstraub · ‎11-23-2015

@Brad Bukacek Jr You could assign a default query to the user that is executing the importtsv statement. What Scheduler are you using? Capacity or Fair Scheduler? Here is the configuration for the capacity scheduler: yarn.scheduler.capacity.queue-mappings This configuration specifies the mapping of user or group to aspecific queue. You can map a single user or a list of users to queues. Syntax: [u or g]:[name]:[queue_name][,next_mapping]*. Here, u or g indicates whether the mapping is for a user or group. The value is u for user and g for group. name indicates the user name or group name. To specify the user who has submitted the application, %user can be used. queue_name indicates the queue name for which the application has to be mapped. To specify queue name same as user name, %user can be used. To specify queue name same as the name of the primary group for which the user belongs to, %primary_group can be used. Example: <property> <name>yarn.scheduler.capacity.queue-mappings</name> <value>u:user1:queue1,g:group1:queue2,u:%user:%user,u:user2:%primary_group</value> <description> Here, <user1> is mapped to <queue1>, <group1> is mapped to <queue2>, maps users to queues with the same name as user, <user2> is mapped to queue name same as <primary group> respectively. The mappings will be evaluated from left to right, and the first valid mapping will be used. </description> </property>

jstraub · ‎11-23-2015

I recently ran into a situation where I had enabled HDFS HA and later had to change the value of dfs.nameservices. So basically during HA setup I set the value for dfs.nameservices to "MyHorton", but a couple hours later realized I should have used "MyCluster" instead. This article explains how you can change the dfs.nameservices value after HDFS HA has been enabled already. Background: What is the purpose of dfs.nameservices? Its the logical name of your HDFS nameservice. Its important to remember that there are several configuration parameters that have a key, which includes the actual value of dfs.nameservices, e.g. dfs.namenode.rpc-address.[nameservice id].nn1 Preparation: Put your HDFS in safemode and backup the namespace (https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfsadmin; dfsadmin -safemode enter; dfsadmin -saveNamespace); Stop Namenode service Backup Hive Metastore (mysqldump hive > /tmp/mydir/backup_hive.sql) Change Configuration: You have to adjust the hdfs-site configuration. Change all configurations that contain the old nameservice id to the new nameservice id. In my case the new nameservice ID was "mycluster". fs.defaultFS=hdfs:://mycluster dfs.nameservices=mycluster dfs.namenode.shared.edits.dir=qjournal://horton03.cloud.hortonworks.com:8485;horton02.cloud.hortonworks.com:8485;horton01.cloud.hortonworks.com:8485/mycluster dfs.client.failover.proxy.provider.mycluster=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider dfs.namenode.rpc-address.mycluster.nn2=horton02.cloud.hortonworks.com:8020 dfs.ha.namenodes.mycluster=nn1,nn2 dfs.namenode.rpc-address.mycluster.nn1=horton01.cloud.hortonworks.com:8020 dfs.namenode.http-address.mycluster.nn1=horton01.cloud.hortonworks.com:50070 dfs.namenode.http-address.mycluster.nn2=horton02.cloud.hortonworks.com:50070 dfs.namenode.https-address.mycluster.nn1=horton01.cloud.hortonworks.com:50470 dfs.namenode.https-address.mycluster.nn2=horton02.cloud.hortonworks.com:50470 Note: You can remove the configurations that include the old nameservice id (e.g. dfs.namenode.http-address.[old_nameservice_id].nn1) Reinit Journalnodes: This is necessary because the shared edits directory includes the nameservice id. Please see, http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hadoop-ha/content/ha-nn-deploy-nn-cluster.html Change Hive FSRoot: It might be necessary to change the Hive metadata after the above configuration changes. Check whether changes are necessary (as Hive-User): hive --service metatool -listFSRoot If you see any table that references the old nameservice id, you have to use the following commands to switch to the new nameservice id. Use the hive metatool to do a dry run (no actual change is made in this mode) of updating the table locations. hive --service metatool -updateLocation hdfs://mycluster hdfs://myhorton -dryRun If you are satisfied with the changes the metatool will make, run the command without the -dryRun option hive --service metatool -updateLocation hdfs://mycluster hdfs://myhorton Additional notes: If you are using HBase you have to adjust additional configurations.

jstraub · ‎11-23-2015

Make sure this file => /etc/hbase/conf/hbase_client_jaas.conf is available on your Regionserver node. It is used to authenticate the regionserver. Content should look like this: Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=false useTicketCache=true; }; Do you have Kerberos enabled? @rxu

jstraub · ‎11-23-2015

I have an environment configured that is similar to yours (Hadoop cluster uses realm XYC.COM, but users can use XYC.COM, ABC.COM, ZET.COM). Users that have a valid Kerberos ticket can use the Storm or Oozie UI, which are secured with Spnego. What Kerberos version is this ? MIT KDC? Can you post your OS, Java, HDP version? thanks The error you are getting is related secret key (KVNO=Key version number) that is used to authenticate your user with the KDC and to obtain and encrypt the Kerberos tickets. A tag associated with encrypted data identifies which key was used for encryption when a long-lived key associated with a principal changes over time. It is used during the transition to a new key so that the party decrypting a message can tell whether the data was encrypted with the old or the new key. (RFC-4120) The error occurs because the key version of your ticket is different than the one on the KDC server. This happens for example when the user changes its password or a new secret key is generated for the service principals and the Keytab files contain the old KVNO. For example: User gets ticket from KDC with kvno=1 User changes password => KVNO is changed to kvno=2 KVNO change is picked up by the server Old User ticket is still valid because user machine was never restarted and the ticket cache never cleared Next access request to the server will fail since the key version numbers are different Possible solutions: Regenerate Keytabs Destroy user ticket and purge cache (reboot should clear cache)

jstraub · ‎11-20-2015

You should definitely talk to @nmaillard he is developing a File-Notification Processor that is capable of doing that. I think it gets triggered when new files show up in HDFS (not sure about changes) and you have access to different file attributes.

jstraub · ‎11-19-2015

I suspect this is a bug, I have seen a similar issue like yours in our internal Jira => javax.persistence.RollbackException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.4.2.v): org.eclipse.persistence.exceptions.DatabaseException Internal Exception: java.sql.BatchUpdateException: A truncation error was encountered trying to shrink VARCHAR '/bin/sh -c python /usr/hdp/share/hst/parallel-sh.py -n 'hst &' to length 255. Error Code: 20000 I'd suggest you open a ticket with our support and you might want to mention the Jira ST-602, which is somewhat similar. @Paul Codding

jstraub · ‎11-18-2015

The put-method of Hbase's Table-class supports single and multiple put elements. So you can either do mytable.put(new Put(...)) or mytable.put(List<Put>) For example: String myFamily = 'f1'; String columnA = 'c1'; String valPrefix = 'blub'; String numRows = 500000; String batchSize = 1000; List<Put> puts = new ArrayList<Put>(); for(int row = 0; row < numRows; row++) { String value = valPrefix + Integer.toString(row); // create put Put put = new Put(rowKeys[batch]); put.add(Bytes.toBytes(myFamily), Bytes.toBytes(columnA), Bytes.toBytes(value)); // add to batch puts.add(p); if(puts.size() % batchSize == 0){ try { myTable.put(puts); myTable.flushCommits(); } catch (Exception e) { e.printStackTrace(); } puts.clear(); } } You can also use the batch-method. The only difference between batch and put-batch is that the batch-method accepts other actions as well, for example Gets. https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html void put(List<Put> puts) throws IOException Puts some data in the table, in batch. This can be used for group commit, or for submitting user defined batches. The writeBuffer will be periodically inspected while the List is processed, so depending on the List size the writeBuffer may flush not at all, or more than once. void batch(List<? extends Row> actions, Object[] results) throws IOException, InterruptedException Method that does a batch call on Deletes, Gets, Puts, Increments and Appends. The ordering of execution of the actions is not defined. Meaning if you do a Put and a Get in the same batch(java.util.List<? extends org.apache.hadoop.hbase.client.Row>, java.lang.Object[]) call, you will not necessarily be guaranteed that the Get returns what the Put had put. Make sure you check out the section about "Writing to HBase" in the HBase book. It has some interesting information about batch writing/performance, e.g. turning off WAL (Write Ahead Log). In regards to the number of RPCCalls, have you considered the bulkloading capabilities of HBase (like saving files in HDFS and afterwards using bulk import to get the data into HBase)?

jstraub · ‎11-18-2015

You're right that might be a bug and should be raised with the engineering/support team.

jstraub · ‎11-18-2015

@Olivier Renault I usually do 🙂

Online	Offline
Last Visited	‎08-18-2019 08:21 AM

Member Since	‎09-15-2015 02:21 PM
Last Visited	‎08-18-2019 08:21 AM
Posts	457
Kudos received	472

Cloudera Community

Re: NiFi: How do I see the flowfile attributes nam...

Re: NiFi: JSON Array split

Re: Securing Solr with Ranger ERROR 500

Re: Is Ambari Infra open source?

Re: After disabling kerberos , ZKfailover not comi...

Re: How can I increase JobHistoryServer heap size ...

Re: Importtsv tool - Specify a YARN Queue?

Changing dfs.nameservices value after HDFS HA has ...

Re: Decommission a regionserver, got complaint "hb...

Re: Enabling Oozie and Storm Web UI using Cross Re...

Re: File Watcher scenario in HDF

Re: SmartSense HST Agents dying

Re: Status of Grouping Puts by RegionServer in HBa...

Re: Ambari shows the Zookeeper quorum as down but ...

Re: Ambari shows the Zookeeper quorum as down but ...