Member since
07-31-2013
1924
Posts
462
Kudos Received
311
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1543 | 07-09-2019 12:53 AM | |
9292 | 06-23-2019 08:37 PM | |
8050 | 06-18-2019 11:28 PM | |
8676 | 05-23-2019 08:46 PM | |
3473 | 05-20-2019 01:14 AM |
07-31-2019
04:26 AM
Done keyword is missing in your script. In your loop body use done at last.
... View more
07-09-2019
12:53 AM
2 Kudos
Yes that is correct, and the motivations/steps-to-use are reflected here too: https://www.cloudera.com/documentation/enterprise/6/latest/topics/cm_s3guard.html Note: On your point of 'load data from S3 into HDFS', it is better stated as simply 'read data from S3', where HDFS gets used as a transient storage (where/when required). There does not need to be a 'download X GiB data from S3 to HDFS first, only then begin jobs' step, as distributed jobs can read off of S3 via s3a:// URLs in the same way they do from HDFS hdfs://.
... View more
06-23-2019
08:37 PM
1 Kudo
This looks like a case of edit logs getting reordered. As @bgooley noted, it is similar to HDFS-12369, where the OP_CLOSE is appearing after OP_DELETE causing the file to be absent when replaying the edits. The simplest fix, depending on if this is the only file instance of the reordered issue in your edit logs, would be to run the NameNode manually in an edits-recovery mode and "skip" this edit when it catches the error. The rest of the edits should apply normally and let you start up your NameNode. The recovery mode of NameNode is detailed at https://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/ If you're using CM, you'll need to use the NameNode's most recent generated configuration directory under /var/run/cloudera-scm-agent/process/ on the NameNode host as the HADOOP_CONF_DIR, while logged in as 'hdfs' user, before invoking the manual NameNode startup command. Once you've followed the prompts and the NameNode appears to start up, quit out/kill it to restart from Cloudera Manager normally. If you have a Support subscription, I'd recommend filing a case for this, as the process could get more involved depending on how widespread this issue is.
... View more
06-06-2019
11:09 PM
I added above values and that was causing https to shutdown. After deleting those values , it started and working fine now. Thanks @Harsh J for your reply.
... View more
06-05-2019
03:01 AM
curl -X PUT -L --anyauth -u : -b cookie.jar "http://httpfs_ip:14000/webhdfs/v1/user/file.csv?op=CREATE&data=true&user.name=hdfs" --header "Content-Type:application/octet-stream" --header "Transfer-Encoding:chunked" -T "file.csv" Just replace httpfs_ip and file.csv
... View more
06-03-2019
07:33 PM
Please follow the entire discussion above - the parameter is an advanced one and has no direct field. You'll need to use the safety valve to apply it by using the property name directly. P.s. It is better etiquette to open a new topic than bump ancient ones.
... View more
06-03-2019
12:27 AM
1 Kudo
Hello Harsh, Thank you for the help on this. I was able to identify some information that helped here. Will come back in case need further help. Will accept your reply as Solution. 🙂 Thanks snm1523
... View more
05-27-2019
08:47 AM
In a mapper job, we can do the following as well. Basically it follows the same pattern of CF renaming. public class RowKeyRenameImporter extends TableMapper<ImmutableBytesWritable, Mutation> {
private static final Log LOG = LogFactory.getLog(RowKeyRenameImporter.class);
public final static String WAL_DURABILITY = "import.wal.durability";
public final static String ROWKEY_RENAME_IMPL = "row.key.rename";
private List<UUID> clusterIds;
private Durability durability;
private RowKeyRename rowkeyRenameImpl;
/**
* @param row The current table row key.
* @param value The columns.
* @param context The current context.
* @throws IOException When something is broken with the data.
*/
@Override
public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException {
try {
writeResult(row, value, context);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
private void writeResult(ImmutableBytesWritable key, Result result, Context context)
throws IOException, InterruptedException {
Put put = null;
if (LOG.isTraceEnabled()) {
LOG.trace("Considering the row." + Bytes.toString(key.get(), key.getOffset(), key.getLength()));
}
processKV(key, result, context, put);
}
protected void processKV(ImmutableBytesWritable key, Result result, Context context, Put put)
throws IOException, InterruptedException {
LOG.info("Renaming the row " + key.toString());
ImmutableBytesWritable renameRowKey = rowkeyRenameImpl.rowKeyRename(key);
for (Cell kv : result.rawCells()) {
if (put == null) {
put = new Put(renameRowKey.get());
}
Cell renamedKV = convertKv(kv, renameRowKey);
addPutToKv(put, renamedKV);
if (put != null) {
if (durability != null) {
put.setDurability(durability);
}
put.setClusterIds(clusterIds);
context.write(key, put);
}
}
}
// helper: create a new KeyValue based on renaming of row Key
private static Cell convertKv(Cell kv, ImmutableBytesWritable renameRowKey) {
byte[] newCfName = CellUtil.cloneFamily(kv);
kv = new KeyValue(renameRowKey.get(), // row buffer
renameRowKey.getOffset(), // row offset
renameRowKey.getLength(), // row length
newCfName, // CF buffer
0, // CF offset
kv.getFamilyLength(), // CF length
kv.getQualifierArray(), // qualifier buffer
kv.getQualifierOffset(), // qualifier offset
kv.getQualifierLength(), // qualifier length
kv.getTimestamp(), // timestamp
KeyValue.Type.codeToType(kv.getTypeByte()), // KV Type
kv.getValueArray(), // value buffer
kv.getValueOffset(), // value offset
kv.getValueLength()); // value length
return kv;
}
protected void addPutToKv(Put put, Cell kv) throws IOException {
put.add(kv);
}
... View more
05-23-2019
08:46 PM
1 Kudo
For HBase MOBs, this can serve as a good starting point as most of the changes are administrative and the writer API remains the same as regular cells: https://www.cloudera.com/documentation/enterprise/latest/topics/admin_hbase_mob.html For SequenceFiles, a good short snippet can be found here: https://github.com/sakserv/sequencefile-examples/blob/master/test/main/java/com/github/sakserv/sequencefile/SequenceFileTest.java#L65-L70 and for Parquet: https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/example/ExampleParquetWriter.java More general reading for the file formats: https://blog.cloudera.com/blog/2011/01/hadoop-io-sequence-map-set-array-bloommap-files/ and https://parquet.apache.org/documentation/latest/
... View more