About zack_riesland

zack_riesland · ‎08-29-2016

Thanks Josh. I'm trying that now...

zack_riesland · ‎08-29-2016

Our cluster recently had some issue related to network outages. When all the dust settled, Hbase eventually "healed" itself, and almost everything is back to working well, with a couple of exceptions. In particular, we have one table where almost every query times out - which was never the case before. It's very small compared to most of our other tables at around 400 million rows. (Clarification: we query via JDBC via Phoenix) When I look at the GUI tools (like http://<my server>:16010/master-status#storeStats) it shows '1' under "offline regions" for that table (it has 33 total regions). Almost all the other tables show '0'. Can anyone help me troubleshoot this? I know there is a CLI tool for fixing HBase issues. I'm wondering whether that "offline region" is the cause of these timeouts. If not, how I can I figure it out? Thanks!

zack_riesland · ‎06-20-2016

Chris, I added this fs.s3.buffer.dir property under "custom hdfs-site" under the hdfs properties in Ambari - the same place where I added my aws credentials (which are working). But it doesn't appear to be "Sticking". I pointed the property at "/home/s3_temp", which I created on the edge node where I'm testing the distcp tool. But I never see data in there, and my uploads continue to fail with the same errors as before. Any ideas? cc @Chris Nauroth

zack_riesland · ‎06-20-2016

In your example, do you know how I would filter everything but the count? So instead of 83/83 (100%) Done 'COUNT(1)' '87' I would only print 87

zack_riesland · ‎06-20-2016

Perfect! Thanks!

zack_riesland · ‎06-20-2016

I know how to accomplish this with Java, but I'm wondering if there is a simpler way using some sort of CLI client and piping. I just want to run a set query, and write the results to a text file. And I want to be able to execute this from a shell script. Any suggestions?

zack_riesland · ‎06-16-2016

Fantastic. Thanks!

zack_riesland · ‎06-16-2016

Thanks @Chris Nauroth After some experimentation, DistCp seems interesting. But I'm noticing a huge failure rate on my mappers. Everything eventually succeeds, but usually only after several failed attempts - even for relatively small batches of data. The error stack is below. "No space available in any of the local directories." This is confusing because the edge node (where I'm running the distcp command) and all the data nodes have plenty of disk space. I'm guessing that it's perhaps a permissions-related issue trying to access some temporary storage? Any ideas? Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: No space available in any of the local directories. at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:366) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:416) at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:198) at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.newBackupFile(NativeS3FileSystem.java:263) at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.<init>(NativeS3FileSystem.java:245) at org.apache.hadoop.fs.s3native.NativeS3FileSystem.create(NativeS3FileSystem.java:412) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:986) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:174) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:123) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99) at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) ... 11 more 2016-06-16 14:48:29,841 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: File copy failed: hdfs://surus/apps/hive/warehouse/fma2_v12_featuredata.db/eea_iperl/000000_0 --> s3n://sensus-device-analytics/HDFS_To_S3_Testing/distcp1/eea_iperl/000000_0 at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:285) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:253) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://surus/apps/hive/warehouse/fma2_v12_featuredata.db/eea_iperl/000000_0 to s3n://sensus-device-analytics/HDFS_To_S3_Testing/distcp1/eea_iperl/000000_0 at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:281) ... 10 more

zack_riesland · ‎06-15-2016

Thanks @Chris Nauroth Does DistCp support any kind of configuration, for example - to limit the amount of bandwidth used?

zack_riesland · ‎06-15-2016

Thanks @Chris Nauroth. I'll play with DistCp. One clarification (I'm brand new when it comes to S3, so this might be dumb): Suppose I have Hive table X that is stored as compressed ORC files. To use DistCp, I suppose I would point at the raw data: /apps/hive/warehouse/db_name.db/table_name But this will copy the compressed, ORC formatted data, correct? Suppose someone wanted to use that data as a Hive table in EMR. Could they also use DistCp and pull it to a cluster, and then create a table over top, specifying the same metadata and then just use the data? Is there a straightforward way to copy, say, CSV data from table X to S3?

Online	Offline
Last Visited	‎06-10-2019 05:13 PM

Member Since	‎02-04-2016 01:07 PM
Last Visited	‎06-10-2019 05:13 PM
Posts	189
Kudos received	70

Cloudera Community

Re: Help with spark partition syntax (scala)

Re: Can I control naming patterns for HDFS chunks

Re: How to connect to Spark2 Thrift Server via JDB...

Re: Hive: Convert int timestamp to date

Re: How to clear temp data from dataflow / nifi?

Re: How to fix "offline regions" in HBase

How to fix "offline regions" in HBase

Re: Tips for optimizing export to S3(n) ?

Re: Can I write a script to perform a phoenix quer...

Re: Can I write a script to perform a phoenix quer...

Can I write a script to perform a phoenix query an...

Re: Tips for optimizing export to S3(n) ?

Re: Tips for optimizing export to S3(n) ?

Re: Tips for optimizing export to S3(n) ?

Re: Tips for optimizing export to S3(n) ?