Member since
02-04-2016
189
Posts
70
Kudos Received
9
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3746 | 07-12-2018 01:58 PM | |
7849 | 03-08-2018 10:44 AM | |
3757 | 06-24-2017 11:18 AM | |
23285 | 02-10-2017 04:54 PM | |
2287 | 01-19-2017 01:41 PM |
08-29-2016
04:43 PM
Thanks Josh. I'm trying that now...
... View more
08-29-2016
04:36 PM
Our cluster recently had some issue related to network outages. When all the dust settled, Hbase eventually "healed" itself, and almost everything is back to working well, with a couple of exceptions. In particular, we have one table where almost every query times out - which was never the case before. It's very small compared to most of our other tables at around 400 million rows. (Clarification: we query via JDBC via Phoenix) When I look at the GUI tools (like http://<my server>:16010/master-status#storeStats) it shows '1' under "offline regions" for that table (it has 33 total regions). Almost all the other tables show '0'. Can anyone help me troubleshoot this? I know there is a CLI tool for fixing HBase issues. I'm wondering whether that "offline region" is the cause of these timeouts. If not, how I can I figure it out? Thanks!
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Phoenix
06-20-2016
07:33 PM
Chris, I added this fs.s3.buffer.dir property under "custom hdfs-site" under the hdfs properties in Ambari - the same place where I added my aws credentials (which are working). But it doesn't appear to be "Sticking". I pointed the property at "/home/s3_temp", which I created on the edge node where I'm testing the distcp tool. But I never see data in there, and my uploads continue to fail with the same errors as before. Any ideas? cc @Chris Nauroth
... View more
06-20-2016
06:32 PM
In your example, do you know how I would filter everything but the count? So instead of 83/83 (100%) Done 'COUNT(1)' '87' I would only print 87
... View more
06-20-2016
06:10 PM
Perfect! Thanks!
... View more
06-20-2016
05:20 PM
I know how to accomplish this with Java, but I'm wondering if there is a simpler way using some sort of CLI client and piping. I just want to run a set query, and write the results to a text file. And I want to be able to execute this from a shell script. Any suggestions?
... View more
Labels:
- Labels:
-
Apache Phoenix
06-16-2016
10:28 PM
1 Kudo
Fantastic. Thanks!
... View more
06-16-2016
06:52 PM
1 Kudo
Thanks @Chris Nauroth After some experimentation, DistCp seems interesting. But I'm noticing a huge failure rate on my mappers. Everything eventually succeeds, but usually only after several failed attempts - even for relatively small batches of data. The error stack is below. "No space available in any of the local directories." This is confusing because the edge node (where I'm running the distcp command) and all the data nodes have plenty of disk space. I'm guessing that it's perhaps a permissions-related issue trying to access some temporary storage? Any ideas? Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: No space available in any of the local directories.
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:366)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:416)
at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:198)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.newBackupFile(NativeS3FileSystem.java:263)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.<init>(NativeS3FileSystem.java:245)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.create(NativeS3FileSystem.java:412)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:986)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:174)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:123)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99)
at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
... 11 more
2016-06-16 14:48:29,841 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: File copy failed: hdfs://surus/apps/hive/warehouse/fma2_v12_featuredata.db/eea_iperl/000000_0 --> s3n://sensus-device-analytics/HDFS_To_S3_Testing/distcp1/eea_iperl/000000_0
at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:285)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:253)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://surus/apps/hive/warehouse/fma2_v12_featuredata.db/eea_iperl/000000_0 to s3n://sensus-device-analytics/HDFS_To_S3_Testing/distcp1/eea_iperl/000000_0
at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:281)
... 10 more
... View more
06-15-2016
09:24 PM
Thanks @Chris Nauroth Does DistCp support any kind of configuration, for example - to limit the amount of bandwidth used?
... View more
06-15-2016
06:57 PM
Thanks @Chris Nauroth. I'll play with DistCp. One clarification (I'm brand new when it comes to S3, so this might be dumb): Suppose I have Hive table X that is stored as compressed ORC files. To use DistCp, I suppose I would point at the raw data: /apps/hive/warehouse/db_name.db/table_name But this will copy the compressed, ORC formatted data, correct? Suppose someone wanted to use that data as a Hive table in EMR. Could they also use DistCp and pull it to a cluster, and then create a table over top, specifying the same metadata and then just use the data? Is there a straightforward way to copy, say, CSV data from table X to S3?
... View more