Member since
02-04-2016
189
Posts
70
Kudos Received
9
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3747 | 07-12-2018 01:58 PM | |
7850 | 03-08-2018 10:44 AM | |
3759 | 06-24-2017 11:18 AM | |
23285 | 02-10-2017 04:54 PM | |
2288 | 01-19-2017 01:41 PM |
09-15-2016
12:53 PM
(I posted another nifi question here if anyone reading this has an answer: https://community.hortonworks.com/questions/56616/options-for-exporting-large-data-sets-from-hive-to.html)
... View more
09-15-2016
12:52 PM
The key here for me is a shift in thinking. The SplitJSON processor "splits" my flow into X flows, based on the results of my query. And then I can run Y of them at a time. It's not a quite a loop (unless Y == 1), but it makes sense now.
... View more
09-12-2016
09:28 AM
Thanks Bryan, I'll give this a try!
... View more
09-10-2016
01:44 PM
I'm a total dataflow/nifi rookie. I'm trying to accomplish something like the following: Given a database table like this Customer_ID (varchar), DoA (boolean), DoB (boolean), DoC (boolean) I want to: 1) query the table (select *) 2) for each customer: 3a) if DoA, execute some steps (move some files around, etc) 3b) if DoB, execute some steps 3c) if DoC, execute some steps 4) Update some logs files, etc. I've been playing with some of the example templates here: https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates But I haven't found anything to show me how to accomplish step 2 above. Is it possible to work through a loop like this? In the nifi training class, the instructor said that this is a common use case, but I can't seem to find a template that looks like this. Can someone point me at an example to get me going?
... View more
Labels:
- Labels:
-
Apache NiFi
09-08-2016
11:51 AM
1 Kudo
I recently realized that more than half of all our HDFS usage is under /tmp I wrote a script to go find all the data and it looks like the vast majority of it is under /tmp/hive/***, for example: /tmp/hive/root /tmp/hive/hdfs /tmp/hive/my_user These have tens of TB in each of them and quite a lot of it is very old. Is it safe to delete this data? Say, anything older than 30 days? Would 14 days be safe? Any best practices here? It seems odd that there is nothing built-in to maintain this space...
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
08-31-2016
04:44 PM
Note, this works well, but I'd still like to know how to utilize the method mentioned above: hive -e 'set hive.cli.print.header=true; select * from my_table' | sed 's/[\t]/,/g' > /my/path/output_folder/output_file.csv
... View more
08-31-2016
04:27 PM
I have a hive table that I want to export to a local .csv file.
I tried this approach:
<code>insert overwrite local directory '/my/local/file/path' row format delimited fields terminated by ',' select * from my_table;
This puts a series of .deflate files in /my/local/file/path, but I want plain ol' .csv files. How do I accomplish this? I tried making a copy of the source table that is not compressed and is 'stored as textfile', but the output is still the same. Thanks.
... View more
Labels:
- Labels:
-
Apache Hive
08-29-2016
05:21 PM
HDP 2.4.2: Hbase 1.1.2.2.4.2.0-258
... View more
08-29-2016
04:59 PM
This is what I get in the logs of one of the region servers mentioned in the stack trace from the master: MY_BROKEN_TABLE/8a444fa1979524e97eb002ce8aa2d7aa/0/4f9a5c26ddb0413aa4eb64a869ab4a2c
at org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591)
at org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
at org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:656)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1407)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1677)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1504)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:441)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.seekTo(HFileReaderV2.java:1249)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:267)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:169)
at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:363)
at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:281)
at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:243)
at org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:342)
at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:88)
at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1216)
at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1890)
at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:525)
at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:562)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2016-08-29 05:36:42,959 INFO [regionserver/XXX009.network/<ip address>:16020-shortCompactions-1472143410679] hdfs.DFSClient: Access token was invalid when connecting to /<ip address>:50010 : org.apache.hadoop.security.token.SecretManager$InvalidToken: access control error while attempting to set up short-circuit access to /apps/hbase/data/data/default/<DB NAme>.MY_BROKEN_TABLE/8a444fa1979524e97eb002ce8aa2d7aa/0/4f9a5c26ddb0413aa4eb64a869ab4a2c
... View more
08-29-2016
04:53 PM
I ran the tool and it moved the '1' from 'offline regions' to 'failed regions'.
The output of hbck: Exception in thread "main" java.io.IOException: 2 region(s) could not be checked or repaired.
The interesting piece of the hbase master log looks like this after a failed query:
2016-08-29 12:44:35,810 WARN [AM.ZK.Worker-pool2-t121] master.RegionStates: Failed to open/close a97029c18889b3b3168d11f910ef04ae on XXX009.network,16020,1472143382923, set to FAILED_OPEN
2016-08-29 12:44:35,900 WARN [AM.ZK.Worker-pool2-t106] master.RegionStates: Failed to open/close fad4e0e460099b5a0345b9ec354d0117 on XXX003.network,16020,1472143374416, set to FAILED_OPEN
2016-08-29 12:44:36,143 WARN [AM.ZK.Worker-pool2-t115] master.RegionStates: Failed to open/close 5ace750e16bcddf3ab29814da9a4f641 on XXX002.network,16020,1472143382124, set to FAILED_OPEN
2016-08-29 12:44:36,889 WARN [AM.ZK.Worker-pool2-t113] master.RegionStates: Failed to open/close a10e94e0a64a9b69a540603d6c9aee75 on XXX012.network,16020,1472143381417, set to FAILED_OPEN
... View more