About zack_riesland

zack_riesland · ‎09-15-2016

(I posted another nifi question here if anyone reading this has an answer: https://community.hortonworks.com/questions/56616/options-for-exporting-large-data-sets-from-hive-to.html)

zack_riesland · ‎09-15-2016

The key here for me is a shift in thinking. The SplitJSON processor "splits" my flow into X flows, based on the results of my query. And then I can run Y of them at a time. It's not a quite a loop (unless Y == 1), but it makes sense now.

zack_riesland · ‎09-12-2016

Thanks Bryan, I'll give this a try!

zack_riesland · ‎09-10-2016

I'm a total dataflow/nifi rookie. I'm trying to accomplish something like the following: Given a database table like this Customer_ID (varchar), DoA (boolean), DoB (boolean), DoC (boolean) I want to: 1) query the table (select *) 2) for each customer: 3a) if DoA, execute some steps (move some files around, etc) 3b) if DoB, execute some steps 3c) if DoC, execute some steps 4) Update some logs files, etc. I've been playing with some of the example templates here: https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates But I haven't found anything to show me how to accomplish step 2 above. Is it possible to work through a loop like this? In the nifi training class, the instructor said that this is a common use case, but I can't seem to find a template that looks like this. Can someone point me at an example to get me going?

zack_riesland · ‎09-08-2016

I recently realized that more than half of all our HDFS usage is under /tmp I wrote a script to go find all the data and it looks like the vast majority of it is under /tmp/hive/***, for example: /tmp/hive/root /tmp/hive/hdfs /tmp/hive/my_user These have tens of TB in each of them and quite a lot of it is very old. Is it safe to delete this data? Say, anything older than 30 days? Would 14 days be safe? Any best practices here? It seems odd that there is nothing built-in to maintain this space...

zack_riesland · ‎08-31-2016

Note, this works well, but I'd still like to know how to utilize the method mentioned above: hive -e 'set hive.cli.print.header=true; select * from my_table' | sed 's/[\t]/,/g' > /my/path/output_folder/output_file.csv

zack_riesland · ‎08-31-2016

I have a hive table that I want to export to a local .csv file. I tried this approach: <code>insert overwrite local directory '/my/local/file/path' row format delimited fields terminated by ',' select * from my_table; This puts a series of .deflate files in /my/local/file/path, but I want plain ol' .csv files. How do I accomplish this? I tried making a copy of the source table that is not compressed and is 'stored as textfile', but the output is still the same. Thanks.

zack_riesland · ‎08-29-2016

HDP 2.4.2: Hbase 1.1.2.2.4.2.0-258

zack_riesland · ‎08-29-2016

This is what I get in the logs of one of the region servers mentioned in the stack trace from the master: MY_BROKEN_TABLE/8a444fa1979524e97eb002ce8aa2d7aa/0/4f9a5c26ddb0413aa4eb64a869ab4a2c at org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591) at org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716) at org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:656) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199) at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1407) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1677) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1504) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:441) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.seekTo(HFileReaderV2.java:1249) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:267) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:169) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:363) at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:281) at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:243) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:342) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:88) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1216) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1890) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:525) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:562) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2016-08-29 05:36:42,959 INFO [regionserver/XXX009.network/<ip address>:16020-shortCompactions-1472143410679] hdfs.DFSClient: Access token was invalid when connecting to /<ip address>:50010 : org.apache.hadoop.security.token.SecretManager$InvalidToken: access control error while attempting to set up short-circuit access to /apps/hbase/data/data/default/<DB NAme>.MY_BROKEN_TABLE/8a444fa1979524e97eb002ce8aa2d7aa/0/4f9a5c26ddb0413aa4eb64a869ab4a2c

zack_riesland · ‎08-29-2016

I ran the tool and it moved the '1' from 'offline regions' to 'failed regions'. The output of hbck: Exception in thread "main" java.io.IOException: 2 region(s) could not be checked or repaired. The interesting piece of the hbase master log looks like this after a failed query: 2016-08-29 12:44:35,810 WARN [AM.ZK.Worker-pool2-t121] master.RegionStates: Failed to open/close a97029c18889b3b3168d11f910ef04ae on XXX009.network,16020,1472143382923, set to FAILED_OPEN 2016-08-29 12:44:35,900 WARN [AM.ZK.Worker-pool2-t106] master.RegionStates: Failed to open/close fad4e0e460099b5a0345b9ec354d0117 on XXX003.network,16020,1472143374416, set to FAILED_OPEN 2016-08-29 12:44:36,143 WARN [AM.ZK.Worker-pool2-t115] master.RegionStates: Failed to open/close 5ace750e16bcddf3ab29814da9a4f641 on XXX002.network,16020,1472143382124, set to FAILED_OPEN 2016-08-29 12:44:36,889 WARN [AM.ZK.Worker-pool2-t113] master.RegionStates: Failed to open/close a10e94e0a64a9b69a540603d6c9aee75 on XXX012.network,16020,1472143381417, set to FAILED_OPEN

Online	Offline
Last Visited	‎06-10-2019 05:13 PM

Member Since	‎02-04-2016 01:07 PM
Last Visited	‎06-10-2019 05:13 PM
Posts	189
Kudos received	70

Cloudera Community

Re: Help with spark partition syntax (scala)

Re: Can I control naming patterns for HDFS chunks

Re: How to connect to Spark2 Thrift Server via JDB...

Re: Hive: Convert int timestamp to date

Re: How to clear temp data from dataflow / nifi?

Re: Nifi/DataFlow example that loops through a lis...

Re: Nifi/DataFlow example that loops through a lis...

Re: Nifi/DataFlow example that loops through a lis...

Nifi/DataFlow example that loops through a list?

Safe to delete under /tmp in HDFS (how about /tmp/...

Re: How to get non-compressed .csv output from hiv...

How to get non-compressed .csv output from hive ta...

Re: How to fix "offline regions" in HBase

Re: How to fix "offline regions" in HBase

Re: How to fix "offline regions" in HBase