Member since
05-01-2017
11
Posts
0
Kudos Received
0
Solutions
06-03-2019
01:45 AM
Because I don't think it works the way my supervisor think it works. We're taking in a series of about 8 csv files from an FTP and these files are rather small (under 1MB). He's (rightfully) concerned that cluster size on HDFS is going to be wasted. So he wants to use the Merge Content processor to resolve this. He seems to believe that the Merge Content processor will 'collate' files with the same name, making a bigger single file. To clarify: The way he wants it to work is if today's "sales_report.csv" comes in and there's already a "sales_report.csv" existing in the directory, he wants the new data from today's "sales_report.csv" to be added as new rows to the existing file. I hope that makes sense. Instead, I'm getting very different results. I have the flow set up so that it picks the files up from the FTP, creates a directory on HDFS based on the folder, and then a subfolder based on the year. When I leave the MC processor out, this all works perfectly. When I put the MC processor in, I get three files - one of them has its original name and two of them have a long string of random characters. We're using the default settings for the Merge Content processor. Sorry, I've written a bit of an essay. But based on what I've described above it does it sound like the MC processor is what he's looking for?
... View more
Labels:
02-13-2019
01:25 AM
Just an additional note that I tried an expression like ${now():plus(1000)} but it's expecting a boolean expression. Is there perhaps some way to put this into a boolean format?
... View more
02-08-2019
07:48 PM
In Nifi I have a processor that needs to place some parquet files into an HDFS directory. The idea is that if it fails, it should wait one second, then try again. If that fails, it will wait another second then try again. Finally if there is one more failure, it will email an admin. I configured it exactly as suggested in this article: https://kisstechdocs.wordpress.com/2015/01/15/creating-a-limited-failure-loop-in-nifi/ However, this will only retry the process - it won't wait before doing the next step. Is there a way to configure this so that steps 2 and 3 will wait one second before proceeding?
... View more
Labels:
01-31-2019
08:39 PM
I have Nifi taking a number of json files from Kafka. I set up a "ConsumeKafka" to start the process. Now I need to read the json file's "creationDate" field, which is a normal UNIX timestamp (i.e., 2019-01-19T04:34:28.527722+00:00). It then needs to take part of that date and format it so it matches a corresponding HDFS directory, so the files can go into a directory that matches the date - i.e., the directories are named "01-19-2019", "01-20-2019", "01-21-2019", etc. I was thinking of setting up an "EvaluateJsonPath" processor with a property of "creationDate" and a value of "$.creationDate". Then there would be an "UpdateAttribute" processor with a property of "creationDate" and a value of "${creationDate('yyyy/MM/dd HH:mm:ss:SSS'Z'):format('MM-dd-yyyy')}" Finally there would be a "PutHDFS" processor with a directory of "/${creationDate}". I'm not sure about the expression (or if this is going to work the way I think it is). Especially because the directory format doesn't quite match the timestamp format. Maybe the directory names can change, but I'm going to assume that they can't. Any thoughts on how I can make this work?
... View more
Labels:
05-16-2018
10:07 PM
That seems to have taken care of it, I can now create a cluster. The /staging directory needs to have permissions of 777 whereas I think the parent folder needs to be 755. Thanks all for your assistance!
... View more
05-16-2018
08:27 PM
Thank you @Samrat Kompella The error I get now is the following: https://imgur.com/5j9Der4 I'm not sure if it wants it back to 777 or not.
... View more
05-16-2018
07:41 PM
Thank you for your reply @Geoffrey Shelton Okot Unfortunately, the same error still occurs when trying to create a cluster. I also restarted the Falcon service, just to be safe. On the summary page, it says the following: Access Control List Owner: root Group: users Permission: 0x755 I'm not sure if that helps. Also, the readout for -ls is as follows: [hdfs@sandbox-hdp root]$ hdfs dfs -ls /apps/falcon/SourceCluster Found 2 items drwxrwxrwx - falcon hdfs 0 2018-05-10 03:41 /apps/falcon/SourceCluster/staging drwxrwxrwx - root hdfs 0 2018-05-10 03:41 /apps/falcon/SourceCluster/working Any further suggestions would be appreciated.
... View more
05-16-2018
06:33 PM
Hello @Samrat Kompella, I tried this and still encounter the same problem: [root@sandbox-hdp ~]# hdfs dfs -chmod -R 777 /apps/falcon/SourceCluster/staging [root@sandbox-hdp ~]# hdfs dfs -ls -d /apps/falcon/SourceCluster/staging
drwxrwxrwx - root hdfs 0 2018-05-10 03:41 /apps/falcon/SourceCluster/staging
... View more
05-16-2018
02:25 AM
I'm trying to create a cluster in Falcon, but I get an error saying that the current user "Falcon" is not the owner of the path. I am logged in as root, but it never asked me for a password. Here's the error: https://imgur.com/t4XorQv Any suggestions?
... View more
Labels:
11-25-2017
04:20 PM
Woops, I had completely overlooked that. Thanks!
... View more
11-20-2017
07:02 PM
Hello all, Hive is complaning when I try to import a csv into a table I created called "stocks." The table is set up as follows: hive> describe stocks; OK exchng string symbol string ymd string price_open float price_high float price_low float price_close float volume int price_adj_close float Then I try to load data from a csv as follows: hive> load data inpath '/user/data/stocks/stocks.csv' > overwrite into table human_resources.stocks; I then get the following error: Loading data to table human_resources.stocks Failed with exception Unable to move source hdfs://quickstart.cloudera:8020/user/data/stocks/stocks.csv to destination hdfs://quickstart.cloudera:8020/user/data/stocks/stocks.csv FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask hive> describe table stocks; FAILED: SemanticException [Error 10001]: Table not found table I don't think that the file is corrupted. You can see the file in the link below and it's just a normal csv file - in fact, it was provided by the author of the Hive book I'm working through. http://www.vaughn-s.net/hadoop/stocks.csv The image file I'm using is cloudera-quickstart-vm-5.10.0-0-vmware, I'm not sure if I need to update or not.
... View more
Labels: