Member since
06-20-2016
488
Posts
433
Kudos Received
118
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3231 | 08-25-2017 03:09 PM | |
2075 | 08-22-2017 06:52 PM | |
3590 | 08-09-2017 01:10 PM | |
8298 | 08-04-2017 02:34 PM | |
8341 | 08-01-2017 11:35 AM |
10-29-2016
07:25 PM
1 Kudo
Pig runs map-reduce under the covers and this list of files is the output of a map-reduce job. You should also notice a 0 byte (no contents) file named _SUCCESS at the top of the list. That is just a flag saying the job was a success. Bottom line is that when you point your job or table to the the parent directory holding these files, it simply sees the union of all files together. So you can think logically of the parent directory as the "file" holding the data. Thus, there is never a need to concatenate the files on hadoop -- just point to the parent directory and treat it as the file. So if you make a hive table -- just point to the parent directory. If you load the data into a pig script -- just point to the parent directory. Etc. If you want to pull the data to an edge node, use the command hdfs dfs -getmerge <hdfsParentDir> <localPathAndName> and it will combine all of the m-001, m-002 ... into a single file. If you want to pull it to your local machine, use Ambari File Views, open the parent directory, click "+ Select All" and then click "concatenate". That will concatenate all into one file and download it from your browser. If this is what you are looking for, let me know by accepting the answer; else, let me know of any gaps.
... View more
10-28-2016
12:35 PM
@Magesh Kumar I believe this question is identical to another one you asked: https://community.hortonworks.com/questions/63947/incremental-flat-file-data-loading-into-hadoop.html#answer-63978
If there are differences, please elaborate.
... View more
10-29-2016
03:50 AM
1 Kudo
@Greg Keys unfortunately I think you are correct that ExecuteScript is the best way to achieve this right now. As far as I know, the PutFile processor cannot append to an existing file. You are given the option to deal with conflicting files using "replace", "ignore", or "fail" as a resolution strategy. You should submit an Apache Jira to add this functionality. I could see difficulties with file locks and flushing the buffer given the streaming nature of NiFi and I think further investigation is needed.
... View more
10-21-2016
02:06 PM
1 Kudo
@Sundar Lakshmanan Glad we found the problem. If you are satisfied, please accept the original answer. (That's how HCC works 🙂
... View more
08-12-2019
03:23 AM
Nifi_AutoDeploymentScript/ is really helpful in workflow deployment. However looking for more details on 1. controller services 2. reading variables of source process group and deploy only those variables per environment 3. reading json attributes
... View more
10-10-2017
05:45 PM
Thanks for the wonderful article! Have one question. When we export a flow/component as template, all the sensitive values in the processors gets cleared out even when they are set as EL and not actual values. This makes it hard for to auto deploy Nifi pipelines pulling the templates from a Git repo and deploying on to Nifi without manual intervention. Is there any suggestions for a Nifi SDLC that will handle processors with sensitive properties. So far I have been to get it almost working, by using external custom properties and the Nifi REST API for deploying and instantiating templates. But the sensitive values getting clearing out in the template would manual population of the sensitive properties.
... View more
10-11-2016
03:15 PM
Hi @Simran Kaur. Edge/client nodes are only for user access to the cluster. Having said that, they are not mandatory for a hadoop cluster since users can access through other means (e.g. Ambari views, Zeppelin, WebHDFS, HDFS mounts and other). So edge/client nodes are a bit distracting. The main architecture to Hadoop is the master-slave architecture of services. At the highest level, services typically have a master that manages a job and slaves that do the work distributed on the cluster. These are never on an edge node (edge node let's the user communicate to the master service).
... View more
10-06-2016
02:00 AM
Thank you Greg for your answer it is really helpful .
... View more
10-04-2016
09:01 PM
This looks really promising Greg - thank you - I will check this out.
... View more
07-11-2018
05:34 AM
@Greg, i am trying to execute mysql queries, got error prefix not found. when i verified in mysql note there i observer mysql is there instead of %mysql. But unable to add % before mysql, its not editable. can u please help me how can i edit.
... View more