About gkeys

gkeys · ‎10-29-2016

Pig runs map-reduce under the covers and this list of files is the output of a map-reduce job. You should also notice a 0 byte (no contents) file named _SUCCESS at the top of the list. That is just a flag saying the job was a success. Bottom line is that when you point your job or table to the the parent directory holding these files, it simply sees the union of all files together. So you can think logically of the parent directory as the "file" holding the data. Thus, there is never a need to concatenate the files on hadoop -- just point to the parent directory and treat it as the file. So if you make a hive table -- just point to the parent directory. If you load the data into a pig script -- just point to the parent directory. Etc. If you want to pull the data to an edge node, use the command hdfs dfs -getmerge <hdfsParentDir> <localPathAndName> and it will combine all of the m-001, m-002 ... into a single file. If you want to pull it to your local machine, use Ambari File Views, open the parent directory, click "+ Select All" and then click "concatenate". That will concatenate all into one file and download it from your browser. If this is what you are looking for, let me know by accepting the answer; else, let me know of any gaps.

gkeys · ‎10-28-2016

@Magesh Kumar I believe this question is identical to another one you asked: https://community.hortonworks.com/questions/63947/incremental-flat-file-data-loading-into-hadoop.html#answer-63978 If there are differences, please elaborate.

alopresto · ‎10-29-2016

@Greg Keys unfortunately I think you are correct that ExecuteScript is the best way to achieve this right now. As far as I know, the PutFile processor cannot append to an existing file. You are given the option to deal with conflicting files using "replace", "ignore", or "fail" as a resolution strategy. You should submit an Apache Jira to add this functionality. I could see difficulties with file locks and flushing the buffer given the streaming nature of NiFi and I think further investigation is needed.

gkeys · ‎10-21-2016

@Sundar Lakshmanan Glad we found the problem. If you are satisfied, please accept the original answer. (That's how HCC works 🙂

mail2msiva · ‎08-12-2019

Nifi_AutoDeploymentScript/ is really helpful in workflow deployment. However looking for more details on 1. controller services 2. reading variables of source process group and deploy only those variables per environment 3. reading json attributes

gopal_unnikrish · ‎10-10-2017

Thanks for the wonderful article! Have one question. When we export a flow/component as template, all the sensitive values in the processors gets cleared out even when they are set as EL and not actual values. This makes it hard for to auto deploy Nifi pipelines pulling the templates from a Git repo and deploying on to Nifi without manual intervention. Is there any suggestions for a Nifi SDLC that will handle processors with sensitive properties. So far I have been to get it almost working, by using external custom properties and the Nifi REST API for deploying and instantiating templates. But the sensitive values getting clearing out in the template would manual population of the sensitive properties.

gkeys · ‎10-11-2016

Hi @Simran Kaur. Edge/client nodes are only for user access to the cluster. Having said that, they are not mandatory for a hadoop cluster since users can access through other means (e.g. Ambari views, Zeppelin, WebHDFS, HDFS mounts and other). So edge/client nodes are a bit distracting. The main architecture to Hadoop is the master-slave architecture of services. At the highest level, services typically have a master that manages a job and slaves that do the work distributed on the cluster. These are never on an edge node (edge node let's the user communicate to the master service).

rammohanciber · ‎10-06-2016

Thank you Greg for your answer it is really helpful .

cloppg · ‎10-04-2016

This looks really promising Greg - thank you - I will check this out.

harinarayana_hd · ‎07-11-2018

@Greg, i am trying to execute mysql queries, got error prefix not found. when i verified in mysql note there i observer mysql is there instead of %mysql. But unable to add % before mysql, its not editable. can u please help me how can i edit.

Online	Offline
Last Visited	‎06-11-2019 01:24 AM

Member Since	‎06-20-2016 01:29 PM
Last Visited	‎06-11-2019 01:24 AM
Posts	488
Kudos received	430

Cloudera Community

Re: DR for hadoop

Re: API + how to know by API command all machines ...

Re: Does data get copied in edge node from externa...

Re: is it possible to set the hadoop.tmp.dir value...

Re: How to handle nulls when exporting from Hive?

Re: Impala -Pig Files - Parquet file?

Re: Incrementally Loading flat files into hadoop (...

Re: In NiFi how do I append RouteText lines to an ...

Re: Hive CAST functions return NULL values:

Re: Nifi workflow version control & deployment

Re: Enterprise NiFi: Implementing Reusable Compone...

Re: where to install hive pig oozie and ranger on ...

Re: How can we see the output in single file if 3 ...

Re: best tools to import data from a myriad of sou...

Re: %jdbc(hive) prefix not found in Zeppelin