Member since
06-08-2017
1049
Posts
518
Kudos Received
312
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 11126 | 04-15-2020 05:01 PM | |
| 7028 | 10-15-2019 08:12 PM | |
| 3073 | 10-12-2019 08:29 PM | |
| 11258 | 09-21-2019 10:04 AM | |
| 4190 | 09-19-2019 07:11 AM |
06-06-2018
08:09 AM
Thanks for your help. That was the problem.
... View more
06-04-2018
10:12 AM
1 Kudo
@Vivek Singh PutHiveQL processor is used to execute HiveDDL/DML commands and the processor expects incoming flowfile content would be HiveQL command. You can keep your create table statement by using GenerateFlowfile processor (or) replacetext processor ..etc and feed the success relation to PutHiveQL processor then the processor executes the content of flowfile and creates the table. Flow: GenerateFlowfile configs: PuthiveQL configs: Configure/enable HiveConnection pool and if you are having more than one HiveDDL/DML command in the flowfile content then use ; as delimiter then the processor will execute those commands with the specified delimiter. In NiFi convertAvroTo ORC processor adds hive.ddl attribute based on the flowfile content we can make use of that attribute and then use ReplaceText processor to create new flowfile content and execute the hive ddl statement using PutHiveQL processor. Please refer to this link for more details regarding generating/executing hive.ddl statements using NiFi.
... View more
08-21-2018
04:35 PM
Issue Resolved for me. In HDP 3.0, please use PutHive3Streaming, PutHive3QL and SelectHiveQL. Cheers.
... View more
06-06-2018
05:56 AM
@Shu I have implemented your template but the merged result is not in the order what I need in the simulation setting, I have five files, which is 00000_0 contains 'aaa' 00001_0 contains 'bbb' 00002_0 contains 'ccc' 00003_0 contains 'eee' 00004_0 contains 'hhhh' I want the output to be: aaa; bbb; ccc; eee; hhhh but the actual output is : eee; bbb; hhhh; ccc; aaa It seems that list/fetch processor can't detect the order like get processor? result.jpg
... View more
05-30-2018
06:11 AM
@Shu Thank you so much. Your command works for me. So as per my observation for `sqoop-import` command; We can not use --hive-import and --target-dir/--warehouse-dir both arguments at once. If we have already created external hive table at target directory. Note: If we want to import data of RDBMS table into Hadoop and into the specific directory in HDFS, then the user only --target-dir argument. If we want to import RDBMS table into Hive table into the specific HDFS directory; then, first of all, create external hive table and use only --hive-import argument. if when we want to use --query argument. we can use both arguments at once.i.e., --hive-import and --target-dir/warehouse-dir Regards, Jay.
... View more
05-26-2018
07:14 PM
@aman
mittal
Yes, it's possible. Take a look into the below sample flow Flow overview: 1.SelectHiveQL //to list tables from specific database in avro format HiveQL Select Query
show tables from default //to list all tables from default database 2.ConvertAvroToJson //to convert the list of tables from avro format to json format 3.SplitJson //split each table into individual flowfiles 4.EvaluateJsonPath //extract tab_name value and keep as attribute to the flowfile 5.RemoteProcessorGroup //as you are going to do for 3k tables it's better to use RPG for distributing the work. if you don't want to use RPG then skip both 5,6 processors feed success relationship from 4 to 7. 6.InputPort //get the RPG flowfiles 7.SelectHiveQL //to pull data from the hive tables 8.EncryptContent 9.RouteOnAttribute //as selecthiveql processor writes query.input.tables attribute, so based on this attribute and NiFi expression language add two properties in the processor. Example: azure
${query.input.tables:startsWith("a")} //only tablenames starts with a
gcloud
${query.input.tables:startsWith("e"):or(${query.input.tables:startsWith("a")})} //we are going to route table names starts with e(or)a to gcloud Feed the gcloud relationship to PutGCSobject processor and azure relationship to PutAzureBlobStorage processor. Refer to this link for NiFi expression language and make your expression that can route only the required tables to azure,gcs. In addition i have used only single database to list all the tables but if your 3k tables are coming from different databases then use GenerateFlowfile processor and add all the list of databases.Extract each database name as attribute --> feed the success relationship to SelectHiveQL processor. Refer to this link dynamically pass database attribute to first select hiveql processor. Reference flow.xml load-hivetables-to-azure-gcs195751.xml - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
... View more
06-12-2018
11:34 AM
@neeraj sharma If the answer addressed your question,Take a moment to Log in and Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues and close this thread.
... View more
06-04-2018
10:12 PM
@Winnie Philip if you are persisting some small datasets like tens of MB's should be good.If you are trying to cache some big dataset(couple of hundred MB's) then you need to increase the Max cache entry size in PutDistributedCacheMap processor. In DistributedMapCacheServer service configure Persistence Directory property value, If the value specified, the cache will be persisted in the given directory; if not specified, the cache will be in-memory only. By specifying directory we are not going use memory to cache the dataset. Also you need to increase Maximum Cache Entries property in DistributedMapCacheServer according to number of cache entries you are trying to keep in cache.
... View more
05-24-2018
04:37 PM
@shu tried almost all the thing mentioned above but still no luck
... View more
06-12-2018
11:35 AM
@Shailesh Bhaskar If the answer addressed your question,Take a moment to Log in and Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues and close this thread.
... View more