Member since
06-08-2017
1049
Posts
518
Kudos Received
312
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 11258 | 04-15-2020 05:01 PM | |
| 7161 | 10-15-2019 08:12 PM | |
| 3140 | 10-12-2019 08:29 PM | |
| 11592 | 09-21-2019 10:04 AM | |
| 4372 | 09-19-2019 07:11 AM |
06-07-2018
10:22 AM
@Satya Nittala Use hive Substr function and specify start/end position that you needed for each field hive> select substr('hdfs://servername/sds/sds/erg/rownum=123/columnnumber=456',0,30) Location, substr('hdfs://servername/sds/sds/erg/rownum=123/columnnumber=456',31) PartionFields;
+---------------------------------+------------------------------+--+
| location | partionfields |
+---------------------------------+------------------------------+--+
| hdfs://servername/sds/sds/erg/ | rownum=123/columnnumber=456 |
+---------------------------------+------------------------------+--+ Please refer to this link for more details regarding hive string functions documentation. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
... View more
06-07-2018
10:08 AM
2 Kudos
@Vladislav Shcherbakov
You need to use PutDataBaseRecord processor to push data to Sql database from NiFi. Configure/enable Record Reader controller service(avro schema that matches incoming data) that can read the incoming data and and database connection then give all the details about your target table in sql server. Refer to this link to get more details regarding configs/usage of PutDatabaseRecord processor and to this link how to Configure Record Reader controller service.
... View more
06-06-2018
04:02 PM
@RAUI
It depends on how your bashrc file is configured, if your path variable in bashrc file having the directory path that your script is in. Then you don't need to specify bash in command path argument
... View more
06-06-2018
11:19 AM
1 Kudo
@aman
mittal
If you are using any other processors Except SplitRecord processor for splitting the flowfile into smaller chunks then each flowfile will have fragment.index attribute associated with the flowfile. As you are having table name as attribute to the flowfile and Make use of these attributes (table_name and fragment.index) and combine them to one to Create the new required attribute I'm assuming the tab_name is the table name attribute and Add new property in update attribute processor and In addition if you want to keep this attributes unique then you can add the timestamp value at the end like new_attribute ${tab_name}_${fragment.index}_${now():toNumber()} Based on the fragment.index and tab_name attribute values we are creating new attribute value dynamically.
... View more
06-06-2018
10:14 AM
@RAUI Once you ingest the files into HDFS then only you need to trigger the spark application then using ExecuteStreamCommand processor would be the correct approach as this processor accepts incoming connection and triggers spark applications. if you are having more than one file that you are storing into HDFS then it would be better to use MergeContent processor after PutHDFS processor. Configure the MergeContent processor to wait for minimum number of entries (or) by using Max Bin Age..etc property because if you connect Success relation from PutHDFS to ExecuteStreamCommand processor as soon as first file written to HDFS then application is going to be triggered from NiFi we are not waiting for all the files stored into HDFS directory. Triggering shell script using ExecuteStreamCommand processor configs:
... View more
06-05-2018
11:23 PM
1 Kudo
@Raghu VN As you are using Windows os and the specified path(with forward slashes) will work fine for single file as Tailing Mode. For Multiple files not able to read the regex that you are specifying in files to tail. I have recreated your scenario and with the below configs i'm able to tail multiple files Change the configs in your TailFile processor as keep two back slashes(\) on File(s) to Tail and Base directory property values Tailing mode Multiple files
File(s) to Tail test[1-3]\flowlog.csv
Rolling Filename Pattern flowlog.csv
Base directory C:\cw-data
Initial Start Position Beginning of File
State Location Local
Recursive lookup true
Lookup frequency 10 minutes
Maximum age 24 hours If you are using linux the directory path will have forward slashes(/) will work without any issues but in windows directory path will have back slashes we need to keep two back slashes(\) to get correct directory path.
... View more
06-05-2018
10:18 PM
@Raja M Probably the issue with the Region Configuration that you have used in PutSNS processor configs, Check Topic/Target arn is pointing to(region) then change the Region property value according to it.
... View more
06-05-2018
10:14 AM
@Yu-An Chen The issue is relationships that are feeding from FetchHDFS to mergeContent processors are commas.failure,failure, please use Success relation to feed MergeContent processor Flow: please change the feeding relationships as per the above screenshot. In addition save and upload this template to your instance for more reference and configure MergeContent/PutSFTP processors order-files-merge-content-194166.xml
... View more
06-05-2018
02:44 AM
@Yu-An Chen Keep FetchHDFS processor configs as below. Give core-site.xml,hdfs-site.xml path in Hadoop Configuration Resources property. Keep HDFS Filename property value as ${path}/${filename} ${path},${filename}are the attributes that needed by FetchHDFS processor to fetch the file from hdfs directory. These attributes are going to be associated with the each flowfile, added by ListHDFS processor.
... View more
06-05-2018
01:22 AM
@JAy PaTel Let me explain how to do your case using Shell script 1.Using whileloop: Create a input file with all the required tables i.e bash$ vi required_tables.txt mssql_table1 mssql_table2 Then the shell script will read the above file data line by line and executes sqoop import for each table bash$ vi sqoop_import_while.sh while read line;
do
tableName=`echo $line | cut -d'.' -f2`
sqoop import --connect jdbc:sqlserver://<HOST>:<port>;databasename=<mssql_database_name> --username xxxxx - -password xxxx --table $tableName --hive-import --hive-database <hive_database_name> --fields-terminated-by ',' -m 1
done </home/required_table.txt(give your required_tables.txt file path) Now the script reads each line from required_table.txt file and assigns the value to tableName variable then we are using tableName variable in our sqoop import staetment. Once the import is finished for the first table then script reads the next line and performs import again for the second table. 2.Using Forloop: bash$ vi sqoop_import_for.sh
declare -a req_tables=("mssql_table1" "mssql_table2")
for t in "${req_tables[@]}"
do
sqoop import --connect jdbc:sqlserver://<HOST>:<port>;databasename=<mssql_database_name> --username xxxxx --password xxxx --table $t --hive-import --hive-database <hive_database_name> --fields-terminated-by ',' -m 1
done By using this script we are not reading the table names from the file instead we are defining an array variable and iterating through all the array elements and performing sqoop import. We have defined req_tables array and given all the required tables then performing for loop each table in req_tables array and passing the name to --table argument in sqoop import. give the permissions to the shell script file then execute the script by using ./sqoop_import_for.sh (or) ./sqoop_import_while.sh You can choose either of the above ways to import only the required tables using sqoop. Let us know if you are facing issues ..!!
... View more