About Shu_ashu

Shu_ashu · ‎06-07-2018

@Satya Nittala Use hive Substr function and specify start/end position that you needed for each field hive> select substr('hdfs://servername/sds/sds/erg/rownum=123/columnnumber=456',0,30) Location, substr('hdfs://servername/sds/sds/erg/rownum=123/columnnumber=456',31) PartionFields; +---------------------------------+------------------------------+--+ | location | partionfields | +---------------------------------+------------------------------+--+ | hdfs://servername/sds/sds/erg/ | rownum=123/columnnumber=456 | +---------------------------------+------------------------------+--+ Please refer to this link for more details regarding hive string functions documentation. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎06-07-2018

@Vladislav Shcherbakov You need to use PutDataBaseRecord processor to push data to Sql database from NiFi. Configure/enable Record Reader controller service(avro schema that matches incoming data) that can read the incoming data and and database connection then give all the details about your target table in sql server. Refer to this link to get more details regarding configs/usage of PutDatabaseRecord processor and to this link how to Configure Record Reader controller service.

Shu_ashu · ‎06-06-2018

@RAUI It depends on how your bashrc file is configured, if your path variable in bashrc file having the directory path that your script is in. Then you don't need to specify bash in command path argument

Shu_ashu · ‎06-06-2018

@aman mittal If you are using any other processors Except SplitRecord processor for splitting the flowfile into smaller chunks then each flowfile will have fragment.index attribute associated with the flowfile. As you are having table name as attribute to the flowfile and Make use of these attributes (table_name and fragment.index) and combine them to one to Create the new required attribute I'm assuming the tab_name is the table name attribute and Add new property in update attribute processor and In addition if you want to keep this attributes unique then you can add the timestamp value at the end like new_attribute ${tab_name}_${fragment.index}_${now():toNumber()} Based on the fragment.index and tab_name attribute values we are creating new attribute value dynamically.

Shu_ashu · ‎06-06-2018

@RAUI Once you ingest the files into HDFS then only you need to trigger the spark application then using ExecuteStreamCommand processor would be the correct approach as this processor accepts incoming connection and triggers spark applications. if you are having more than one file that you are storing into HDFS then it would be better to use MergeContent processor after PutHDFS processor. Configure the MergeContent processor to wait for minimum number of entries (or) by using Max Bin Age..etc property because if you connect Success relation from PutHDFS to ExecuteStreamCommand processor as soon as first file written to HDFS then application is going to be triggered from NiFi we are not waiting for all the files stored into HDFS directory. Triggering shell script using ExecuteStreamCommand processor configs:

Shu_ashu · ‎06-05-2018

@Raghu VN As you are using Windows os and the specified path(with forward slashes) will work fine for single file as Tailing Mode. For Multiple files not able to read the regex that you are specifying in files to tail. I have recreated your scenario and with the below configs i'm able to tail multiple files Change the configs in your TailFile processor as keep two back slashes(\) on File(s) to Tail and Base directory property values Tailing mode Multiple files File(s) to Tail test[1-3]\flowlog.csv Rolling Filename Pattern flowlog.csv Base directory C:\cw-data Initial Start Position Beginning of File State Location Local Recursive lookup true Lookup frequency 10 minutes Maximum age 24 hours If you are using linux the directory path will have forward slashes(/) will work without any issues but in windows directory path will have back slashes we need to keep two back slashes(\) to get correct directory path.

Shu_ashu · ‎06-05-2018

@Raja M Probably the issue with the Region Configuration that you have used in PutSNS processor configs, Check Topic/Target arn is pointing to(region) then change the Region property value according to it.

Shu_ashu · ‎06-05-2018

@Yu-An Chen The issue is relationships that are feeding from FetchHDFS to mergeContent processors are commas.failure,failure, please use Success relation to feed MergeContent processor Flow: please change the feeding relationships as per the above screenshot. In addition save and upload this template to your instance for more reference and configure MergeContent/PutSFTP processors order-files-merge-content-194166.xml

Shu_ashu · ‎06-05-2018

@Yu-An Chen Keep FetchHDFS processor configs as below. Give core-site.xml,hdfs-site.xml path in Hadoop Configuration Resources property. Keep HDFS Filename property value as ${path}/${filename} ${path},${filename}are the attributes that needed by FetchHDFS processor to fetch the file from hdfs directory. These attributes are going to be associated with the each flowfile, added by ListHDFS processor.

Shu_ashu · ‎06-05-2018

@JAy PaTel Let me explain how to do your case using Shell script 1.Using whileloop: Create a input file with all the required tables i.e bash$ vi required_tables.txt mssql_table1 mssql_table2 Then the shell script will read the above file data line by line and executes sqoop import for each table bash$ vi sqoop_import_while.sh while read line; do tableName=`echo $line | cut -d'.' -f2` sqoop import --connect jdbc:sqlserver://<HOST>:<port>;databasename=<mssql_database_name> --username xxxxx - -password xxxx --table $tableName --hive-import --hive-database <hive_database_name> --fields-terminated-by ',' -m 1 done </home/required_table.txt(give your required_tables.txt file path) Now the script reads each line from required_table.txt file and assigns the value to tableName variable then we are using tableName variable in our sqoop import staetment. Once the import is finished for the first table then script reads the next line and performs import again for the second table. 2.Using Forloop: bash$ vi sqoop_import_for.sh declare -a req_tables=("mssql_table1" "mssql_table2") for t in "${req_tables[@]}" do sqoop import --connect jdbc:sqlserver://<HOST>:<port>;databasename=<mssql_database_name> --username xxxxx --password xxxx --table $t --hive-import --hive-database <hive_database_name> --fields-terminated-by ',' -m 1 done By using this script we are not reading the table names from the file instead we are defining an array variable and iterating through all the array elements and performing sqoop import. We have defined req_tables array and given all the required tables then performing for loop each table in req_tables array and passing the name to --table argument in sqoop import. give the permissions to the shell script file then execute the script by using ./sqoop_import_for.sh (or) ./sqoop_import_while.sh You can choose either of the above ways to import only the required tables using sqoop. Let us know if you are facing issues ..!!

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: How can i convert location field into 2 fields...

Re: Need parsing and import data from ftp to sql ...

Re: Trigger an spark application from NiFi

Re: Naming splitted files incrementally in nifi fo...

Re: Trigger an spark application from NiFi

Re: TailFile Multiple Directory Same File not work...

Re: puSNS getting Invalid Parameter Empty Messag...

Re: Order of files MergeContent processor

Re: Order of files MergeContent processor

Re: Closed : How to import mutilple tables (not ...