Member since
06-08-2017
1049
Posts
518
Kudos Received
312
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 11160 | 04-15-2020 05:01 PM | |
| 7065 | 10-15-2019 08:12 PM | |
| 3080 | 10-12-2019 08:29 PM | |
| 11328 | 09-21-2019 10:04 AM | |
| 4233 | 09-19-2019 07:11 AM |
12-04-2017
04:10 PM
@Julià Delos, The issue is as we are having same file name for the 2 output files to resolve this issue you need to change the file name in Update Attribute processor By adding property as filename
${UUID()} Configs:- So we are changing the filename of the flowfile to UUID as uuid is unique, by using uuid as your filename you wont get any issues in PUTFile processor.
... View more
12-04-2017
01:42 AM
@Julià Delos You need to split the json array first by using SplitJSON processor:- JsonPath Expression property as $.* Configs:- Input:- [
{
"id": "6935895746",
"type": "PushEvent",
"actor": {
"id": 32568916,
"login": "bigajwiktoria",
"display_login": "bigajwiktoria",
"gravatar_id": "",
"url": "https://api.github.com/users/bigajwiktoria",
"avatar_url": "https://avatars.githubusercontent.com/u/32568916?"
}
},
{
"id": "6935895745",
"type": "PushEvent",
"actor": {
"id": 463230,
"login": "taylorotwell",
"display_login": "taylorotwell",
"gravatar_id": "",
"url": "https://api.github.com/users/taylorotwell",
"avatar_url": "https://avatars.githubusercontent.com/u/463230?"
}
}
] Output:- As in the above array we are having 2 messages so split json processor results 2 flowfiles ff1:- {
"id": "6935895746",
"type": "PushEvent",
"actor": {
"id": 32568916,
"login": "bigajwiktoria",
"display_login": "bigajwiktoria",
"gravatar_id": "",
"url": "https://api.github.com/users/bigajwiktoria",
"avatar_url": "https://avatars.githubusercontent.com/u/32568916?"
}
} ff2:- {
"id": "6935895745",
"type": "PushEvent",
"actor": {
"id": 463230,
"login": "taylorotwell",
"display_login": "taylorotwell",
"gravatar_id": "",
"url": "https://api.github.com/users/taylorotwell",
"avatar_url": "https://avatars.githubusercontent.com/u/463230?"
}
} Then use Evaluate Json Path processor:- in this processor we are extracting the json message values to attributes. Change property Destination to flowfile-attribute Add properties actor-login
$.actor.login
id
$.id type
$.type Configs:- Replace Text processor:- Now we are replacing the entire json message with our extracted attributes in Replace Text processor. Search Value
(?s)(^.*$)
Replacement Value
"${id}","${type}","${actor-login}"
Character Set
UTF-8
Replacement Strategy Always Replace
Evaluation Mode
Entire text Configs:- now your flowfile will have your required output as content. Flow:- SplitJson(split relation) //splitting Json Array to individual messages-->EvaluateJSONPath(Match Relation) //extracting the required values from json message and adding them as flowfile attributes --> Replace Text (Success) //Replacing flowfile contents with the required values If the answer addressed your question, Then Click on Accept button below, that would be great help to community users if they are facing similar kind of issues.
... View more
12-03-2017
03:52 PM
1 Kudo
@VINAYAK DORNALA If you don't want to pass last value on every run, then create sqoop job as sqoop job will store the last value in metastore and pass the last value when we run the job again. --incremental lastmodified:- Creating sqoop job:- sqoop job --create test -- import --connect 'jdbc:mysql://quickstart:3306/retail_db' --username retail_dba --password cloudera --table orders --split-by order_id --target-dir /user/sqoop/orders --check-column order_date --merge-key order_id --incremental lastmodified --as-textfile sqoop job --list //list out created sqoop jobs
sqoop job --exec test //execute sqoop job
sqoop job --delete test //delete sqoop job last modified mode works with merge-key, this mode compares the existing old data in the directory with the new data, If same merge key appears in the old data then sqoop merges that old data with the new data in reducer phase, then the new data after merge will be written to the target directory. when we creates sqoop job then it will imports the data as incrementally for the first time run sqoop imports all the data and stores the last value in metastore and for the next run sqoop make use of metastore last value only imports newly added data only. Lastmodified with merge-key argument will work with the existing target directory, If we won't mention merger-key argument then sqoop will throw an error if the target directory already exists. --incremental append:- sqoop job --create test -- import --connect 'jdbc:mysql://quickstart:3306/retail_db' --username retail_dba --password cloudera --table orders --split-by order_id --target-dir /user/sqoop/orders --check-column order_date --incremental append --as-textfile Append mode works with the existing directory(if directory not exists then creates directory), Stores the last state value in metastore. For the second run of sqoop job only imports the newly added data after the last state value and creates new file in the target directory. In this mode mapper is going to be initialized because we are not mentioning any merge key. You can decide which mode will be best fit for your use case. Note:- "--" and a space before the import creating sqoop job. Use Options file to store credentials (user name, password and connect string) and pass it as as a parameter to --options-file. Creating options file in sqoop
... View more
12-03-2017
02:13 AM
2 Kudos
@VINAYAK DORNALA In your sqoop import you are using --merge-key order_id when ever you use merge key argument in sqoop import runs a MapReduce job that takes two directories as input: a newer dataset, and an older one. The output of the MapReduce job will be placed in the directory in HDFS specified by --target-dir . Sqoop will compare the new data with the old existing data(part-m-000003), if the same Order_id record presented then sqoop will merge the data with newly imported data and Creates new part file in the target directory. You can see part-r-00000 file got created after sqoop import with merge key, part-r means reducer has created the file. When we do sqoop import with out merge key argument the directory names in target dir are like part-m means mapper created the file. This is an expected behavior from sqoop when we specify merge key argument compares the existing data with the new data, if same order_id found it will merge the data with new data in reducer phase and Creates new part-r file by removing all the existing files(because sqoop has checked if there is any order_id presented in the existing part-m files and creates new part-r file).
... View more
12-01-2017
06:47 PM
@Pallavi Ab, I think issue is with hdfs-site.xml and core-site.xml Use the xml's from /usr/hdp/2.4.2.0-258/hadoop/conf instead of /usr/hdp/2.4.2.0-258/etc/hadoop/conf.empty directory /usr/hdp/2.4.2.0-258/hadoop/conf/hdfs-site.xml /usr/hdp/2.4.2.0-258/hadoop/conf/core-site.xml Copy them to another directory and try to use them in the Hadoop configuration resources property in GetHDFS processor.
... View more
12-01-2017
06:08 PM
@Pallavi Ab As per your Logs Caused by: java.io.IOException: PropertyDescriptor PropertyDescriptor[Directory] has invalid value /user/cmor/kinetica/files/sample_b.csv.The directory does not exist. Can you check is the above directory exists in HDFS by using below command bash# hdfs dfs -test -d /user/cmor/kinetica/files
bash# echo $?
bash# hdfs dfs -test -e /user/cmor/kinetica/files/sample_b.csv
bash# echo $?
//if echo returns 0 file or directory exists
//if echo returns 1 file or directory exists Make sure the path in the Directory property is correct and run the processor again. Usage of hdfs test command bash# hdfs dfs -test -[defsz] <hdfs-path>
Options:<br>-d: f the path is a directory, return 0.<br>-e: if the path exists, return 0.<br>-f: if the path is a file, return 0.<br>-s: if the path is not empty, return 0.<br>-z: if the file is zero length, return 0.
... View more
12-01-2017
05:37 PM
@Pallavi Ab once make sure your file is in the directory and Nifi has permissions to the directory. I am not sure about your get hdfs configurations, take a look on the below configs and configure your processor same configs in the screenshot shown below. Configs:- Important Property is Keep source file configure this property as per your needs. Keep Source File false true false Determines whether to delete the file from HDFS after it has been successfully transferred. If true, the file will be fetched repeatedly. This is intended for testing only.
... View more
11-29-2017
04:45 PM
3 Kudos
@balalaika Use SplitJson Processor with the following configs:- Input Json:- {
"objectIdFieldName": "ID",
"objectIds": [
64916,
67266,
67237,
64511]
} Output:- will be different flowfiles per objectId 64916 67266 67237 64511 Use Extract Text processor:- extract the content of the flowfile by adding new property id as (.*) Right now you will have id attribute added to the flowfile then you can use the id attribute when you are doing InvokeHTTP processor Flow:- GetHTTP (get response JSON) --> SplitJson -->ExtractText--> InvokeHTTP (new query per ID)
... View more
11-24-2017
02:22 PM
2 Kudos
@Mohammad Shamim
Do desc formatted on the table name,this command will display either the table is External (or) Managed and Location of the table. hive# desc formatted <db-name>.<db-table-name>; Check Size of the Database:- bash# hdfs dfs -count -h -v <hdfs-location> Example:- In the above screenshot you can view i ran desc formatted devrabbit table, The table type is Managed table and the location of the table is /user/hdfs/hive if you want to find the size of the above location then do hdfs dfs -count -h -v /user/hdfs/hive It will display the size of the directory.
... View more
11-23-2017
12:58 PM
@Mohamed Hossam
I think you are missing space in search value property. Use the below regex in search value property ^(.*?) (.*?) IP (.*?) > (.*?) .*$ (or) ([^\s]+)\s([^\s]+)\sIP\s(.*)\s>\s([^\s]+).* Use any of the above regex's. Config:-
... View more