Member since
06-08-2017
1049
Posts
518
Kudos Received
312
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 11983 | 04-15-2020 05:01 PM | |
| 7950 | 10-15-2019 08:12 PM | |
| 3592 | 10-12-2019 08:29 PM | |
| 12972 | 09-21-2019 10:04 AM | |
| 4838 | 09-19-2019 07:11 AM |
07-04-2019
03:09 AM
@Matt Field From NiFi ExtractText docs: The first capture group, if any found, will be placed into that attribute name.But all capture groups, including the matching string sequence itself will also be provided at that attribute name with an index value provided. This is an expected behaviour from NiFi as you are having capture group in your regular expression, so extract text processor adds index value to attribute name. For consistency use ${myattribute} without index value as the reference for the attribute value. - If the answer is helpful to resolve the issue, Login and Click on Accept button below to close this thread.This will help other community users to find answers quickly 🙂
... View more
06-26-2019
07:29 PM
1 Kudo
Try with below jolt spec: [{
"operation": "shift",
"spec": {
"id": "ID",
"nummer": "Nummer",
"table": {
"*": {
"zn": "ArtikelPreise_Pos.[#2].ZeileNr",
"stfflbisart": "ArtikelPreise_Pos.[#2].StaffelBis"
}
}
}
}, {
"operation": "default",
"spec": {
"Default_Kopf": "${VAR_KD}",
"ArtikelPreise_Pos[]": {
"*": {
"Default_Kopf": "${DFT_POS}"
}
}
}
}
] Output: {
"ID" : "177",
"Nummer" : "22",
"ArtikelPreise_Pos" : [ {
"ZeileNr" : 1,
"StaffelBis" : 10,
"Default_Kopf" : "${DFT_POS}"
}, {
"ZeileNr" : 2,
"StaffelBis" : 50,
"Default_Kopf" : "${DFT_POS}"
} ],
"Default_Kopf" : "${VAR_KD}"
} I hope this matches with your expected output. - If the answer is helpful to resolve the issue, Login and Click on Accept button below to close this thread.This will help other community users to find answers quickly 🙂
... View more
06-25-2019
05:51 PM
1 Kudo
@Rupak Dum 1. Do I need to upload the .sh script in a folder within HDFS No need to upload .sh script to HDFS/ If you upload script to HDFS then follow this link to execute shell script from HDFS. 2. How do I setup the permission for the script so that it runs successfully You are running sqoop import as root user for this case you need to change the permissions in HDFS for /user directory. Refer to this and this link for similar kind of thread. 3. How to execute .sh file from within hdfs so that I do not get the permission denied error change the permissions of /user hdfs directory to 700 (or) 777, then you won't get any permission issues.
... View more
06-20-2019
08:37 PM
@Amrutha K This is a known issue in spark reported in this Jira SPARK-24260 and not yet resolved. One way of doing this is to execute each query at a time i.e after reading .hql file we can access array of elemets by their indexes (0),(1) val df1=sc.sql(sc.textFile("/user/temp/hive.hql").collect().mkString.split(";").collect()(0)) val df2=sc.sql(sc.textFile("/user/temp/hive.hql").collect().mkString.split(";").collect()(1)) (or) If you want to just execute the queries and see the results on console then try this approach. sc.textFile("/user/temp/hive.hql").collect().mkString.split(";").map(x => sc.sql(x).show()) Now we are executing all queries in hql script and displaying results in console.
... View more
06-19-2019
01:52 AM
@Jayashree S Use RouteOnArttribute processor after ListS3Object processor and filter only the required file and pass that to FetchS3Object. Flow: Lists3
RouteOnAttribute
FetchS3 (or) If you want to pull the same file from s3 all the time, then you can use flow as: GenerateFlowFile //schedule this processor as per your requirements
FetchS3Object //configure full s3 file path
... View more
06-19-2019
01:21 AM
@Bill Miller Try with series of SplitRecord processors to create smaller chunks of files. Follow the similar approach mentioned in this thread and see if you get any performance with this approach.
... View more
06-19-2019
01:00 AM
@Sampath Kumar Hive Timestamp type accepts format as yyyy-MM-dd HH:mm:ss[.SSS] hive> select timestamp("2019-06-15 15:43:12");
2019-06-15 15:43:12
hive> select timestamp("2019-06-15 15:43:12.988");
2019-06-15 15:43:12.988 hive> select timestamp("2019-06-15T15:43:12")
NULL If you are thinking to have timestamp type rather than text format tables then you use from_unixtime,unix_timestamp functions to remove "T" from the data and then you can have timestamp type in all formats. - If the answer is helpful to resolve the issue, Login and Click on Accept button below to close this thread.This will help other community users to find answers quickly 🙂
... View more
06-18-2019
12:58 AM
1 Kudo
@Carlos Try with below spec: [{
"operation": "shift",
"spec": {
"*": "data.&",
"ID": ["ID", "data.ID"]
}
}, {
"operation": "default",
"spec": {
"dataset": "${dataset:toLower()}",
"date": "${date}"
}
}] Output: {
"ID" : "123",
"data" : {
"ID" : "123",
"Text1" : "aaa",
"Text2" : "aaa",
"Text3" : "aaa"
},
"date" : "${date}",
"dataset" : "${dataset:toLower()}"
}
... View more
06-16-2019
11:02 PM
1 Kudo
@Sampath Kumar ALTER TABLE table SET SERDEPROPERTIES ("timestamp.formats"="yyyy-MM-dd'T'HH:mm:ss"); Works only in case of Textformat,CSV format tables. If you are having other format table like orc..etc then set serde properties are not got to be working. - Tested by creating text format table: Data: 1,2019-06-15T15:43:12
2,2019-06-15T15:43:19 create table i(id int,ts timestamp) row format delimited fields terminated by ',' stored as textfile;
ALTER TABLE i SET SERDEPROPERTIES ("timestamp.formats"="yyyy-MM-dd'T'HH:mm:ss");
select * from i;
1 2019-06-15 15:43:12
2 2019-06-15 15:43:19 - incase if we have orc file with 2019-06-15T15:43:12 format then altering the serde properties still results null format for timestamp field.
... View more
06-15-2019
02:36 PM
1 Kudo
@Jayashree S ListS3 processor is stateful processor once the processor runs it will store the state in the processor and then runs incrementally,if we don't have any new files added to S3 directory then processor won't list any files. How to Clear state: Stop the ListS3 processor and Right click on ListS3processor and select state and clear the state that is saved in the processor. Then start ListS3 processor, now processor will list all the files in S3 directory.
... View more