Member since
04-24-2017
106
Posts
13
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1417 | 11-25-2019 12:49 AM | |
2493 | 11-14-2018 10:45 AM | |
2246 | 10-15-2018 03:44 PM | |
2120 | 09-25-2018 01:54 PM | |
1942 | 08-03-2018 09:47 AM |
09-28-2024
07:03 PM
1 Kudo
Thanks to the second OP for identifying the root cause in the NiFi Jira. For people researching this today, the cause was the implicit/default Namespace specified in the root node (the 'xmlns' referenced in that element but without a suffix). In the case of the second poster, their XML started with: <data xmlns="http://www.media-saturn.com/msx" xmlns: ... The `/data/item//uniqueID` he was searching for belongs to, more accurately, the "http://www.media-saturn.com/msx" namespace, meaning that - he was supposed to - specify that namespace as part of his XPath expression. The reason that searching for the pathless "//@uniqueType" worked, was because that search searches all namespaces for that XPath expression! I'm using NiFi 2.0.0 M4 today and I'm pleased to report that it appears to support the XPath 3.0/3.1 notation where the Namespace can be specified inline with the query. It's not particularly elegant - but it works. You prefix the Namespace with the capital Letter 'Q' and wrap it in curly brackets; namely: Q{http://www.media-saturn.com/msx}<single-level selector> To implement his expression, "/data/item//uniqueID[UniqueType='ProdID']/text()" which currently returns an Empty String set for Key 'ProdID4', you would use: /Q{http://www.media-saturn.com/msx}data/Q{http://www.media-saturn.com/msx}item//uniqueID[UniqueType='ProdID']/text() I have a suspicion that the second Namespace reference (to 'item' in this case) is not required, since once you've selected/are navigating down the 'data' path of the correct Namespace, you're not likely to jump to another Namespace? My research indicates that Attributes do not seem to accept Namespace referencing - but again, once you've successfully selected your path I suspect it becomes a moot point. Aside, [1] it would be nice if the NiFi documentation specified the version of the XPath implemented within the Processor. [2] Even better if there were a drop down within the Processor that allowed a developer to select the version of XPath expression desired.
... View more
04-10-2020
05:14 AM
this issue resolved ???i am also acing the same issue please suggest
... View more
02-11-2020
04:07 AM
Hello @mburgess , If you have an input string but don't know what the value might be, e.g. it could be "8" or "8.4", and want to convert this into an int. Is there a way you can convert the "8" to an 8 and the "8.4" to an 8.4 float? Currently I am only able to convert both to ints as 8, or both to floats as 8.0, and 8.4. For context, I am using a ValidateRecord to validate the number is an int, so would not like float values to be validated. This means that if an input is converted from a string into a number, I would like to know whether it is a decimal or integer. Are you able to please assist? Many thanks!
... View more
11-25-2019
12:49 AM
1 Kudo
To answer my own question: Since I'm using multiple partitions for the Kafka topic, Spark uses more executors to process the data. Also Hive/Tez creates as many worker containers as the topic contains partitions.
... View more
11-14-2018
10:45 AM
I found the following Java based solution for me: Using the Dataset.filter method with FilterFunction: https://spark.apache.org/docs/2.3.0/api/java/index.html?org/apache/spark/sql/Dataset.html So, my code now looks like this: Dataset<Row> dsResult = sqlC.read()
.format("org.apache.phoenix.spark")
.option("table", tableName)
.option("zkUrl", hbaseUrl).load()
.where("OTHER_COLUMN = " + inputId)
.filter(row -> {
long readTime = row.getTimestamp(row.fieldIndex("TABLE_TS_COL")).getTime();
long tsFrom = new Timestamp(sdf.parse(dateFrom).getTime()).getTime();
long tsTo = new Timestamp(sdf.parse(dateTo).getTime()).getTime();
return readTime >= tsFrom && readTime <= tsTo;
});
... View more
10-15-2018
03:44 PM
Solved it - Phoenix Arrays are 1-based, so using the following query solved it: SELECT REGEXP_SPLIT(ROWKEY, ':')[1] as test, count(1) FROM "my_view" GROUP BY REGEXP_SPLIT(ROWKEY, ':')[1]
... View more
09-25-2018
01:54 PM
1 Kudo
The problem was solved after changing the MySQL Database URL from jdbc:mysql://xxxx.yyyy/hive?createDatabaseIfNotExist=true to jdbc:mysql://xxxx.yyyy/hive?createDatabaseIfNotExist=true&serverTimezone=Europe/Berlin I found the relevant information here: https://community.hortonworks.com/questions/218023/error-setting-up-hive-on-hdp-265timezone-on-mysql.html
... View more
09-14-2018
10:27 AM
@Felix Albani Thank you for your help! Without the LIMIT clause, the Job works perfectly (and in parallel).
... View more
09-03-2018
07:15 AM
Thank you! Can you give me some details about this or do you have some helpful links?
... View more
08-09-2018
07:19 PM
1. COUNT will result in a full table scan and hence the query is slow. 2. Where on the primary key will be fast as it will do a lookup and not a scan. 3. Where used on any column apart from the primary key will result in a HBase full table scan. 4. Analyse table once to speed up count queries. But it will not affect the where on no-primary key.
... View more