About dmueller1607

HearMonster · ‎09-28-2024

Thanks to the second OP for identifying the root cause in the NiFi Jira. For people researching this today, the cause was the implicit/default Namespace specified in the root node (the 'xmlns' referenced in that element but without a suffix). In the case of the second poster, their XML started with: <data xmlns="http://www.media-saturn.com/msx" xmlns: ... The `/data/item//uniqueID` he was searching for belongs to, more accurately, the "http://www.media-saturn.com/msx" namespace, meaning that - he was supposed to - specify that namespace as part of his XPath expression. The reason that searching for the pathless "//@uniqueType" worked, was because that search searches all namespaces for that XPath expression! I'm using NiFi 2.0.0 M4 today and I'm pleased to report that it appears to support the XPath 3.0/3.1 notation where the Namespace can be specified inline with the query. It's not particularly elegant - but it works. You prefix the Namespace with the capital Letter 'Q' and wrap it in curly brackets; namely: Q{http://www.media-saturn.com/msx}<single-level selector> To implement his expression, "/data/item//uniqueID[UniqueType='ProdID']/text()" which currently returns an Empty String set for Key 'ProdID4', you would use: /Q{http://www.media-saturn.com/msx}data/Q{http://www.media-saturn.com/msx}item//uniqueID[UniqueType='ProdID']/text() I have a suspicion that the second Namespace reference (to 'item' in this case) is not required, since once you've selected/are navigating down the 'data' path of the correct Namespace, you're not likely to jump to another Namespace? My research indicates that Attributes do not seem to accept Namespace referencing - but again, once you've successfully selected your path I suspect it becomes a moot point. Aside, [1] it would be nice if the NiFi documentation specified the version of the XPath implemented within the Processor. [2] Even better if there were a drop down within the Processor that allowed a developer to select the version of XPath expression desired.

Abhishek_721 · ‎04-10-2020

this issue resolved ???i am also acing the same issue please suggest

wblake · ‎02-11-2020

Hello @mburgess , If you have an input string but don't know what the value might be, e.g. it could be "8" or "8.4", and want to convert this into an int. Is there a way you can convert the "8" to an 8 and the "8.4" to an 8.4 float? Currently I am only able to convert both to ints as 8, or both to floats as 8.0, and 8.4. For context, I am using a ValidateRecord to validate the number is an int, so would not like float values to be validated. This means that if an input is converted from a string into a number, I would like to know whether it is a decimal or integer. Are you able to please assist? Many thanks!

dmueller1607 · ‎11-25-2019

To answer my own question: Since I'm using multiple partitions for the Kafka topic, Spark uses more executors to process the data. Also Hive/Tez creates as many worker containers as the topic contains partitions.

dmueller1607 · ‎11-14-2018

I found the following Java based solution for me: Using the Dataset.filter method with FilterFunction: https://spark.apache.org/docs/2.3.0/api/java/index.html?org/apache/spark/sql/Dataset.html So, my code now looks like this: Dataset<Row> dsResult = sqlC.read() .format("org.apache.phoenix.spark") .option("table", tableName) .option("zkUrl", hbaseUrl).load() .where("OTHER_COLUMN = " + inputId) .filter(row -> { long readTime = row.getTimestamp(row.fieldIndex("TABLE_TS_COL")).getTime(); long tsFrom = new Timestamp(sdf.parse(dateFrom).getTime()).getTime(); long tsTo = new Timestamp(sdf.parse(dateTo).getTime()).getTime(); return readTime >= tsFrom && readTime <= tsTo; });

dmueller1607 · ‎10-15-2018

Solved it - Phoenix Arrays are 1-based, so using the following query solved it: SELECT REGEXP_SPLIT(ROWKEY, ':')[1] as test, count(1) FROM "my_view" GROUP BY REGEXP_SPLIT(ROWKEY, ':')[1]

dmueller1607 · ‎09-25-2018

The problem was solved after changing the MySQL Database URL from jdbc:mysql://xxxx.yyyy/hive?createDatabaseIfNotExist=true to jdbc:mysql://xxxx.yyyy/hive?createDatabaseIfNotExist=true&serverTimezone=Europe/Berlin I found the relevant information here: https://community.hortonworks.com/questions/218023/error-setting-up-hive-on-hdp-265timezone-on-mysql.html

dmueller1607 · ‎09-14-2018

@Felix Albani Thank you for your help! Without the LIMIT clause, the Job works perfectly (and in parallel).

dmueller1607 · ‎09-03-2018

Thank you! Can you give me some details about this or do you have some helpful links?

kgautam · ‎08-09-2018

1. COUNT will result in a full table scan and hence the query is slow. 2. Where on the primary key will be fast as it will do a lookup and not a scan. 3. Where used on any column apart from the primary key will result in a HBase full table scan. 4. Analyse table once to speed up count queries. But it will not affect the where on no-primary key.

Online	Offline
Last Visited	‎11-25-2019 04:11 AM

Member Since	‎04-24-2017 12:08 PM
Last Visited	‎11-25-2019 04:11 AM
Posts	106
Kudos received	13

Cloudera Community

Re: Spark Streaming / Hive + Kafka: Only one Worke...

Re: Filter a Phoenix Timestamp Column in SparkSQL ...

Re: Phoenix Query with Split operation on String (...

Re: Hive Metastore not starting in HDP 3.0

Re: HFile creation from Hive Table not working

Re: NiFi: Extract atrribute value from XML using E...

Re: Reading external Hive table from Spark in Hado...

Re: Convert JSON Attribute to Number in NiFi workf...

Re: Spark Streaming / Hive + Kafka: Only one Worke...

Re: Filter a Phoenix Timestamp Column in SparkSQL ...

Re: Phoenix Query with Split operation on String (...

Re: Hive Metastore not starting in HDP 3.0

Re: Spark SQL: Limit clause performance issues

Re: SparkSQL: Hive sub-query leads to full table s...

Re: Accessing HBase Table through Hive is very slo...