Member since
04-24-2017
106
Posts
13
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1136 | 11-25-2019 12:49 AM | |
2156 | 11-14-2018 10:45 AM | |
1960 | 10-15-2018 03:44 PM | |
1742 | 09-25-2018 01:54 PM | |
1423 | 08-03-2018 09:47 AM |
04-10-2020
05:14 AM
this issue resolved ???i am also acing the same issue please suggest
... View more
02-11-2020
04:07 AM
Hello @mburgess , If you have an input string but don't know what the value might be, e.g. it could be "8" or "8.4", and want to convert this into an int. Is there a way you can convert the "8" to an 8 and the "8.4" to an 8.4 float? Currently I am only able to convert both to ints as 8, or both to floats as 8.0, and 8.4. For context, I am using a ValidateRecord to validate the number is an int, so would not like float values to be validated. This means that if an input is converted from a string into a number, I would like to know whether it is a decimal or integer. Are you able to please assist? Many thanks!
... View more
11-25-2019
12:49 AM
1 Kudo
To answer my own question: Since I'm using multiple partitions for the Kafka topic, Spark uses more executors to process the data. Also Hive/Tez creates as many worker containers as the topic contains partitions.
... View more
11-14-2018
10:45 AM
I found the following Java based solution for me: Using the Dataset.filter method with FilterFunction: https://spark.apache.org/docs/2.3.0/api/java/index.html?org/apache/spark/sql/Dataset.html So, my code now looks like this: Dataset<Row> dsResult = sqlC.read()
.format("org.apache.phoenix.spark")
.option("table", tableName)
.option("zkUrl", hbaseUrl).load()
.where("OTHER_COLUMN = " + inputId)
.filter(row -> {
long readTime = row.getTimestamp(row.fieldIndex("TABLE_TS_COL")).getTime();
long tsFrom = new Timestamp(sdf.parse(dateFrom).getTime()).getTime();
long tsTo = new Timestamp(sdf.parse(dateTo).getTime()).getTime();
return readTime >= tsFrom && readTime <= tsTo;
});
... View more
10-15-2018
03:44 PM
Solved it - Phoenix Arrays are 1-based, so using the following query solved it: SELECT REGEXP_SPLIT(ROWKEY, ':')[1] as test, count(1) FROM "my_view" GROUP BY REGEXP_SPLIT(ROWKEY, ':')[1]
... View more
09-25-2018
01:54 PM
1 Kudo
The problem was solved after changing the MySQL Database URL from jdbc:mysql://xxxx.yyyy/hive?createDatabaseIfNotExist=true to jdbc:mysql://xxxx.yyyy/hive?createDatabaseIfNotExist=true&serverTimezone=Europe/Berlin I found the relevant information here: https://community.hortonworks.com/questions/218023/error-setting-up-hive-on-hdp-265timezone-on-mysql.html
... View more
09-14-2018
10:27 AM
@Felix Albani Thank you for your help! Without the LIMIT clause, the Job works perfectly (and in parallel).
... View more
09-03-2018
07:15 AM
Thank you! Can you give me some details about this or do you have some helpful links?
... View more
08-09-2018
07:19 PM
1. COUNT will result in a full table scan and hence the query is slow. 2. Where on the primary key will be fast as it will do a lookup and not a scan. 3. Where used on any column apart from the primary key will result in a HBase full table scan. 4. Analyse table once to speed up count queries. But it will not affect the where on no-primary key.
... View more
08-07-2018
03:31 PM
1 Kudo
have a look here: https://community.hortonworks.com/questions/88526/how-to-salt-row-key-in-hbase-table.html Basically it says that your prefix definition should be made in a way that you can calculate it during the query as well. In your (but perhaps simplified) example it might be even numbers prefix 000, odd numbers prefix 001.
... View more