Member since
02-27-2020
173
Posts
42
Kudos Received
48
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1025 | 11-29-2023 01:16 PM | |
1125 | 10-27-2023 04:29 PM | |
1060 | 07-07-2023 10:20 AM | |
2437 | 03-21-2023 08:35 AM | |
871 | 01-25-2023 08:50 PM |
05-26-2020
02:16 PM
The issue is that the DROP TABLE statement doesn't seem to remove the data from HDFS. This is usually caused by the table being an external table that doesn't allow Hive to perform all operations on it. Another thing you can try is what's suggested in this thread (i.e. before you drop the table, change its property to be EXTERNAL=FALSE). Does that work for you?
... View more
05-26-2020
11:59 AM
Ok, so to be able to purge the table created by sqoop, because it is external, you'll need to add to your stanza: --hcatalog-storage-stanza 'stored as parquet TBLPROPERTIES("external.table.purge"="true")' Then when you load the data for the first time it will enable purging on that table. Executing purge command you have will then remove both metadata and the dat in the external table. Let me know if that works and if the solution is acceptable.
... View more
05-22-2020
12:51 PM
1 Kudo
Sqoop can only insert into a single Hive partition at one time. To accomplish what you are trying to do, you can have two separate sqoop commands: sqoop with --query ... where year(EventTime)=2019 (remove year(EventTime)=2020) and set --hive-partition-value 2019 (not 2020) sqoop with --query ... where year(EventTime)=2020 (remove year(EventTime)=2019) and set --hive-partition-value 2020 (not 2019) This way sqoop will write into the one partition you want. Since this is one-time import, the solution should work just fine. Let me know if this works and accept the answer if it makes sense.
... View more
05-22-2020
12:09 PM
Hi Heri, After you execute drop table test purge; can you check that the data is actually deleted? Do a query on the table first, but also check with hdfs dfs to see if the underlying files have been deleted from Hadoop (they should be). Let me know what you see. You may be right that EXTERNAL table data does not get deleted, just the metadata is deleted. That's why I'm asking you to check for data with hdfs dfs. Now, to be able to drop the EXTERNAL table (both metadata and data) you'd need to follow the steps here: https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/using-hiveql/content/hive_drop_external_table_data.html Hope that helps.
... View more
05-22-2020
11:28 AM
Some additional information you could provide to help the community answer the question: Are there any errors that Java returns when querying HBase or does it just silently not show any rows? Is the same user executing both tasks (through shell and java)? Can any other rows be retrieved from Java?
... View more
05-11-2020
02:58 PM
Thanks for clarifying the question, but I'm afraid I still don't know what you are trying to achieve. Based on your example I understand you have 10K records/documents with phone number P1 and 20K records/documents with phone number P2. You are retrieving all 10K documents only in a single query? And you want the performance of 10K row P1 query to be the same as a 10K row P2 query. Is that right? Solr was never built to retrieve large number of objects at one time. It's meant for faceted-search that returns humanly consumable number of records in the result set (see pagination). Are you doing this for UI display or for integration purposes? There is some useful documentation here on getting large number of records from Solr. It would be helpful if you shared your query, what your data structure is, and what is the use case. That way the community can better understand the problem and provide potential solution.
... View more
05-08-2020
01:10 PM
A Solr query can have a filter clause that will ensure only the documents from the last 7 days are fetched. What you are looking for is a filter query (fq) parameter. For example you could add this to your query: &fq=createdate:[NOW-7DAY/DAY TO NOW] You can read more about filtering in documentation here. If this is helpful please don't forget to give kudos and/or accept solution.
... View more
04-29-2020
01:54 PM
I just ran the following query through Hive and it worked as expected. select
col1,
col2,
case when col1 = "Female" and col2 = "Yes" then "Data Received" end
from table_name limit 100; Can you provide some steps to reproduce?
... View more
04-17-2020
07:58 AM
1 Kudo
Glad things are moving forward for you, Heri. Examining your sqoop command, I notice the following: --check-column EventTime tells sqoop to check this column as the timestamp column for select logic --incremental lastmodified tells sqoop that your source SQL table can have both records added to it AND records updated in it. Sqoop assumes that when a record is updated or added its EventTime is set to current timestamp. When you run this job for the first time, sqoop will pickup ALL records available (initial load). It will then print out a --last-value timestampX. This timestamp is the cutoff point for the next run of the job (i.e. next time you run the job with --exec incjob, it will set --last-value timestampX) So, to answer your question, it looks like sqoop is treating your job as an incremental load on the first run: [EventTime] < '2020-04-17 08:51:00.54'. When this job is kicked off again, it should pickup records from where it left off automatically. If you want, you can provide a manual --last-value timestamp for the initial load, but make sure you don't use it on subsequent incremental loads. For more details, please review sections 7.2.7 and 11.4 of Sqoop Documentation If this is helpful, don't forget to give kudos and accept the solution. Thank you!
... View more
04-15-2020
12:55 PM
2 Kudos
You will need to get both the year and the month of the original date (because of the leap year considerations). In terms of how to accomplish this in NiFi one way is to use ExecuteScript processor and Python's calendar library with the function calendar.monthrange(year, month) that returns the last day of month as the second argument.
... View more
- « Previous
- Next »