Member since
02-27-2020
173
Posts
41
Kudos Received
48
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
444 | 11-29-2023 01:16 PM | |
551 | 10-27-2023 04:29 PM | |
597 | 07-07-2023 10:20 AM | |
1455 | 03-21-2023 08:35 AM | |
593 | 01-25-2023 08:50 PM |
06-27-2020
10:48 PM
1 Kudo
Hi Guy, Please try adjusting your command to the following: ozone sh volume create --quota=1TB --user=hdfs o3://ozone1/tests Note the documentation states that the last parameter is a URI in the format <prefix>://<Service ID>/<path>. Service Id is what you found in ozone-site.xml.
... View more
06-12-2020
09:48 AM
Hi @Maria_pl , generally speaking the approach is as follows: 1. Generate a dummy flow file that will trigger (GenerateFlowFile processor) 2. Next step is UpdateAttribute processor that sets the start date and end date as attributes in the flow file 3. ExecuteScript is next. This can be a python script, or whichever language you prefer, that will use the start and end attributes to list out all the dates in between. 4. If your script produces single file output of dates, you can then use SplitText processor to cut each row into its own flow file and from there each file will have its own unique date in your range. Hope that makes sense.
... View more
06-06-2020
09:15 PM
Glad to hear that you have finally found the root cause of this issue. Thanks for sharing @Heri
... View more
05-27-2020
05:40 PM
The rowId used in our HBase tables is not exactly composed of hex strings. It is a mix as pointed out in earlier correspondence of this thread. \x00\x0A@E\xFFn[\x18\x9F\xD4-1447846881#524196832 The solution to this, as I was able to find, was to use the function provided by the Bytes library called toBytesBinary. This method considers the hex representation of characters in a string and treat each of those as unit instead of breaking it further up. Hope this helps!!
... View more
05-22-2020
12:51 PM
1 Kudo
Sqoop can only insert into a single Hive partition at one time. To accomplish what you are trying to do, you can have two separate sqoop commands: sqoop with --query ... where year(EventTime)=2019 (remove year(EventTime)=2020) and set --hive-partition-value 2019 (not 2020) sqoop with --query ... where year(EventTime)=2020 (remove year(EventTime)=2019) and set --hive-partition-value 2020 (not 2019) This way sqoop will write into the one partition you want. Since this is one-time import, the solution should work just fine. Let me know if this works and accept the answer if it makes sense.
... View more
05-11-2020
02:58 PM
Thanks for clarifying the question, but I'm afraid I still don't know what you are trying to achieve. Based on your example I understand you have 10K records/documents with phone number P1 and 20K records/documents with phone number P2. You are retrieving all 10K documents only in a single query? And you want the performance of 10K row P1 query to be the same as a 10K row P2 query. Is that right? Solr was never built to retrieve large number of objects at one time. It's meant for faceted-search that returns humanly consumable number of records in the result set (see pagination). Are you doing this for UI display or for integration purposes? There is some useful documentation here on getting large number of records from Solr. It would be helpful if you shared your query, what your data structure is, and what is the use case. That way the community can better understand the problem and provide potential solution.
... View more
04-29-2020
01:54 PM
I just ran the following query through Hive and it worked as expected. select
col1,
col2,
case when col1 = "Female" and col2 = "Yes" then "Data Received" end
from table_name limit 100; Can you provide some steps to reproduce?
... View more
04-17-2020
07:58 AM
1 Kudo
Glad things are moving forward for you, Heri. Examining your sqoop command, I notice the following: --check-column EventTime tells sqoop to check this column as the timestamp column for select logic --incremental lastmodified tells sqoop that your source SQL table can have both records added to it AND records updated in it. Sqoop assumes that when a record is updated or added its EventTime is set to current timestamp. When you run this job for the first time, sqoop will pickup ALL records available (initial load). It will then print out a --last-value timestampX. This timestamp is the cutoff point for the next run of the job (i.e. next time you run the job with --exec incjob, it will set --last-value timestampX) So, to answer your question, it looks like sqoop is treating your job as an incremental load on the first run: [EventTime] < '2020-04-17 08:51:00.54'. When this job is kicked off again, it should pickup records from where it left off automatically. If you want, you can provide a manual --last-value timestamp for the initial load, but make sure you don't use it on subsequent incremental loads. For more details, please review sections 7.2.7 and 11.4 of Sqoop Documentation If this is helpful, don't forget to give kudos and accept the solution. Thank you!
... View more
04-15-2020
12:55 PM
2 Kudos
You will need to get both the year and the month of the original date (because of the leap year considerations). In terms of how to accomplish this in NiFi one way is to use ExecuteScript processor and Python's calendar library with the function calendar.monthrange(year, month) that returns the last day of month as the second argument.
... View more
04-15-2020
08:16 AM
1 Kudo
You can try the ReplaceText NiFi processor withe the approached described here. That will be a clean way of doing what you want without much scripting.
... View more
- « Previous
- Next »