About aakulov

aakulov · ‎06-27-2020

Hi Guy, Please try adjusting your command to the following: ozone sh volume create --quota=1TB --user=hdfs o3://ozone1/tests Note the documentation states that the last parameter is a URI in the format <prefix>://<Service ID>/<path>. Service Id is what you found in ozone-site.xml.

aakulov · ‎06-12-2020

Hi @Maria_pl , generally speaking the approach is as follows: 1. Generate a dummy flow file that will trigger (GenerateFlowFile processor) 2. Next step is UpdateAttribute processor that sets the start date and end date as attributes in the flow file 3. ExecuteScript is next. This can be a python script, or whichever language you prefer, that will use the start and end attributes to list out all the dates in between. 4. If your script produces single file output of dates, you can then use SplitText processor to cut each row into its own flow file and from there each file will have its own unique date in your range. Hope that makes sense.

jagadeesan · ‎06-06-2020

Glad to hear that you have finally found the root cause of this issue. Thanks for sharing @Heri

akshatsp · ‎05-27-2020

The rowId used in our HBase tables is not exactly composed of hex strings. It is a mix as pointed out in earlier correspondence of this thread. \x00\x0A@E\xFFn[\x18\x9F\xD4-1447846881#524196832 The solution to this, as I was able to find, was to use the function provided by the Bytes library called toBytesBinary. This method considers the hex representation of characters in a string and treat each of those as unit instead of breaking it further up. Hope this helps!!

aakulov · ‎05-22-2020

Sqoop can only insert into a single Hive partition at one time. To accomplish what you are trying to do, you can have two separate sqoop commands: sqoop with --query ... where year(EventTime)=2019 (remove year(EventTime)=2020) and set --hive-partition-value 2019 (not 2020) sqoop with --query ... where year(EventTime)=2020 (remove year(EventTime)=2019) and set --hive-partition-value 2020 (not 2019) This way sqoop will write into the one partition you want. Since this is one-time import, the solution should work just fine. Let me know if this works and accept the answer if it makes sense.

aakulov · ‎05-11-2020

Thanks for clarifying the question, but I'm afraid I still don't know what you are trying to achieve. Based on your example I understand you have 10K records/documents with phone number P1 and 20K records/documents with phone number P2. You are retrieving all 10K documents only in a single query? And you want the performance of 10K row P1 query to be the same as a 10K row P2 query. Is that right? Solr was never built to retrieve large number of objects at one time. It's meant for faceted-search that returns humanly consumable number of records in the result set (see pagination). Are you doing this for UI display or for integration purposes? There is some useful documentation here on getting large number of records from Solr. It would be helpful if you shared your query, what your data structure is, and what is the use case. That way the community can better understand the problem and provide potential solution.

aakulov · ‎04-29-2020

I just ran the following query through Hive and it worked as expected. select col1, col2, case when col1 = "Female" and col2 = "Yes" then "Data Received" end from table_name limit 100; Can you provide some steps to reproduce?

aakulov · ‎04-17-2020

Glad things are moving forward for you, Heri. Examining your sqoop command, I notice the following: --check-column EventTime tells sqoop to check this column as the timestamp column for select logic --incremental lastmodified tells sqoop that your source SQL table can have both records added to it AND records updated in it. Sqoop assumes that when a record is updated or added its EventTime is set to current timestamp. When you run this job for the first time, sqoop will pickup ALL records available (initial load). It will then print out a --last-value timestampX. This timestamp is the cutoff point for the next run of the job (i.e. next time you run the job with --exec incjob, it will set --last-value timestampX) So, to answer your question, it looks like sqoop is treating your job as an incremental load on the first run: [EventTime] < '2020-04-17 08:51:00.54'. When this job is kicked off again, it should pickup records from where it left off automatically. If you want, you can provide a manual --last-value timestamp for the initial load, but make sure you don't use it on subsequent incremental loads. For more details, please review sections 7.2.7 and 11.4 of Sqoop Documentation If this is helpful, don't forget to give kudos and accept the solution. Thank you!

aakulov · ‎04-15-2020

You will need to get both the year and the month of the original date (because of the leap year considerations). In terms of how to accomplish this in NiFi one way is to use ExecuteScript processor and Python's calendar library with the function calendar.monthrange(year, month) that returns the last day of month as the second argument.

aakulov · ‎04-15-2020

You can try the ReplaceText NiFi processor withe the approached described here. That will be a clean way of doing what you want without much scripting.

Online	Offline
Last Visited	‎09-05-2024 02:11 AM

Member Since	‎02-27-2020 04:13 PM
Last Visited	‎09-05-2024 02:11 AM
Posts	173
Kudos received	42

Cloudera Community

Re: Changing Colours or adding a banner to WebUIs

Re: CDP Public Cloud - Resizing of Worker/Compute ...

Re: How to collect queries submitted by other user...

Re: CDH配置好以后，agent服务能够启动，但是server服务无法启动 (After CDH...

Re: How to increase timeout definition?

Re: Cannot create volume in Ozone

Re: NiFi: how to fulfill flow with massive of date...

Re: Drop table not working as expected in Hive

Re: [HBase][Get] Row retrievable using shell but n...

Re: Creating a partitioned table in hive with sqoo...

Re: How to limit the number of records fetched by ...

Re: Hive-Hive case statement with And condition

Re: Could not get current time from database

Re: Get the last day of the month

Re: NiFi: Convert a proprietary ASCII based format...