Member since
02-27-2020
173
Posts
42
Kudos Received
48
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2399 | 11-29-2023 01:16 PM | |
| 2941 | 10-27-2023 04:29 PM | |
| 2415 | 07-07-2023 10:20 AM | |
| 4910 | 03-21-2023 08:35 AM | |
| 1670 | 01-25-2023 08:50 PM |
05-26-2020
11:59 AM
Ok, so to be able to purge the table created by sqoop, because it is external, you'll need to add to your stanza: --hcatalog-storage-stanza 'stored as parquet TBLPROPERTIES("external.table.purge"="true")' Then when you load the data for the first time it will enable purging on that table. Executing purge command you have will then remove both metadata and the dat in the external table. Let me know if that works and if the solution is acceptable.
... View more
05-22-2020
12:51 PM
1 Kudo
Sqoop can only insert into a single Hive partition at one time. To accomplish what you are trying to do, you can have two separate sqoop commands: sqoop with --query ... where year(EventTime)=2019 (remove year(EventTime)=2020) and set --hive-partition-value 2019 (not 2020) sqoop with --query ... where year(EventTime)=2020 (remove year(EventTime)=2019) and set --hive-partition-value 2020 (not 2019) This way sqoop will write into the one partition you want. Since this is one-time import, the solution should work just fine. Let me know if this works and accept the answer if it makes sense.
... View more
05-22-2020
12:09 PM
Hi Heri, After you execute drop table test purge; can you check that the data is actually deleted? Do a query on the table first, but also check with hdfs dfs to see if the underlying files have been deleted from Hadoop (they should be). Let me know what you see. You may be right that EXTERNAL table data does not get deleted, just the metadata is deleted. That's why I'm asking you to check for data with hdfs dfs. Now, to be able to drop the EXTERNAL table (both metadata and data) you'd need to follow the steps here: https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/using-hiveql/content/hive_drop_external_table_data.html Hope that helps.
... View more
05-22-2020
11:28 AM
Some additional information you could provide to help the community answer the question: Are there any errors that Java returns when querying HBase or does it just silently not show any rows? Is the same user executing both tasks (through shell and java)? Can any other rows be retrieved from Java?
... View more
04-17-2020
07:58 AM
1 Kudo
Glad things are moving forward for you, Heri. Examining your sqoop command, I notice the following: --check-column EventTime tells sqoop to check this column as the timestamp column for select logic --incremental lastmodified tells sqoop that your source SQL table can have both records added to it AND records updated in it. Sqoop assumes that when a record is updated or added its EventTime is set to current timestamp. When you run this job for the first time, sqoop will pickup ALL records available (initial load). It will then print out a --last-value timestampX. This timestamp is the cutoff point for the next run of the job (i.e. next time you run the job with --exec incjob, it will set --last-value timestampX) So, to answer your question, it looks like sqoop is treating your job as an incremental load on the first run: [EventTime] < '2020-04-17 08:51:00.54'. When this job is kicked off again, it should pickup records from where it left off automatically. If you want, you can provide a manual --last-value timestamp for the initial load, but make sure you don't use it on subsequent incremental loads. For more details, please review sections 7.2.7 and 11.4 of Sqoop Documentation If this is helpful, don't forget to give kudos and accept the solution. Thank you!
... View more
04-15-2020
12:55 PM
2 Kudos
You will need to get both the year and the month of the original date (because of the leap year considerations). In terms of how to accomplish this in NiFi one way is to use ExecuteScript processor and Python's calendar library with the function calendar.monthrange(year, month) that returns the last day of month as the second argument.
... View more
04-15-2020
08:16 AM
1 Kudo
You can try the ReplaceText NiFi processor withe the approached described here. That will be a clean way of doing what you want without much scripting.
... View more
04-15-2020
08:10 AM
1 Kudo
This is the offending code: new = year+month+day+hour
# Considering date is in mm/dd/yyyy format
#converting the appendd list to strings instead of ints
b=[str(x) for x in new]
#joining all the data without adding
b = '/'.join(b)
#convert to unix
dt_object2 = datetime.strptime(b, "%Y/%m/%d/%H") It looks like at some point the values of year, month, day, hour are set to strings "Year", "month", "DOY", "Hour". Then when new = year+month+day+hour is called, the strings get concatenated into "YearmonthDOYHour". You then split and join that string so that's why you see a '/' character between each character in the Python error message. I'll leave it you to debug this, as I've lost track of all the code changes at this point. Also note that the incoming data may be providing you with Day of Year (DOY) instead of day of month, which is what %d. You may need to use %j to parse that out with zero padding (see documentation here). If this is helpful, don't forget to give kudos or accept solution.
... View more
04-15-2020
07:37 AM
1 Kudo
Once you call IOUtils.toString, you get the text variable containing your message(s). Then it is appropriate to call json.loads on that text variable, as that is the function that will convert text json structure to a callable python object. Should be something like this: text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
json_data = json.loads(text) After this you should be able to access the elements of the json with: json_data['Year'] Let me know if that works.
... View more
04-14-2020
04:32 PM
1 Kudo
Python is complaining about this line, most likely. #json_data = json.dumps(text)
json_data = json.loads(json_data) Why is the initial assignment commented out? Without it you have a circular assignment for json_data, and Python doesn't know what to do.
... View more
- « Previous
- Next »