Member since
01-07-2019
220
Posts
23
Kudos Received
30
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5004 | 08-19-2021 05:45 AM | |
1808 | 08-04-2021 05:59 AM | |
872 | 07-22-2021 08:09 AM | |
3670 | 07-22-2021 08:01 AM | |
3380 | 07-22-2021 07:32 AM |
05-06-2019
04:47 AM
This seems relevant: In Python 2, unicode objects can only be printed if they can be converted to ASCII. If it can't be encoded in ASCII, you'll get that error. You probably want to explicitly encode it and then print the resulting str: print post.text.encode('utf-8')
... View more
05-06-2019
04:42 AM
Since the question was asked, the situation has changed. As soon as Hortonworks and Cloudera merged, NiFi became supported by Cloudera. Shortly after the integrations with CDH were also completed, so that NiFi is now a fully supported and integrated component. Hence the question already contains the answer: Please look into NiFi for solving this usecase.
... View more
04-10-2019
04:15 AM
1 Kudo
Assuming you want to access the data via spark, then the main question is how it should be stored. For this Drill is not supported, but Hive tables and Kudu are supported by Cloudera. Now it boils down to whether you want to store the data in Hive or in Kudu, as Spark can work with both of these. If you want to insert your data record by record, or want to do interactive queries in Impala then Kudu is likely the best choice. If you want to insert and process your data in bulk, then Hive tables are usually the nice fit.
... View more
04-10-2019
02:34 AM
Though I don't know how it works exactly under the hood, I can confirm that it will work on the source DB side. (As it will definitely NOT simply pull everything from the DB, and then chop it up before writing to Hadoop.) If you are looking for the optimum, you are likely going to need some trial and error. However, as a starting point I understand that the default value is 1000, and that you may want to try 10000 as a first step towards better performance.
... View more
04-09-2019
07:41 AM
1 Kudo
There is something very unusual happening here. Based on your outputs, values are not only ending up in the wrong columns, but you are even getting different values! In the 'correct' record, you have 5686.76, and in the 'wrong' record you have -5686.76. My first guess was that there is a mistake in how you send data to the appropriate columns, but I don't see how that can explain a minus sign changing position. To troubleshoot something like this, it is really important to dig into the details. I would therefore recommend you to bring your question down to a 'Minimal reproducible example'. Eliminating any complexity that is not causing unexpected results. For example: You show a load command to get data into spark, consider replacing it with an actual string (and make sure to check whether the string allows you to reproduce the problem). You also show 2 writes, but if we have the exact input and code to reproduce the problem the correct answer is probably not relevant. Also, you use some code to list columns, consider hardcoding it first. As mentioned, really try to take out all complexity untill we land on a minimal amount that still reproduces the problem. Hopefully you will already see the answer once you have eliminated all the distractions, and if not you will have a fully trimmed down version, which you can use to update your question here!
... View more
04-09-2019
06:53 AM
1 Kudo
This question is a bit broad, and simultaneously quite dependent on your exact situation. I therefore recommend you to contact your cloudera contact person for a more in-depth answer. However, what I can say is the following: Regarding your second question there is a nice answer here: https://community.cloudera.com/t5/Data-Ingestion-Integration/Flume-without-agents-on-web-server/m-p/40776/highlight/true#M1550 In short, you will want 'something' to push the data off the webserver, (for instance a flume, or a MiNiFy agent) assuming your webserver does not already publish the mesages to a bus like Kafka. In general the solution that you use for moving data from the webserver to the cluster should also work in the opposite direction.
... View more
01-17-2019
04:52 PM
If you want to be sure whether one of the two components is updating the password, consider checking the md5sum before and after you change it (and make sure to change it to a different value).
... View more
01-15-2019
02:30 PM
Listfile has the option to define a File Filter and a Path Filter. From your explanation I would expect that you need to define the Path Filter. In the Path Filter you can place a regular expression (in Java syntax). For regex there are a lot of resources available, I think something like this should do the trick: (?!(B|C)).*
... View more
01-15-2019
11:24 AM
First of all doublecheck all configurations (incl. password). Just to avoid moving in the right direction. Secondly confirm that you do not need TLS enabled. If these don't help, the following might help with troubleshooting: 1. Become nifi on the node where nifi is running 2. Send the message via Python 3. Share the python command here Note: Please explicity specify all things that you configure in nify when executing python (even if they are not needed because of good defaults for instance).
... View more
01-15-2019
10:44 AM
@Louis Allen Just to confirm: Is the server where SQL runs kerberized, and have you set up a relaton of trust?
... View more
- « Previous
- Next »