Member since
02-11-2019
81
Posts
3
Kudos Received
0
Solutions
10-17-2019
01:26 AM
Hi @ChineduLB, UDFs let you code your own application logic for processing column values during an Impala query. Adding a refresh/invalidate to it could cause unexpected behavior during value processing. A general recommendation for Invalidate metadata/Refresh is to execute it after the ingestion finished. This way the Impala user does not have to worry about the staleness of the metadata. There is a blogpost on how to handle "Fast Data" and make it available to Impala in batches: https://blog.cloudera.com/how-to-ingest-and-query-fast-data-with-impala-without-kudu/ Additionally, just wanted to mention that the Invalidate metadata/Refresh can be executed from beeline as well, just need to connect from beeline to Impala, this blogpost has the details: https://www.ericlin.me/2017/04/how-to-use-beeline-to-connect-to-impala/
... View more
09-16-2019
03:14 PM
@ChineduLB No you can't, you can only save data into temp tables, or simply use sub-query instead. Cheers Eric
... View more
08-28-2019
12:44 AM
@ChineduLB If you go to CM > Sentry > Configuration > search for "database", you should be able to see those database options, the one you need is "Sentry Server Database Password". Plus, you also need to make sure that the username and password you used here can connect to Sentry database. Cheers Eric
... View more
07-15-2019
03:12 AM
1 Kudo
Hi, I do not think there is any different. Spark lazily executes statements, so you second 2 jobs version will behave the same way as the first single job, in my opinion. Cheers Eric
... View more
05-15-2019
01:24 AM
This confirmes the package is installed correctly, and the JDK is installed to /usr/lib/jvm/java-8-oracle-cloudera/ You may want to use this as JAVA_HOME when configuring CM and the cluster to use this JDK
... View more
04-12-2019
07:32 PM
1 Kudo
Hi, I would suggest to use INT rather than STRING. Firstly, searching based INT type is faster, and secondly, like you said, you can do numeric comparison, which will be different from the STRING type comparison.
... View more
03-06-2019
11:42 PM
1 Kudo
MapReduce jobs can be submitted with ease, as all they mostly require is the correct config on the classpath (such as under src/main/resources for Maven projects). Spark/PySpark greatly relies on its script tooling to submit to a remote cluster so it is a little more involved to achieve this. IntelliJ IDEA has a remote execution option in its run targets that can be configured to copy over the build jar and invoke any arbitrary command on an edge host. This can be combined with remote debugging perhaps to get equal experience as MR. Another option is to use a web interface based editor such as CDSW.
... View more
- « Previous
-
- 1
- 2
- Next »