Member since
02-22-2016
19
Posts
0
Kudos Received
0
Solutions
06-04-2020
08:28 AM
Probably worth pointing out that the behaviour of insertInto & saveAsTable can differ under certain conditions: https://towardsdatascience.com/understanding-the-spark-insertinto-function-1870175c3ee9 https://stackoverflow.com/questions/47844808/what-are-the-differences-between-saveastable-and-insertinto-in-different-savemod
... View more
11-23-2016
02:07 PM
Any chance we might one day have the ability to override DEFAULT_SELECTIVITY via a query hint?
... View more
11-23-2016
02:05 PM
Nice, thanks Tim. Appreciate the reply and, even more so, the link to the source.
... View more
11-23-2016
07:53 AM
I've got a query running that scans a table that has 17.7billion rows in it. I have some (non-partition-pruning) filters on that table. If I look at the query plan the cardinality estimate of the scan is 1.7billion rows, exactly one tenth of the total number of rows in the table. I'm simply intrigued as to why Impala estimates cardinality of the query to be exactly one tenth. What heuristics does Impala use (if any) to determine this? Or is this simply an arbitrary rule of "if there are filters on the table then make an assumption that one tenth of the data is returned". Just interested to know that's all. Regards Jamie P.S. here is the section of the explain plan pertinent to the scan: 00:SCAN HDFS [tuk_sseft.cu0pr0cafw_cu0pr1cafw_cu0cafw_pr0cafw_current, RANDOM]
partitions=1/1 files=5184 size=1.21TB
predicates: tuk_sseft.cu0pr0cafw_cu0pr1cafw_cu0cafw_pr0cafw_current.cu0cafw_bsk_custperpurch_52w_cnt > 0, (tuk_sseft.cu0pr0cafw_cu0pr1cafw_cu0cafw_pr0cafw_current.cu0pr0cafw_bsk_custprodperpurch_56w_cnt > 0 OR tuk_sseft.cu0pr0cafw_cu0pr1cafw_cu0cafw_pr0cafw_current.cu0pr0cafw_bsk_custprodperpurch_101w107w_cnt > 0 OR tuk_sseft.cu0pr0cafw_cu0pr1cafw_cu0cafw_pr0cafw_current.cu0pr0cafw_bsk_custprodperpurch_153w159w_cnt > 0 OR tuk_sseft.cu0pr0cafw_cu0pr1cafw_cu0cafw_pr0cafw_current.cu0pr0cafw_bsk_custprodperpurch_205w211w_cnt > 0)
runtime filters: RF000 -> Cu0Pr0Cafw_product
table stats: 17738533540 rows total
column stats: all
hosts=18 per-host-mem=5.24GB
tuple-ids=0 row-size=612B cardinality=1773853354
... View more
Labels:
- Labels:
-
Apache Impala
06-20-2016
06:37 AM
@jbapple wrote: You can follow the bug I linked to above for any future updates. Right nowI have no news for you. DOH! Didn't even see that link. Must pay more attention, my apologies.
... View more
06-20-2016
05:49 AM
Hi jbapple, I'm a colleague of Mike that originally submitted this. we're still hitting the issue (Impala v2.3.0-cdh5.5.1), any update on whether or not this might be fixed? regards Jamie
... View more
03-15-2016
01:22 AM
cool, thanks Robert
... View more
03-13-2016
04:00 AM
Hi, We have an existing solution that uses a 3rd party workflow manager and which we are now looking to replace with Oozie. Our solution comprises lots of Python scripts that ultimately use impyla (I.e. Cloudera's Python client for Impala) to issue SQL statements to Impala. We have followed (what we believe to be) the recommended Cloudera architecture of using an edge (aka gateway) node and all of our code (Python scripts etc...) are deployed to that edge node. Here's the problem that I think we have. According to http://blog.cloudera.com/blog/2013/03/how-to-use-oozie-shell-and-java-actions/ Oozie can only execute scripts that are stored on HDFS however as I've just explained our scripts are deployed to the file system of the edge node. Can anyone recommend a course of action that would enable us to use Oozie to execute our scripts? TIA JT
... View more
Labels:
- Labels:
-
Apache Oozie
02-22-2016
06:36 AM
Hi folks, I've installed an instance of CDH on Azure using the provided Azure Resource Manager template (https://raw.githubusercontent.com/azure/azure-quickstart-templates/master/cloudera-on-centos/azuredeploy.json). Everything seems to be working OK so far, the first problem i've hit is that I can't login to Hue. The comment above says "By default the first user that logs into Hue becomes the first admin user" but how do i login? What are the default credentials? I have no idea. Anyone know (or know how i can find out)? Thanks in advance Jamie
... View more