Member since
12-07-2015
83
Posts
23
Kudos Received
10
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2915 | 07-11-2018 02:42 PM | |
7859 | 12-10-2017 08:26 PM | |
2200 | 11-14-2017 12:17 PM | |
16148 | 03-29-2017 06:42 AM | |
2141 | 02-22-2017 01:43 PM |
07-24-2019
11:26 AM
5 Kudos
This is a known issue and has been fixed in CM 6.2. Here is the relevant item in the release notes. Cheers, Lars
... View more
07-11-2018
02:42 PM
Yes, creating two clusters is what you could try. I'm no expert in setting this up and unfortunately I also don't have good advice on which tooling to use. distcp certainly could be worth a try. Within a country your experience will depend on where your machines are, and you'll likely also be affected by reduced bandwidth between data centers. I'm not sure about other services' behavior when running across racks. Impala is not (yet) rack-aware in its scheduling and exchanges. However, even once we get to adding support for rack-awareness, we might assume that the racks are within a single data-center.
... View more
07-11-2018
08:52 AM
This sounds like a result of the drastically increased link latency between your two "racks". While within a single rack you should see latencies less than a millisecond, US-EU latencies will be around 150ms, depending on where in the US and EU your machines are located. Bandwidth between your locations is likely also much lower than between the racks. Impala currently does not do any rack-aware scheduling of I/O and data exchanges. In addition it is not optimized for high variance in link latencies and throughput. HDFS itself to my knowledge also makes no optimizations for such a case. Frankly, I don't think you will see good performance in such a scenario. If you want to increase data availability, you could explore replicating the data between your locations while running queries in only one at a time. If you want to increase service availability, you can look into using a load balancer and switching from one cluster to the other in case of failure.
... View more
04-27-2018
11:34 AM
Thank you Chris for providing more information. It looks like it crashed in the code that writes Parquet files (HdfsParquetTableWriter::ColumnWriter<impala::StringValue>::ProcessValue). However, your query should not write any data: "SELECT a.topLevelField, b.priceFromNestedField FROM db.table a LEFT JOIN a.nestedField b" I also noticed that the stack looks like it has been overwritten by something. I don't recall any recent issues in that method and will have a look at the code to see if I can spot anything obvious. In the meantime, can you double check that this query caused the crash and no other query was running? Thanks, Lars
... View more
04-26-2018
03:41 PM
Let's see what the hs_err_pid file contains next. Additionally, would you be willing to share the Minidump or a core dump with us in private? Please be aware that Minidumps contain process memory of each thread's stack, and core dumps contain all of the process's memory. Let me know if you'd like to do that and I'll share a private upload link with you. Alternatively you can follow these instructions to resolve the minidump yourself and share the contained stack traces: https://cwiki.apache.org/confluence/display/IMPALA/Debugging+Impala+Minidumps
... View more
04-26-2018
10:44 AM
Hi, Can you post the ends of the INFO and ERROR logs? Can you also post the content of the hs_err_pid<pid>.log file? Thanks, Lars
... View more
03-12-2018
12:18 PM
1 Kudo
For Python it makes a difference whether output gets printed to the terminal (which in this case likely supports unicode) or output is redirected to a file (which means it needs to be encoded in ASCII). This post on StackOverflow seems to describe the issue well. I linked the post in the JIRA for future reference. Cheers, Lars
... View more
03-09-2018
02:40 PM
Hi GeKas, I'm not sure I understood your question. In general, writing to stdout should respect the local language settings of your shell: $ echo $LANG
en_US.UTF-8 Writing to a file however does not need to respect these, so it's behavior may be different.
... View more
03-08-2018
10:48 AM
This looks like IMPALA-2717 to me. The Jira has a patch attached to it, but no-one ever seems to have pushed a code review for this. Unfortunately there's no targeted release for this issue. Contributions are always welcome, let me know if you want to give it a shot. Cheers, Lars
... View more
12-10-2017
08:26 PM
Hi Davood, Impala needs a column type for column3 and NULL does not allow the planner to infer the type. Using a cast to specify the type will work: create table v as select i, cast(null as int) as j from t; Cheers, Lars
... View more