About Lars Volker

Lars Volker · ‎07-24-2019

This is a known issue and has been fixed in CM 6.2. Here is the relevant item in the release notes. Cheers, Lars

Lars Volker · ‎07-11-2018

Yes, creating two clusters is what you could try. I'm no expert in setting this up and unfortunately I also don't have good advice on which tooling to use. distcp certainly could be worth a try. Within a country your experience will depend on where your machines are, and you'll likely also be affected by reduced bandwidth between data centers. I'm not sure about other services' behavior when running across racks. Impala is not (yet) rack-aware in its scheduling and exchanges. However, even once we get to adding support for rack-awareness, we might assume that the racks are within a single data-center.

Lars Volker · ‎07-11-2018

This sounds like a result of the drastically increased link latency between your two "racks". While within a single rack you should see latencies less than a millisecond, US-EU latencies will be around 150ms, depending on where in the US and EU your machines are located. Bandwidth between your locations is likely also much lower than between the racks. Impala currently does not do any rack-aware scheduling of I/O and data exchanges. In addition it is not optimized for high variance in link latencies and throughput. HDFS itself to my knowledge also makes no optimizations for such a case. Frankly, I don't think you will see good performance in such a scenario. If you want to increase data availability, you could explore replicating the data between your locations while running queries in only one at a time. If you want to increase service availability, you can look into using a load balancer and switching from one cluster to the other in case of failure.

Lars Volker · ‎04-27-2018

Thank you Chris for providing more information. It looks like it crashed in the code that writes Parquet files (HdfsParquetTableWriter::ColumnWriter<impala::StringValue>::ProcessValue). However, your query should not write any data: "SELECT a.topLevelField, b.priceFromNestedField FROM db.table a LEFT JOIN a.nestedField b" I also noticed that the stack looks like it has been overwritten by something. I don't recall any recent issues in that method and will have a look at the code to see if I can spot anything obvious. In the meantime, can you double check that this query caused the crash and no other query was running? Thanks, Lars

Lars Volker · ‎04-26-2018

Let's see what the hs_err_pid file contains next. Additionally, would you be willing to share the Minidump or a core dump with us in private? Please be aware that Minidumps contain process memory of each thread's stack, and core dumps contain all of the process's memory. Let me know if you'd like to do that and I'll share a private upload link with you. Alternatively you can follow these instructions to resolve the minidump yourself and share the contained stack traces: https://cwiki.apache.org/confluence/display/IMPALA/Debugging+Impala+Minidumps

Lars Volker · ‎04-26-2018

Hi, Can you post the ends of the INFO and ERROR logs? Can you also post the content of the hs_err_pid<pid>.log file? Thanks, Lars

Lars Volker · ‎03-12-2018

For Python it makes a difference whether output gets printed to the terminal (which in this case likely supports unicode) or output is redirected to a file (which means it needs to be encoded in ASCII). This post on StackOverflow seems to describe the issue well. I linked the post in the JIRA for future reference. Cheers, Lars

Lars Volker · ‎03-09-2018

Hi GeKas, I'm not sure I understood your question. In general, writing to stdout should respect the local language settings of your shell: $ echo $LANG en_US.UTF-8 Writing to a file however does not need to respect these, so it's behavior may be different.

Lars Volker · ‎03-08-2018

This looks like IMPALA-2717 to me. The Jira has a patch attached to it, but no-one ever seems to have pushed a code review for this. Unfortunately there's no targeted release for this issue. Contributions are always welcome, let me know if you want to give it a shot. Cheers, Lars

Lars Volker · ‎12-10-2017

Hi Davood, Impala needs a column type for column3 and NULL does not allow the planner to infer the type. Using a cast to specify the type will work: create table v as select i, cast(null as int) as j from t; Cheers, Lars

Online	Offline
Last Visited	‎09-24-2019 05:29 PM

Member Since	‎12-07-2015 10:33 AM
Last Visited	‎09-24-2019 05:29 PM
Posts	83
Kudos received	23

Cloudera Community

Re: When I add a new rack some Impala queries beca...

Re: Create table as select issue Unsupported type...

Re: How to gracefully stop an impalad?

Re: Need help with Impala 2.8 on CDH 5.10 upgrade

Re: Inverse of function bin

Re: Impalad COORDINATOR_ONLY with HDFS Short-Circu...

Re: When I add a new rack some Impala queries beca...

Re: When I add a new rack some Impala queries beca...

Re: Impala critical bug in CDH 5.14.0

Re: Impala critical bug in CDH 5.14.0

Re: Impala critical bug in CDH 5.14.0

Re: 'ascii' codec can't encode character u'\xe8' i...

Re: 'ascii' codec can't encode character u'\xe8' i...

Re: 'ascii' codec can't encode character u'\xe8' i...

Re: Create table as select issue Unsupported type...