Created on 03-05-2018 08:26 AM - edited 09-16-2022 05:56 AM
I have seen this error reported both in the context of Cloudera ecosystem and in the general python context. But the fact that this error is popping up even when I am not doing any string comparison seems a little strange; I thought that basic output of unicode was supported.
This error is coming from a basic select query:
Query: select my_column from my_table limit 1000000 Unknown Exception : 'ascii' codec can't encode character u'\xe8' in position 20483: ordinal not in range(128)
Has this been addressed in later versions? I happen to be on a rather old version:
$ impala-shell -v Impala Shell v2.1.0-cdh4 (11a45f8) built on Thu Dec 18 07:45:47 PST 2014
Also, please note that this does not appear related to my terminal, as I can print that character otherwise:
$ python -c 'print(u"\xe8");' è
Created on 03-09-2018 07:14 AM - edited 03-09-2018 07:18 AM
Hi Lars,
this is exactly the issue, at least for CDH5.13.2.
I have applied the patch to /opt/cloudera/parcels/CDH/lib/impala-shell/lib/shell_output.py
map(self.prettytable.add_row, rows) - return self.prettytable.get_string() + return self.prettytable.get_string().encode('utf-8') except Exception, e:
and now the command runs as appropriate.
Something not clear to me, I understand this patch to affect command when output file is specified with "-o file", but I do not understand it when output is redirected with "> file". It fails even you want to parse output , e.g. "| grep ...."
Why is impala-shell affected when output is other than stdout?
To be honest, normally there is almost no reason to parse the results (or store) without "-B" for me. On the other hand, this is @epowell issue.
In my opinion, you should definitely add this patch, as everything should work with utf-8 encoding.
Created 03-06-2018 02:56 AM
This is supposed to be fixed since impala 1.4
https://issues.apache.org/jira/browse/IMPALA-489
You can check also https://issues.cloudera.org/browse/IMPALA-607 it may help you deal with it.
Are you really using cdh4?
Created 03-06-2018 09:01 AM
The fact that this was fixed in v1.4 is what is confusing to me, espcecially because I appear to be on version 2.1.
Also, I am able to run the command that was failing in that forum thread. That is also a bit confusing.
impala-shell -i 10.0.0.1 -d myDB -q "select 'Ѳ'" Starting Impala Shell without Kerberos authentication Connected to 10.0.0.1:21000 Server version: impalad version 2.1.0-cdh4 RELEASE (build 11a45f84eb1f0d441ebad72cf9d65262f6cc2391) Query: use `myDB` Query: select 'Ѳ' Query submitted at: 2018-03-06 16:52:51 (Coordinator: None) Query progress can be monitored at: None/query_plan?query_id=5040fadc08d0c83e:64b2514f9d270fb5 +-----+ | 'ѳ' | +-----+ | Ѳ | +-----+
Yes, I am really using CDH4. Do you think that I wouldn't be facing this issue on CDH5?
Thanks a bunch for your help!
Created 03-07-2018 02:41 AM
CDH4 is 4 years old (last updated was on 2014 I think).
So even if Impala supports it, HDFS and Hive are too old and you may encounter various issues.
If you have such option, then upgrade to CDH5.14.
Created 03-07-2018 09:10 AM
Thank you for your reply and your advice, @GeKas.
The issue has developed a bit more since my last reply, and has become even more puzzling. Maybe this is just an old bug that I'm tripping on.
It now appears that the issue only occurs when I write the results to a file AND do not include the -B parameter. I did not detect this pattern until now but it appears consistent with my prior accounts.
Here is everything working when output to the console:
$ impala-shell --print_header -i 10.0.0.1 -d myDB -q 'select phenolist from myTable where startpos = 225609789' Starting Impala Shell without Kerberos authentication Connected to 10.0.0.1:21000 Server version: impalad version 2.1.0-cdh4 RELEASE (build 11a45f84eb1f0d441ebad72cf9d65262f6cc2391) Query: use `myDB` Query: select phenolist from myTable where startpos = 225609789 +---------------------+ | phenolist | +---------------------+ | Pelger-Huët anomaly | +---------------------+ Fetched 1 row(s) in 0.35s
If I direct results to a file, I get the error:
$ impala-shell .... > tmp Unknown Exception : 'ascii' codec can't encode character u'\xeb' in position 83: ordinal not in range(128)
And finally, if I add the -B parameter, things work again:
$ impala-shell -B ... > tmp Fetched 1 row(s) in 0.35s
Created 03-08-2018 12:26 AM
Just quickly tested your query.
The same error appears on impalad version 2.10.0-cdh5.13.1 RELEASE (build 1e4b23c4eb52dac95c5be6316f49685c41783c51)
The same error appears if you use:
$ impala-shell .... -o tmp
which is the same as redirecting output to the file.
You can always use -B, or if you want headers as well, you can use:
$ impala-shell ..... --print_header -B -o tmp
you can also specify the delimiter with "--output_delimiter"
Created 03-08-2018 10:48 AM
This looks like IMPALA-2717 to me. The Jira has a patch attached to it, but no-one ever seems to have pushed a code review for this. Unfortunately there's no targeted release for this issue. Contributions are always welcome, let me know if you want to give it a shot.
Cheers, Lars
Created on 03-09-2018 07:14 AM - edited 03-09-2018 07:18 AM
Hi Lars,
this is exactly the issue, at least for CDH5.13.2.
I have applied the patch to /opt/cloudera/parcels/CDH/lib/impala-shell/lib/shell_output.py
map(self.prettytable.add_row, rows) - return self.prettytable.get_string() + return self.prettytable.get_string().encode('utf-8') except Exception, e:
and now the command runs as appropriate.
Something not clear to me, I understand this patch to affect command when output file is specified with "-o file", but I do not understand it when output is redirected with "> file". It fails even you want to parse output , e.g. "| grep ...."
Why is impala-shell affected when output is other than stdout?
To be honest, normally there is almost no reason to parse the results (or store) without "-B" for me. On the other hand, this is @epowell issue.
In my opinion, you should definitely add this patch, as everything should work with utf-8 encoding.
Created 03-09-2018 02:40 PM
Hi GeKas,
I'm not sure I understood your question. In general, writing to stdout should respect the local language settings of your shell:
$ echo $LANG en_US.UTF-8
Writing to a file however does not need to respect these, so it's behavior may be different.
Created 03-12-2018 02:25 AM
Hi Lars,
I totally agree with you on that. I was not clear, so let me rephrase it.
Without the fix, the below command runs successfully:
$ impala-shell -i localhost:21000 -q "select 'Ѳ'"
While, the below command, fails:
$ impala-shell -i localhost:21000 -q "select 'Ѳ'" -o tmp
Starting Impala Shell without Kerberos authentication
Connected to localhost:21000
Server version: impalad version 2.10.0-cdh5.13.2 RELEASE (build dc867db57915f55b697ef8cd9e00404a9385231a)
Query: select 'Ѳ'
Query submitted at: 2018-03-12 10:20:07 (Coordinator: http://localhost:25000)
Query progress can be monitored at: http://localhost:25000/query_plan?query_id=7c476a1defbdad29:de8ff88700000000
Unknown Exception : 'ascii' codec can't encode character u'\u0473' in position 11: ordinal not in range(128)
Could not execute command: select 'Ѳ'
But when I use the first command with redirect, fails also, which is not expected:
$ impala-shell -i localhost:21000 -q "select 'Ѳ'" > tmp Starting Impala Shell without Kerberos authentication Connected to localhost:21000 Server version: impalad version 2.10.0-cdh5.13.2 RELEASE (build dc867db57915f55b697ef8cd9e00404a9385231a) Query: select 'Ѳ' Query submitted at: 2018-03-12 10:22:12 (Coordinator: http://localhost:25000) Query progress can be monitored at: http://localhost:25000/query_plan?query_id=f74a6fde13e4760a:a323131d00000000 Unknown Exception : 'ascii' codec can't encode character u'\u0473' in position 11: ordinal not in range(128) Could not execute command: select 'Ѳ'
But, as I mentioned before, this is @epowell's post. I just expressed my concern.