Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Oozie launching "jobs" with ASCII defaultCharSet..not UTF8?

avatar
Explorer
We are running Oozie 4.0.0 (via CDH 5.3.2 with YARN), and we have a weird
thing going on.  When we run workflows, they appear to be changing the
default Character set..and not sure why.  When we run a simple Java App,
with the line below:
System.out.println(Charset.defaultCharset());
>From our test code, we did the simple above command, and we get:
2015-03-05 19:01:05,623 INFO [main] com.test.encoding.Test: US-ASCII
Just running a shell script with "locale" as the only thing also returns
the POSIX:

Oozie Launcher, capturing output data:
  =======================
  LANG=
  LC_CTYPE="POSIX"
  LC_NUMERIC="POSIX"
  LC_TIME="POSIX"
  LC_COLLATE="POSIX"
  LC_MONETARY="POSIX"
  LC_MESSAGES="POSIX"
  LC_PAPER="POSIX"
  LC_NAME="POSIX"
  LC_ADDRESS="POSIX"
  LC_TELEPHONE="POSIX"
  LC_MEASUREMENT="POSIX"
  LC_IDENTIFICATION="POSIX"
  LC_ALL=

even though all when running locale in a bash shell...the nodes have the
UTF-8:

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=


But...when we look at the various settings on the box (JVM, locale,
etc)...they all point to UTF-8.  In the ooze-env.sh we set:  setting
LC_ALL=en_US.UTF-8  setting LANG=en_US.UTF-8  setting LANGUAGE=en_US.UTF-8
just to make sure things get setup...but no success. Basically, we can't
figure out how to have Oozie do UTF-8, and not ASCII/POSIX.  We are backed
by a MySQL DB, with the default char set to UTF-8 as well.

Any thoughts/suggestion, places to read/look?
Thanks in advance!
Cheers,
Aaron
1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hi,

I am able to replicate this in my cluster.. But I tested in CDH 6.

 

Shell output:-

[root@host-10-17-102-176 hive]# locale
LANG=en_US.UTF-8
LC_CTYPE=UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"

LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

 

Oozie Launcher, capturing output data:
=======================
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

 

To fix this kindly make the below configuration change.

 

Access the CM and navigate to the Yarn Configuration > Containers Environment Variable (yarn.nodemanager.admin-env) --> And append these properties "LC_ALL=en_US.UTF-8,LANG=en_US.UTF-8" to this config. Restart the affected services to make the changes permanent.

 

Post this kindly re run the oozie job and check the output. In my cluster it shows like this post making the change.

Oozie Launcher, capturing output data:
=======================
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

 

Nitish

View solution in original post

4 REPLIES 4

avatar
New Contributor

have you found a solution to this?

avatar
Expert Contributor

Hi,

 

What's the CDH version you are using currently on which you are seeing this issue?

Can you share the workflow.xml and the script that you are running?

Also kindly share the oozie launcher logs.

 

Regards

Nitish

avatar
Expert Contributor

Hi,

I am able to replicate this in my cluster.. But I tested in CDH 6.

 

Shell output:-

[root@host-10-17-102-176 hive]# locale
LANG=en_US.UTF-8
LC_CTYPE=UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"

LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

 

Oozie Launcher, capturing output data:
=======================
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

 

To fix this kindly make the below configuration change.

 

Access the CM and navigate to the Yarn Configuration > Containers Environment Variable (yarn.nodemanager.admin-env) --> And append these properties "LC_ALL=en_US.UTF-8,LANG=en_US.UTF-8" to this config. Restart the affected services to make the changes permanent.

 

Post this kindly re run the oozie job and check the output. In my cluster it shows like this post making the change.

Oozie Launcher, capturing output data:
=======================
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

 

Nitish

avatar
New Contributor

issue resolved with your solution, thanks

 

CDH 6.3.3