Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Need Suggestion regarding Mahout

Need Suggestion regarding Mahout

Explorer

Hi,

 

I am trying to cluster few .csv files data and each file size is around 50 MB. First i tired to create a sequence file using following command:

 

mahout seqdirectory -c UTF-8 -i hdfs://-------:8020//AD/ -o hdfs://------:8020//AD/seq

 

Following are the error logs:

 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.<init>(String.java:215)
at java.lang.StringBuilder.toString(StringBuilder.java:430)
at org.apache.mahout.text.PrefixAdditionFilter.process(PrefixAdditionFilter.java:67)
at org.apache.mahout.text.SequenceFilesFromDirectoryFilter.accept(SequenceFilesFromDirectoryFilter.java:90)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1468)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1502)
at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:98)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:53)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:196)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208

 

Correct me , if i am wrong, I think error is related to MAHOUT_HEAPSZIE. If error is related to MAHOUT_HEAPSIZE, please let me know how to set. I tired to set using /bin/mahout file, but echo $MAHOUT_HEAPSIZE shows blank.

Here's the file:

 

JAVA=$JAVA_HOME/bin/java
JAVA_HEAP_MAX=-Xmx3g

# check envvars which might override default args
if [ "$MAHOUT_HEAPSIZE" != "" ]; then
#echo "run with heapsize $MAHOUT_HEAPSIZE"
JAVA_HEAP_MAX="-Xmx""$MAHOUT_HEAPSIZE""m"
#echo $JAVA_HEAP_MAX
fi

 

Please let me know if any changes required in above file. Kindly provide suggestion regarding this issue, It would be really helpful for us.

 

Thanks

2 REPLIES 2

Re: Need Suggestion regarding Mahout

Master Collaborator

This is likely what you need to increase, yes.

 

You can set any environment variable in a bash-like shell with:

 

export FOO="value"

 

so you can try

 

export MAHOUT_HEAPSIZE="4096"

... then run the command again.

Highlighted

Re: Need Suggestion regarding Mahout

Explorer

After setting also, same error:

 

JAVA_HEAP_MAX=-Xmx3g

 

MAHOUT_HEAPSIZE="4096"

 

Logs:

 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space