Reply
Highlighted
Explorer
Posts: 9
Registered: ‎09-10-2013

Need Suggestion regarding Mahout

Hi,

 

I am trying to cluster few .csv files data and each file size is around 50 MB. First i tired to create a sequence file using following command:

 

mahout seqdirectory -c UTF-8 -i hdfs://-------:8020//AD/ -o hdfs://------:8020//AD/seq

 

Following are the error logs:

 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.<init>(String.java:215)
at java.lang.StringBuilder.toString(StringBuilder.java:430)
at org.apache.mahout.text.PrefixAdditionFilter.process(PrefixAdditionFilter.java:67)
at org.apache.mahout.text.SequenceFilesFromDirectoryFilter.accept(SequenceFilesFromDirectoryFilter.java:90)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1468)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1502)
at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:98)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:53)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:196)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208

 

Correct me , if i am wrong, I think error is related to MAHOUT_HEAPSZIE. If error is related to MAHOUT_HEAPSIZE, please let me know how to set. I tired to set using /bin/mahout file, but echo $MAHOUT_HEAPSIZE shows blank.

Here's the file:

 

JAVA=$JAVA_HOME/bin/java
JAVA_HEAP_MAX=-Xmx3g

# check envvars which might override default args
if [ "$MAHOUT_HEAPSIZE" != "" ]; then
#echo "run with heapsize $MAHOUT_HEAPSIZE"
JAVA_HEAP_MAX="-Xmx""$MAHOUT_HEAPSIZE""m"
#echo $JAVA_HEAP_MAX
fi

 

Please let me know if any changes required in above file. Kindly provide suggestion regarding this issue, It would be really helpful for us.

 

Thanks

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Need Suggestion regarding Mahout

This is likely what you need to increase, yes.

 

You can set any environment variable in a bash-like shell with:

 

export FOO="value"

 

so you can try

 

export MAHOUT_HEAPSIZE="4096"

... then run the command again.

Explorer
Posts: 9
Registered: ‎09-10-2013

Re: Need Suggestion regarding Mahout

[ Edited ]

After setting also, same error:

 

JAVA_HEAP_MAX=-Xmx3g

 

MAHOUT_HEAPSIZE="4096"

 

Logs:

 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space