Cloudera Labs
Provide feedback on Cloudera Labs
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

RxSpark Execution on Cloudera

Solved Go to solution

RxSpark Execution on Cloudera

Contributor

Hi I am trying to execute the below code with "Revo64-9.0 -f testrxspark.R" command

 

  • list.files(system.file("SampleData", package = "RevoScaleR"))
  • myHadoopCluster <- RxSpark(namenode="zzz.westeurope.cloudapp.azure.com", port=8020,consoleOutput=TRUE)
  • rxSetComputeContext(myHadoopCluster)
  • file.exists(system.file("SampleData/AirlineDemoSmall.csv", package="RevoScaleR"))
  • bigDataDirRoot <- "/user/RevoShare" # HDFS location of the example data
  • rxHadoopListFiles(bigDataDirRoot) # There will be no files at this point.
  • source <-system.file("SampleData/AirlineDemoSmall.csv", package="RevoScaleR")
  • inputDir <- file.path(bigDataDirRoot,"AirlineDemoSmall")
  • rxHadoopMakeDir(inputDir) rxHadoopListFiles(bigDataDirRoot)
  • rxHadoopCopyFromLocal(source, inputDir)
  • rxHadoopListFiles(inputDir)
  • hdfsFS <- RxHdfsFileSystem(hostName="zzz.westeurope.cloudapp.azure.com", port=8020)
  • colInfo <- list(DayOfWeek = list(type = "factor", levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")))
  • airDS <- RxTextData(file = inputDir, missingValueString = "M", colInfo = colInfo, fileSystem = hdfsFS)
  • rxSummary(~ArrDelay:DayOfWeek, data = airDS)

I got following error: 17/03/19 00:00:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where appli$

 

Warning: libjvm.so not found in /log/cloudera/parcels/MRS-9.0.1/hadoop, searching system-wide

 

Internal Error: Cannot reset hdfs internal params while connected to an hdfs file system. Error in try({ : Internal

 

Error: Cannot reset hdfs internal params while connected to an hdfs file system. Error: Error in try({ : Internal

Error: Cannot reset hdfs internal params while connected to an hdfs file system.

 

The ScaleR library scaleR-hadoop-0.1-SNAPSHOT.jar is copied to Cloudera Parcel hadoop lib folders on all nodes.

 

Please help me fix this at the earliest

1 ACCEPTED SOLUTION

Accepted Solutions

Re: RxSpark Execution on Cloudera

Contributor

MRS uses a directory called /var/RevoShare for its execution. Once it executes, it creates folders by the user name who executes along with its data inside this folder. Somehow it is not deleting older files. I have manually deleted the files & then it started working.

1 REPLY 1

Re: RxSpark Execution on Cloudera

Contributor

MRS uses a directory called /var/RevoShare for its execution. Once it executes, it creates folders by the user name who executes along with its data inside this folder. Somehow it is not deleting older files. I have manually deleted the files & then it started working.