Reply
Highlighted
Contributor
Posts: 30
Registered: ‎03-02-2017
Accepted Solution

RxSpark Execution on Cloudera

Hi I am trying to execute the below code with "Revo64-9.0 -f testrxspark.R" command

 

  • list.files(system.file("SampleData", package = "RevoScaleR"))
  • myHadoopCluster <- RxSpark(namenode="zzz.westeurope.cloudapp.azure.com", port=8020,consoleOutput=TRUE)
  • rxSetComputeContext(myHadoopCluster)
  • file.exists(system.file("SampleData/AirlineDemoSmall.csv", package="RevoScaleR"))
  • bigDataDirRoot <- "/user/RevoShare" # HDFS location of the example data
  • rxHadoopListFiles(bigDataDirRoot) # There will be no files at this point.
  • source <-system.file("SampleData/AirlineDemoSmall.csv", package="RevoScaleR")
  • inputDir <- file.path(bigDataDirRoot,"AirlineDemoSmall")
  • rxHadoopMakeDir(inputDir) rxHadoopListFiles(bigDataDirRoot)
  • rxHadoopCopyFromLocal(source, inputDir)
  • rxHadoopListFiles(inputDir)
  • hdfsFS <- RxHdfsFileSystem(hostName="zzz.westeurope.cloudapp.azure.com", port=8020)
  • colInfo <- list(DayOfWeek = list(type = "factor", levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")))
  • airDS <- RxTextData(file = inputDir, missingValueString = "M", colInfo = colInfo, fileSystem = hdfsFS)
  • rxSummary(~ArrDelay:DayOfWeek, data = airDS)

I got following error: 17/03/19 00:00:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where appli$

 

Warning: libjvm.so not found in /log/cloudera/parcels/MRS-9.0.1/hadoop, searching system-wide

 

Internal Error: Cannot reset hdfs internal params while connected to an hdfs file system. Error in try({ : Internal

 

Error: Cannot reset hdfs internal params while connected to an hdfs file system. Error: Error in try({ : Internal

Error: Cannot reset hdfs internal params while connected to an hdfs file system.

 

The ScaleR library scaleR-hadoop-0.1-SNAPSHOT.jar is copied to Cloudera Parcel hadoop lib folders on all nodes.

 

Please help me fix this at the earliest

Contributor
Posts: 30
Registered: ‎03-02-2017

Re: RxSpark Execution on Cloudera

MRS uses a directory called /var/RevoShare for its execution. Once it executes, it creates folders by the user name who executes along with its data inside this folder. Somehow it is not deleting older files. I have manually deleted the files & then it started working.

Announcements

Currently incubating in Cloudera Labs:

Envelope
HTrace
Ibis
Impyla
Livy
Oryx
Phoenix
Spark Runner for Beam SDK
Time Series for Spark
YCSB