Member since
03-19-2017
4
Posts
0
Kudos Received
0
Solutions
04-25-2018
09:21 PM
@sanket patel Hello Sanket, It's good to know that you can successfully upgrade Phoenix in HDP. And glad to hear that Hortonworks has fixed at least some bugs in Phoenix 4,7. However, we are still encountering errors using the current distribution, as I described, and thus want to try to manually upgrade. I found this bash code snippet on Github at phaneesh/upgrade-phoenix.sh under the title "Upgrade to Apache Phoenix 4.11.0 on HDP 2.5.3.0-37" at https://gist.github.com/phaneesh/16f77539d1f1249aa9b0e5a847ce48a0 #!/usr/bin/env bash
HDP_INSTALL_PATH=/usr/hdp/2.5.5.0-157
cd /tmp
wget http://redrockdigimark.com/apachemirror/phoenix/apache-phoenix-4.12.0-HBase-1.1/bin/apache-phoenix-4.12.0-HBase-1.1-bin.tar.gz
tar -xvf apache-phoenix-4.12.0-HBase-1.1-bin.tar.gz
cp -r apache-phoenix-4.12.0-HBase-1.1-bin ${HDP_INSTALL_PATH}/apache-phoenix-4.12.0
PHOENIX_INSTALL_DIR=${HDP_INSTALL_PATH}/apache-phoenix-4.12.0
chmod -R 755 ${PHOENIX_INSTALL_DIR}
rm -rf apache-phoenix-4.12.0-HBase-1.1-bin.tar.gz
rm -rf apache-phoenix-4.12.0-HBase-1.1-bin
all_files=(${HDP_INSTALL_PATH}/phoenix/phoenix-client.jar ${HDP_INSTALL_PATH}/phoenix/phoenix-hive.jar ${HDP_INSTALL_PATH}/phoenix/phoenix-server.jar ${HDP_INSTALL_PATH}/phoenix/phoenix-thin-client.jar ${HDP_INSTALL_PATH}/hbase/lib/phoenix-server.jar)
for upd_file in "${all_files[@]}"
do
rm -rf ${upd_file}
done
cd ${HDP_INSTALL_PATH}/phoenix
ln -s ${PHOENIX_INSTALL_DIR}/phoenix-4.12.0-HBase-1.1-client.jar phoenix-client.jar
ln -s ${PHOENIX_INSTALL_DIR}/phoenix-4.12.0-HBase-1.1-hive.jar phoenix-hive.jar
ln -s ${PHOENIX_INSTALL_DIR}/phoenix-4.12.0-HBase-1.1-server.jar phoenix-server.jar
ln -s ${PHOENIX_INSTALL_DIR}/phoenix-4.12.0-HBase-1.1-thin-client.jar phoenix-thin-client.jar
chmod -R 755 ${HDP_INSTALL_PATH}/phoenix
cd ${HDP_INSTALL_PATH}/hbase/lib
ln -s ${PHOENIX_INSTALL_DIR}/phoenix-4.12.0-HBase-1.1-server.jar phoenix-server.jar
echo "Done" Do you think we can use that as a template? Is is missing anything that needs to be done, based on your experience?
... View more
04-24-2018
04:44 PM
Hi, We have been trying to use HBase and Phoenix (4.7) in the
current Hortonworks distribution for a genomics database, moving a
current Postgres set of tables into that framework. We are using a small
commodity cluster, nothing unusual. However, we are having critical
problems that I was not expecting. Even on a simple table (less than 20
fields), we can not get global indexes to work when retrieving other
fields outside the index - forcing use via a "Hint" in the command has
not been working. No joins involved, just querying a single table. And we just tried using local indexes to get around
that. But we are finding that when we try to use more than one local
index, none of the indexes get used. Alas, we need secondary indexes,
even with denormalization.
We have noted that the version of
Phoenix
(4.7) in Hortonworks is now over two years old (4.7.0 came out in March
2016). And while I do see that Hortonworks has added some patches to
4.7, the Apache release is now up to 4.13.2, with 4.14 coming out soon.
I've
been told that there are probably on the order of a thousand bug fixes
and many new features between 4.7 and 4.13. A senior Biomed Informatics
group lead here just tried to upgrade our
4.7 version manually on our cluster, doing file by file replacement
following some blog post instructions that he found. Did not work; we
reverted back to the orig config. He, working with another senior IT
group lead, are currently spinning up a test environment in the Amazon
cloud using the latest release there (4.13), to see if these problems go
away. I have to tell you, we were not expecting to have these
very basic problems with a well-established top-level Apache project, in
a distribution from one of the top three Hadoop vendors. The Phoenix
Query Server is working fine for us, but resolving these indexing issues
is critical. We need to make a decision quickly now as to where to
migrate, because we are expecting a huge increase in the data we will
need to store and query in the coming months. Right now, I have a couple
general questions for the group: 1) is anyone else experiencing
basic problems with the Hortonworks version of Phoenix? Or have you been
able to use it productively? 2) what is the future of Phoenix
(and HBase) in the Hortonworks distribution? We recently had a web
presentation by Hortonworks folks on the new software that they are
developing for data warehouses in the cloud that does does not
involve HBase and Phoenix (specific use case presented used genomic data). Sounded like a lot of effort was going in
that direction. At this point, I am worried about the support
Hortonworks will be giving Phoenix and HBase in the future, even if we
can finally resolve our current indexing problem. 3) and, of course, if anyone has encountered the same specific type of Phoenix indexing problems and managed to fix them, we would very much like to learn how that was done. Of particular note: have other groups managed to get global indexes
used by inserting "Hints" to do so - and to do such to employ multiple indexes in the same query? Any sharing of your experiences and/or guidance would be appreciated. - Ron Taylor
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Phoenix
04-19-2018
08:50 PM
Hello Hortonworks Community,
I am a novice Phoenix user and this is my first post to the Hortonworks Community group. I have what I hope is a simple question: my global index is being ignored,
even after I add a Hint, and I want to know
why.
We are using Phoenix 4.7 in the Hortonworks distribution. I have created
a fairly simple Pboenix (HBase) table of about 8 million rows, and about 15 columns,
with several fields having global indexes. I created the main table
(variantjoin_rtsalted24) and its indexes, and then
used a bulk loader to populate them from a tab-delimited file. That
appeared to work fine.
chrom_int is one field on which there is a global index, named
vj2_chrom_int. And you can see the index being automatically being used
below, where it is the only field being returned. Time required is 0.124
sec.
0: jdbc:phoenix:> SELECT VJ.chrom_int FROM VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;
+------------+
| CHROM_INT |
+------------+
| 18 |
| 18 |
| 18 |
| 18 |
| 18 |
+------------+
5 rows selected (0.124 seconds)
0: jdbc:phoenix:>
You can see that the vj2_chrom_int index is automatically being used, as
I understand things, by the "RANGE SCAN" wording and "[0,1" in the
explain plan:
0: jdbc:phoenix:> explain SELECT VJ.chrom_int FROM VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;
+------------------------------------------------------------------------------+
| PLAN |
+------------------------------------------------------------------------------+
| CLIENT 24-CHUNK SERIAL 24-WAY ROUND ROBIN RANGE SCAN OVER VJ2_CHROM_INT [0,1 |
| SERVER FILTER BY FIRST KEY ONLY |
| SERVER 5 ROW LIMIT |
| CLIENT 5 ROW LIMIT |
+------------------------------------------------------------------------------+
4 rows selected (0.043 seconds)
0: jdbc:phoenix:>
I can use a Hint to tell Phoenix to NOT use this index, as seen below.
And that increases the time needed to 1.97 sec, over an order of
magnitude more time than the 0.124 sec required with index use.
0: jdbc:phoenix:> SELECT /*+ NO_INDEX */ VJ.chrom_int FROM VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;
+------------+
| CHROM_INT |
+------------+
| 18 |
| 18 |
| 18 |
| 18 |
| 18 |
+------------+
5 rows selected (1.977 seconds)
0: jdbc:phoenix:>
And here is the explain plan for that:
0: jdbc:phoenix:> explain SELECT /*+ NO_INDEX */ VJ.chrom_int FROM
VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;
+------------------------------------------------------------------------------+
| PLAN |
+------------------------------------------------------------------------------+
| CLIENT 72-CHUNK 14325202 ROWS 15099524157 BYTES PARALLEL 24-WAY ROUND ROBIN |
| SERVER FILTER BY CHROM_INT = 18 |
| SERVER 5 ROW LIMIT |
| CLIENT 5 ROW LIMIT |
+------------------------------------------------------------------------------+
4 rows selected (0.009 seconds)
Now, I want to add other fields for retrieval. For example,
"genomic_range". The Phoenix documentation says in such a case I must
add a Hint to force Phoenix to make use of the index (since it is a
simple global index, not a covered index wherein genomic_range
has been added). So I tried that. See below. Alas, the response time is
about the same as what I get with NO_INDEX. It appears that, even with
the Hint, the index is not being used.
0: jdbc:phoenix:> SELECT/*+ INDEX(VJ jv2_chrom_int) */ VJ.chrom_int,
genomic_range FROM VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 )
limit 5;
+------------+----------------------+
| CHROM_INT | GENOMIC_RANGE |
+------------+----------------------+
| 18 | [49546,49547) |
| 18 | [20003625,20003626) |
| 18 | [19618749,19618752) |
| 18 | [47561,47583) |
| 18 | [20024261,20024272) |
+------------+----------------------+
5 rows selected (1.799 seconds)
0: jdbc:phoenix:>
And below is the explain plan for the query with the index failure. No indication of index use, that I can tell.
0: jdbc:phoenix:> explain SELECT/*+ INDEX(VJ jv2_chrom_int) */
VJ.chrom_int, genomic_range FROM VARIANTJOIN_RTSALTED24 as VJ WHERE
(chrom_int =18 ) limit 5;
+------------------------------------------------------------------------------+
| PLAN |
+------------------------------------------------------------------------------+
| CLIENT 72-CHUNK 14325202 ROWS 15099524157 BYTES PARALLEL 24-WAY ROUND ROBIN |
| SERVER FILTER BY CHROM_INT = 18 |
| SERVER 5 ROW LIMIT |
| CLIENT 5 ROW LIMIT |
+------------------------------------------------------------------------------+
4 rows selected (0.042 seconds)
0: jdbc:phoenix:>
So I am puzzled and frustrated. How do I get my index Hint to work? The
difference in timing between automatic use (when the index is the only
field being retrieved) and when NO_INDEX is used tells me that the index
table is there and can indeed be used. But
something is going wrong when I try to force its use via a Hint.
Guidance would be very much appreciated on this basic point.
- Ron Taylor
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Phoenix
03-19-2017
10:05 PM
Hi folks,
I am trying to get the sparklyr R package to work with Spark on a local Linux cluster. Works fine under Spark on my laptop - but now I want to parallelize R code on a actual cluster. I posted the message below as a new issue on the sparklyr page at github over a week ago, and I got one reply. Which, alas, did not succed in solving my problem. So I am posting again here, in the hope somebody reading this can provide guidance. I need to get R working in Spark on our cluster. Eager to get going, if I can get out of the gate. But have to get past this spark_connect() problem. Please see the below.
- Ron Taylor
Pacific Northwest National Laboratory
%%%%%%%%%%%%%
This is my earlier msg posted at the sparklyr page at github:
Hello folks,
I am trying to use sparklyr for the first time on a Hadoop cluster. I have used R with sparklyr on a "local" copy of Spark on my Mac laptop, but this is the first time that I am trying to run it as a "yarn-client" on a true cluster, to actually get some parallelization out of sparklyr use.
We have a small Linux cluster at our lab running Cloudera CDH 5.10.0. When I try to do the spark_connect() from an R session started on a command line on the Hadoop cluster's name (master) node, I get the same msg as in an earlier closed issue on the sparklyr github site.
That is, my error msg is:
"Failed while connecting to sparklyr to port (8880) for sessionid (2423): Gateway in port (8880) did not respond."
I am thus reopening that issue here, since I still need help even after reading that older github issue (#394).
At bottom is the record of my R session on the Hadoop cluster's name node, with all the details that I can think of printed out to the screen.
I note that the version of Spark used by CDH is 1.6.0, which is different than what is in spark_home_dir (1.6.2). I cannot seem to change the spark_home_dir by setting SPARK_HOME to the Spark location used by the CDH distribution. spark_home_dir does not get altered by my setting of SPARK_HOME (as you can see below). So one question (perhaps the critical question?) is: how do I force sparklyr to connect to the Spark version being used by the CDH distribution?
As you can see at the Cloudera web page at https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_510.html
the 5.10.0 distribution has Spark 1.6.0, not 1.6.2.
So I am trying to tell sparklyr code to use the Spark 1.6.0 distribution that is located here:
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/ spark -submit /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/ spark -shell /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/ spark
and so I was trying to set SPARK_HOME as follows:
Sys.setenv(SPARK_HOME = "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41")
Sys.getenv("SPARK_HOME") [1] "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41"
However! I note that part of the error msg (see bottom) says that the correct path was used to spark -submit:
"Path: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/ spark -submit"
So maybe sparklyr is indeed accessing the Spark 1.6.0 distribution as it should in the cluster, and the problem lies elsewhere??
One other note: there was an earlier version of sparklyr installed by support here on the Hadoop name node. I have bypassed that, installed the latest version of sparklyr (0.5.1) into
/people/rtaylor/Rpackages
as you can see below.
Would very much appreciate some guidance to get me over this initial hurdle.
Ron Taylor Pacific Northwest National Laboratory
%%%%%%%%%%%%%%%%%%%%%%%
Here is the captured screen output from the R session with my failed spark_connect() call (with the R session being done at the Linux command prompt on the Hadoop cluster namenode):
[rtaylor@bigdatann Rwork]$ R
R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch" Copyright (C) 2016 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit)
'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.
ls() character(0)
(.packages()) [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7] "base"
install.packages(" sparklyr ", lib="/people/rtaylor/Rpackages/") --- Please select a CRAN mirror for use in this session --- trying URL 'https://cran.cnr.berkeley.edu/src/contrib/sparklyr_0.5.2.tar.gz' Content type 'application/x-gzip' length 732806 bytes (715 KB) ================================================== downloaded 715 KB
installing source package ‘ sparklyr ’ ... ** package ‘ sparklyr ’ successfully unpacked and MD5 sums checked ** R ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** testing if installed package can be loaded
DONE ( sparklyr )
The downloaded source packages are in ‘/tmp/RtmpQUB4IE/downloaded_packages’
library( sparklyr , lib.loc="/people/rtaylor/Rpackages/")
(.packages()) [1] " sparklyr " "stats" "graphics" "grDevices" "utils" "datasets" [7] "methods" "base"
sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Workstation release 6.4 (Santiago)
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] sparklyr_0.5.1
loaded via a namespace (and not attached): [1] Rcpp_0.12.9 withr_1.0.2 digest_0.6.12 dplyr_0.5.0 [5] rprojroot_1.2 assertthat_0.1 rappdirs_0.3.1 R6_2.2.0 [9] jsonlite_1.2 DBI_0.5-1 backports_1.0.5 magrittr_1.5 [13] httr_1.2.1 config_0.2 tools_3.3.2 parallel_3.3.2 [17] yaml_2.1.14 base64enc_0.1-3 tcltk_3.3.2 tibble_1.2
Sys.getenv("JAVA_HOME") [1] "/usr/java/latest"
spark_installed_versions() spark hadoop dir 1 1.6.2 2.6 spark -1.6.2-bin-hadoop2.6
spark_home_dir() [1] "/people/rtaylor/.cache/ spark / spark -1.6.2-bin-hadoop2.6"
R.home(component = "home") [1] "/share/apps/R/3.3.2/lib64/R"
path.expand("~") [1] "/people/rtaylor"
Sys.setenv(SPARK_HOME = "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41")
Sys.getenv("SPARK_HOME") [1] "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41"
spark_home_dir() [1] "/people/rtaylor/.cache/ spark / spark -1.6.2-bin-hadoop2.6"
ls() character(0)
config <- spark_config()
ls() [1] "config"
And, finally, here is the actual spark_connect() command where I see the failure:
sc <- spark_connect(master = "yarn-client", config = config, version = "1.6.0") Error in force(code) : Failed while connecting to sparklyr to port (8880) for sessionid (2423): Gateway in port (8880) did not respond. Path: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/ spark -submit Parameters: --class, sparklyr .Backend, --jars, '/share/apps/R/3.3.2/lib64/R/library/ sparklyr /java/ spark -csv_2.11-1.3.0.jar','/share/apps/R/3.3.2/lib64/R/library/ sparklyr /java/commons-csv-1.1.jar','/share/apps/R/3.3.2/lib64/R/library/ sparklyr /java/univocity-parsers-1.5.1.jar', '/share/apps/R/3.3.2/lib64/R/library/ sparklyr /java/ sparklyr -1.6-2.10.jar', 8880, 2423
---- Output Log ---- /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/../lib/ spark /bin/ spark -submit: line 27: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/ spark -class: No such file or directory /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/../lib/ spark /bin/ spark -submit: line 27: exec: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/ spark -class: cannot execute: No such file or directory
---- Error Log ----
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
And below you can see the the results of a similar R session, which I just tried today after hearing back from Mr. Ariga via github on my posted issue, and trying what he suggested. Again, I failed. Still need help.
---------- Forwarded message ---------- From: Ronald Taylor Date: Sun, Mar 19, 2017 at 5:23 PM Subject: Re: [rstudio/sparklyr] problem with spark_connect() using sparklyr on a Cloudera CDH 5.10.0 Hadoop cluster (#534) To: rstudio/sparklyr <reply+00525e71db98e9f7adda4ea9e8bcc457e983afabaf2725ba92cf0000000114e3a7e692a169ce0ca76ebc@reply.github.com> Hi Aki,
Thanks for the guidance, but I still cannot get spark_connect() to work. Very disappointing. You can see the screen output for my connect attempts below. Also, I checked out the Cloudera web page that you listed - but I don't see anything there that usefully supplements your email to me. And so I am still stuck. Can you (or anbody else on the list) think of of anything else I can try? Spark 1.6.0 is running fine on the Cloudera cluster that I am trying to use, according to the Cloudera Manager. So the spark_connect() *should* work, but is not.
- Ron
Screen output: > > config <- spark_config() > > spark_home <- "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41" > > spark_home [1] "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41" > > > spark_version <- "1.6.0" > > spark_version [1] "1.6.0" > > sc <- spark_connect(master = "yarn-client", config = config, version = spark_version, spark_home=spark_home) Error in force(code) : Failed while connecting to sparklyr to port (8880) for sessionid (8451): Gateway in port (8880) did not respond. Path: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/spark-submit Parameters: --class, sparklyr.Backend, --jars, '/people/rtaylor/Rpackages/sparklyr/java/spark-csv_2.11-1.3.0.jar','/people/rtaylor/Rpackages/sparklyr/java/commons-csv-1.1.jar','/people/rtaylor/Rpackages/sparklyr/java/univocity-parsers-1.5.1.jar', '/people/rtaylor/Rpackages/sparklyr/java/sparklyr-1.6-2.10.jar', 8880, 8451 ---- Output Log ---- /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/../lib/spark/bin/spark-submit: line 27: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/spark-class: No such file or directory /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/../lib/spark/bin/spark-submit: line 27: exec: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/spark-class: cannot execute: No such file or directory ---- Error Log ---- >
Another attempt, after I set some config values, as suggested:
> > config$spark.driver.cores<- 4 > > config$spark.executor.cores<- 4 > > config$executor.memory <- "4G" > > config $sparklyr.cores.local [1] 16 $spark.sql.shuffle.partitions.local [1] 16 $spark.env.SPARK_LOCAL_IP.local [1] "127.0.0.1" $sparklyr.csv.embedded [1] "^1.*" $`sparklyr.shell.driver-class-path` [1] "" $spark.driver.cores [1] 4 $spark.executor.cores [1] 4 $executor.memory [1] "4G" attr(,"config") [1] "default" attr(,"file") [1] "/people/rtaylor/Rpackages/sparklyr/conf/config-template.yml" > > > spark_version [1] "1.6.0" > > spark_home [1] "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41" > > spark_connect(master = "yarn-client", config = config, version = spark_version, spark_home=spark_home) Error in force(code) : Failed while connecting to sparklyr to port (8880) for sessionid (533): Gateway in port (8880) did not respond. Path: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/spark-submit Parameters: --class, sparklyr.Backend, --jars, '/people/rtaylor/Rpackages/sparklyr/java/spark-csv_2.11-1.3.0.jar','/people/rtaylor/Rpackages/sparklyr/java/commons-csv-1.1.jar','/people/rtaylor/Rpackages/sparklyr/java/univocity-parsers-1.5.1.jar', '/people/rtaylor/Rpackages/sparklyr/java/sparklyr-1.6-2.10.jar', 8880, 533 ---- Output Log ---- /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/../lib/spark/bin/spark-submit: line 27: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/spark-class: No such file or directory /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/../lib/spark/bin/spark-submit: line 27: exec: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/spark-class: cannot execute: No such file or directory ---- Error Log ---- > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Guidance that I got from Mr. Ariga in response to my gigthub posting:
On Fri, Mar 17, 2017 at 6:34 AM, Aki Ariga wrote:
I ran Spark 1.6.2 on CDH 5.10 as follows:
config <- spark_config()
config $ spark.driver.cores <- 4
config $ spark.executor.cores <- 4
config $ spark.executor.memory <- " 4G "
spark_home <- " /opt/cloudera/parcels/CDH/lib/spark "
spark_version <- " 1.6.2 "
#spark_home <- "/opt/cloudera/parcels/SPARK2/lib/spark2"
#spark_version <- "2.0.0"
sc <- spark_connect( master = " yarn-client " , version = spark_version , config = config , spark_home = spark_home )
See also http://blog.cloudera.com/blog/2017/02/analyzing-us-flight-data-on-amazon-s3-with-sparklyr-and-apache-spark-2-0/
—
%%%%%%%%
... View more
Labels:
- Labels:
-
Spark