Member since
06-07-2023
2
Posts
0
Kudos Received
0
Solutions
06-20-2023
05:48 AM
Good morning @RangaReddy thanks by your help, the exact issue is with PySpark library inside VirtualEnv because not connect to tables in the data lake like parquets tables to make queries the error messages is "Path does not exist". Whe I use the default library of Pyspark OUTSIDE the VirtualEnv in the default installation of Cloudera with Spark. I don't have problems to make the queries, it works. I have change this configuration but returns to the default configuration of Spark where the binary files are located export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark The steps that I follow are: 1. Create the environment with the desired version of Python Python36 -m venv <environment_name> 2. Activate the created environment source <environment_name>/bin/activate 3. pip install pyspark I refer to this documentation, but I don't get activate my local installation of Pyspark A Case for Isolated Virtual Environments with PySpark - inovex GmbH Thanks!
... View more
06-07-2023
07:46 AM
Hello Inside a Cloudera default installation with Spark I create and activate a Python Virtual Environment with all the libraries that I need. The only problem that I have is with PySpark library inside VirtualEnv because not connect to tables in the data lake like parquets tables to make queries. Whe I use the default library of Pyspark OUTSIDE the VirtualEnv in the default installation of Cloudera with Spark. I don't have problems to make the queries, it works. Can you help me please with a solution to use Pyspark inside the Python VirtualEnv and make queries to tables in the data lake. Thanks!!
... View more
Labels:
- Labels:
-
Apache Spark