Created 06-07-2023 07:46 AM
Hello
Inside a Cloudera default installation with Spark I create and activate a Python Virtual Environment with all the libraries that I need.
The only problem that I have is with PySpark library inside VirtualEnv because not connect to tables in the data lake like parquets tables to make queries.
Whe I use the default library of Pyspark OUTSIDE the VirtualEnv in the default installation of Cloudera with Spark. I don't have problems to make the queries, it works.
Can you help me please with a solution to use Pyspark inside the Python VirtualEnv and make queries to tables in the data lake.
Thanks!!
Created 06-08-2023 03:50 AM
@hightek2699 Welcome to our community! To help you get the best possible answer, I have tagged in our Spark expert @RangaReddy who may be able to assist you further.
Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Created 06-12-2023 02:12 AM
Hi @hightek2699
Could you please share me the what is the exact issue when you are running inside virtual environment. And also provide the steps what you have followed.
Created 06-20-2023 05:48 AM
Good morning
@RangaReddy thanks by your help, the exact issue is with PySpark library inside VirtualEnv because not connect to tables in the data lake like parquets tables to make queries the error messages is "Path does not exist".
Whe I use the default library of Pyspark OUTSIDE the VirtualEnv in the default installation of Cloudera with Spark. I don't have problems to make the queries, it works.
I have change this configuration but returns to the default configuration of Spark where the binary files are located
export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
The steps that I follow are:
1. Create the environment with the desired version of Python
Python36 -m venv <environment_name>
2. Activate the created environment
source <environment_name>/bin/activate
3. pip install pyspark
I refer to this documentation, but I don't get activate my local installation of Pyspark
A Case for Isolated Virtual Environments with PySpark - inovex GmbH
Thanks!
Created 06-29-2023 08:41 PM
Hi @hightek2699
Don't install pyspark manually using pip install command. Use the cloudera provided pyspark.