About HannaJr87

HannaJr87 · ‎04-12-2023

you can use Keycloak to authenticate and obtain a token, which you can then use to make the GET request to the API. Here's an example of how you can do this:Set up a Keycloak client for your API. You can follow the Keycloak documentation to set up a client for your API.Use the Keycloak API to obtain an access token. You can use the /auth/realms/{realm_name}/protocol/openid-connect/token endpoint to obtain an access token. You will need to provide your Keycloak username and password, as well as the client ID and client secret for your API. Here's an example of how you can use the requests library in Python to obtain an access token: import requests url = 'http://<KEYCLOAK_HOST>/auth/realms/<REALM_NAME>/protocol/openid-connect/token' payload = { 'client_id': '<CLIENT_ID>', 'client_secret': '<CLIENT_SECRET>', 'username': '<USERNAME>', 'password': '<PASSWORD>', 'grant_type': 'password' } response = requests.post(url, data=payload) access_token = response.json()['access_token'] This code will make a POST request to the Keycloak token endpoint and obtain an access token. Use the access token to make the GET request to the API. You can use the Authorization header to include the access token in the GET request. Here's an example of how you can use the requests library in Python to make the GET request: import requests url = 'http://<API_HOST>/api/<API_ENDPOINT>' headers = { 'Authorization': 'Bearer ' + access_token } response = requests.get(url, headers=headers) data = response.json() You can use the ExecuteScript processor in NiFi to execute the Python code and obtain the data from the API. You can use the InvokeHTTP processor to make the initial GET request to the Keycloak sign-in page and extract the necessary information, such as the Keycloak username and password, to use in the Python code. You can use NiFi's built-in ExtractText processor to extract the necessary information from the InvokeHTTP response. This code will make a GET request to the API with the access token included in the Authorization header. Sincerely, Hannah

HannaJr87 · ‎04-12-2023

Increase the number of partitions: By default, the number of partitions is set to the number of cores available in your cluster. If your data is small, you can try to increase the number of partitions to improve the performance. You can use the repartition method to increase the number of partitions. For example, you can try something like this: df = spark.read.jdbc(url=jdbc_url, table='sgms.'+tablelist[i], properties=connection_details).repartition(4) This will create 4 partitions for the data and distribute it across the cluster. Increase the executor memory: By default, each executor is allocated 1GB of memory. If your data is large, you can try to increase the memory allocation to improve the performance. You can use the --executor-memory flag to set the memory allocation. For example, you can try something like this: spark-submit --executor-memory 4g oracle-example.com This will allocate 4GB of memory to each executor. Use foreachPartition instead of write: The write method writes data sequentially, which can be slow for large datasets. You can try using the foreachPartition method to write data in parallel. For example, you can try something like this: df.foreachPartition(lambda x: write_to_hdfs(x)) Here, write_to_hdfs is a function that writes the data to HDFS. Increase the number of executors: By default, only one executor is allocated for each task. You can try to increase the number of executors to improve the performance. You can use the --num-executors flag to set the number of executors. For example, you can try something like this: spark-submit --num-executors 4 oracle-example.com This will allocate 4 executors for each task.

Online	Offline
Last Visited	‎04-12-2023 10:43 AM

Member Since	‎04-12-2023 05:49 AM
Last Visited	‎04-12-2023 10:43 AM
Posts	2
Kudos received	2

Cloudera Community

Re: How to sign-in or bypass keycloak when making ...

Re: write is slow in hdfs using pyspark