Member since
02-27-2020
173
Posts
42
Kudos Received
48
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1757 | 11-29-2023 01:16 PM | |
2246 | 10-27-2023 04:29 PM | |
1780 | 07-07-2023 10:20 AM | |
3646 | 03-21-2023 08:35 AM | |
1273 | 01-25-2023 08:50 PM |
10-20-2020
03:17 PM
Hi Priyanshu, This is may not be the answer you are looking for, but this may be a bug in Apache Oozie. Looking at the source code for CallbackService.java the string that callback tries to return is CALL_BACK_QUERY_STRING = "{0}?" + ID_PARAM + "{1}" + "&" + STATUS_PARAM + "{2}" Note that there is an & character. If this is not properly handled, the XML/HTTP will come back with exactly the error you mention (see here). Your best bet may be to use the alternate solution you proposed. Regards, Alex
... View more
10-06-2020
10:05 AM
The error seems to indicate that the source JSON is malformed. Check where the data is stored and look at the JSON structure. Each row should be one, self-contained JSON. Please post screenshot here. Also, did you add the necessary jar to Hive: hive-serdes-1.0-SNAPSHOT.jar. I assume you are following this example: https://github.com/cloudera/cdh-twitter-example Finally, you can try a different serDe as shown in this topic: https://community.cloudera.com/t5/Support-Questions/hive-table-error/td-p/127271 Or try this solution on stackoverflow: https://stackoverflow.com/questions/32416555/twitter-sentiment-analysis
... View more
10-01-2020
02:19 PM
Hi @aakulov Thanks very much for your help. We used --hs2-url and sqoop works fine. best regards, Eduardo
... View more
09-29-2020
02:38 PM
3 Kudos
By default, Cloudera Machine Learning (CML) ships Jupyter kernel as part of the base engine images. Data Scientists often prefer to use a specialized custom kernel in Jupyter that makes their work more efficient. In this community post, we will walk through how to customize a Docker container image with a sparkmagic Jupyter kernel and how to deploy it to a CML workspace.
Prerequisites:
Admin privileges in a CML workspace
Local Docker client with access to Docker Hub or internal Docker registry
Step 1. Choose a custom Jupyter kernel.
Jupyter kernels are purpose-built add-ons to the basic Python notebook editor. For this tutorial, I chose sparkmagic as the kernel that provides convenient features for working with Spark, like keeping SQL syntax clean in a cell. Sparkmagic relies on Livy to communicate with the Spark cluster. As of this writing, Livy is not supported in CML when running Spark on Kubernetes. However, your classic Spark cluster (for example on Data Hub) will work with Levy and therefore sparkmagic. For now, you simply need to know that installing sparkmagic is done with the following sequence of commands:
pip3 install sparkmagic jupyter nbextension enable --py --sys-prefix widgetsnbextension jupyter-kernelspec install sparkmagic/kernels/pysparkkernel
Note: The third line is executed once you cd in the directory that is created after the install. This location is platform dependent and is determined by running pip3 show sparkmagic after the install. We’ll have to take care of this in the docker image definition.
Step 2. Customize your Docker Image
To create a custom Docker image we first create a text file, I called it magic-dockr, that specifies the base image (CML base engine on Ubuntu) along with additional libraries we want to install. I will use CML to do the majority of the work.
First, create the below docker file in your CML project.
# Dockerfile # Specify a Cloudera Machine Learning base image FROM docker.repository.cloudera.com/cdsw/engine:9-cml1.1 # Update packages on the base image and install beautifulsoup4 RUN apt-get update RUN pip3 install sparkmagic RUN jupyter nbextension enable --py --sys-prefix widgetsnbextension RUN jupyter-kernelspec install --user $(pip3 show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/pysparkkernel
Now we use this image definition to build a deployable Docker container. Run the following commands in an environment where docker.io binaries are installed.
docker build -t <your-repository>/cml-sparkmagic:v1.0 . -f magic-dockr docker push <your-repository>/cml-sparkmagic:v1.0
This will build and distribute your Docker image to a repository of your choosing.
Step 3. Adding a custom image to CML
There are two steps to make the custom kernel available in your project. One to add the image to CML Workspace and the other to enable the image for the Project you are working on.
The first step requires Admin privileges. From the blade menu on the left, select Admin then click on the Engines tab. In the Engine Images section, enter the name of your custom image (e.g. Sparkmagic Kernel) and the repository tag you used in Step 2. Click Add.
Once the engine is added, we’ll need to tell CML how to launch a Jupyter notebook when this image is used to run a session. Click the Edit button next to the Sparkmagic Kernel you’ve added. Click + New Editor in the window that opens.
Enter the editor name as Jupyter Notebook and for the command use the following:
/usr/local/bin/jupyter-notebook --no-browser --ip=127.0.0.1 --port=8090 --NotebookApp.token= --NotebookApp.allow_remote_access=True --log-level=ERROR
Note that port 8090 is the default port, unless your administrator changed it.
Then click Save and Save again.. At this point CML knows where to find your custom kernel and what editor to launch when a session starts.
Now we are ready to enable this custom engine inside a project.
Step 4. Enable custom engine in your project.
Open a project where you would like to use your custom kernel. For me, it’s a project called Custom Kernel Project (yes, I’m not very creative when it comes to names). In the left panel, click on Project Settings, then go to the Engine tab. From the Engine Image section, drop-down select your custom engine image.
To test the engine, go to Sessions, and create a new session. You’ll see that Engine Image is the custom Docker image you’ve created in Step 2. Name your session and select Jupyter Notebook as your Editor.
When the session launches, in the Jupyter notebook interface you’ll be able to select PySpark when creating a new notebook.
You can start with %%help magic and follow along with Sparkmagic documentation. Specifically, you’ll want to configure a connection to a Spark cluster using a JSON template provided.
That’s it!
CML brings you the flexibility to run any third-party editor on the platform, making development more efficient for Data Scientists and Data Engineers. Note that while this article talked about sparkmagic custom kernel, the same procedure can be applied to any kernel you wish to run with Jupyter notebook or Jupyter Lab.
Reference:
CML Docs: Creating a Customized Engine Image
Sparkmagic Docs
... View more
09-17-2020
01:17 AM
1 Kudo
Hi Alex, I've resolved the issue. I had been using the wf:X() in the wrong place. Initially I had added an argument for the shell action like ${user} and then set $wf:user() as the value of that argument when submitting the workflow. I replaced ${user} with the Expression Language Function and the value was resolved automatically and when submitting the workflow I wasn't asked to provide a value for the argument 🙂
... View more
09-16-2020
12:20 PM
Hi James, Thanks for clarifying your question. It's true that there is no native functionality for this, however it is possible to change the action name in a slightly hacky way: 1. In the edit mode of your Oozie workflow click on the name of the node and note the id: 2. Save and export your workflow. This will give you access to a JSON file that you can edit. 3. In that JSON file do a search and replace for -[NODE ID]\" and replace with your desired name for the node. All of the references to the old node ID should be replaced to maintain references. Save the file. 4. Import your JSON back into Hue. This will update your existing workflow file and now the generated Oozie XML will have the name for the node that you want. Hope this helps. Regards, Alex
... View more
07-29-2020
03:20 AM
I was afraid of that. Yes, I am using distcp for migration. Thanks very much nevertheless for your reply. The bandwidth option might be a very last resort, but probably, that will have to do.
... View more
07-25-2020
08:31 PM
I just could not see it!!:-( Thank you so much. 😊
... View more
07-23-2020
02:38 AM
Can we delete kafka consumer group data? not the consumer group need to delete group data?
... View more
07-23-2020
12:45 AM
Yes They both are in same folder. Just to let you know, scripts works if we do not have Spark commands in script which is being imported to other script. So it looks like i am missing something to include.
... View more