About aakulov

aakulov · ‎10-20-2020

Hi Priyanshu, This is may not be the answer you are looking for, but this may be a bug in Apache Oozie. Looking at the source code for CallbackService.java the string that callback tries to return is CALL_BACK_QUERY_STRING = "{0}?" + ID_PARAM + "{1}" + "&" + STATUS_PARAM + "{2}" Note that there is an & character. If this is not properly handled, the XML/HTTP will come back with exactly the error you mention (see here). Your best bet may be to use the alternate solution you proposed. Regards, Alex

aakulov · ‎10-06-2020

The error seems to indicate that the source JSON is malformed. Check where the data is stored and look at the JSON structure. Each row should be one, self-contained JSON. Please post screenshot here. Also, did you add the necessary jar to Hive: hive-serdes-1.0-SNAPSHOT.jar. I assume you are following this example: https://github.com/cloudera/cdh-twitter-example Finally, you can try a different serDe as shown in this topic: https://community.cloudera.com/t5/Support-Questions/hive-table-error/td-p/127271 Or try this solution on stackoverflow: https://stackoverflow.com/questions/32416555/twitter-sentiment-analysis

Eduardohahn · ‎10-01-2020

Hi @aakulov Thanks very much for your help. We used --hs2-url and sqoop works fine. best regards, Eduardo

aakulov · ‎09-29-2020

By default, Cloudera Machine Learning (CML) ships Jupyter kernel as part of the base engine images. Data Scientists often prefer to use a specialized custom kernel in Jupyter that makes their work more efficient. In this community post, we will walk through how to customize a Docker container image with a sparkmagic Jupyter kernel and how to deploy it to a CML workspace. Prerequisites: Admin privileges in a CML workspace Local Docker client with access to Docker Hub or internal Docker registry Step 1. Choose a custom Jupyter kernel. Jupyter kernels are purpose-built add-ons to the basic Python notebook editor. For this tutorial, I chose sparkmagic as the kernel that provides convenient features for working with Spark, like keeping SQL syntax clean in a cell. Sparkmagic relies on Livy to communicate with the Spark cluster. As of this writing, Livy is not supported in CML when running Spark on Kubernetes. However, your classic Spark cluster (for example on Data Hub) will work with Levy and therefore sparkmagic. For now, you simply need to know that installing sparkmagic is done with the following sequence of commands: pip3 install sparkmagic jupyter nbextension enable --py --sys-prefix widgetsnbextension jupyter-kernelspec install sparkmagic/kernels/pysparkkernel Note: The third line is executed once you cd in the directory that is created after the install. This location is platform dependent and is determined by running pip3 show sparkmagic after the install. We’ll have to take care of this in the docker image definition. Step 2. Customize your Docker Image To create a custom Docker image we first create a text file, I called it magic-dockr, that specifies the base image (CML base engine on Ubuntu) along with additional libraries we want to install. I will use CML to do the majority of the work. First, create the below docker file in your CML project. # Dockerfile # Specify a Cloudera Machine Learning base image FROM docker.repository.cloudera.com/cdsw/engine:9-cml1.1 # Update packages on the base image and install beautifulsoup4 RUN apt-get update RUN pip3 install sparkmagic RUN jupyter nbextension enable --py --sys-prefix widgetsnbextension RUN jupyter-kernelspec install --user $(pip3 show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/pysparkkernel Now we use this image definition to build a deployable Docker container. Run the following commands in an environment where docker.io binaries are installed. docker build -t <your-repository>/cml-sparkmagic:v1.0 . -f magic-dockr docker push <your-repository>/cml-sparkmagic:v1.0 This will build and distribute your Docker image to a repository of your choosing. Step 3. Adding a custom image to CML There are two steps to make the custom kernel available in your project. One to add the image to CML Workspace and the other to enable the image for the Project you are working on. The first step requires Admin privileges. From the blade menu on the left, select Admin then click on the Engines tab. In the Engine Images section, enter the name of your custom image (e.g. Sparkmagic Kernel) and the repository tag you used in Step 2. Click Add. Once the engine is added, we’ll need to tell CML how to launch a Jupyter notebook when this image is used to run a session. Click the Edit button next to the Sparkmagic Kernel you’ve added. Click + New Editor in the window that opens. Enter the editor name as Jupyter Notebook and for the command use the following: /usr/local/bin/jupyter-notebook --no-browser --ip=127.0.0.1 --port=8090 --NotebookApp.token= --NotebookApp.allow_remote_access=True --log-level=ERROR Note that port 8090 is the default port, unless your administrator changed it. Then click Save and Save again.. At this point CML knows where to find your custom kernel and what editor to launch when a session starts. Now we are ready to enable this custom engine inside a project. Step 4. Enable custom engine in your project. Open a project where you would like to use your custom kernel. For me, it’s a project called Custom Kernel Project (yes, I’m not very creative when it comes to names). In the left panel, click on Project Settings, then go to the Engine tab. From the Engine Image section, drop-down select your custom engine image. To test the engine, go to Sessions, and create a new session. You’ll see that Engine Image is the custom Docker image you’ve created in Step 2. Name your session and select Jupyter Notebook as your Editor. When the session launches, in the Jupyter notebook interface you’ll be able to select PySpark when creating a new notebook. You can start with %%help magic and follow along with Sparkmagic documentation. Specifically, you’ll want to configure a connection to a Spark cluster using a JSON template provided. That’s it! CML brings you the flexibility to run any third-party editor on the platform, making development more efficient for Data Scientists and Data Engineers. Note that while this article talked about sparkmagic custom kernel, the same procedure can be applied to any kernel you wish to run with Jupyter notebook or Jupyter Lab. Reference: CML Docs: Creating a Customized Engine Image Sparkmagic Docs

_James_ · ‎09-17-2020

Hi Alex, I've resolved the issue. I had been using the wf:X() in the wrong place. Initially I had added an argument for the shell action like ${user} and then set $wf:user() as the value of that argument when submitting the workflow. I replaced ${user} with the Expression Language Function and the value was resolved automatically and when submitting the workflow I wasn't asked to provide a value for the argument 🙂

aakulov · ‎09-16-2020

Hi James, Thanks for clarifying your question. It's true that there is no native functionality for this, however it is possible to change the action name in a slightly hacky way: 1. In the edit mode of your Oozie workflow click on the name of the node and note the id: 2. Save and export your workflow. This will give you access to a JSON file that you can edit. 3. In that JSON file do a search and replace for -[NODE ID]\" and replace with your desired name for the node. All of the references to the old node ID should be replaced to maintain references. Save the file. 4. Import your JSON back into Hue. This will update your existing workflow file and now the generated Oozie XML will have the name for the node that you want. Hope this helps. Regards, Alex

classypie · ‎07-29-2020

I was afraid of that. Yes, I am using distcp for migration. Thanks very much nevertheless for your reply. The bandwidth option might be a very last resort, but probably, that will have to do.

AMY · ‎07-25-2020

I just could not see it!!:-( Thank you so much. 😊

Manoj690 · ‎07-23-2020

Can we delete kafka consumer group data? not the consumer group need to delete group data?

nagac · ‎07-23-2020

Yes They both are in same folder. Just to let you know, scripts works if we do not have Spark commands in script which is being imported to other script. So it looks like i am missing something to include.

Online	Offline
Last Visited	‎09-05-2024 02:11 AM

Member Since	‎02-27-2020 04:13 PM
Last Visited	‎09-05-2024 02:11 AM
Posts	173
Kudos received	42

Cloudera Community

Re: Changing Colours or adding a banner to WebUIs

Re: CDP Public Cloud - Resizing of Worker/Compute ...

Re: How to collect queries submitted by other user...

Re: CDH配置好以后，agent服务能够启动，但是server服务无法启动 (After CDH...

Re: How to increase timeout definition?

Re: how to get url for oozie action logs as an arg...

Re: HiveSQL gives error, while querying twitter da...

Re: (NEW INSTALL) Sqoop 1.4.7.7 / CDP 7.1.3 hive.H...

How to setup custom Jupyter Kernel in CML

Re: Using Oozie Worfklow Expression Language Funct...

Re: How can I set the name of an Oozie action node...

Re: Rack awareness during HDFS replication

Re: PARSER error while trying to create a simple t...

Re: kafka topic deletion

Re: Unable to import pyspark script for my into py...