About aakulov

rachid-berkane · ‎07-22-2020

ambari files view (same PB for Hue File browser) is not the good tool if you want to upload (very) big files. it's running in JVMs, and uploading big files will use more memory (you will hit maximum availaible mem very quickly and cause perfs issues to other users while you are uploading ) BTW it's possible to add other ambari server views to increase perfs (it may be dedicated to some teams/projects ) for very big files prefer Cli tools : scp to EDGE NODE with a big FS + hdfs dfs -put. or distcp or use an object storage accessible from you hadoop cluster with a good network bandwidth

Kezia · ‎07-20-2020

You can use Nifi to save your Kafka messages into HDFS (for instance). Something like this : - ConsumeKafka : flowfile content is the Kafka message itself, and you have access to some attributes : topic name, partition, offset, key...(but not timestamp !). When i need it I store the timestamp in the key. - ReplaceText : build your backup line using flowfile content and attributes - MergeContent : to build a big file containing multiple Kafka message - Extracttext : to set attribute to be used as filename - PutHDFS : to save the created file into HDFS And you can do the reverse if you need to push it bash to your kafka cluster.

aakulov · ‎07-20-2020

In the PutHBaseRecord processor, specify the Row Identifier Field Name to be ${MYCOL} per the NiFi Expression Language. Hope this helps!

aakulov · ‎07-19-2020

NiFi is one option to accomplish what you need. You can find an example here for a generic SQL database moving data "in real-time" to Hive. If you could describe your use case in more details, the community could assist you better.

aakulov · ‎07-17-2020

MLOps will eventually be available as part of Cloudera Data Science Workbench (CDSW) product. Keep an eye out for new releases coming soon.

aakulov · ‎07-17-2020

CDW doesn't give you access to that safety valve setting in Hue configuration. So, you won't be able to disable the download button. You do have access to that configuration if you setup a Data Hub cluster and have your users access the data through the Hue interface there.

aakulov · ‎07-16-2020

To get the output like the Hive page you linked to you just need this: describe formatted <TABLE_NAME> <COLUMN_NAME>; That works in Hue. Can you further clarify what output you are looking for in an ideal scenario?

aakulov · ‎07-08-2020

Glad you are making progress. The command you are looking for is actually LOAD DATA LOCAL INPATH ... Note that you missed LOCAL keyword. Without the LOCAL keyword, hive will look for a table in hdfs, which is why you see the error "No files matching path hdfs://quickstart.cloudera:8020/users/melissava...". It's because hive is trying to look in the hdfs, instead of you local machine.

aakulov · ‎06-30-2020

You can do this: !pip3 install sklearn This will install the needed package. Note that ! is a magic command that executes the command not in your Python session but in the underlying OS environment.

Marek · ‎06-29-2020

Clearing cookies helped. Thank you.

Online	Offline
Last Visited	‎09-05-2024 02:11 AM

Member Since	‎02-27-2020 04:13 PM
Last Visited	‎09-05-2024 02:11 AM
Posts	173
Kudos received	42

Cloudera Community

Re: Changing Colours or adding a banner to WebUIs

Re: CDP Public Cloud - Resizing of Worker/Compute ...

Re: How to collect queries submitted by other user...

Re: CDH配置好以后，agent服务能够启动，但是server服务无法启动 (After CDH...

Re: How to increase timeout definition?

Re: Unable to upload file into HDFS

Re: Backing up Kafka

Re: "ROW ID was null" error in puthbaserecord for ...

Re: MySQL to Hive

Re: Can we use Cloudera implementation of MLOps f...

Re: How to disable download button in HUE on CDP v...

Re: Get a list of all column names with non-missin...

Re: Error while compiling statement: FAILED: Parse...

Re: CDSW - ModuleNotFoundError: No module named 's...

Re: Unable to access terminal from CDSW session