Member since
02-27-2020
173
Posts
42
Kudos Received
48
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
470 | 11-29-2023 01:16 PM | |
567 | 10-27-2023 04:29 PM | |
610 | 07-07-2023 10:20 AM | |
1522 | 03-21-2023 08:35 AM | |
603 | 01-25-2023 08:50 PM |
07-22-2020
09:24 PM
ambari files view (same PB for Hue File browser) is not the good tool if you want to upload (very) big files. it's running in JVMs, and uploading big files will use more memory (you will hit maximum availaible mem very quickly and cause perfs issues to other users while you are uploading ) BTW it's possible to add other ambari server views to increase perfs (it may be dedicated to some teams/projects ) for very big files prefer Cli tools : scp to EDGE NODE with a big FS + hdfs dfs -put. or distcp or use an object storage accessible from you hadoop cluster with a good network bandwidth
... View more
07-20-2020
11:27 PM
1 Kudo
You can use Nifi to save your Kafka messages into HDFS (for instance). Something like this : - ConsumeKafka : flowfile content is the Kafka message itself, and you have access to some attributes : topic name, partition, offset, key...(but not timestamp !). When i need it I store the timestamp in the key. - ReplaceText : build your backup line using flowfile content and attributes - MergeContent : to build a big file containing multiple Kafka message - Extracttext : to set attribute to be used as filename - PutHDFS : to save the created file into HDFS And you can do the reverse if you need to push it bash to your kafka cluster.
... View more
07-20-2020
08:48 PM
In the PutHBaseRecord processor, specify the Row Identifier Field Name to be ${MYCOL} per the NiFi Expression Language. Hope this helps!
... View more
07-19-2020
09:38 PM
NiFi is one option to accomplish what you need. You can find an example here for a generic SQL database moving data "in real-time" to Hive. If you could describe your use case in more details, the community could assist you better.
... View more
07-17-2020
02:15 PM
MLOps will eventually be available as part of Cloudera Data Science Workbench (CDSW) product. Keep an eye out for new releases coming soon.
... View more
07-17-2020
02:11 PM
CDW doesn't give you access to that safety valve setting in Hue configuration. So, you won't be able to disable the download button. You do have access to that configuration if you setup a Data Hub cluster and have your users access the data through the Hue interface there.
... View more
07-16-2020
04:51 PM
To get the output like the Hive page you linked to you just need this: describe formatted <TABLE_NAME> <COLUMN_NAME>; That works in Hue. Can you further clarify what output you are looking for in an ideal scenario?
... View more
07-08-2020
10:05 AM
Glad you are making progress. The command you are looking for is actually LOAD DATA LOCAL INPATH ... Note that you missed LOCAL keyword. Without the LOCAL keyword, hive will look for a table in hdfs, which is why you see the error "No files matching path hdfs://quickstart.cloudera:8020/users/melissava...". It's because hive is trying to look in the hdfs, instead of you local machine.
... View more
06-30-2020
08:01 AM
You can do this: !pip3 install sklearn This will install the needed package. Note that ! is a magic command that executes the command not in your Python session but in the underlying OS environment.
... View more