About Seaport

Seaport · ‎09-03-2021

Vidya, Thanks for your reply. Could you help me clarify the issue further? Does Spark (or other MapReduce tool) create the container using the local host as its template (to some degree)?

Seaport · ‎08-26-2021

I will use Spark2 in CDP and need to install Python3. Do I need to installation Python3 on every node in the CDP cluster, just only need to install it on one particular node? Spark2 job is executed in JVM containers that could be created on any worker node. I wonder whether the container is created upon a template? If yes, then how the template is created and where is it? Thanks.

Seaport · ‎11-04-2020

I resolved the error by following advice from this post. https://community.cloudera.com/t5/Support-Questions/Sharing-how-to-solve-HUE-and-HBase-connect-problem-on-CDH-6/td-p/82030

Seaport · ‎11-04-2020

I got the same error with HappyBase. My code has been working fine for a few weeks. Somehow Thrift API stopped. I restarted the API and then I got this error.

Seaport · ‎07-30-2020

The unpack command will not work without that extra dash. https://stackoverflow.com/questions/34573279/how-to-unzip-gz-files-in-a-new-directory-in-hadoop/43704452 I had another try with a file name as the destination. hdfs dfs -cat /user/testuser/stage1.tar.gz | gzip -d | hdfs dfs -put - /user/testuser/test3/stage1 the file stage1 appeared in the test3 directory. There is something interesting. The stage1.tar.gz contains three empty txt files. "hdfs dfs -cat /user/testuser/test3/-" ouptut nothing and the file size is 0.1k "hdfs dfs -cat /user/testuser/test3/stage1" output some texts including original file names inside. Also the file size is 10k.

Seaport · ‎07-30-2020

@Shelton Thanks for the quick response. Here is my code to create the gz file. tar cvzf ~/stage1.tar.gz ./* I tried the following command to upload and unzip it into a hdfs directory /user/testuser/test3 hdfs dfs -copyFromLocal stage1.tar.gz /user/testuser hdfs dfs -cat /user/testuser/stage1.tar.gz | gzip -d | hdfs dfs -put - /user/testuser/test3 However, what I got in /user/testuser/test3 is a file with the name "-", not the multiple files in the stage1.tar.gz. Does your solution mean to concatenate all files together? Please advise. Thanks.

Seaport · ‎07-30-2020

I am copying a large number of small files (hl7 message files) from Linux local storage to hdfs. I wonder whether this is a performance difference between copying files one by one (though a script) or just using one statement like "hadoop fs -put ./* /hadoop_path". Additional background info: some files have space in their file name, if I use the command "hadoop fs -put ./* /hadoop_path", I got the error "put: unexpected URISyntaxException" for those files. If there is no performance difference, I would just copy file one at a time and my script replaces the space with "%20". Otherwise, I have to rename all files, replacing spaces with underscores, and then use batch copy.

Seaport · ‎02-25-2020

I got the following responses from Cloudera Certification. Regarding Question #1, the FAQ page has the most the up-to-date information. So right now I'd better hold off purchasing the exam until the DE575 is relaunched. Regarding Question #2, the course is the "Spark and Hadoop Developer" training course is the one I should take for preparing DE575. Regarding Question #3, the environment for the exam is fixed and only available on CDH. Candidates do not have the option to take the exam in an HDP environment. The skills tested are applicable to HDP development as well, it is in the developer track, so it should have nothing to do with the environment that it is running in. It is primarily interested in transforming data that sits on the cluster.

Seaport · ‎02-19-2020

Finally, I figured out what is going on. The root cause is that, I only set up testuser on edge nodes, not the name node. I looked into this page, https://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-common/GroupsMapping.html, which shows that "For HDFS, the mapping of users to groups is performed on the NameNode. Thus, the host system configuration of the NameNode determines the group mappings for the users." After I created the user on the NameNode and ran the command hdfs dfsadmin -refreshUserToGroupsMappings the copy is successful and there is no permission-denied error.

Seaport · ‎02-10-2020

@GangWar Here it is. $ id -Gn testuser hadoop wheel hdfs

Online	Offline
Last Visited	‎12-18-2024 04:34 PM

Member Since	‎04-03-2019 03:26 PM
Last Visited	‎12-18-2024 04:34 PM
Posts	91
Kudos received	6

Cloudera Community

Re: Zeppelin admin account does not have any permi...

Re: Preparation for CCP Data Engineer Exam (DE575)

Re: Owner Group Write Permision to an HDFS path

Re: "Ask a Question" button disappeared

Re: Yarn error - Skipping AM assignment as cluster...

Re: Do I need to install Python3 on every CDP node...

Do I need to install Python3 on every CDP node?

Re: happybase not working

Re: happybase not working

Re: Copy Files from Linux to HDFS - individually v...

Re: Copy Files from Linux to HDFS - individually v...

Copy Files from Linux to HDFS - individually vs in...

Re: Preparation for CCP Data Engineer Exam (DE575)

Re: Owner Group Write Permision to an HDFS path

Re: Owner Group Write Permision to an HDFS path