About darouwan

darouwan · ‎12-07-2018

Hi @Jagadeesan A S my current save mode is append. My sparking streaming apps will run every 5 min, it is not convenient to delete manually....So I think the better solution is customize the temp location. Or Can I set offset of the scheduled running time? For example, my current 2 apps every 5 minutes, that's run at 0, 5, 10, 15, 20 Can I set a schedule, make one still runs at 0, 5, 10 , 15, and another runs at 2.5, 7.5, 10.5?

darouwan · ‎12-06-2018

Thanks @Jagadeesan A S _temporary is a temp directory under path of the df.write.parquet(path) on hdfs. However spark.local.dir default value is /tmp, and in document, Directory to use for "scratch" space in Spark, including map output files and RDDs that get stored on disk. This should be on a fast, local disk in your system. So it should be a directory on local file system. I am not sure spark.local.dir refers to the temp directory of spark writing ...

darouwan · ‎12-06-2018

I have two spark applications writing data to one directory on HDFS, which cause the faster completed app will delete the working directory _temporary containing some temp file belonging to another app. So can I specify a _temporary directory for each Spark application？

darouwan · ‎11-30-2018

I use kafka with default value(PLAINTEXT://localhost:9092), however, it broadcasts it's hostname instead of ip address. The result is, if the producer runs a host without the kafka host info, it cannot send messages to kafka. How to resolve it?

darouwan · ‎11-13-2018

Thanks @KB And another question: When my spark application writing massive of data to hdfs, it always throws error message like following: No lease on /user/xx/sample_2016/_temporary/0/_temporary/attempt_201604141035_0058_m_019029_0/part-r-19029-1b93e1fa-9284-4f2c-821a-c83795ad27c1.gz.parquet:File does not exist.HolderDFSClient_NONMAPREDUCE_1239207978_115 does not have any open files. How to solve this problem? I search online and others said it is related to dfs.datanode.max.xcievers

darouwan · ‎11-13-2018

I increase the ipc max length according to this https://community.hortonworks.com/questions/101841/issue-requested-data-length-146629817-is-longer-th.html The hdfs service seems back to work.

darouwan · ‎11-13-2018

Thanks @KB I have reset the dfs.datanode.max.xcievers to 32768, is it still too high? I increase it to avoid "No lease on file (inode 5425306)" error. So what's the proper value for this property? If I set the value to a proper value, will the missing block be recovered automatically?

darouwan · ‎11-13-2018

Yesterday I add three more data nodes to my hdfs cluster with hdp 2.6.4. Few hours later, because of sparking writing error(No lease on...), I increase dfs.datanode.max.xcievers to 65536 and increase the heap size of name node and data node from 5G to 12G. And then restart it. However, the hdfs restart progress pauses in name node stage. It shows it is always in safe mode, and continues for 10 minutes. I force to leave the safe mode manually, and then hdfs reports a lot blocks are missing(about more than 90%). I checked the log of datanode and namenode, there are two kinds of error log: 1. In name node: Requested data length ** is longer than maximum configured RPC length ** 2. In data node: End of file exception between local host is "***", destination host is "**:8020" So how can I recovery my missing file? and what's the actual cause of this problem?

darouwan · ‎10-15-2018

Solve by using HttpFs. It set a gateway where no need to access data node.

darouwan · ‎10-11-2018

I need to read files on hdfs which is distributed on another network, where I can access it via proxy. Now I can access the directories, files status via WebHDFS by specific port successfully. However when I try to read the file content, it redirects me to another data node url, while I only have access to name node. So how can I read the file in a convenient way?

Online	Offline
Last Visited	‎04-02-2018 12:02 AM

Member Since	‎12-21-2017 12:43 AM
Last Visited	‎04-02-2018 12:02 AM
Posts	67
Kudos received	3

Cloudera Community

Re: Access file on hdfs via proxy

Re: NullPointerException when running spark on Zep...

Re: How to change Spark _temporary directory when ...

Re: How to change Spark _temporary directory when ...

How to change Spark _temporary directory when writ...

How to set kafka to broadcast the ip address rathe...

Re: A lot of blocks missing in HDFS

Re: A lot of blocks missing in HDFS

Re: A lot of blocks missing in HDFS

A lot of blocks missing in HDFS

Re: Access file on hdfs via proxy

Access file on hdfs via proxy