Member since
12-21-2017
67
Posts
3
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1443 | 10-15-2018 10:01 AM | |
4540 | 03-26-2018 08:23 AM |
12-07-2018
01:44 AM
1 Kudo
Hi @Jagadeesan A S my current save mode is append. My sparking streaming apps will run every 5 min, it is not convenient to delete manually....So I think the better solution is customize the temp location. Or Can I set offset of the scheduled running time? For example, my current 2 apps every 5 minutes, that's run at 0, 5, 10, 15, 20 Can I set a schedule, make one still runs at 0, 5, 10 , 15, and another runs at 2.5, 7.5, 10.5?
... View more
12-06-2018
04:32 AM
Thanks @Jagadeesan A S _temporary is a temp directory under path of the df.write.parquet(path) on hdfs. However spark.local.dir default value is /tmp, and in document, Directory to use for "scratch" space in Spark, including map output files and RDDs that get stored on disk. This should be on a fast, local disk in your system. So it should be a directory on local file system. I am not sure spark.local.dir refers to the temp directory of spark writing ...
... View more
12-06-2018
02:16 AM
I have two spark applications writing data to one directory on HDFS, which cause the faster completed app will delete the working directory _temporary containing some temp file belonging to another app. So can I specify a _temporary directory for each Spark application?
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
11-30-2018
02:36 AM
I use kafka with default value(PLAINTEXT://localhost:9092), however, it broadcasts it's hostname instead of ip address. The result is, if the producer runs a host without the kafka host info, it cannot send messages to kafka. How to resolve it?
... View more
Labels:
- Labels:
-
Apache Kafka
11-13-2018
08:49 AM
Thanks @KB And another question: When my spark application writing massive of data to hdfs, it always throws error message like following: No lease on /user/xx/sample_2016/_temporary/0/_temporary/attempt_201604141035_0058_m_019029_0/part-r-19029-1b93e1fa-9284-4f2c-821a-c83795ad27c1.gz.parquet:File does not exist.HolderDFSClient_NONMAPREDUCE_1239207978_115 does not have any open files.
How to solve this problem? I search online and others said it is related to dfs.datanode.max.xcievers
... View more
11-13-2018
08:33 AM
I increase the ipc max length according to this https://community.hortonworks.com/questions/101841/issue-requested-data-length-146629817-is-longer-th.html The hdfs service seems back to work.
... View more
11-13-2018
07:46 AM
Thanks @KB I have reset the dfs.datanode.max.xcievers to 32768, is it still too high? I increase it to avoid "No lease on file (inode 5425306)" error. So what's the proper value for this property? If I set the value to a proper value, will the missing block be recovered automatically?
... View more
11-13-2018
02:21 AM
Yesterday I add three more data nodes to my hdfs cluster with hdp 2.6.4. Few hours later, because of sparking writing error(No lease on...), I increase dfs.datanode.max.xcievers to 65536 and increase the heap size of name node and data node from 5G to 12G. And then restart it. However, the hdfs restart progress pauses in name node stage. It shows it is always in safe mode, and continues for 10 minutes. I force to leave the safe mode manually, and then hdfs reports a lot blocks are missing(about more than 90%). I checked the log of datanode and namenode, there are two kinds of error log: 1. In name node: Requested data length ** is longer than maximum configured RPC length ** 2. In data node: End of file exception between local host is "***", destination host is "**:8020" So how can I recovery my missing file? and what's the actual cause of this problem?
... View more
Labels:
- Labels:
-
Apache Hadoop
10-15-2018
10:01 AM
1 Kudo
Solve by using HttpFs. It set a gateway where no need to access data node.
... View more
10-11-2018
06:46 AM
I need to read files on hdfs which is distributed on another network, where I can access it via proxy. Now I can access the directories, files status via WebHDFS by specific port successfully. However when I try to read the file content, it redirects me to another data node url, while I only have access to name node. So how can I read the file in a convenient way?
... View more
Labels:
- Labels:
-
Apache Hadoop