Member since
02-15-2019
9
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2387 | 02-07-2019 04:17 PM |
02-02-2021
02:35 AM
You have to call hsync with the SyncFlag.UPDATE_LENGTH argument
... View more
02-01-2021
11:38 AM
I have an application that uses FSDataOutputStream to write data to HDFS. In order to write that data I use FSDataOutputStream's hflush function. In order to obtain the number of bytes that have been written I use FSDataOutputStream's getPos function. For some reason after hflush has been called, getPos returns the wrong file size most of the time (sometimes it is correct). My understanding is that when I call hflush, and after that when I call getPos, the file size in HDFS has to be equal (in bytes) to what getPos returns, but getPos always returns something greater! As though half of the file is still stuck in some buffer and hasn't reached a physical disk... I read about the hsync function of FSDataOutputStream. I started using hsync instead of hflush, because it guarantees that the data will not be buffered and will be written to disk. But the problem still persists, it is very rare now, but I still have the same issue. 10% of the time, when I call hsync, and then getPos, the file size in HDFS is less than what getPos returns. Why is this happening and how can I synchronize getPos with hsync?
... View more
Labels:
- Labels:
-
HDFS
02-15-2019
08:03 PM
@Harsh J you are a genius! Thanks a lot!
... View more
02-15-2019
05:39 PM
1 Kudo
Hey guys, I have already asked this on multiple forums but never got a reply, so I thought that I might get one here. I have an about 1 gig dataset, and it's got a "cityid" column of which there are 324 unique values, so after partitioning I should get 324 folders in hdfs. But whenever I partition, it fails, you can look at the exception messages here https://community.hortonworks.com/questions/238893/notenoughreplicasexception-when-writing-into-a-par.html It's definitely an HDFS issue, because everything worked out on MapR. What could possible be the problem? Btw, I tried this on a fresh install of hortonworks and cloudera and with default settings, so nothing was compromised. If you need any more details please ask. Could this be a setup issue or something? Like maybe I need to increase memory somewhere in the HDFS or something?
... View more
Labels:
- Labels:
-
Apache Hive
-
HDFS
02-08-2019
11:08 AM
Hey, thanks so much!
... View more
02-07-2019
08:33 PM
In the picture I attached to this post you can see my current log level value. This does not work. In /var/log/hadoop/hdfs/hadoop-hdfs-namenode-sandbox-hdp.hortonworks.com.log I can only see INFO and WARN messages.
... View more
Labels:
- Labels:
-
Apache Hadoop
02-07-2019
04:17 PM
Are you familiar with user defined functions?
... View more