Member since
05-02-2017
360
Posts
65
Kudos Received
22
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13382 | 02-20-2018 12:33 PM | |
1514 | 02-19-2018 05:12 AM | |
1865 | 12-28-2017 06:13 AM | |
7150 | 09-28-2017 09:25 AM | |
12192 | 09-25-2017 11:19 AM |
04-09-2017
06:33 PM
Hi @Anand Pawar Ofcourse you should not consider the header! While analyzing you should remove the header only then you will be able to get proper output. As you have mentioned it will end up in misinterpretation and sometime error. Also this can be handled easily in whatever the tool you choose in hadoop. If you are storing it in a hive table then use tblproperties("skip.header.line.count"="1"); to skip the header. If it is in pig then you can skip the first line while processing it. For sure you should not consider the header in the file when you analyze the data but however you can store the file with header. Hope this would answer your question.
... View more
04-07-2017
12:58 PM
@Prabhu Muthaiyan Glad that it helped you. Happy Hadooping!!
... View more
04-07-2017
11:29 AM
@Jonathan Samelson Non DFS space will be flushed out in if more memory is needed for processing few jobs. Also there will be an interval set in the configuration for clean up as well. Depending on that the memory will be cleared and used by hadoop. In your case when processing 50MB, it doesn't cost much memory than compared to processing 2.2 GB. Based on the size of file used for processing memory will be allocated/de-allocated. I guess it might answer your question. Below link might give some insights on allocation of memory. https://books.google.co.in/books?id=H3mvcxPeUfwC&pg=PA114&lpg=PA114&dq=dfs.datanode.du.reserved+example&source=bl&ots=pYyIud-Ix9&sig=qAkLTAkAtWCdITL1DiNqwEXrBqU&hl=en&sa=X&ved=0ahUKEwjgkvXZnpLTAhVHqI8KHVSRAgYQ6AEIPTAF#v=onepage&q=dfs.datanode.du.reserved%20example&f=false
... View more
04-07-2017
09:33 AM
@Jonathan Samelson Exactly. Non DFS is a reserved space allocated for hadoop services. Which means it will not used for data storage but however this space will be used by hadoop like an intermediate layer for various process/jobs which are triggered in hadoop. Similar to staging layer in typical ETL processing.
... View more
04-06-2017
07:21 PM
Hi @Jonathan Samelson Please refer this link https://community.hortonworks.com/questions/92839/hdfs-dropping-hive-table-is-not-freeing-up-memory.html#answer-92872
... View more
04-06-2017
07:18 PM
@Saikrishna Tarapareddy Could you help me with your query? Is it throwing an error when you are using it in hive or the data which you were expecting is not replicated by the code?
... View more
04-06-2017
07:12 PM
@Prabhu Muthaiyan Filter the data from hive prod and load it into a file and then as mentioned by @Namit Maheshwari use distcp to transfer between different environments. If you want to limit the data without any filters being applied filter only a set of files under a HDFS folder.
... View more
04-05-2017
05:47 PM
@Christopher Daumer Create an external hive table with ORC and point it to your ORC file location. CREATE EXTERNAL TABLE IF NOT EXISTS mytable
(col1 bigint,col2 bigint) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS ORC
location '<ORC File location';
I dont think you can convert an ORC into csv directly. But you can write an UDF to do it.
... View more
04-04-2017
07:57 PM
Hi @Maeve Ryan It seems Disk Usage (Non DFS Used) is higher than DFS Used. DFS Used is your data in data node where as Non DFS Used is not exactly the data but some kind of log files which will be held by Yarn or other hadoop services. Check for huge files available in the disk. Then remove those files which will free free up your memory .Verify whats the size of DFS used before and after deleting the data file from HDFS. That will give an amount of memory available after the deletion. Memory calculation for non DFS: Non DFS Used = Configured Capacity - Remaining Space - DFS Used du -hsx * | sort -rh | head -10 helps to get the top 10 huge files.
... View more
04-03-2017
05:37 PM
Thanks @mqureshi
... View more