About balavignesh_nag

balavignesh_nag · ‎04-09-2017

Hi @Anand Pawar Ofcourse you should not consider the header! While analyzing you should remove the header only then you will be able to get proper output. As you have mentioned it will end up in misinterpretation and sometime error. Also this can be handled easily in whatever the tool you choose in hadoop. If you are storing it in a hive table then use tblproperties("skip.header.line.count"="1"); to skip the header. If it is in pig then you can skip the first line while processing it. For sure you should not consider the header in the file when you analyze the data but however you can store the file with header. Hope this would answer your question.

balavignesh_nag · ‎04-07-2017

@Prabhu Muthaiyan Glad that it helped you. Happy Hadooping!!

balavignesh_nag · ‎04-07-2017

@Jonathan Samelson Non DFS space will be flushed out in if more memory is needed for processing few jobs. Also there will be an interval set in the configuration for clean up as well. Depending on that the memory will be cleared and used by hadoop. In your case when processing 50MB, it doesn't cost much memory than compared to processing 2.2 GB. Based on the size of file used for processing memory will be allocated/de-allocated. I guess it might answer your question. Below link might give some insights on allocation of memory. https://books.google.co.in/books?id=H3mvcxPeUfwC&pg=PA114&lpg=PA114&dq=dfs.datanode.du.reserved+example&source=bl&ots=pYyIud-Ix9&sig=qAkLTAkAtWCdITL1DiNqwEXrBqU&hl=en&sa=X&ved=0ahUKEwjgkvXZnpLTAhVHqI8KHVSRAgYQ6AEIPTAF#v=onepage&q=dfs.datanode.du.reserved%20example&f=false

balavignesh_nag · ‎04-07-2017

@Jonathan Samelson Exactly. Non DFS is a reserved space allocated for hadoop services. Which means it will not used for data storage but however this space will be used by hadoop like an intermediate layer for various process/jobs which are triggered in hadoop. Similar to staging layer in typical ETL processing.

balavignesh_nag · ‎04-06-2017

Hi @Jonathan Samelson Please refer this link https://community.hortonworks.com/questions/92839/hdfs-dropping-hive-table-is-not-freeing-up-memory.html#answer-92872

balavignesh_nag · ‎04-06-2017

@Saikrishna Tarapareddy Could you help me with your query? Is it throwing an error when you are using it in hive or the data which you were expecting is not replicated by the code?

balavignesh_nag · ‎04-06-2017

@Prabhu Muthaiyan Filter the data from hive prod and load it into a file and then as mentioned by @Namit Maheshwari use distcp to transfer between different environments. If you want to limit the data without any filters being applied filter only a set of files under a HDFS folder.

balavignesh_nag · ‎04-05-2017

@Christopher Daumer Create an external hive table with ORC and point it to your ORC file location. CREATE EXTERNAL TABLE IF NOT EXISTS mytable (col1 bigint,col2 bigint) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS ORC location '<ORC File location'; I dont think you can convert an ORC into csv directly. But you can write an UDF to do it.

balavignesh_nag · ‎04-04-2017

Hi @Maeve Ryan It seems Disk Usage (Non DFS Used) is higher than DFS Used. DFS Used is your data in data node where as Non DFS Used is not exactly the data but some kind of log files which will be held by Yarn or other hadoop services. Check for huge files available in the disk. Then remove those files which will free free up your memory .Verify whats the size of DFS used before and after deleting the data file from HDFS. That will give an amount of memory available after the deletion. Memory calculation for non DFS: Non DFS Used = Configured Capacity - Remaining Space - DFS Used du -hsx * | sort -rh | head -10 helps to get the top 10 huge files.

balavignesh_nag · ‎04-03-2017

Thanks @mqureshi

Online	Offline
Last Visited	‎10-03-2019 09:01 AM

Member Since	‎05-02-2017 01:47 PM
Last Visited	‎10-03-2019 09:01 AM
Posts	360
Kudos received	64

Cloudera Community

Re: what is the best way to get ftp file to hdfs c...

Re: when yarn communicates with the namenodes when...

Re: [TEZ] are partition, sort and shuffle built-in...

Re: CASE statement Error in Beeline HIVE

Re: hive query to display Week of the timestamp an...

Re: For HDPCD exam, Is it necessary to remove hea...

Re: Need to bring prod hive table data into test e...

Re: Removing files in HDFS does not free up space

Re: Removing files in HDFS does not free up space

Re: Removing files in HDFS does not free up space

Re: Help with Hive Regex extract.

Re: Need to bring prod hive table data into test e...

Re: How to convert ORC file into CSV, or how to cr...

Re: HDFS - Dropping HIVE table is not freeing up m...

Re: I am new to hive. how to delete Hive default f...