About Mark_Petronic

Mark_Petronic · ‎04-27-2016

Actually, I tried restarting them before the reboot. Restarted everything. Still had the errors. Then did the reboot and they cleared. Oh well, is what it is!

Mark_Petronic · ‎04-26-2016

We'll, I would still love to understand how this worked but, a reboot of the 4 DNs made this error go away. Never would have thunk it! That's pretty strange... Anyway, maybe this will help some other poor soul who hits this same condition. 🙂

Mark_Petronic · ‎04-26-2016

So, I was installing a new cluster for our development QA testing and failed to adjust the default configuration that Ambari gave me for dfs.datanode.data.dir. It ended up putting in every partition it found. I really only wanted one - /grid/1, which was a dedicated disk partition for HDFS block storage. I discovered this blunder after the complete installation completed. I did not want to just redo the whole install, opting instead to try to manually fix this as a good deep dive learning experience. I got it all worked out, and it was a good learning exercise, but I have one lingering issue that I cannot solve. I am getting 4 Ambari errors (one for each DN) that state: Detected data dir(s) that became unmounted and are now writing to the root partition: /grid/1/hadoop/hdfs/data . I figured out that Ambari agents monitor (and remember) which HDFS directories were previously mounted and it checks to see if a mounted disk goes away for some reason and displays an error. That's all fine - I get that. However, my setup seems to be correct yet Ambari is still complaining. As part of correcting the configuration issue I laid upon myself, I did edit the following file (on each DN) to remove all those previously caches mount points that I did not want and I just left the one I did want. I ended up just stopping HDFS, removing all the /opt/hadoop/hdfs, //tmp/hadoop/hdfs, etc directories, removing the name node metadata directories, reformatting the namenode, and starting up HDFS. The file system is up an working. But, can anyone tell me why I cannot get rid of this Ambari error? Here's the contents of one of dfs_data_dir_mount.hist files. All 4 are exactly the same. Below showa the mount where I have a disk for HDFS data storage. It all looks good. I must be missing something obvious. I did restart everything - nothing clears this error. Thanks in advance... [root@vmwqsrqadn01 ~]# cat /var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist # This file keeps track of the last known mount-point for each DFS data dir. # It is safe to delete, since it will get regenerated the next time that the DataNode starts. # However, it is not advised to delete this file since Ambari may # re-create a DFS data dir that used to be mounted on a drive but is now mounted on the root. # Comments begin with a hash (#) symbol # data_dir,mount_point /grid/1/hadoop/hdfs/data,/grid/1 [root@vmwqsrqadn01 ~]# mount -l | grep grid /dev/sdb1 on /grid/1 type ext4 (rw,noatime,nodiratime,seclabel,data=ordered)

Mark_Petronic · ‎04-18-2016

@Ancil McBarnett, Could you please explain why you propose /dev as a separate 32G partition?

Mark_Petronic · ‎03-19-2016

@Ryan Chapin Looking for some suggestions. We have a query that was hanging indefinitely in Hive on Tez. We are running with HDP 2.4.0. After some debugging, we narrowed it down to a single ORC file in a Hive partition that contained that file plus about 10 others. If we move this one file out of the partition and test the query, it now completes. If we include ONLY that one file in the partition, the query hangs. Even a simple "select * from table" hangs. The query never gets beyond the first map task. I then discovered the ORC file dump feature of Hive and ran the following on this file: hive --orcfiledump --skip-dump --recover -d hdfs://vmwhaddev01:8020/tmp/000002_0 > orc.dump This command never returns to the command line and hangs, similar to what Hive does. I Tested this on a known good file and the dump completes successfully and returns control to the command line as expected. So, even this dump test is hanging. If I tail orc.dump during the hang, the last line of the file looks complete. So, I am wondering if this line is the end of a stripe and the next stripe is corrupt? The ORC reader seems to get into some infinite loop at this point. Once the dumped output file size stops increasing, at about 382 MB, top command shows the Hive dump process continuously using about 99.9% CPU until I CTRL+C the command. It dumps about 382 MB of data before the dump command hangs. Here's last line which is complete: {"_col0":"DSN001000325021","_col1":10784199,"_col2":1457431200,"_col3":1457434800,"_col4":20209,"_col5":0,"_col6":60,"_col7":10,"_col8":1456222331,"_col9":120,"_col10":117,"_col11":114,"_col12":0,"_col13":0,"_col14":0,"_col15":0,"_col16":0,"_col17":0,"_col18":120,"_col19":121,"_col20":123,"_col21":124,"_col22":125,"_col23":0,"_col24":0,"_col25":15296815,"_col26":2,"_col27":0,"_col28":163528,"_col29":88,"_col30":1498,"_col31":29082,"_col32":874908,"_col33":51565,"_col34":104138,"_col35":149,"_col36":6,"_col37":0,"_col38":0,"_col39":1508,"_col40":0,"_col41":2248,"_col42":46961,"_col43":624,"_col44":1732,"_col45":0,"_col46":0,"_col47":0,"_col48":0,"_col49":41,"_col50":159,"_col51":12,"_col52":30,"_col53":0,"_col54":0,"_col55":0,"_col56":0,"_col57":0,"_col58":0,"_col59":0,"_col60":0,"_col61":1398668,"_col62":5916915,"_col63":5916855,"_col64":5986115,"_col65":249,"_col66":66,"_col67":547,"_col68":76,"_col69":132618,"_col70":17398,"_col71":140325,"_col72":19012,"_col73":0,"_col74":0,"_col75":0,"_col76":0,"_col77":"TUC04HNSIGW63B002Adv","_col78":1456805959,"_col79":0,"_col80":158,"_col81":136,"_col82":0,"_col83":0,"_col84":1,"_col85":0,"_col86":12,"_col87":12,"_col88":0,"_col89":0,"_col90":0,"_col91":0,"_col92":12,"_col93":12,"_col94":0,"_col95":0,"_col96":0,"_col97":0,"_col98":18,"_col99":14,"_col100":0,"_col101":0,"_col102":0,"_col103":0,"_col104":12,"_col105":12,"_col106":0,"_col107":0,"_col108":0,"_col109":0,"_col110":51565,"_col111":37383,"_col112":0,"_col113":402,"_col114":0,"_col115":449,"_col116":3126,"_col117":46256,"_col118":28682,"_col119":4,"_col120":0,"_col121":0,"_col122":0,"_col123":0,"_col124":0,"_col125":0,"_col126":0,"_col127":0,"_col128":0,"_col129":0,"_col130":0,"_col131":0,"_col132":0,"_col133":0,"_col134":0,"_col135":0,"_col136":0,"_col137":0,"_col138":0,"_col139":0,"_col140":0,"_col141":0,"_col142":0,"_col143":"20.2.1-3569","_col144":0,"_col145":0,"_col146":0,"_col147":0,"_col148":0,"_col149":0,"_col150":0,"_col151":0,"_col152":0,"_col153":0,"_col154":0,"_col155":0,"_col156":0,"_col157":0,"_col158":0,"_col159":0,"_col160":0,"_col161":0,"_col162":0,"_col163":0,"_col164":0,"_col165":0,"_col166":0,"_col167":0,"_col168":0,"_col169":0,"_col170":0,"_col171":0,"_col172":0,"_col173":4,"_col174":"12-AUG-15","_col175":12,"_col176":"HT1100","_col177":"1","_col178":"B4WB16S2","_col179":"CORE_DSN_PROD_HT1100_50K","_col180":"SW_DSN_PROD_HT1100_50K","_col181":"3.2.0.24","_col182":"1000"} I am trying to determine if I have uncovered a bug or somehow the data that I am inserting into the ORC somehow resulted in this condition. Either way, it seems like a bug if you can insert data that causes an ORC file to become corrupted. The ingest pipeline for this data is as follows. I convert raw CSV files into Avro and land them in an HDFS directory. There could be multiple Avro schemas in play here as there are multiple versions of these CSV files in flight. The Avro schemas are designed such that I can include all versions in the same Hive table. Typically, newer versions of these stats files add more columns of stats. Once a day, I move all the Avro files that have accumulated to a temp directory and create an external table over the files in that directory I run a query that selects * from the external table and inserts all the results into another Hive managed table that is in ORC format, effectively using Hive to perform the Avro to ORC conversion. This query also performs a join with some data from one other table to enrich the data landing in the ORC table. This table is partitioned by year/month/day. Because the resulting ORC files are relatively small for HDFS, I perform one final step after the ORC insert query completes. I run a Hive query against the newly created partition to effectively compact the ORC files. Typically, the reduce part generates around 70 ORC files. I run a query like the following for the appropriate year, month, and day of the partition just created which typically compacts all 70 ORC files into about 5 much larger ones that are about 2-3 HDFS blocks (128 MB) in size each. alter table table_name partition (year=2016, month=3, day=16) concatenate; This is the first such issue we've seen in over two months of ingesting such files in this manner. Does anyone have any ideas of where to look further to possibly understand the root cause of this problem? Maybe the concatenate operation happened to cause the file corruption in this case? Anyone heard of such a thing? Should I file a bug report and provide this corrupt ORC file for some forensic analysis? I don't really want to start trying to hex dump and decode ORC to figure out what happened.

Mark_Petronic · ‎03-07-2016

Thank you @vpoornalingam. That helps demystify this step. I tried the same query as @Rich Raposa and it did not list any of my FSRoots. Then I realized the --config path in the example is wrong for my install of Ambari 2.2 and HDP 2.3.0. It should be "/etc/hive/conf/conf.server". Using that I do get the expected listing. [hive@vmwhaddev01 ~]$ hive --config /etc/hive/conf/conf.server --service metatool -listFSRoot Initializing HiveMetaTool.. <info traces> Listing FS Roots.. hdfs://vmwhaddev01:8020/apps/hive/warehouse hdfs://vmwhaddev01:8020/apps/hive/warehouse/jupstats.db hdfs://vmwhaddev01:8020/apps/hive/warehouse/aggregates.db hdfs://vmwhaddev01:8020/apps/hive/warehouse/imports.db

Mark_Petronic · ‎03-05-2016

@Ancil McBarnett Hey, man, thanks for this write up. Very helpful in gaining insight to the big picture. I am working on the requirements for my prod cluster. Question for you... Based on my somewhat novice knowledge, it seems like overkill to use RAID-10 and 4 1TB drives for the HA NNs when you are running QJM (which I assume is the case as the diagram also shows ZKs and JNs). All the edits go to the JN JBOD disks. So, that just leaves a couple fsimage files for the RAID-10 arrangement against 1TB - isn't that a bit over kill on storage?. Another confusion point for me is in that the NN disk layout diagram above it shows fsimage and edits going to the RAID-10 disks. Edits are written to the JNs against the one JBOD disk, right? So, is the diagram misleading or am I missing some intended message there? Here's a question I just asked related to this comment.

Mark_Petronic · ‎03-05-2016

I am trying to determine and plan the best disk layout for my active/standby NNs for a new production rollout that is going to run with QJM NN HA. I plan to have three servers, each running an instance of ZK and JN. The plan, per recommendations against another question I asked on this forum, is to have a dedicated RAID-1 disk on each server for the JNs to use for edit logs. I expect that array to use 256-512 GB sized disks. Each server will also have a dedicated disk for the OS, logs, tmp, etc, also RAID-1 using two 1.2TB drives. Each ZK instance will also have dedicated disks (spindles) per recommendations here. I am having a hard time answering this question... Where to store the fsimage files? Could I, for example, store them on the same RAID-1 disk that the JNs are using? I do plan to collocate two of the three JNs on the same two servers running the NNs and the third JN on a third server so, the NN would have access to the same drives used by the JNs. This collocation seems to be a commonly recommend arrangement. Or, should the fsimage files be pointed to a separate RAID-1 disk array just for that purpose? Another option would be to point the fsimage files to a separately size partition on the OS RAID-1 disk array. These questions do NOT come from a sizing perspective, but more from a workload perspective. Fitting the files somewhere is easy to figure out. The real question is about performance impacts of mixing, for example, the background checkpointing operations done by the standby namenode with the work being done by the JNs to save edits and putting all that onto the same spindle. I see clearly that ZK should be kept on it's own spindle due to how it using a write ahead log and how latency is a huge concern in that case to impacting ZK performance. I just don't have a good feel for mixing the two work loads of checkpointing and edit log updates. Can someone please make some recommendation here?

Mark_Petronic · ‎03-05-2016

I am currently planning ahead in preparation to deploy a production cluster that will run using QJM NN HA. Reading this, all looks reasonable except when I get to the last step (15). It seems to instruct you to change configuration directly on the hosts. Won't that be overwritten by Ambari in subsequent pushes of other config updates? Shouldn't these changes be made via Ambari config changes?

Mark_Petronic · ‎02-13-2016

Thanks for the clarifications. All makes sense now.

Online	Offline
Last Visited	‎12-01-2018 04:39 PM

Member Since	‎11-24-2015 02:54 PM
Last Visited	‎12-01-2018 04:39 PM
Posts	56
Kudos received	58

Cloudera Community

Re: How to change location of avro.schema.url loca...

Re: Detected data dir(s) that became unmounted and...

Re: Best practice for Avro schema/field naming reg...

Re: Hive's "alter table partition concatenate" no...

Re: Detected data dir(s) that became unmounted and...

Re: Detected data dir(s) that became unmounted and...

Detected data dir(s) that became unmounted and are...

Re: Cheat Sheet and Tips for a Custom Install of H...

Help understanding corrupt ORC file in Hive

Re: How to configure namenode high availability

Re: Cheat Sheet and Tips for a Custom Install of H...

Where to write fsimage files when running QJM NN H...

How to configure namenode high availability

Re: Understanding check pointing with namenode HA