Welcome to Netrider ... Connecting Riders!

Interested in talking motorbikes with a terrific community of riders?
Signup (it's quick and free) to join the discussions and access the full suite of tools and information that Netrider has to offer.

**Very Technical Discussion** - Dissapearing DNS Records

Discussion in 'The Pub' at netrider.net.au started by jirf88, Jan 4, 2010.

  1. Folks;

    It seems to me that a certain number of you around here are in the same field of work as I. And seeing as I've been scratching my head over this for a fair while now, I figure its time to start canvassing other peoples experiences. Never hurt anyone, and you never know, we might all just learn something :)

    Over the past few months I've been experiencing a loss of DNS records from my forward lookup zone. At this stage it seems to be restricted to Hostnames (A) only. Touch wood. Frequency of ~3 p/m that I've noticed. So far, nothing major has dissapeared, just printers, a few switches and PCs, so this could be happening more frequently than I know about. I have not noticed any repeat offenders, nor have I noticed a correlation between patch application and issues.

    The netowrk is fairly large, 8 nameservers in all, all of which are DCs. As of now, there are 1545 records :)o) These machines are running a funbag mix of 2k3 and 2k.

    Scavenging is off, Dynamic updates is set to secure only.



    Go forth, and scour your wealth of knowledge for the answer!

    Cheers guys!
     
     Top
  2. I assume that you would have propbably checked this, but will ask anyway.
    Have you found anything odd in the eventvwr? I have heard that the records can be lost after restarts. Are any of the servers having reboot issues?

    I take it the lookups should be replicating - is this happening properly?

    http://support.microsoft.com/kb/887597
     
     Top
  3. No and yes. The logs contain a few errors relating to not being able to contact the directory services, but I know what caused that. Adding a record will see it replicate properly. At this point I'm thinking that the issue is stemming from one if the name servers, but what one and why need to be answered.

    Asking simple questions is often the best place to start. Ocams razor is true more often than you'd think.
     
     Top
  4. Yeah very true.
    I am thinking it is just one of the servers dropping the record.
    Are you able to turn off the service on a server to see if that fixes the issue?
    I.e, Turn it off one server for say 3 days. Still occurs then turn it back on and try the next one.
    If you could I would start with the 2k servers.
     
     Top
  5. Hi Jirf88, Have you considered conflicting/overlapping IP addresses with your DHCP server? Or a rogue DHCP server on your network causing DNS updates when it allocates an IP.

    You have probably done this already but check that your Primary DNS servers do not allow updates except from known trusted servers by either Key or IP address restrictions.

    That's all I can think of for now. :-k
     
     Top
  6. More obvious, what are your scavenging settings set to?

    Typical settings, if enabled, are 7 days, but any sysadmin can change these. Scavenging stale records is good to remove dynamically assigned records where the DHCP settings are not set to remove their DNS records. Stale records are normally defined as records where the value has not been updated or confirmed within the settings. There are two values that are set and it's important that they work together. Again, normally restricted to dynamically assigned records.

    Aside from all of that, there is no other automated method (AFAIK) by which DNS records should be removed from a DNS zone, whether it's AD integrated or a standard zone.
     
     Top
  7. Reading this thread gave me a headache.
     
     Top
  8. Cj: scavenging is off ;) sounded good in theory, but when we had it on a while back it made the dns binge and purge records more often than the olsen twins.

    Speed demon: I warned yo'ass :p
     
     Top
  9. Ah, ok, well if scavenging is off then there's no automated way for the records to be removed. Just to check, scavenging can be set at the server and the zone, have you checked both (yeah, I know, grandma and sucking eggs, but you've got to ask!).

    That brings us back to security and rights and privileges, as well as a misconfiguration somewhere.

    Are the records changing always different or are they the same devices over and over again?

    Assuming all DNS servers are AD integrated, it's hard to audit all 8 servers for object access and change. Is membership of the required groups (DNSAdmins) tightly controlled? Any chance of a help desk person making changes accidentally?

    Are the records all manually added or are they dynamic? (printers and switches could be either I suppose depending on how you assign addresses).

    Remote chance...is there an old DC that someone might have that is being turned on and off (perhaps being done as an authoritative restore in a lab?)

    After that, all I can think of is to review/remove membership of groups and the security of the actual zones. Good luck, problems like these are fun!
     
     Top
  10. Thanks guys, these are all excellent suggestions. Like I said, its been my experience that difficult problems can often be solved simply by getting someone else to state the obvious. So keep the ideas coming.

    As for you cj:

    -All of them are AD integrated.
    -Records are assigned dynamically, unless we have a need to do it otherwise.
    -The records that are breaking are not the same.

    So, ya reckon its a security issue? Deleting miscellaneous hostnames is a bit of a tedious way to cause havoc don't you think? Ill check it out anyway, you never know.

    IT is fun!
     
     Top
  11. Are/were the dhcp leases for the devices that have dropped been up for renewal at the time this happened?
     
     Top
  12. Errm, perhaps. The instance that happened yesterday is possible - that was a PC that gets its address from the DHCP scope. The switches and printer I mentioned though, they are assigned their addressed by reservations.

    Actually, if I had to hedge a bet, I would say that this happens more frequently with devices that have a reserved address.
     
     Top
  13. ..nerds!!.....

    :bolt:
     
     Top
  14. O.K. when you stated that scavenging was turned off I was left scratching my head, but that reserved addresses are showing a greater number of issues really breaks all logic (Especially with no scavenging)
    Are you absolutely sure there is no scavenging on any server, or that some one has a DNS server on the domain that you aren’t in control of?
    I suppose this leads to asking is this a computer industry site (IE You may have devs with just enough knowledge to be dangerous) or is it supporting a different industry type?
     
     Top
  15. I don't think it is a security issue, but if are adamant that no servers in any zones have scavenging enabled, the alternatives are limited.

    I have had some horrors with scavenging when the intervals didn't concur with the dhcp lease times. If you notice the issue more with what are effectively static devices, then that makes more sense to me. A switch and printer rarely check in before their lease time whereas a pc is switched on and off routinely checks in before the 1/2 interval time. If the lease times are long and scavenging times are low, it is possible for the entry to be deleted with no user interaction. But you say it isn't set anywhere. Mmmmm
     
     Top
  16. It's been a few years but cejay's comment about lease times rings a bell. I remember coming across a similar problem in my IT management days. I got in touch with my then network admin to see if he remembered it and he's not sure but mentioned the issue of static equipment and scavenging as a possibility.

    He's going to look back through the records (he's pretty anal about record keeping) and see what he can find.
     
     Top
  17. Oh yes, I forgot to ask.

    What does DNSDiag say?
     
     Top
  18. Our setup has this problem too, I'd be interested to see what fixes it as the Windows admins at work seem to be stumped (admittedly that seems to be their default position)
     
     Top
  19. Stigger is it the same symptoms?
    Is this issue new?
    Could it have coincided with a M$ patch (I have been out of the M$ Admin space for the last 3 months managing some WebSphere monkeys)
    In short is there a setting change that has been rolled out without you being aware?
     
     Top
  20. Sounds very very similar, it's been noticeable for about 3 months or more. No idea what the settings are the Windows admins manage DNS, well internal anyway we just manage bind on the external DNS. So it only becomes a problem for me when both environments need to talk to each other...
     
     Top