Sep 22, 2008

Blackhole routers and how they affect your AD environment

Recently I've been working on a Kerberos authentication issue with servers that connect to DCs via VPN. The servers can join domain fine, users can log into domain from these servers, and browsing domain resources seem no problem. However, for a particular application, it always fails to get the AS ticket from DC.

Ideally, to troubleshoot such issue, I'd like to have a DS version MPSreport to start. Unfortunately, these servers are in a very secure environment. To get the report off the servers, or to install MPSreport on them is almost impossible - unless I am willing to go through a lot procedures, approvals, and sign-off.

With my past experience and after a few other basic tests on DCs, I ran the following ping tests on DCs, which I have access to.

C:\>ping -f -l 1473 clientIP

Pinging clientIP with 1473 bytes of data:

Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.

Ping statistics for clientIP:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),


Below is wrong. It means package was dropped somewhere

C:\>ping -f -l 1472 clientIP

Pinging clientIP with 1472 bytes of data:

Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for clientIP:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

Small packets are getting through

C:\Documents and Settings\191742717admin>ping -f -l 1361 clientIP

Pinging clientIP with 1361 bytes of data:

Reply from clientIP: bytes=1361 time=27ms TTL=122
Reply from clientIP: bytes=1361 time=27ms TTL=122
Reply from clientIP: bytes=1361 time=25ms TTL=122
Reply from clientIP: bytes=1361 time=26ms TTL=122

Ping statistics for clientIP:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 25ms, Maximum = 27ms, Average = 26ms

The other thing is that from DC to the client we got the following:

C:\>portqry -n clientIP -e 135

Querying target system called:
clientIP
Attempting to resolve IP address to a name...
IP address resolved to client.domain.com
TCP port 135 (epmap service): FILTERED

The above result, ideally, should be Listening. Being FILTERED may be result of blackhole router(s).

This is a typical issue that caused by blackhole routers. There are a bunch of MS KBs that are dedicated to this. We most often see such issue with servers that communicate across WAN.

The solution is to upgrade/replace firmware/hardware so they can respond correctly when handling oversize packets. But this is not always possible when we are talking about Internet. One of the workaround is to cap the MTU on both DCs and clients so they send packets that are smaller than the smallest size that backholing routers can handle without fragment.

BTW, our network department had couple of network traces but unable to reveal the packet lose (or why they got lost). What a supprise.

2010/05/20: Edit: Even when there is not black hole router, because largest UDP packet could be larger then the smallest MTU along the path, kerberos could still fail. This is because defragmented UDP packets will be dropped at destination if they are not arriving in the right order. If it's cross WAN, besides avoiding black hole routers, it's also important to force kerberos to use TCP only. See KB 244474