Jan 17, 2006

A time consuming case with kerbreos error

Situation:
==========
two DCs domain
PDC failed due to hard disk corruption, Exchange stopped. After that the customer tried to seize all fsmo roles but failed. (Side note: this was because he didn't have a DNS on BDC).
Then what he did was to restored BDC using a cloned image, changed the time back to the date when the image was made.

Still the exchange services failed to start. So he built a brand new forest+exchange and that ran properly.

Goal
====
Now he wanted to get old mails back

Troubleshooting
===============
1. Created DNS and verified that all records were registerred correctly
2. Exchange services still couldn't start
3. Netdiag reported:
DC list test . . . . . . . . . . . : Failed
[WARNING] Cannot call DsBind to dc.xxx.com (192.168.231.104). [SEC_E_WRONG_PRINCIPAL]

System Log reported:

Event ID : 4
Raw Event ID : 4
Category : None
Source : Kerberos
Type : Error
Generated : 2005-12-6 0:43:43
Written : 2005-12-6 0:43:43
Machine : MAIL
Message : The kerberos client received a KRB_AP_ERR_MODIFIED error from the
server host/dc.xxx.com. The target name used was ldap/dc.xxx.com/xxx.com@xxx.com. This indicates that the password used to encrypt the kerberos service ticket
is different than that on the target server. Commonly, this is due to identically named machine accounts in the target realm (xxx.COM), and the client realm.
Please contact your system administrator.

5. checked into AD with ldifde dump, there was not any duplicate machine account, service principal name whatsoever.
6. There was not dupliate A records in either forward lookup zone or reverse lookup zone
7. All report on DC was perfectly OK
8. Reset secure channel on Exchange server was successful, but problem persisted
9. Found that we could not open ADUC on Exchange server, nor could we remotely connect to DC using Event Viewer
10. Capture network traffic trace, the error was "krb5krb_ap_err_modified", which still pointed to duplicated machine accounts or SPNs. But we did not have any duplicates

*** What else that is not correctly reported by all above information could be wrong ? ****

11. Disjoined Exchange from domain (in order to get a fresh computer account in domain)
12. Got error "target principal name incorrect" when we re-joined it back
13. We got another copy of network trace. In it we found
KERBEROS: Error code (error-code[6]) = Pre-authentication information was invalid
14. we tried everything we could think of, such as to stop antivirus, check Stored User and Password, etc. No luck.

Resolution:
===========
15. Reset the secure channel for the DC itself. (But weird enough that DC didn't report anything wrong if there had been secure channel issue!)

You can reset single DC environment password as follows:
nl_test /sc_change_pwd:domain
or
netdom resetpwd /server:IPofDC ......(do NOT stop KDC)