Described herein is a method and system for training nonlinear adaptive filters (or neural networks) which have embedded memory. Such memory can arise in a multi-layer finite impulse response (FIR) architecture, or an infinite impulse response (IIR) architecture. We focus on filter architectures with separate linear dynamic components and static nonlinear components. Such filters can be structured so as to restrict their degrees of computational freedom based on a priori knowledge about the dynamic operation to be emulated. The method is detailed for an FIR architecture which consists of linear FIR filters together with nonlinear generalized single layer subnets. For the IIR case, we extend the methodology to a general nonlinear architecture which uses feedback. For these dynamic architectures, we describe how one can apply optimization techniques which make updates closer to the Newton direction than those of a steepest descent method, such as backpropagation. We detail a novel adaptive modified Gauss-Newton optimization technique, which uses an adaptive learning rate to determine both the magnitude and direction of update steps. For a wide range of adaptive filtering applications, the new training algorithm converges faster and to a smaller value of cost than both steepest-descent methods such as backpropagation-through-time, and standard quasi-Newton methods. We apply the algorithm to modeling the inverse of a nonlinear dynamic tracking system 5, as well as a nonlinear amplifier 6.