The stochastic dynamic programming equation is clearly,
V(n,t) =(1-p)V(n,t-1) + p max_{1 leq k leq x} (1- alpha^k) V(n-k,t-1)
where V(0,x)=1.Let k(n,t) be the optimal number of missiles to fire when the state is (n,t) and an enemy fighter is encountered. There are three `obvious' conjectures:
Property (C) is easy to prove. Property (A) has been proved (although the proof is not easy. See [1] and [2]. Property (B) has never been proved. But it has been tested extensively by numerical solution of the dynamic programming equation. This resolution of (B) remains an intriguing open problem that has been outstanding for 50 years.
I have reported in [3], however, that (B) is false for a very similar problem, in which there is no n, but the bomber pilot simply wishes to maximize the expected number of epochs he survives before failing to win an encounter with an enemy fighter. The DP equation is very similar.
Intuitively, the non-montonity of k(n) comes about because with n missles the bomber pilot can expect to survive about 3 encounters and should fire about n/3 missles for each, whereas with n+1 missles he can expect to survive about 4 encounters and should fire about (n+1)/4 for each. It is this phenomenon that makes the problem different than that originally posed.
More recently, I have found that (B) is false if (1- alpha^k) is replaced by some other concave increasing function of k. See
1. Klinger, A. and Brown, T.A. (1968) Allocating unreliable units to random demands. Stochastic Optimization and Control. Editor Karreman, Publ. 20, Math Research Center, U.S. Army and Univ. Wisconsin, Wiley, NY.
2. Samuel, E. (1970) On some problems in operations research. J. Appl. Prob. 7, 157-164.
3. Weber, R.R. (1985) A problem of ammunition rationing. In Stochastic Dynamic Optimization and Applications in Scheduling and Related Fields, Ed. F.J. radermacher, G. Ritter and S.M. Ross, University of Passau.
4. Weber, R.R. (2011) ABCs of the bomber problem and its relatives, Annals of Operations Research, available here.