This paper examines adaptive agents' behavior in a stochastic dynamic version of the Hotelling's location model. We conduct an agent-based numerical simulation under the Hotelling's setting with two agents who use the Nash Q-learning mechanism for adaptation.
This allows exploring what alternations this technique brings compared to the original analytic solution of the famous static game-theoretic model with strong assumptions imposed on players. We discover that under the Nash Q-learning and quadratic consumer cost function, agents with high enough valuation of future profits learn behavior similar to aggressive market strategy.
Both agents make similar products and lead a price war to eliminate their opponent from the market. This behavior closely resembles the Principle of Minimum Differentiation from Hotelling's original paper with linear consumer costs.
However, the quadratic consumer cost function would otherwise result in the maximum differentiation of production in the original model. Thus, the Principle of Minimum Differentiation can be justified based on repeated interactions of the agents and long-run optimization.