Abstract
This study develops a behaviorally informed deep reinforcement learning (DRL) framework for algorithmic portfolio optimization. The model integrates two well-established behavioral biases, loss aversion and overconfidence, into an actor-critic architecture. Unlike conventional DRL systems that assume fully rational agents, the proposed framework incorporates investor heterogeneity through regime-dependent bias thresholds that adjust position sizing, while the underlying RL policy determines trading direction. To adaptively switch among three behavioral models loss loss-averse, overconfident, and neutral, the framework employs TimesNet to generate one-step-ahead market regime forecasts. All decisions follow a strict walk-forward evaluation protocol that precludes access to future information and ensures realistic out-of-sample performance measurement. The framework is evaluated across two major financial domains: the cryptocurrency market (2018-2024) and the Dow Jones Industrial Average (2008-2024). The integrated BBAPT architecture, which combines TimesNet with behavioral DRL, consistently outperforms benchmark strategies including neutral RL agents, classical Markowitz portfolios, and equally weighted allocations. In cryptocurrency markets, BBAPT achieves the highest risk-adjusted performance, while in equity markets it delivers improved risk-return outcomes even after accounting for time-varying index constituents. Overall, the empirical evidence demonstrates that embedding behavioral finance principles into reinforcement learning enhances robustness, adaptability, and risk-adjusted returns in non-stationary environments. These findings position behaviorally informed DRL as a promising foundation for next-generation algorithmic trading systems.