This study proposes a novel reward function for deep deterministic policy gradient (DDPG) based bipedal robot walking control. The reward incorporates target reaching and a new terms promoting stability and natural gaits via body orientation angles. This design encourages desired behaviors while adapting to diverse robot morphologies. Additionally, action-space noise (Ornstein-Uhlenbeck process) and parameter-space noise (Gaussian noise on stiffness, damping, friction) are introduced to enhance DDPG’s exploration efficiency and achieve better policy learning. This combined noise strategy facilitates exploration of diverse terrains and promotes adaptive behavior. The reward function is analyzed for its impact on gait patterns and leg loading, investigating its influence on mimicking human walking and load distribution. Simulations demonstrate the robot’s learning capability, achieving coordinated gait, balance, and successful termination. Torque analysis across leg joints and movement axes is conducted. The proposed approach, combining modified rewards and action/parameter space noise, offers a promising solution to mitigate local minima issues in DDPG. The MATLAB®/Simulink Reinforcement Learning toolbox is employed.