This study proposes a machine learning framework for predicting perceived urban quality of life (QoL) by integrating visual features from street-level imagery with personal attributes, including demographic, socioeconomic, and travel behavior data. Two supervised models, support vector machine (SVM) and multilayer perceptron (MLP), were trained and evaluated separately for Bangkok and London to compare performance across different urban contexts. Model performance was assessed using mean squared error (MSE), providing a clear and quantitative basis for evaluation. Results show that combining visual and personal features improves prediction accuracy compared to using visual features alone, highlighting the importance of incorporating both environmental and individual-level data. Statistical feature selection further identified income, education, housing stability, and travel patterns as consistently important predictors of QoL, although their relative influence varied across the two cities. These findings suggest that while certain socioeconomic variables are universally relevant, local conditions shape how these factors interact with the built environment in influencing perceived QoL. Overall, this study demonstrates the potential of machine learning to complement traditional survey-based assessments and provide scalable, cost-effective tools for urban planning and mobility research.