Abstract
Diagnosing skin cancer (melanoma vs. non-melanoma) accurately remains challenging, particularly in differentiating subtypes in clinical settings. Current Deep Learning (DL) models are limited by uni-modal data (often only dermatoscopic images), poor generalizability, and a lack of transparency regarding inherent biases11. This paper proposes an Explainable Multimodal Deep Learning (EM-DL) framework for skin cancer subtype prediction. We fuse non-invasive images (e.g., dermatoscopy/clinical images) with tabular clinical data (demographics, lesion history) using a Transformer-based fusion network22. Training is conducted on a centralized, augmented multi-center dataset to enhance cross-domain robustness3. Finally, we integrate XAI (SHAP and Grad-CAM) 4 to audit model fairness across protected subgroups (e.g., Fitzpatrick skin type, ethnicity, gender) and provide interpretable feature attributions, establishing a new standard for ethical and globally scalable AI diagnostics.