FinMMR

Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

Beijing University of Posts and Telecommunications, Beijing, China
ICCV 2025
^*Corresponding author.

Abstract

We present FinMMR, a novel bilingual multimodal benchmark tailored to evaluate the reasoning capabilities of multimodal large language models (MLLMs) in financial numerical reasoning tasks. Compared to existing benchmarks, our work introduces three significant advancements.

(1) Multimodality: We meticulously transform existing financial reasoning datasets, and construct novel questions from the latest Chinese financial research reports. The dataset comprises 4.3K questions and 8.7K images spanning 14 categories, including tables, bar charts, and ownership structure charts.

(2) Comprehensiveness: FinMMR encompasses 14 financial subdomains, including corporate finance, banking, and industry analysis, significantly exceeding existing benchmarks in financial domain knowledge breadth.

(3) Challenge: Models are required to perform multi-step precise numerical reasoning by integrating financial knowledge with the understanding of complex financial images and text. The best-performing MLLM achieves only 51.4% accuracy on Hard problems.

We believe that FinMMR will drive advancements in enhancing the reasoning capabilities of MLLMs in real-world scenarios.

FinMMR

Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

Overview of the FinMMR dataset. FinMMR presents three challenges: (1) visual perception: 8.7K financial images of 14 categories; (2) knowledge reasoning: 4.3K financial questions of 14 subdomains; (3) numerical computation: multi-step precise calculation.

Sampled FinMMR examples with two language (i.e. English and Chinese), rich images and different knowledge. The questions and images need expert-level visual perception, knowledge reasoning and numerical computation.

Degradation of Qwen2.5-VL-72B on all subsets due to distractor images and improvement achieved by the filtering-reasoning pipeline on the medium validation set under PoT setting.

Improvements of different MLLMs with knowledge augmentation on the 1,160 problems of FinMMR under PoT setting.

Results of model combinations and individual models.

Abstract

Paper

BibTeX