Skip to content

AfterQuery/FinanceQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities of Large Language Models

📄 Read the paper here

🤗 Check out our eval on Hugging Face here

FinanceQA is a comprehensive testing suite designed to evaluate LLMs' performance on complex financial analysis tasks that mirror real-world investment work. The dataset aims to be substantially more challenging and practical than existing financial benchmarks, focusing on tasks that require precise calculations and professional judgment.

Results

Abstract: FinanceQA is a testing suite that evaluates LLMs’ performance on complex numerical financial analysis tasks that mirror real-world investment work. Despite recent advances, current LLMs fail to meet the strict accuracy requirements of financial institutions, with models failing approximately 60% of realistic tasks that mimic on-the-job analyses at hedge funds, private equity firms, investment banks, and other financial institutions. The primary challenges include hand-spreading metrics, adhering to standard accounting and corporate valuation conventions, and performing analysis under incomplete information - particularly in multi-step tasks requiring assumption generation. This performance gap highlights the disconnect between existing LLM capabilities and the demands of professional financial analysis that are inadequately tested by current testing architectures. Results show that higher-quality training data is needed to support such tasks, which we experiment with using OpenAI’s fine-tuning API. FinanceQA is publicly released at https://huggingface.co/datasets/AfterQuery/FinanceQA.

Description

The dataset contains two main categories of questions:

  1. Tactical Questions: Questions based on financial documents that test calculation accuracy, accounting standards, assumption-making, and real-world practices.

    • Basic questions
    • Assumption-based questions (requiring inference with incomplete information)
  2. Conceptual Questions: Questions testing understanding of financial relationships, logical derivations, industry estimations, and accounting principles.

Fields

The dataset contains the following components:

  • context: Relevant sections from primary financial documents (e.g., 10-K sections)
  • question: The specific financial analysis task or query
  • answer: The correct calculation or response
  • chain_of_thought: The reasoning logic to arrive at the correct answer
  • question_type: Categorization as either "basic", "assumption", or "conceptual"
  • company: The company in question
  • file_link: The link to the source of the context field
  • file_name: The file name of the source of the context field

Hugging Face

Link

About

FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities in Large Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published