Skip to content

Conversation

@DinhLongHuynh
Copy link
Contributor

Empty DataFrame Handling Issue in ArticlesProcessor

Problem Overview

When no papers are found on a specific day, the search returns an empty list, which gets converted to an empty DataFrame in the ArticlesProcessor class. This causes errors in all processing methods because they assume the DataFrame has the expected structure and columns.

Error Flow

  1. Empty Search Resultsarticles = []
  2. Empty DataFrame Creationpd.DataFrame.from_dict([]) creates DataFrame with no columns
  3. Processing Methods Fail → All methods assume specific columns exist

Specific Errors

  • KeyError: "None of [Index(['databases', 'publication_date', 'title', 'keywords', 'url'], dtype='object')] are in the [columns]"
  • AttributeError: 'Series' object has no attribute 'apply'
  • IndexError: When accessing non-existent columns

Root Cause

  • No validation layer in processing methods
  • Assumption of data existence - code assumes papers will always be found

Solution

Add validation layer in each ArticlesProcessor function to:

  1. Check if DataFrame is empty before processing
  2. Initialize expected structure if empty
  3. Use defensive programming to handle edge cases
  4. Maintain consistent output format regardless of input

This ensures robust and reliable operation even when no papers are found.

@VladimirShitov VladimirShitov self-requested a review July 28, 2025 23:00
@VladimirShitov
Copy link
Collaborator

Hey @DinhLongHuynh , thank you so much for the contribution! I will test it and happily merge asap if there are no problems

Copy link
Collaborator

@VladimirShitov VladimirShitov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, looks great! I simplified the code a bit, but it should yield the same result. I've also added a test case with an empty paper list in telegram

@VladimirShitov VladimirShitov merged commit 3645b00 into theislab:main Aug 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants