Large dataframes containing strings generate invalid xlsx files

For large dataframes containing strings with 500k rows and 60+ columns, pyexcelerate generates invalid .xlsx files that cannot be opened in Excel. Excel gives the error message: "We found a problem with some content in yourfile.xlsx. Do you want us to try to recover as much as we can?". Answering yes removes all invalid content, which yields an empty worksheet.

```
from string import ascii_lowercase
from pandas import DataFrame
from pyexcelerate import Workbook
from uuid import uuid4

num_rows = 501000
num_cols = 64

# generate dataframe with column names
column_names = [ f"Col{idx}" for idx in range(0, num_cols)]
df = DataFrame(random.choice(list(ascii_lowercase), size=(num_rows, num_cols)), columns=column_names)

# write to file
output_filename = f"/tmp/{str(uuid4())}.xlsx"

wb = Workbook()
ws = wb.new_sheet("sheet name", data=df.values.tolist())
wb.save(output_filename)
```

The xlsx file is successfully written to disk, is 138Mb in size, but generates the above-mentionned error when being opened in Excel. Further inspection when opening the file in Python reveals that the Zip archive is invalid.

Interestingly, the problem only occurs for large dataframes filled with strings. The problem does *not* occur:
- For smaller dataframes
- For dataframes of the same size filled only with integers

Versions:
- OS: macOS Catalina 10.15.7; Darwin Kernel Version 19.6.0
- PyExcelerate==0.10.0
- pandas==1.3.5


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Large dataframes containing strings generate invalid xlsx files #167

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Large dataframes containing strings generate invalid xlsx files #167

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions