Skip to content

Conversation

@ad3002
Copy link

@ad3002 ad3002 commented Sep 30, 2025

Problem

TRF crashed when processing T2T (telomere-to-telomere) chromosome assemblies longer than 1-2 GB. The errors varied depending on chromosome size, initially appearing to be a genome size issue but actually stemming from individual chromosome length limitations. Example Lissotriton helveticus genome assembly, chromosome: 4_1

The root cause was the use of 32-bit integer types (int, unsigned int) for sequence indices and lengths, which limited processing to ~2GB (2^31 bytes for signed int).

Solution

Replaced 32-bit integer types with 64-bit size_t types throughout the codebase for all sequence-related indices, positions, and lengths. This enables TRF to process sequences up to system memory limits (theoretically 2^64 bytes).

Changes by file

tr30dat.h - Core data type replacements:

  • int Lengthsize_t Length (global sequence length variable)
  • unsigned int maxwraplengthsize_t maxwraplength
  • Updated structures: FASTASEQUENCE, pairalign, TRFPARAMSET, bestperiodlistelement, distanceentry, distancelist, distanceseenarrayelement, distancelistelement

trfrun.h - Index and position variables:

  • Updated index_list structure (repeat positions)
  • Local variables: charcount, flankstart, flankend
  • Functions: LoadSequenceFromFileBenson, LoadSequenceFromFileEugene

trf.c - Command-line parsing:

  • Added ParseSize() function using strtoull() for 64-bit values
  • Updated -l parameter parsing for maxwraplength

tr30dat.c - Memory allocation:

  • Updated new1Darrayfunc macro for correct memory allocation
  • Updated format specifiers: %d%zu for all output

Testing

Successfully processes amphibian chromosomes and T2T assemblies exceeding 1-2 GB without crashes.

…nd consistency for amphibian chromosomes large than 1-2Gb
…uild and Apple Silicon support

- Add build.sh: a dependency-free build script (uses CC env var, detects platform/arch, compiles src/trf.c)
- Add Makefile.simple: minimal Makefile for building, versioned binary creation, install and test targets
- Update README.md: document the simple build (build.sh), Makefile.simple usage, one-command gcc build, and native ARM Mac (Apple Silicon) support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant