Skip to content

Commit 7c59691

Browse files
authored
Merge branch 'master' into wrap-up
2 parents 5ffc90b + 90d39e5 commit 7c59691

File tree

8 files changed

+220
-5
lines changed

8 files changed

+220
-5
lines changed

.github/actions/spelling/allow/terms.txt

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,9 @@ Ohridski
3232
OMP
3333
OpenMP
3434
PTX
35+
QNAME
3536
RAII
37+
RNAME
3638
Resugaring
3739
SBO
3840
Slib
@@ -58,6 +60,7 @@ gitlab
5860
gpu
5961
gridlay
6062
gsoc
63+
hpc
6164
jit
6265
jitlink
6366
jthread
@@ -72,8 +75,11 @@ oop
7275
pushforward
7376
pythonized
7477
ramview
78+
ramntupleview
7579
reoptimize
80+
rntuple
7681
samtools
82+
samtoramntuple
7783
sbo
7884
sitemap
7985
softsusy
@@ -158,4 +164,5 @@ cartopiax
158164
Oncoprotein
159165
oncoprotein
160166
organoids
161-
paraview
167+
paraview
168+

_data/crconlist2025.yml

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@
4141
that detects possible data-race conditions that would enable reducing atomic
4242
operations in the Clad-produced code.
4343
44-
# slides: /assets/presentations/...
44+
slides: /assets/presentations/Maksym_Andriichuk_final_gsoc_atomic.pdf
4545

4646
- title: "Enable automatic differentiation of OpenMP programs with Clad"
4747
speaker:
@@ -59,7 +59,6 @@
5959
AST parsing and designing corresponding differentiation strategies. Additional
6060
contributions include example applications and comprehensive tests.
6161
62-
6362
# slides: /assets/presentations/...
6463

6564
- title: "Using ROOT in the field of Genome Sequencing"
@@ -82,8 +81,7 @@
8281
FASTQ compression from 14.2GB to 6.8GB. We also developed chromosome based
8382
file-splitting for larger genome file so that chromosome based data can be extracted.
8483
85-
86-
# slides: /assets/presentations/...
84+
slides: /assets/presentations/Aditya_Pandey_GSoC2025_final.pdf
8785
8886
- name: "CompilerResearchCon 2025 (day 1)"
8987
date: 2025-10-30 15:00:00 +0200
@@ -193,3 +191,4 @@
193191
comprehensive unit tests.
194192
195193
slides: /assets/presentations/Abdelrhman_final_presentation_support_usage_of_Thrust_API_in_clad.pdf
194+

_data/standing_meetings.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@
77
date: 2025-11-13 15:00:00 +0200
88
speaker: "Abhinav Kumar"
99
link: "[Slides](/assets/presentations/Abhinav_Kumar_GSoC25_final.pdf)"
10+
- title: "Summary: Activity analysis for reverse-mode differentiation of (CUDA) GPU kernels"
11+
date: 2025-11-13 15:30:00 +0200
12+
speaker: "Maksym Andriichuk"
13+
link: "[Slides](/assets/presentations/Maksym_Andriichuk_final_gsoc_atomic.pdf)"
1014
- title: "Wrap-Up: Implement and improve an efficient, layered tape with prefetching capabilities"
1115
date: 2025-10-30 15:40:00 +0200
1216
speaker: "Aditi Milind Joshi"
@@ -445,4 +449,8 @@
445449
date: 2025-10-30 15:20:00 +0200
446450
speaker: "Rohan Timmaraju"
447451
link: "[Slides](/assets/presentations/Rohan_Timmaraju_GSoC25_final.pdf)"
452+
- title: "Final Presentation: Using ROOT in the field of Genome Sequencing"
453+
date: 2025-11-13 16:20:00 +0200
454+
speaker: "Aditya Pandey"
455+
link: "[Slides](/assets/presentations/Aditya_Pandey_GSoC2025_final.pdf)"
448456

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
---
2+
title: "RAMTools: Extending ROOT for Genomic Data Processing"
3+
layout: post
4+
excerpt: "A GSoC 2025 project extending CERN's ROOT framework with the RNTuple format to efficiently process, store, and query large-scale genomic data."
5+
sitemap: true
6+
author: Aditya Pandey
7+
permalink: blogs/gsoc25_aditya_pandey_final_blog/
8+
banner_image: /images/blog/gsoc-banner.png
9+
date: 2025-11-15
10+
tags: gsoc c++ genomics bioinformatics cern root rntuple hpc
11+
---
12+
13+
## Introduction
14+
15+
Hello! I'm Aditya Pandey, and this summer I had the privilege of participating in Google Summer of Code (GSoC) 2025 with CERN-HSF as part of the Compiler Research Group. It has been an incredible experience working with my mentors, Vassil Vassilev and Martin Vassilev, on a project that bridges the gap between high-energy physics (HEP) and genomics.
16+
17+
## Project Overview
18+
19+
RAMTools is a project that extends ROOT—CERN's data processing framework—to efficiently handle genomic sequencing data. While ROOT was designed for petabyte-scale physics data, its cutting-edge features are perfectly suited for the challenges of modern genomics.
20+
21+
The core problem with traditional genomic formats like SAM/BAM is that they are row-oriented, making analytical queries on massive datasets slow and inefficient. My project introduces **RAM (ROOT Alignment/Map)**, a new system that leverages ROOT's latest columnar format, RNTuple. This provides:
22+
23+
- **Columnar Storage**: Optimal for fast analytical queries and high compression ratios.
24+
- **Parallel I/O**: Built-in support for concurrent read/write operations.
25+
- **Modern Compression**: Support for multiple algorithms (LZ4, LZMA, ZLIB, ZSTD).
26+
27+
By converting SAM data to the RNTuple format, we can achieve significant performance gains in both storage and query speed.
28+
29+
## Technical Implementation
30+
31+
The project was implemented in C++17 and built using CMake, relying on the ROOT framework (version 6.26+) for its RNTuple I/O subsystem.
32+
33+
### Architecture Components
34+
35+
1. **SAM Parser**: A custom, high-performance C++17 parser optimized for streaming and processing extremely large SAM files.
36+
37+
2. **RNTuple Writer**: An efficient data model that maps the fields of a SAM record (QNAME, FLAG, RNAME, POS, etc.) to a columnar RNTuple structure.
38+
39+
3. **Chromosome Splitter**: A key feature that allows for partitioning the output into separate files by chromosome, enabling trivial parallel processing of downstream analysis.
40+
41+
4. **Region Query Engine**: A fast query tool that leverages RNTuple's selective column reading to extract genomic regions (e.g., chr1:10150-10300) without reading the entire file.
42+
43+
### Command-Line Tools
44+
45+
The primary interaction with RAMTools is through two command-line executables:
46+
47+
#### SAM to RAM Conversion (`samtoramntuple`)
48+
49+
Converts a standard SAM file into the optimized RNTuple-based RAM format.
50+
51+
```bash
52+
# Basic conversion
53+
./tools/samtoramntuple input.sam output.root
54+
55+
# Split by chromosome for parallel processing
56+
# (Creates output-chr1.root, output-chr2.root, etc.)
57+
./tools/samtoramntuple input.sam output -split
58+
```
59+
60+
#### Region Querying (`ramntupleview`)
61+
62+
Queries a specific genomic region from a RAM file, similar to `samtools view`.
63+
64+
```bash
65+
# Usage: ./tools/ramntupleview [input.root] "[chromosome]:[start]-[end]"
66+
./tools/ramntupleview output.root "chr1:10150-10300"
67+
```
68+
69+
## Performance Achievements
70+
71+
We benchmarked RAMTools using the HG00154 sample from the 1000 Genomes Project, which consists of 196 million reads in a 72.1 GB uncompressed SAM file.
72+
73+
### Query Performance Comparison
74+
75+
RNTuple's columnar architecture shows significant speedups, especially for large region queries, when compared to the older ROOT TTree format and CRAM (industry-standard compressed format).
76+
77+
![Region Query Performance](/images/blog/genome_query_time.png)
78+
79+
The benchmarks demonstrate performance across three query sizes:
80+
81+
| Query Region | Size Category | Region Coordinates | RNTuple Time (s) | TTree Time (s) | CRAM Time (s) |
82+
|--------------|--------------|-------------------|------------------|----------------|---------------|
83+
| Small | 50M | chr1:1-50M | 6.69 | 1.29 | 0.34 |
84+
| Medium | 48M | chr21:1-48M | 6.84 | 35.70 | 7.81 |
85+
| Large | 100M | chr2:1-100M | 8.92 | 87.80 | 21.71 |
86+
87+
For the small region (chr1:1-50M), CRAM performs best due to its reference-based compression optimizations for sequential access. However, as query size increases:
88+
89+
- **Medium queries (chr21:1-48M)**: RNTuple is **5.2x faster** than TTree and competitive with CRAM
90+
- **Large queries (chr2:1-100M)**: RNTuple is **9.8x faster** than TTree and **2.4x faster** than CRAM
91+
92+
The performance advantage of RNTuple becomes more pronounced with larger analytical queries, making it ideal for whole-chromosome or multi-gene region analyses common in genomics research.
93+
94+
### Storage and Compression
95+
96+
RNTuple also provides excellent compression. The 72.1 GB SAM file was compressed down to 11.4 GB using ZSTD, a 6.3x compression ratio.
97+
98+
| Format | Compression Algo | File Size (GB) | Additional Requirements | Total Storage (GB) |
99+
|--------|-----------------|----------------|------------------------|-------------------|
100+
| SAM | Uncompressed | 72.1 | - | 72.1 |
101+
| CRAM | Reference-based | 7.8 | 3.2 GB reference file | 11.0 |
102+
| RAM-RNTuple | ZSTD | 11.4 | Self-contained | 11.4 |
103+
| RAM-TTree | LZMA | 12.5 | - | 12.5 |
104+
| RAM-TTree | ZLIB | 16.7 | - | 16.7 |
105+
| RAM-TTree | LZ4 | 31.2 | - | 31.2 |
106+
107+
The most significant achievement here is that the 11.4 GB RNTuple file is **completely self-contained**. This is a key advantage over formats like CRAM, which achieves a similar total storage size (11.0 GB) but is dependent on an external 3.2 GB reference genome. This self-contained nature simplifies data archival, distribution, and use in cloud environments immensely.
108+
109+
## Repository & Documentation
110+
111+
- **GitHub**: [RAMTools Repository](https://github.com/compiler-research/ramtools)
112+
113+
## Future Work
114+
115+
While GSoC has concluded, there is a clear path forward for RAMTools:
116+
117+
1. **More format Support**: Support for more formats for wide adaptation.
118+
119+
2. **Further Query Optimization**: Explore multi-threading in the query engine to parallelize data retrieval.
120+
121+
3. **Integration with Analysis Frameworks**: Investigate integration with popular bioinformatics frameworks or visualization tools.
122+
123+
## Conclusion
124+
125+
GSoC 2025 has been a phenomenal experience. I've had the opportunity to dive deep into high-performance C++ and solve real-world problems in genomics.
126+
127+
I am immensely grateful to my mentors, Vassil Vassilev and Martin Vassilev, for their invaluable guidance, insightful code reviews, and constant support. I also want to extend my thanks to the entire ROOT team, CERN-HSF, and Google for making this project possible. I look forward to continuing my contributions to this exciting intersection of science and technology.
128+
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
---
2+
title: "Activity analysis for reverse-mode differentiation of (CUDA) GPU kernels"
3+
layout: post
4+
excerpt: "A summary of my GSoC 2025 project focusing on activity analysis for reverse-mode differentiation of (CUDA) GPU kernels."
5+
sitemap: true
6+
author: Maksym Andriichuk
7+
permalink: blogs/gsoc25_andriichuk_final_blog/
8+
banner_image: /images/blog/gsoc-clad-banner.png
9+
date: 2025-14-11
10+
tags: gsoc clad cuda clang c++
11+
---
12+
13+
**Mentors:** Vassil Vassilev, David Lange
14+
15+
## A Brief Introduction
16+
17+
### Main idea
18+
19+
Over a year ago, we added support for differentiating CUDA kernels using Clad. Read more on that [here](https://compiler-research.org/blogs/gsoc24_christina_koutsou_project_final_blog/). We introduced atomic operations in Clad to prevent race conditions that frequently appear because of how Clad handles statements like ```x=y``` in the reverse mode. Since atomic operations are inefficient, we aim to remove them whenever we are sure no race condition occurs.
20+
21+
Another part of my GSoC project was to unify Varied and TBR analyses in how they store information during the analysis run. This would make the implementation of future analyses easier and remove even more adjoints, since Varied Analysis does not account for variable reassignments.
22+
23+
## Project Implementation
24+
25+
### 1. Removing atomic operations
26+
27+
Consider the code below:
28+
29+
```cpp
30+
__global__ void kernel_call(double *out, double *in) {
31+
int index = threadIdx.x + blockIdx.x * blockDim.x;
32+
out[index] = in[index];
33+
}
34+
}
35+
void fn(double *out, double *in) {
36+
kernel_call<<<1, 16>>>(out, in);
37+
}
38+
```
39+
40+
The adjoint that corresponds to ```out[index] = in[index]``` is:
41+
42+
```cpp
43+
{
44+
out[index0] = _t2;
45+
double _r_d0 = _d_out[index0];
46+
_d_out[index0] = 0.;
47+
atomicAdd(&_d_in[index], _r_d0);
48+
}
49+
```
50+
51+
Notice that in this case index is ```injective```, meaning no two threads from any two blocks have the same value of index. This means that when writing to ```_d_in[index]```, no two threads would be able to write to the same memory at the same time.
52+
53+
The implementation involves two static analyzers: one checks whether an index matches some particular form, and the other checks if it was not changed later. The hardest part is accounting for all possible term permutations of, say, ```threadIdx.x + blockIdx.x * blockDim.x``` and for expressions that depend on index linearly, i.e., ```2*index+1```.
54+
55+
### 2. Varied Analysis
56+
57+
The implementation looked very straightforward at first but turned out to be harder. Since the new infrastructure is more detailed, the analyses had to be improved. The tricky parts were supporting variable reassignments and loop handling. Support for pointers and OOP was added, and the analysis was enabled on all gradient tests numerically, which makes it almost default. However, there are more things to be done to produce even less code.
58+
59+
### 3. Benchmarks
60+
61+
To compare how much difference the analysis makes, we used the LULESH benchmark. The difference in execution time was about 5% across all problem sizes, which is pretty good for an analysis this small.
62+
63+
In trivial cases like the ```kernel_call``` function above, we got up to 5x speedup with a given number of blocks/threads.
64+
65+
## Future Work
66+
67+
- Adding more capabilities to the Varied Analysis
68+
- Adding more indices to consider injective
69+
70+
## Related Links
71+
72+
- [Clad Repository](https://github.com/vgvassilev/clad)
73+
- [My GitHub Profile](https://github.com/ovdiiuv)
125 KB
Binary file not shown.
265 KB
Binary file not shown.

images/blog/genome_query_time.png

96.8 KB
Loading

0 commit comments

Comments
 (0)