compiler-research
diff --git a/‎.github/actions/spelling/allow/terms.txt‎
Lines changed: 8 additions & 1 deletion b/‎.github/actions/spelling/allow/terms.txt‎
Lines changed: 8 additions & 1 deletion
diff --git a/‎_data/crconlist2025.yml‎
Lines changed: 3 additions & 4 deletions b/‎_data/crconlist2025.yml‎
Lines changed: 3 additions & 4 deletions
diff --git a/‎_data/standing_meetings.yml‎
Lines changed: 8 additions & 0 deletions b/‎_data/standing_meetings.yml‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎_posts/2025-11-14-using-root-in-the-field-of-genome-sequencing.md‎
Lines changed: 128 additions & 0 deletions b/‎_posts/2025-11-14-using-root-in-the-field-of-genome-sequencing.md‎
Lines changed: 128 additions & 0 deletions
diff --git a/‎_posts/2025-14-11-activity-for-cuda.md‎
Lines changed: 73 additions & 0 deletions b/‎_posts/2025-14-11-activity-for-cuda.md‎
Lines changed: 73 additions & 0 deletions
diff --git a/‎assets/presentations/Aditya_Pandey_GSoC2025_final.pdf‎
125 KB b/‎assets/presentations/Aditya_Pandey_GSoC2025_final.pdf‎
125 KB
diff --git a/‎assets/presentations/Maksym_Andriichuk_final_gsoc_atomic.pdf‎
265 KB b/‎assets/presentations/Maksym_Andriichuk_final_gsoc_atomic.pdf‎
265 KB
diff --git a/‎images/blog/genome_query_time.png‎
96.8 KB b/‎images/blog/genome_query_time.png‎
96.8 KB
@@ -32,7 +32,9 @@ Ohridski
 OMP
 OpenMP
 PTX
+QNAME
 RAII
+RNAME
 Resugaring
 SBO
 Slib
@@ -58,6 +60,7 @@ gitlab
 gpu
 gridlay
 gsoc
+hpc
 jit
 jitlink
 jthread
@@ -72,8 +75,11 @@ oop
 pushforward
 pythonized
 ramview
+ramntupleview
 reoptimize
+rntuple
 samtools
+samtoramntuple
 sbo
 sitemap
 softsusy
@@ -158,4 +164,5 @@ cartopiax
 Oncoprotein
 oncoprotein
 organoids
-paraview
+paraview
+
@@ -41,7 +41,7 @@
         that detects possible data-race conditions that would enable reducing atomic 
         operations in the Clad-produced code.
         
-      # slides: /assets/presentations/...
+      slides: /assets/presentations/Maksym_Andriichuk_final_gsoc_atomic.pdf
 
     - title: "Enable automatic differentiation of OpenMP programs with Clad"
       speaker:
@@ -59,7 +59,6 @@
         AST parsing and designing corresponding differentiation strategies. Additional 
         contributions include example applications and comprehensive tests.
 
-        
       # slides: /assets/presentations/...
 
     - title: "Using ROOT in the field of Genome Sequencing"
@@ -82,8 +81,7 @@
         FASTQ compression from 14.2GB to 6.8GB. We also developed chromosome based 
         file-splitting for larger genome file so that chromosome based data can be extracted. 
 
-        
-      # slides: /assets/presentations/...
+        slides: /assets/presentations/Aditya_Pandey_GSoC2025_final.pdf
 
 - name: "CompilerResearchCon 2025 (day 1)"
   date: 2025-10-30 15:00:00 +0200
@@ -193,3 +191,4 @@
         comprehensive unit tests.
         
       slides: /assets/presentations/Abdelrhman_final_presentation_support_usage_of_Thrust_API_in_clad.pdf
+
@@ -7,6 +7,10 @@
       date: 2025-11-13 15:00:00 +0200
       speaker: "Abhinav Kumar"
       link: "[Slides](/assets/presentations/Abhinav_Kumar_GSoC25_final.pdf)"
+      - title: "Summary: Activity analysis for reverse-mode differentiation of (CUDA) GPU kernels"
+      date: 2025-11-13 15:30:00 +0200
+      speaker: "Maksym Andriichuk"
+      link: "[Slides](/assets/presentations/Maksym_Andriichuk_final_gsoc_atomic.pdf)"
     - title: "Wrap-Up: Implement and improve an efficient, layered tape with prefetching capabilities"
       date: 2025-10-30 15:40:00 +0200
       speaker: "Aditi Milind Joshi"
@@ -445,4 +449,8 @@
       date: 2025-10-30 15:20:00 +0200
       speaker: "Rohan Timmaraju"
       link: "[Slides](/assets/presentations/Rohan_Timmaraju_GSoC25_final.pdf)"
+    - title: "Final Presentation: Using ROOT in the field of Genome Sequencing"
+      date: 2025-11-13 16:20:00 +0200
+      speaker: "Aditya Pandey"
+      link: "[Slides](/assets/presentations/Aditya_Pandey_GSoC2025_final.pdf)"
 
@@ -0,0 +1,128 @@
+---
+title: "RAMTools: Extending ROOT for Genomic Data Processing"
+layout: post
+excerpt: "A GSoC 2025 project extending CERN's ROOT framework with the RNTuple format to efficiently process, store, and query large-scale genomic data."
+sitemap: true
+author: Aditya Pandey
+permalink: blogs/gsoc25_aditya_pandey_final_blog/
+banner_image: /images/blog/gsoc-banner.png
+date: 2025-11-15
+tags: gsoc c++ genomics bioinformatics cern root rntuple hpc
+---
+
+## Introduction
+
+Hello! I'm Aditya Pandey, and this summer I had the privilege of participating in Google Summer of Code (GSoC) 2025 with CERN-HSF as part of the Compiler Research Group. It has been an incredible experience working with my mentors, Vassil Vassilev and Martin Vassilev, on a project that bridges the gap between high-energy physics (HEP) and genomics.
+
+## Project Overview
+
+RAMTools is a project that extends ROOT—CERN's data processing framework—to efficiently handle genomic sequencing data. While ROOT was designed for petabyte-scale physics data, its cutting-edge features are perfectly suited for the challenges of modern genomics.
+
+The core problem with traditional genomic formats like SAM/BAM is that they are row-oriented, making analytical queries on massive datasets slow and inefficient. My project introduces **RAM (ROOT Alignment/Map)**, a new system that leverages ROOT's latest columnar format, RNTuple. This provides:
+
+- **Columnar Storage**: Optimal for fast analytical queries and high compression ratios.
+- **Parallel I/O**: Built-in support for concurrent read/write operations.
+- **Modern Compression**: Support for multiple algorithms (LZ4, LZMA, ZLIB, ZSTD).
+
+By converting SAM data to the RNTuple format, we can achieve significant performance gains in both storage and query speed.
+
+## Technical Implementation
+
+The project was implemented in C++17 and built using CMake, relying on the ROOT framework (version 6.26+) for its RNTuple I/O subsystem.
+
+### Architecture Components
+
+1. **SAM Parser**: A custom, high-performance C++17 parser optimized for streaming and processing extremely large SAM files.
+
+2. **RNTuple Writer**: An efficient data model that maps the fields of a SAM record (QNAME, FLAG, RNAME, POS, etc.) to a columnar RNTuple structure.
+
+3. **Chromosome Splitter**: A key feature that allows for partitioning the output into separate files by chromosome, enabling trivial parallel processing of downstream analysis.
+
+4. **Region Query Engine**: A fast query tool that leverages RNTuple's selective column reading to extract genomic regions (e.g., chr1:10150-10300) without reading the entire file.
+
+### Command-Line Tools
+
+The primary interaction with RAMTools is through two command-line executables:
+
+#### SAM to RAM Conversion (`samtoramntuple`)
+
+Converts a standard SAM file into the optimized RNTuple-based RAM format.
+
+```bash
+# Basic conversion
+./tools/samtoramntuple input.sam output.root
+
+# Split by chromosome for parallel processing
+# (Creates output-chr1.root, output-chr2.root, etc.)
+./tools/samtoramntuple input.sam output -split
+```
+
+#### Region Querying (`ramntupleview`)
+
+Queries a specific genomic region from a RAM file, similar to `samtools view`.
+
+```bash
+# Usage: ./tools/ramntupleview [input.root] "[chromosome]:[start]-[end]"
+./tools/ramntupleview output.root "chr1:10150-10300"
+```
+
+## Performance Achievements
+
+We benchmarked RAMTools using the HG00154 sample from the 1000 Genomes Project, which consists of 196 million reads in a 72.1 GB uncompressed SAM file.
+
+### Query Performance Comparison
+
+RNTuple's columnar architecture shows significant speedups, especially for large region queries, when compared to the older ROOT TTree format and CRAM (industry-standard compressed format).
+
+![Region Query Performance](/images/blog/genome_query_time.png)
+
+The benchmarks demonstrate performance across three query sizes:
+
+| Query Region | Size Category | Region Coordinates | RNTuple Time (s) | TTree Time (s) | CRAM Time (s) |
+|--------------|--------------|-------------------|------------------|----------------|---------------|
+| Small | 50M | chr1:1-50M | 6.69 | 1.29 | 0.34 |
+| Medium | 48M | chr21:1-48M | 6.84 | 35.70 | 7.81 |
+| Large | 100M | chr2:1-100M | 8.92 | 87.80 | 21.71 |
+
+For the small region (chr1:1-50M), CRAM performs best due to its reference-based compression optimizations for sequential access. However, as query size increases:
+
+- **Medium queries (chr21:1-48M)**: RNTuple is **5.2x faster** than TTree and competitive with CRAM
+- **Large queries (chr2:1-100M)**: RNTuple is **9.8x faster** than TTree and **2.4x faster** than CRAM
+
+The performance advantage of RNTuple becomes more pronounced with larger analytical queries, making it ideal for whole-chromosome or multi-gene region analyses common in genomics research.
+
+### Storage and Compression
+
+RNTuple also provides excellent compression. The 72.1 GB SAM file was compressed down to 11.4 GB using ZSTD, a 6.3x compression ratio.
+
+| Format | Compression Algo | File Size (GB) | Additional Requirements | Total Storage (GB) |
+|--------|-----------------|----------------|------------------------|-------------------|
+| SAM | Uncompressed | 72.1 | - | 72.1 |
+| CRAM | Reference-based | 7.8 | 3.2 GB reference file | 11.0 |
+| RAM-RNTuple | ZSTD | 11.4 | Self-contained | 11.4 |
+| RAM-TTree | LZMA | 12.5 | - | 12.5 |
+| RAM-TTree | ZLIB | 16.7 | - | 16.7 |
+| RAM-TTree | LZ4 | 31.2 | - | 31.2 |
+
+The most significant achievement here is that the 11.4 GB RNTuple file is **completely self-contained**. This is a key advantage over formats like CRAM, which achieves a similar total storage size (11.0 GB) but is dependent on an external 3.2 GB reference genome. This self-contained nature simplifies data archival, distribution, and use in cloud environments immensely.
+
+## Repository & Documentation
+
+- **GitHub**: [RAMTools Repository](https://github.com/compiler-research/ramtools)
+
+## Future Work
+
+While GSoC has concluded, there is a clear path forward for RAMTools:
+
+1. **More format Support**: Support for more formats for wide adaptation.
+
+2. **Further Query Optimization**: Explore multi-threading in the query engine to parallelize data retrieval.
+
+3. **Integration with Analysis Frameworks**: Investigate integration with popular bioinformatics frameworks or visualization tools.
+
+## Conclusion
+
+GSoC 2025 has been a phenomenal experience. I've had the opportunity to dive deep into high-performance C++ and solve real-world problems in genomics.
+
+I am immensely grateful to my mentors, Vassil Vassilev and Martin Vassilev, for their invaluable guidance, insightful code reviews, and constant support. I also want to extend my thanks to the entire ROOT team, CERN-HSF, and Google for making this project possible. I look forward to continuing my contributions to this exciting intersection of science and technology.
+
@@ -0,0 +1,73 @@
+---
+title: "Activity analysis for reverse-mode differentiation of (CUDA) GPU kernels"
+layout: post
+excerpt: "A summary of my GSoC 2025 project focusing on activity analysis for reverse-mode differentiation of (CUDA) GPU kernels."
+sitemap: true
+author: Maksym Andriichuk
+permalink: blogs/gsoc25_andriichuk_final_blog/
+banner_image: /images/blog/gsoc-clad-banner.png
+date: 2025-14-11
+tags: gsoc clad cuda clang c++
+---
+
+**Mentors:** Vassil Vassilev, David Lange
+
+## A Brief Introduction
+
+### Main idea
+
+Over a year ago, we added support for differentiating CUDA kernels using Clad. Read more on that [here](https://compiler-research.org/blogs/gsoc24_christina_koutsou_project_final_blog/). We introduced atomic operations in Clad to prevent race conditions that frequently appear because of how Clad handles statements like ```x=y``` in the reverse mode. Since atomic operations are inefficient, we aim to remove them whenever we are sure no race condition occurs.
+
+Another part of my GSoC project was to unify Varied and TBR analyses in how they store information during the analysis run. This would make the implementation of future analyses easier and remove even more adjoints, since Varied Analysis does not account for variable reassignments.
+
+## Project Implementation
+
+### 1. Removing atomic operations 
+
+Consider the code below:
+
+```cpp
+__global__ void kernel_call(double *out, double *in) {
+    int index = threadIdx.x + blockIdx.x * blockDim.x;
+    out[index] = in[index];
+}
+}
+void fn(double *out, double *in) {
+    kernel_call<<<1, 16>>>(out, in);
+}
+```
+
+The adjoint that corresponds to ```out[index] = in[index]``` is:
+
+```cpp
+{
+    out[index0] = _t2;
+    double _r_d0 = _d_out[index0];
+    _d_out[index0] = 0.;
+    atomicAdd(&_d_in[index], _r_d0);
+}
+```
+
+Notice that in this case index is ```injective```, meaning no two threads from any two blocks have the same value of index. This means that when writing to ```_d_in[index]```, no two threads would be able to write to the same memory at the same time.
+
+The implementation involves two static analyzers: one checks whether an index matches some particular form, and the other checks if it was not changed later. The hardest part is accounting for all possible term permutations of, say, ```threadIdx.x + blockIdx.x * blockDim.x``` and for expressions that depend on index linearly, i.e., ```2*index+1```.
+
+### 2. Varied Analysis
+
+The implementation looked very straightforward at first but turned out to be harder. Since the new infrastructure is more detailed, the analyses had to be improved. The tricky parts were supporting variable reassignments and loop handling. Support for pointers and OOP was added, and the analysis was enabled on all gradient tests numerically, which makes it almost default. However, there are more things to be done to produce even less code.
+
+### 3. Benchmarks
+
+To compare how much difference the analysis makes, we used the LULESH benchmark. The difference in execution time was about 5% across all problem sizes, which is pretty good for an analysis this small. 
+
+In trivial cases like the ```kernel_call``` function above, we got up to 5x speedup with a given number of blocks/threads.
+
+## Future Work
+
+- Adding more capabilities to the Varied Analysis
+- Adding more indices to consider injective
+
+## Related Links
+
+- [Clad Repository](https://github.com/vgvassilev/clad)
+- [My GitHub Profile](https://github.com/ovdiiuv)