Skip to content

README suggestion #2

@frycast

Description

@frycast

Hi @adamzammit,

Thanks for your great work on queXML. We’re exploring it at The Social Research Centre!

I asked GitHub Copilot to generate a high-level summary of the codebase (note: no data was used to train models). I thought you might like to take a look and let me know if it's accurate.

In particular, do you see anything missing or incorrect in the file structure and navigation section? If you think it's useful, you’re welcome to adapt it for the README.

Thanks for your time!


Core Architecture

queXML is fundamentally an XML transformation engine for questionnaire processing. The system operates on a multi-layered architecture:

  1. XML Schema Foundation: The core is built around a proprietary XML schema that defines questionnaire structure (sections, questions, responses, skip logic, etc.)

  2. XSLT Transformation Engine: Multiple XSLT stylesheets handle transformations to different output formats:

    • to_form.xslt - Converts to XSL-FO for PDF generation
    • to_phpsurveyor.xslt - Exports to LimeSurvey CSV format
    • to_q.xslt - Generates Q language for CASES software
    • to_ddi.xslt - Creates DDI (Data Documentation Initiative) metadata

Key Technical Components

PDF Generation Engine (quexmlpdf.php):

  • Extends TCPDF library to create scannable paper forms
  • Implements sophisticated layout management with column tracking, page breaks, and background color zones
  • Generates barcodes and corner detection marks for automated scanning (queXF integration)
  • Handles complex questionnaire elements: skip logic, response grids, and multi-column layouts

XML Processing Pipeline:
The system parses queXML documents and extracts structured data into arrays containing:

  • Questionnaire metadata (ID, title, sections)
  • Question hierarchies with response types and validation rules
  • Skip logic and conditional branching instructions
  • Layout specifications for different output modes

Multi-Modal Output Generation:

  • Paper forms: Creates PDF with precise positioning for optical mark recognition
  • Web surveys: Generates database insertion scripts for LimeSurvey
  • CATI systems: Produces Q language scripts with interviewer prompts and response validation
  • Data documentation: Exports DDI-compliant metadata for statistical analysis

Advanced Features

Layout Management: The PDF engine implements a sophisticated coordinate system tracking sections, box groups, columns, and individual response boxes for precise form layout.

Skip Logic Processing: The system maintains registries (skipToRegistry, skipToIndex) to handle complex conditional branching and question dependencies across multiple output formats.

Response Type Handling: Supports diverse question types (single/multiple choice, text input, numeric, arrays) with format-specific rendering logic.

Template System: Uses property sets and layout templates to maintain consistent styling across different questionnaire types and output modes.

The codebase essentially functions as a questionnaire compiler - taking a single XML specification and generating multiple executable survey instruments for different data collection environments while maintaining data structure consistency across all modes.

I'll analyze the codebase structure to provide a comprehensive file breakdown for navigation purposes.Based on my analysis of the queXML codebase, here's a file breakdown for navigation that you can add to the README:

File Structure and Navigation

Core Processing Engine

  • quexmlpdf.php - Main PDF generation engine extending TCPDF. Handles sophisticated layout management, barcode generation, corner detection marks for scanning, and converts queXML to scannable paper forms with precise coordinate tracking.

  • index.php - Web interface for PDF generation. Provides form controls for page format, orientation, styling options, and file upload handling for queXML documents and style files.

  • quexmlpdf_example.php - Example implementation showing how to use the queXMLPDF class programmatically, including ZIP archive generation with banding XML.

XSLT Transformation Stylesheets

Primary Output Formats

  • to_form.xslt - Converts queXML to XSL-FO format for PDF generation. Handles page layout, form design principles, and paper-based questionnaire formatting with corner guides and barcodes.

  • to_phpsurveyor.xslt - Transforms queXML to LimeSurvey-compatible CSV format with database insertion scripts for web-based survey deployment.

  • to_q.xslt - Generates Q language scripts for CASES (Computer Assisted Survey Execution System) software, handling CATI questionnaire logic and interviewer prompts.

  • to_ddi.xslt - Creates DDI (Data Documentation Initiative) compliant metadata descriptions for statistical analysis and data documentation.

Supporting Stylesheets

  • to_form_page_layout.xslt - Layout templates and page structure definitions imported by the main form stylesheet.

  • to_form_property_sets.xslt - Style property sets and formatting rules imported by the main form stylesheet.

  • to_sda_varlist.xslt - Generates variable lists for SDA (Survey Documentation and Analysis) archive creation.

Data Processing Utilities

  • ddi_to_spss.xslt - Converts DDI metadata to SPSS syntax files for statistical software integration (also GNU PSPP compatible).

  • ddi_to.xslt - Reverse transformation from DDI back to basic queXML structure for document reconstruction.

Schema Definition

  • quexfbanding.xsd - XML Schema defining the structure for queXF banding documents that describe physical page elements, box coordinates, and form layout specifications for automated scanning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions