Skip to content

velocitybolt/synkro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Synkro

Turn policies, handbooks, and documentation into high-quality training data for fine-tuning LLMs.

Features

  • Quality Evaluation - Each response is graded and automatically refined if it fails
  • Multiple Formats - Conversation (multi-turn), Instruction (single-turn), Evaluation (Q&A), and Tool Calling
  • Eval Platform Support - Export to LangSmith, Langfuse, or generic Q&A format
  • Tool Call Training - Generate OpenAI function calling format for teaching models to use custom tools
  • Top LLM Providers - OpenAI, Anthropic, Google, and local models (Ollama, vLLM)
  • File Support - PDF, DOCX, TXT, Markdown, URLs
  • CLI Included - Generate datasets from the command line
  • Cost Tracking - See total cost and LLM call breakdown after each generation

Installation

pip install synkro

Quick Start

from synkro.pipelines import create_pipeline
from synkro.models.google import Google
from synkro.types import DatasetType

pipeline = create_pipeline(
    model=Google.GEMINI_25_FLASH,          # Fast generation
    grading_model=Google.GEMINI_25_PRO,    # Quality grading
    dataset_type=DatasetType.CONVERSATION,
)

dataset = pipeline.generate(
    "All expenses over $50 require manager approval.",
    traces=50,
)
dataset.save("training.jsonl")

From Files

from synkro.pipelines import create_pipeline
from synkro.core.policy import Policy

policy = Policy.from_file("handbook.pdf")  # PDF, DOCX, TXT, MD
pipeline = create_pipeline()
dataset = pipeline.generate(policy, traces=100)
dataset.save()

From URLs

from synkro.core.policy import Policy

policy = Policy.from_url("https://example.com/terms")
dataset = pipeline.generate(policy)

Dataset Types

Type Turns Output Formats Best For
CONVERSATION Multi messages, chatml Fine-tuning chat models
INSTRUCTION 1 messages, chatml Instruction-following models
EVALUATION 1 qa, langsmith, langfuse LLM evaluation & benchmarks
TOOL_CALL Multi tool_call, chatml Teaching tool use

Conversation (Default)

from synkro.types import DatasetType

pipeline = create_pipeline(dataset_type=DatasetType.CONVERSATION)
dataset = pipeline.generate(policy)

Output (multi-turn):

{"messages": [
  {"role": "user", "content": "What's the approval process for $350?"},
  {"role": "assistant", "content": "For a $350 expense, you need manager approval..."},
  {"role": "user", "content": "What if my manager is unavailable?"},
  {"role": "assistant", "content": "You can request approval from..."}
]}

Instruction

pipeline = create_pipeline(dataset_type=DatasetType.INSTRUCTION)
dataset = pipeline.generate(policy)

Output (single-turn):

{"messages": [
  {"role": "user", "content": "What's the approval process for $350?"},
  {"role": "assistant", "content": "For a $350 expense, you need manager approval. Submit the expense report with receipt..."}
]}

Evaluation

Generate Q&A datasets for LLM evaluation with ground truth:

pipeline = create_pipeline(dataset_type=DatasetType.EVALUATION)
dataset = pipeline.generate(policy, traces=50)

# Save in different formats
dataset.save("eval.jsonl", format="qa")         # Generic Q&A
dataset.save("eval.jsonl", format="langsmith")  # LangSmith format
dataset.save("eval.jsonl", format="langfuse")   # Langfuse format

Output (format="qa"):

{
  "question": "Can I submit a $200 expense without a receipt?",
  "answer": "All expenses require receipts per policy...",
  "expected_outcome": "Deny - missing receipt violates R003",
  "ground_truth_rules": ["R003", "R005"],
  "difficulty": "negative",
  "category": "Receipt Requirements"
}

Output (format="langsmith"):

{
  "inputs": {"question": "...", "context": "..."},
  "outputs": {"answer": "..."},
  "metadata": {"expected_outcome": "...", "ground_truth_rules": [...]}
}

Output (format="langfuse"):

{
  "input": {"question": "...", "context": "..."},
  "expectedOutput": {"answer": "...", "expected_outcome": "..."},
  "metadata": {"ground_truth_rules": [...], "difficulty": "..."}
}

Tool Calling

Generate training data for teaching models when and how to use your custom tools:

from synkro import create_pipeline, ToolDefinition, DatasetType

# Define your tools
web_search = ToolDefinition(
    name="web_search",
    description="Search the web for current information",
    parameters={
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "Search query"}
        },
        "required": ["query"]
    },
    mock_responses=["NYC: 72°F, sunny", "BTC: $67,234"]
)

# Create pipeline with tools
pipeline = create_pipeline(
    dataset_type=DatasetType.TOOL_CALL,
    tools=[web_search],
)

# Generate from tool usage guidelines
dataset = pipeline.generate("""
Use web_search for real-time data like weather, prices.
Answer general questions directly without tools.
""", traces=20)

dataset.save("tool_training.jsonl", format="tool_call")  # OpenAI format
dataset.save("tool_training.jsonl", format="chatml")     # ChatML with XML tags

Output Formats:

OpenAI function calling (format="tool_call"):

{"messages": [
  {"role": "user", "content": "What's the weather in NYC?"},
  {"role": "assistant", "content": null, "tool_calls": [
    {"id": "call_abc", "type": "function", "function": {"name": "web_search", "arguments": "{\"query\": \"weather NYC\"}"}}
  ]},
  {"role": "tool", "tool_call_id": "call_abc", "content": "NYC: 72°F, sunny"},
  {"role": "assistant", "content": "The weather in NYC is 72°F and sunny."}
]}

ChatML with XML tags (format="chatml"):

{"messages": [
  {"role": "user", "content": "What's the weather in NYC?"},
  {"role": "assistant", "content": "<tool_call>\n{\"name\": \"web_search\", \"arguments\": {\"query\": \"weather NYC\"}}\n</tool_call>"},
  {"role": "tool", "content": "<tool_response>\nNYC: 72°F, sunny\n</tool_response>"},
  {"role": "assistant", "content": "The weather in NYC is 72°F and sunny."}
]}

Evaluation & Grading

Every response is graded on policy compliance, citations, and reasoning. Failed responses are automatically refined (up to N iterations).

from synkro.pipelines import create_pipeline
from synkro.models.openai import OpenAI

pipeline = create_pipeline(
    model=OpenAI.GPT_4O_MINI,       # Fast generation
    grading_model=OpenAI.GPT_4O,    # Quality grading
    max_iterations=3,               # Refinement attempts
)

dataset = pipeline.generate(policy, traces=100)

# Check quality
print(f"Pass rate: {dataset.passing_rate:.1%}")

# Filter to only passing traces
high_quality = dataset.filter(passed=True)
high_quality.save("training.jsonl")

Eval API

Generate test scenarios and grade your own model's responses against policy compliance.

import synkro

# Generate scenarios with ground truth (no synthetic responses)
result = synkro.generate_scenarios(
    policy="Expenses over $50 require manager approval...",
    count=100,
)

# Each scenario has ground truth labels
for scenario in result.scenarios:
    print(scenario.user_message)       # "Can I expense a $200 dinner?"
    print(scenario.expected_outcome)   # "Requires manager approval per R001"
    print(scenario.target_rule_ids)    # ["R001", "R003"]
    print(scenario.scenario_type)      # "positive" | "negative" | "edge_case"

# Grade YOUR model's responses
for scenario in result.scenarios:
    response = my_model(scenario.user_message)  # Your model
    grade = synkro.grade(response, scenario, policy)

    if not grade.passed:
        print(f"Failed: {grade.feedback}")

When to Use

Use Case API
Generate training data synkro.generate()
Generate eval scenarios synkro.generate_scenarios()
Grade external model synkro.grade()

Scenario Types

Scenarios are generated with balanced coverage:

Type % Description
positive 35% Happy path - user meets all criteria
negative 30% Violations - user fails one criterion
edge_case 25% Boundary conditions at exact limits
irrelevant 10% Outside policy scope

EvalScenario Fields

scenario.user_message      # The test input
scenario.expected_outcome  # Ground truth behavior
scenario.target_rule_ids   # Rules being tested
scenario.scenario_type     # positive/negative/edge_case/irrelevant
scenario.category          # Policy category
scenario.context           # Additional context

Temperature

Use temperature to control output diversity:

# High temp for diverse scenario coverage
result = synkro.generate_scenarios(policy, temperature=0.8)

# Low temp for deterministic training data
dataset = synkro.generate(policy, temperature=0.2)

Cost & Performance

Approximate costs using Gemini 2.5 Flash (multi-turn conversations):

Traces LLM Calls Time Cost
100 ~335 ~13 min ~$3
500 ~1,675 ~1 hour ~$14
1000 ~3,350 ~2 hours ~$28

Based on ~3.3 LLM calls per trace (generation + grading) with max_iterations=3. Actual costs vary by policy complexity and turn count.

Local LLMs

Run with Ollama, vLLM, or any OpenAI-compatible endpoint:

from synkro import create_pipeline
from synkro.models import Local

# Ollama
pipeline = create_pipeline(model=Local.OLLAMA("llama3.2"))

# vLLM
pipeline = create_pipeline(model=Local.VLLM("mistral-7b"))

# Custom endpoint
pipeline = create_pipeline(model=Local.CUSTOM("my-model", endpoint="http://localhost:8080"))

CLI:

synkro generate policy.pdf --provider ollama --model llama3.2
synkro generate policy.pdf --provider vllm --endpoint http://localhost:8000

CLI

# From file
synkro generate policy.pdf --traces 50

# From text
synkro generate "All expenses over $50 need approval" -n 20

# From URL
synkro generate https://example.com/policy -o training.jsonl

# Skip interactive mode
synkro generate policy.pdf --no-interactive

# Quick demo with built-in policy
synkro demo

Options:

  • --traces, -n - Number of traces (default: 20)
  • --output, -o - Output file path
  • --model, -m - Model for generation
  • --format, -f - Output format: messages, qa, langsmith, langfuse, tool_call, chatml
  • --provider, -p - LLM provider for local models (ollama, vllm)
  • --endpoint, -e - Custom API endpoint URL
  • --interactive/-i, --no-interactive/-I - Review/edit extracted rules before generation (default: on)

Interactive Mode

By default, synkro extracts policy rules into a Logic Map and lets you review/edit them before generation. The interactive session also shows the recommended conversation turns based on policy complexity:

╭─────────────────────────── Conversation Settings ────────────────────────────╮
│  Complexity:  Conditional                                                    │
│  Turns:       3                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯

╭────────────────────────── 📜 Logic Map (3 rules) ────────────────────────────╮
│ ├── R001: Expenses over $50 require manager approval                         │
│ ├── R002: Client meals limited to $75/person                                 │
│ └── R003: Receipts required for all expenses                                 │
╰──────────────────────────────────────────────────────────────────────────────╯

Enter feedback: shorter conversations
✓ Set to 2 turns (User requested shorter/simpler conversations)

Enter feedback: add a rule for travel expenses
✓ Added R004: Travel expenses over $500 require VP approval

Enter feedback: done
✅ Session complete - 1 rule change(s), 2 turns

You can adjust both conversation turns and rules using natural language:

Input Action
"shorter conversations" Reduce turns (1-2)
"I want 5 turns" Set specific turn count
"more thorough" Increase turns (5-6)
"remove R002" Delete a rule
"add a rule for..." Add new rule

Commands: done, undo, reset, show R001, help

Advanced Features

Checkpointing

Resume interrupted generations:

pipeline = create_pipeline(checkpoint_dir="./checkpoints")
dataset = pipeline.generate(policy, traces=100)  # Resumes from checkpoint

Dataset Operations

# Filter by quality
high_quality = dataset.filter(passed=True)

# Remove duplicates
unique = dataset.dedupe(threshold=0.85)

# Check pass rate
print(f"Pass rate: {dataset.passing_rate:.1%}")

Folder Loading

Generate from multiple documents at once:

from synkro.core.policy import Policy

policy = Policy.from_file("policies/")  # Loads all PDF, DOCX, TXT, MD files
dataset = pipeline.generate(policy, traces=100)

Thinking Mode

Generate training data with explicit reasoning in <think> tags, compatible with Qwen3 and DeepSeek-R1:

pipeline = create_pipeline(thinking=True)
dataset = pipeline.generate(policy, traces=50)

Output:

{"messages": [
  {"role": "user", "content": "Can I expense a $350 team dinner?"},
  {"role": "assistant", "content": "<think>\nLet me check the expense policy...\n- Rule: Expenses over $50 require manager approval\n- $350 exceeds the $50 threshold\n- Manager approval is required\n</think>\n\nFor a $350 team dinner, you'll need manager approval since it exceeds the $50 threshold. Please submit your expense report with the receipt and request approval from your manager."}
]}

Works with all dataset types (CONVERSATION, INSTRUCTION, TOOL_CALL).

Logic Map Inspection

Access the extracted rules programmatically:

result = pipeline.generate(policy, traces=50, return_logic_map=True)

# Inspect extracted rules
for rule in result.logic_map.rules:
    print(f"{rule.rule_id}: {rule.text}")

# Get the dataset
dataset = result.dataset