Skip to content

Conversation

@smian1
Copy link
Collaborator

@smian1 smian1 commented Dec 29, 2025

docs: Update identity and memory extraction guidelines in prompts.py

  • Added critical identity rules for name recognition and memory extraction
  • Included banned language and consolidation checks to improve memory accuracy
  • Clarified rules for extracting user-related information and avoiding duplicates

@smian1 smian1 requested review from Copilot and mdmohsin7 and removed request for Copilot December 29, 2025 08:17
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant improvements to the memory extraction prompts in prompts.py. The added rules for identity, banned language, deduplication, and bad examples are well-defined and should greatly enhance the quality and accuracy of the generated memories. However, I've identified one critical issue: a contradiction between a newly added rule and an existing one, which could lead to incorrect behavior by the language model. My review includes a comment and a suggestion to resolve this ambiguity, ensuring that valid user information updates are not accidentally discarded.

Comment on lines 279 to 283
• Family consistency: Don't create children that contradict existing family structure
• Location consistency: Don't claim multiple contradictory home locations
• Career consistency: Don't claim conflicting job titles or employers simultaneously
If a fact seems mathematically impossible or contradicts existing memories, DO NOT extract.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a critical contradiction in the prompt's instructions. The new LOGIC CHECK section on line 283 states, 'If a fact... contradicts existing memories, DO NOT extract.' However, an existing rule in the CRITICAL DEDUPLICATION & UPDATES RULES section (line 185) explicitly says, 'If a new memory CONTRADICTS or UPDATES an existing one, YOU MUST ADD IT.'

This conflict could confuse the model, causing it to discard valid updates to a user's information (e.g., a change of job or address), treating them as logical errors. The instructions should be clarified to distinguish between illogical contradictions (which should be ignored) and valid updates over time (which should be captured).

Suggested change
Family consistency: Don't create children that contradict existing family structure
Location consistency: Don't claim multiple contradictory home locations
Career consistency: Don't claim conflicting job titles or employers simultaneously
If a fact seems mathematically impossible or contradicts existing memories, DO NOT extract.
Family consistency: Avoid creating contradictory family structures simultaneously (e.g., claiming someone has children and no children at the same time).
Location consistency: Avoid claiming someone lives in multiple locations simultaneously.
Career consistency: Avoid claiming conflicting primary job titles or employers simultaneously.
If a fact seems mathematically impossible or creates a nonsensical contradiction (and is not a valid update over time), DO NOT extract.

Copilot AI review requested due to automatic review settings December 29, 2025 09:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances memory extraction guidelines in prompts.py and adds speaker name functionality to webhook and developer API responses. The changes aim to improve memory accuracy by providing clearer identity rules, consolidation checks, and examples of what should and shouldn't be extracted. Additionally, the PR enriches transcript segments with human-readable speaker names based on person_id mappings.

Key Changes

  • Added comprehensive identity rules and validation checks to prevent hallucinations and duplicate person creation in memory extraction
  • Introduced speaker_name enrichment functionality for transcript segments in webhook payloads and developer API responses
  • Expanded documentation with banned language patterns, additional bad examples, and logic consistency checks

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
backend/utils/prompts.py Added identity rules, banned language patterns, consolidation checks, and additional bad examples to improve memory extraction quality
backend/utils/webhooks.py Added _add_speaker_names_to_payload function and integrated it into conversation webhook to include speaker names in transcript segments
backend/routers/developer.py Added _add_speaker_names_to_segments function and integrated it into conversation GET endpoints to enrich transcript segments with speaker names when requested

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 35 to 54
def _add_speaker_names_to_payload(uid, payload: dict):
"""Add speaker_name to transcript segments in webhook payload."""
segments = payload.get('transcript_segments', [])
if not segments:
return

person_ids = [seg.get('person_id') for seg in segments if seg.get('person_id')]
people_map = {}
if person_ids:
people_data = users_db.get_people_by_ids(uid, list(set(person_ids)))
people_map = {p['id']: p['name'] for p in people_data}

for seg in segments:
if seg.get('is_user'):
seg['speaker_name'] = 'User'
elif seg.get('person_id') and seg['person_id'] in people_map:
seg['speaker_name'] = people_map[seg['person_id']]
else:
seg['speaker_name'] = f"Speaker {seg.get('speaker_id', 0)}"

Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function _add_speaker_names_to_payload and _add_speaker_names_to_segments in developer.py (lines 683-704) contain nearly identical logic. Consider extracting this shared logic into a common utility function to avoid code duplication and improve maintainability. The only difference is the parameter types (dict vs list), which could be unified.

Copilot uses AI. Check for mistakes.
Comment on lines 156 to 157
2. **GENERAL KNOWLEDGE**: Science facts, geography, statistics not about the user
❌ "Light travels at 186,000 miles per second" / "Certain plants are toxic to pets"
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a contradiction in the guidelines. Line 156-157 states "GENERAL KNOWLEDGE: Science facts, geography, statistics not about the user" should NEVER be extracted, but earlier examples in lines 54, 60-61, 66-67, and 233 show that interesting/shareable knowledge facts ARE valid memories if they are framed as things the user learned (e.g., "{user_name} learned that honey never spoils"). Consider clarifying that general knowledge should be extracted when framed as user learnings, but not when simply stating facts without connection to the user.

Suggested change
2. **GENERAL KNOWLEDGE**: Science facts, geography, statistics not about the user
"Light travels at 186,000 miles per second" / "Certain plants are toxic to pets"
2. **STANDALONE GENERAL KNOWLEDGE**: Science facts, geography, statistics stated without connection to {user_name}'s learning or life
"Light travels at 186,000 miles per second" / "Certain plants are toxic to pets"
Exception: Allowed ONLY when explicitly framed as something {user_name} learned and it clearly passes the interesting memory criteria above

Copilot uses AI. Check for mistakes.
ADDITIONAL BAD EXAMPLES:
**Transient/Temporary (will be outdated):**
❌ "{user_name} is working on a new app"
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a contradiction in the guidelines. Line 94 shows "{user_name} and Jamie are working on the Q4 budget presentation" as a valid example, but line 245 in the ADDITIONAL BAD EXAMPLES section states "{user_name} is working on a new app" should not be extracted because it's transient. These appear to contradict each other - both describe temporary work activities. Consider clarifying when work-in-progress should or should not be extracted, or ensuring consistency between these two sections.

Copilot uses AI. Check for mistakes.
return obj


def _add_speaker_names_to_payload(uid, payload: dict):
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature lacks type hints for the parameters. Consider adding type hints for better code clarity and type checking. For example: def _add_speaker_names_to_payload(uid: str, payload: dict) -> None:

Suggested change
def _add_speaker_names_to_payload(uid, payload: dict):
def _add_speaker_names_to_payload(uid: str, payload: dict) -> None:

Copilot uses AI. Check for mistakes.
people_map = {}
if person_ids:
people_data = users_db.get_people_by_ids(uid, list(set(person_ids)))
people_map = {p['id']: p['name'] for p in people_data}
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dictionary comprehension people_map = {p['id']: p['name'] for p in people_data} assumes all person dictionaries have 'id' and 'name' keys. If any person document is missing these fields, it will raise a KeyError. Consider adding defensive checks or using .get() method: people_map = {p['id']: p['name'] for p in people_data if 'id' in p and 'name' in p} to handle potentially malformed data gracefully.

Suggested change
people_map = {p['id']: p['name'] for p in people_data}
people_map = {
p['id']: p['name']
for p in people_data
if isinstance(p, dict) and 'id' in p and 'name' in p
}

Copilot uses AI. Check for mistakes.
people_map = {}
if all_person_ids:
people_data = users_db.get_people_by_ids(uid, list(all_person_ids))
people_map = {p['id']: p['name'] for p in people_data}
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dictionary comprehension people_map = {p['id']: p['name'] for p in people_data} assumes all person dictionaries have 'id' and 'name' keys. If any person document is missing these fields, it will raise a KeyError. Consider adding defensive checks or using .get() method: people_map = {p['id']: p['name'] for p in people_data if 'id' in p and 'name' in p} to handle potentially malformed data gracefully.

Suggested change
people_map = {p['id']: p['name'] for p in people_data}
people_map = {p['id']: p['name'] for p in people_data if 'id' in p and 'name' in p}

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Contributor

Hey @smian1 👋

Thank you so much for taking the time to contribute to Omi! We truly appreciate you putting in the effort to submit this pull request.

After careful review, we've decided not to merge this particular PR. Please don't take this personally — we genuinely try to merge as many contributions as possible, but sometimes we have to make tough calls based on:

  • Project standards — Ensuring consistency across the codebase
  • User needs — Making sure changes align with what our users need
  • Code best practices — Maintaining code quality and maintainability
  • Project direction — Keeping aligned with our roadmap and vision

Your contribution is still valuable to us, and we'd love to see you contribute again in the future! If you'd like feedback on how to improve this PR or want to discuss alternative approaches, please don't hesitate to reach out.

Thank you for being part of the Omi community! 💜

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant