Skip to content

Conversation

@joel13samuel
Copy link
Contributor

@joel13samuel joel13samuel commented Dec 3, 2025

Summary by CodeRabbit

  • New Features

    • Introduced TensorLake Document AI Agent supporting document parsing and structured data extraction.
    • Added support for multiple document schema types (Real Estate, Invoices, Contracts).
    • Integrated text analysis capabilities for token-based document insights.
  • Chores

    • Added project configuration and environment setup files.
    • Included comprehensive documentation for installation, configuration, and usage.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

Walkthrough

This pull request introduces a new TensorLake Document AI agent module within the Agentuity framework. It includes configuration files, comprehensive documentation, project setup files, and a complete agent implementation supporting document parsing, status polling, schema introspection, and text analysis with structured data extraction.

Changes

Cohort / File(s) Summary
Configuration & Tooling
agents/tensorlake/tensorlake-agent/.editorconfig, .gitignore, .python-version, pyproject.toml, agentuity.yaml
Added editor configuration for consistent formatting (2-space indents, LF line endings, UTF-8), comprehensive Python project ignore patterns, Python 3.11 version specification, project dependencies (agentuity, tensorlake, pydantic), and Agentuity framework configuration with development/deployment commands and agent metadata.
Documentation
README.md, AGENTS.md
Added user-facing README documenting TensorLake Agent setup, usage, deployment, and troubleshooting; added developer guide for Python Agent development with Agentuity platform, including core interfaces, storage APIs, and best practices.
Package Structure
agentuity_agents/__init__.py, agentuity_agents/tensorlake_agent/__init__.py
Added Python package initializer files establishing the module namespace.
Core Agent Implementation
agentuity_agents/tensorlake_agent/agent.py
Introduced comprehensive agent with seven Pydantic data models (Buyer, Seller, RealEstateSchema, InvoiceLineItem, InvoiceSchema, ContractParty, ContractSchema), schema registry (SCHEMAS), distributed text analysis function (topk\_words), analyze\_text endpoint, and agent handler (welcome and run) supporting four actions: parse (DocumentAI integration with optional signature detection and structured extraction), status (job polling), schemas (enumeration), and analyze (local text analysis).
Application Entry Points
main.py, server.py
Added simple main entry point with greeting; added server entry point with environment validation (AGENTUITY\_API\_KEY/SDK\_KEY requirement, AGENTUITY\_TRANSPORT\_URL warning), logging configuration, and Agentuity autostart invocation.

Sequence Diagram

sequenceDiagram
    actor Client
    participant Agent as TensorLake Agent
    participant DocumentAI as DocumentAI API
    participant Storage as Agent Storage

    Client->>Agent: POST /run (action: "parse")
    activate Agent
    alt Demo Mode
        Agent->>Agent: Generate sample data
        Agent-->>Client: Return demo results
    else Production
        Agent->>DocumentAI: Submit document parse job
        DocumentAI-->>Agent: parse_id
        Agent->>DocumentAI: Poll for completion
        DocumentAI-->>Agent: Status (processing/complete)
        alt Parse Complete
            Agent->>DocumentAI: Fetch structured extraction results
            DocumentAI-->>Agent: Extracted data + chunks
            Agent->>Storage: Store results (optional)
            Agent-->>Client: Return structured data + chunks
        else Error/Timeout
            Agent-->>Client: Return error details
        end
    end
    deactivate Agent
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Areas requiring extra attention:

  • agent.py — Dense implementation with multiple Pydantic models, DocumentAI integration logic, error handling paths (quota guidance, timeouts, demo mode), and distributed analysis function (topk\_words); verify DocumentAI API calls, structured extraction schema mapping, and async/polling patterns
  • agentuity.yaml — Validate configuration accuracy (server command args, environment file paths, resource constraints, bundler settings)
  • server.py — Review environment validation logic and early-exit handling for missing API credentials
  • Data model field validation and default values across schemas

Poem

🐰 A TensorLake glistens in the code,
Parsing documents down the road!
Schemas, chunks, and structured delight,
DocumentAI shining oh so bright—
This fuzzy agent hops with glee! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'TensorLake agent example' directly and clearly summarizes the main change: introducing a complete TensorLake agent implementation as an example, covering configuration, documentation, and core agent code.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tensorlake

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
agents/tensorlake/tensorlake-agent/pyproject.toml (1)

1-10: Project metadata and dependencies look reasonable; confirm supported versions

The manifest is minimal but sufficient for this example agent, and the dependency set matches what you’d expect. It would be good to double‑check that the agent has been run/tests have passed on all Python versions in the >=3.10, <3.13 range and with the chosen minimum versions of agentuity, tensorlake, and pydantic, or narrow the constraints if you only intend to support a subset.

agents/tensorlake/tensorlake-agent/main.py (1)

1-6: Simple CLI entry point is fine for now

A minimal main() that prints a greeting is acceptable as a placeholder; you can later evolve this to invoke the actual agent/server startup if you want a richer CLI experience.

agents/tensorlake/tensorlake-agent/server.py (1)

20-27: Consider using yellow for warning messages instead of red.

The warning messages on lines 22-26 use red ANSI codes (\033[31m) but are labeled as [WARN]. Red is typically reserved for errors, while yellow (\033[33m) is more conventional for warnings, improving visual distinction.

     # Check if AGENTUITY_TRANSPORT_URL is set
     if not os.environ.get("AGENTUITY_TRANSPORT_URL"):
         print(
-            "\033[31m[WARN] You are running this agent outside of the Agentuity environment. Any automatic Agentuity features will be disabled.\033[0m"
+            "\033[33m[WARN] You are running this agent outside of the Agentuity environment. Any automatic Agentuity features will be disabled.\033[0m"
         )
         print(
-            "\033[31m[WARN] Recommend running `agentuity dev` to run your project locally instead of `python script`.\033[0m"
+            "\033[33m[WARN] Recommend running `agentuity dev` to run your project locally instead of `python script`.\033[0m"
         )
agents/tensorlake/tensorlake-agent/agentuity_agents/tensorlake_agent/agent.py (2)

10-13: Remove unused import.

The json module is imported but not used. The JSON parsing is handled by request.data.json().

 import os
 import re
-import json
 from typing import List, Tuple, Optional

458-463: Use logging.exception to capture traceback and consider narrowing the exception type.

Per static analysis, catching bare Exception is broad. While acceptable for a top-level handler, using context.logger.exception() instead of context.logger.error() will automatically include the traceback, which aids debugging.

     except Exception as e:
-        context.logger.error("Error in TensorLake agent: %s", str(e))
+        context.logger.exception("Error in TensorLake agent: %s", str(e))
         return response.json({
             "error": str(e),
             "type": type(e).__name__
         })
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abb2d97 and 3d2095f.

⛔ Files ignored due to path filters (1)
  • agents/tensorlake/tensorlake-agent/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • agents/tensorlake/tensorlake-agent/.editorconfig (1 hunks)
  • agents/tensorlake/tensorlake-agent/.gitignore (1 hunks)
  • agents/tensorlake/tensorlake-agent/.python-version (1 hunks)
  • agents/tensorlake/tensorlake-agent/AGENTS.md (1 hunks)
  • agents/tensorlake/tensorlake-agent/README.md (1 hunks)
  • agents/tensorlake/tensorlake-agent/agentuity.yaml (1 hunks)
  • agents/tensorlake/tensorlake-agent/agentuity_agents/__init__.py (1 hunks)
  • agents/tensorlake/tensorlake-agent/agentuity_agents/tensorlake_agent/__init__.py (1 hunks)
  • agents/tensorlake/tensorlake-agent/agentuity_agents/tensorlake_agent/agent.py (1 hunks)
  • agents/tensorlake/tensorlake-agent/main.py (1 hunks)
  • agents/tensorlake/tensorlake-agent/pyproject.toml (1 hunks)
  • agents/tensorlake/tensorlake-agent/server.py (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
agents/tensorlake/tensorlake-agent/AGENTS.md

8-8: Images should have alternate text (alt text)

(MD045, no-alt-text)


40-40: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


48-48: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.7)
agents/tensorlake/tensorlake-agent/agentuity_agents/tensorlake_agent/agent.py

458-458: Do not catch blind exception: Exception

(BLE001)


459-459: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🔇 Additional comments (13)
agents/tensorlake/tensorlake-agent/agentuity_agents/__init__.py (1)

1-1: Package initializer is fine as a no-op stub

Keeping this __init__ intentionally empty is a reasonable way to declare the namespace without side effects.

agents/tensorlake/tensorlake-agent/.python-version (1)

1-1: Python version pin is consistent with pyproject range

Pinning to 3.11 here is compatible with the >=3.10, <3.13 constraint in pyproject.toml and gives a clear default for local dev.

agents/tensorlake/tensorlake-agent/agentuity_agents/tensorlake_agent/__init__.py (1)

1-1: Minimal package initializer is acceptable

An empty __init__ (aside from the comment) is fine for now; you can add explicit re-exports later if you want a curated public API.

agents/tensorlake/tensorlake-agent/.editorconfig (1)

1-12: EditorConfig settings look coherent for this subproject

The root flag and basic formatting options are consistent and should give predictable editor behavior across contributors.

agents/tensorlake/tensorlake-agent/.gitignore (1)

1-180: Comprehensive ignore rules with useful Agentuity additions

The Python/IDE patterns are thorough, and the Agentuity-specific entries at the end ensure local agent state and crash reports stay out of git. Keeping .python-version unignored matches the committed version file.

agents/tensorlake/tensorlake-agent/AGENTS.md (1)

1-110: Agentuity Python guide is clear and well-structured

The doc gives a concise but complete overview of handler signatures, request/response/context APIs, storage, and logging, which should be enough for someone to get started with Python agents in this repo.

agents/tensorlake/tensorlake-agent/server.py (1)

29-36: LGTM!

Logging configuration and autostart invocation are properly placed after environment validation.

agents/tensorlake/tensorlake-agent/agentuity.yaml (1)

1-70: LGTM!

The configuration is well-documented with clear comments explaining each section. The watch patterns, bundler settings, and agent definitions are properly structured.

agents/tensorlake/tensorlake-agent/README.md (1)

1-238: Well-structured documentation.

The README provides comprehensive coverage of installation, configuration, usage examples, and troubleshooting. The action/schema tables and JSON examples are helpful for users.

agents/tensorlake/tensorlake-agent/agentuity_agents/tensorlake_agent/agent.py (4)

26-84: LGTM!

The Pydantic schema definitions are well-structured with clear docstrings and appropriate typing. The schema registry pattern provides a clean lookup mechanism.


98-106: LGTM!

The topk_words function is a clean implementation with proper stopword filtering and efficient counting.


152-181: LGTM!

The handler provides a helpful default response when no data is provided, documenting available actions with an example.


388-410: The "status" action blocks by calling wait_for_completion, which defeats the purpose of non-blocking status retrieval.

The TensorLake Python SDK does not expose a non-blocking status check method; wait_for_completion() is the documented way to retrieve results. However, the underlying REST API supports polling via GET /documents/v2/parse/{parse_id}. For a status endpoint, consider either:

  • Making direct HTTP calls to poll the REST API without blocking, or
  • Document that this action blocks until parse completion and clarify expected use cases.

Comment on lines +42 to +48
resources:
# The memory requirements
memory: 250Mi
# The CPU requirements
cpu: 500M
# The disk size requirements
disk: 300Mi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Agentuity CPU resource units format millicores

💡 Result:

Agentuity uses standard Kubernetes CPU quantity format: millicores (m). Key points:

  • Syntax: integer CPU (e.g., "1") or millicores with "m" suffix (e.g., "250m" = 0.25 CPU). [1]
  • Conversion: 1000m = 1 CPU (1 vCPU/core); 1m = 0.001 CPU (minimum precision). [1][2]
  • Usage examples: cpu: "500m" → 0.5 CPU; cpu: "100m" → 0.1 CPU. [1][3]

Sources: Kubernetes docs on resource units and community guides. [1][2][3]

References
[1] Kubernetes — Resource Management for Pods and Containers.
[2] VSHN / Kubernetes resource units explanation.
[3] Kubernetes resources examples (config snippets).


Fix the CPU resource unit to use lowercase 'm'.

The CPU value 500M is incorrect. Agentuity follows standard Kubernetes resource format where CPU must use lowercase m for millicores (e.g., 500m = 0.5 CPU cores). Change cpu: 500M to cpu: 500m on line 46.

🤖 Prompt for AI Agents
In agents/tensorlake/tensorlake-agent/agentuity.yaml around lines 42 to 48, the
CPU resource uses an incorrect unit `500M`; change it to the Kubernetes-standard
lowercase millicore unit `500m` so the CPU entry reads `cpu: 500m`. Ensure no
other CPU entries use uppercase units.

Comment on lines +25 to +29
1. Clone the repo:
```bash
git clone https://github.com/agentuity/tensorlake-agent.git
cd tensorlake-agent
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clone URL may be incorrect.

The clone URL references https://github.com/agentuity/tensorlake-agent.git, but this agent appears to be part of the agentuity/examples repository based on the PR context. Consider updating to reflect the correct repository path or providing instructions for navigating to the correct subdirectory.

 1. Clone the repo:
    ```bash
-   git clone https://github.com/agentuity/tensorlake-agent.git
-   cd tensorlake-agent
+   git clone https://github.com/agentuity/examples.git
+   cd examples/agents/tensorlake/tensorlake-agent
    ```
🤖 Prompt for AI Agents
In agents/tensorlake/tensorlake-agent/README.md around lines 25 to 29, the
repository clone instructions point to
https://github.com/agentuity/tensorlake-agent.git which is incorrect for this
PR; update the instructions to clone the main examples repo and change the cd
step to the agent subdirectory (e.g., git clone
https://github.com/agentuity/examples.git and cd
examples/agents/tensorlake/tensorlake-agent) so users land in the correct
project folder.

Comment on lines +36 to +43
## Configuration


1. Open `.env` and set:
```
TENSORLAKE_API_KEY=your_tensorlake_api_key
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing step to create .env file.

Step 1 under Configuration instructs users to "Open .env", but there's no prior instruction to create this file. Consider adding a step to copy from a template or create the file.

 ## Configuration


-1. Open `.env` and set:
+1. Create a `.env` file and set:
    ```
    TENSORLAKE_API_KEY=your_tensorlake_api_key
    ```
🤖 Prompt for AI Agents
In agents/tensorlake/tensorlake-agent/README.md around lines 36 to 43, the
Configuration section tells users to "Open `.env`" but omits how to create it;
add a preceding step instructing users to create the file (for example by
copying a provided template like `.env.example` or creating a new `.env`), and
show the exact command to run (e.g., copy or touch) and mention where to place
it before setting TENSORLAKE_API_KEY so the instructions are complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants