TensorLake agent example #91

joel13samuel · 2025-12-03T04:00:11Z

Summary by CodeRabbit

New Features
- Introduced TensorLake Document AI Agent supporting document parsing and structured data extraction.
- Added support for multiple document schema types (Real Estate, Invoices, Contracts).
- Integrated text analysis capabilities for token-based document insights.
Chores
- Added project configuration and environment setup files.
- Included comprehensive documentation for installation, configuration, and usage.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-03T04:00:20Z

Walkthrough

This pull request introduces a new TensorLake Document AI agent module within the Agentuity framework. It includes configuration files, comprehensive documentation, project setup files, and a complete agent implementation supporting document parsing, status polling, schema introspection, and text analysis with structured data extraction.

Changes

Cohort / File(s)	Summary
Configuration & Tooling `agents/tensorlake/tensorlake-agent/.editorconfig`, `.gitignore`, `.python-version`, `pyproject.toml`, `agentuity.yaml`	Added editor configuration for consistent formatting (2-space indents, LF line endings, UTF-8), comprehensive Python project ignore patterns, Python 3.11 version specification, project dependencies (agentuity, tensorlake, pydantic), and Agentuity framework configuration with development/deployment commands and agent metadata.
Documentation `README.md`, `AGENTS.md`	Added user-facing README documenting TensorLake Agent setup, usage, deployment, and troubleshooting; added developer guide for Python Agent development with Agentuity platform, including core interfaces, storage APIs, and best practices.
Package Structure `agentuity_agents/__init__.py`, `agentuity_agents/tensorlake_agent/__init__.py`	Added Python package initializer files establishing the module namespace.
Core Agent Implementation `agentuity_agents/tensorlake_agent/agent.py`	Introduced comprehensive agent with seven Pydantic data models (Buyer, Seller, RealEstateSchema, InvoiceLineItem, InvoiceSchema, ContractParty, ContractSchema), schema registry (SCHEMAS), distributed text analysis function (topk\_words), analyze\_text endpoint, and agent handler (welcome and run) supporting four actions: parse (DocumentAI integration with optional signature detection and structured extraction), status (job polling), schemas (enumeration), and analyze (local text analysis).
Application Entry Points `main.py`, `server.py`	Added simple main entry point with greeting; added server entry point with environment validation (AGENTUITY\_API\_KEY/SDK\_KEY requirement, AGENTUITY\_TRANSPORT\_URL warning), logging configuration, and Agentuity autostart invocation.

Sequence Diagram

sequenceDiagram
    actor Client
    participant Agent as TensorLake Agent
    participant DocumentAI as DocumentAI API
    participant Storage as Agent Storage

    Client->>Agent: POST /run (action: "parse")
    activate Agent
    alt Demo Mode
        Agent->>Agent: Generate sample data
        Agent-->>Client: Return demo results
    else Production
        Agent->>DocumentAI: Submit document parse job
        DocumentAI-->>Agent: parse_id
        Agent->>DocumentAI: Poll for completion
        DocumentAI-->>Agent: Status (processing/complete)
        alt Parse Complete
            Agent->>DocumentAI: Fetch structured extraction results
            DocumentAI-->>Agent: Extracted data + chunks
            Agent->>Storage: Store results (optional)
            Agent-->>Client: Return structured data + chunks
        else Error/Timeout
            Agent-->>Client: Return error details
        end
    end
    deactivate Agent

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Areas requiring extra attention:

agent.py — Dense implementation with multiple Pydantic models, DocumentAI integration logic, error handling paths (quota guidance, timeouts, demo mode), and distributed analysis function (topk\_words); verify DocumentAI API calls, structured extraction schema mapping, and async/polling patterns
agentuity.yaml — Validate configuration accuracy (server command args, environment file paths, resource constraints, bundler settings)
server.py — Review environment validation logic and early-exit handling for missing API credentials
Data model field validation and default values across schemas

Poem

🐰 A TensorLake glistens in the code,
Parsing documents down the road!
Schemas, chunks, and structured delight,
DocumentAI shining oh so bright—
This fuzzy agent hops with glee! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'TensorLake agent example' directly and clearly summarizes the main change: introducing a complete TensorLake agent implementation as an example, covering configuration, documentation, and core agent code.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch tensorlake

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (5)

agents/tensorlake/tensorlake-agent/pyproject.toml (1)

1-10: Project metadata and dependencies look reasonable; confirm supported versions

The manifest is minimal but sufficient for this example agent, and the dependency set matches what you’d expect. It would be good to double‑check that the agent has been run/tests have passed on all Python versions in the >=3.10, <3.13 range and with the chosen minimum versions of agentuity, tensorlake, and pydantic, or narrow the constraints if you only intend to support a subset.

agents/tensorlake/tensorlake-agent/main.py (1)

1-6: Simple CLI entry point is fine for now

A minimal main() that prints a greeting is acceptable as a placeholder; you can later evolve this to invoke the actual agent/server startup if you want a richer CLI experience.
agents/tensorlake/tensorlake-agent/server.py (1)
20-27: Consider using yellow for warning messages instead of red.

The warning messages on lines 22-26 use red ANSI codes (\033[31m) but are labeled as [WARN]. Red is typically reserved for errors, while yellow (\033[33m) is more conventional for warnings, improving visual distinction.
     # Check if AGENTUITY_TRANSPORT_URL is set
     if not os.environ.get("AGENTUITY_TRANSPORT_URL"):
         print(
-            "\033[31m[WARN] You are running this agent outside of the Agentuity environment. Any automatic Agentuity features will be disabled.\033[0m"
+            "\033[33m[WARN] You are running this agent outside of the Agentuity environment. Any automatic Agentuity features will be disabled.\033[0m"
         )
         print(
-            "\033[31m[WARN] Recommend running `agentuity dev` to run your project locally instead of `python script`.\033[0m"
+            "\033[33m[WARN] Recommend running `agentuity dev` to run your project locally instead of `python script`.\033[0m"
         )
agents/tensorlake/tensorlake-agent/agentuity_agents/tensorlake_agent/agent.py (2)
10-13: Remove unused import.

The json module is imported but not used. The JSON parsing is handled by request.data.json().
 import os
 import re
-import json
 from typing import List, Tuple, Optional
458-463: Use logging.exception to capture traceback and consider narrowing the exception type.

Per static analysis, catching bare Exception is broad. While acceptable for a top-level handler, using context.logger.exception() instead of context.logger.error() will automatically include the traceback, which aids debugging.
     except Exception as e:
-        context.logger.error("Error in TensorLake agent: %s", str(e))
+        context.logger.exception("Error in TensorLake agent: %s", str(e))
         return response.json({
             "error": str(e),
             "type": type(e).__name__
         })

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abb2d97 and 3d2095f.

⛔ Files ignored due to path filters (1)

agents/tensorlake/tensorlake-agent/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (12)

agents/tensorlake/tensorlake-agent/.editorconfig (1 hunks)
agents/tensorlake/tensorlake-agent/.gitignore (1 hunks)
agents/tensorlake/tensorlake-agent/.python-version (1 hunks)
agents/tensorlake/tensorlake-agent/AGENTS.md (1 hunks)
agents/tensorlake/tensorlake-agent/README.md (1 hunks)
agents/tensorlake/tensorlake-agent/agentuity.yaml (1 hunks)
agents/tensorlake/tensorlake-agent/agentuity_agents/__init__.py (1 hunks)
agents/tensorlake/tensorlake-agent/agentuity_agents/tensorlake_agent/__init__.py (1 hunks)
agents/tensorlake/tensorlake-agent/agentuity_agents/tensorlake_agent/agent.py (1 hunks)
agents/tensorlake/tensorlake-agent/main.py (1 hunks)
agents/tensorlake/tensorlake-agent/pyproject.toml (1 hunks)
agents/tensorlake/tensorlake-agent/server.py (1 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.18.1)

agents/tensorlake/tensorlake-agent/AGENTS.md

8-8: Images should have alternate text (alt text)

(MD045, no-alt-text)

40-40: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

48-48: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.7)

agents/tensorlake/tensorlake-agent/agentuity_agents/tensorlake_agent/agent.py

458-458: Do not catch blind exception: Exception

(BLE001)

459-459: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🔇 Additional comments (13)

agents/tensorlake/tensorlake-agent/agentuity_agents/__init__.py (1)

1-1: Package initializer is fine as a no-op stub

Keeping this __init__ intentionally empty is a reasonable way to declare the namespace without side effects.

agents/tensorlake/tensorlake-agent/.python-version (1)

1-1: Python version pin is consistent with pyproject range

Pinning to 3.11 here is compatible with the >=3.10, <3.13 constraint in pyproject.toml and gives a clear default for local dev.

agents/tensorlake/tensorlake-agent/agentuity_agents/tensorlake_agent/__init__.py (1)

1-1: Minimal package initializer is acceptable

An empty __init__ (aside from the comment) is fine for now; you can add explicit re-exports later if you want a curated public API.

agents/tensorlake/tensorlake-agent/.editorconfig (1)

1-12: EditorConfig settings look coherent for this subproject

The root flag and basic formatting options are consistent and should give predictable editor behavior across contributors.

agents/tensorlake/tensorlake-agent/.gitignore (1)

1-180: Comprehensive ignore rules with useful Agentuity additions

The Python/IDE patterns are thorough, and the Agentuity-specific entries at the end ensure local agent state and crash reports stay out of git. Keeping .python-version unignored matches the committed version file.

agents/tensorlake/tensorlake-agent/AGENTS.md (1)

1-110: Agentuity Python guide is clear and well-structured

The doc gives a concise but complete overview of handler signatures, request/response/context APIs, storage, and logging, which should be enough for someone to get started with Python agents in this repo.

agents/tensorlake/tensorlake-agent/server.py (1)

29-36: LGTM!

Logging configuration and autostart invocation are properly placed after environment validation.

agents/tensorlake/tensorlake-agent/agentuity.yaml (1)

1-70: LGTM!

The configuration is well-documented with clear comments explaining each section. The watch patterns, bundler settings, and agent definitions are properly structured.

agents/tensorlake/tensorlake-agent/README.md (1)

1-238: Well-structured documentation.

The README provides comprehensive coverage of installation, configuration, usage examples, and troubleshooting. The action/schema tables and JSON examples are helpful for users.

agents/tensorlake/tensorlake-agent/agentuity_agents/tensorlake_agent/agent.py (4)

26-84: LGTM!

The Pydantic schema definitions are well-structured with clear docstrings and appropriate typing. The schema registry pattern provides a clean lookup mechanism.

98-106: LGTM!

The topk_words function is a clean implementation with proper stopword filtering and efficient counting.

152-181: LGTM!

The handler provides a helpful default response when no data is provided, documenting available actions with an example.

388-410: The "status" action blocks by calling wait_for_completion, which defeats the purpose of non-blocking status retrieval.

The TensorLake Python SDK does not expose a non-blocking status check method; wait_for_completion() is the documented way to retrieve results. However, the underlying REST API supports polling via GET /documents/v2/parse/{parse_id}. For a status endpoint, consider either:

Making direct HTTP calls to poll the REST API without blocking, or

Document that this action blocks until parse completion and clarify expected use cases.

coderabbitai · 2025-12-03T04:05:12Z

agents/tensorlake/tensorlake-agent/agentuity.yaml

+  resources:
+    # The memory requirements
+    memory: 250Mi
+    # The CPU requirements
+    cpu: 500M
+    # The disk size requirements
+    disk: 300Mi


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Agentuity CPU resource units format millicores

💡 Result:

Agentuity uses standard Kubernetes CPU quantity format: millicores (m). Key points:

Syntax: integer CPU (e.g., "1") or millicores with "m" suffix (e.g., "250m" = 0.25 CPU). [1]

Conversion: 1000m = 1 CPU (1 vCPU/core); 1m = 0.001 CPU (minimum precision). [1][2]

Usage examples: cpu: "500m" → 0.5 CPU; cpu: "100m" → 0.1 CPU. [1][3]

Sources: Kubernetes docs on resource units and community guides. [1][2][3]

References
[1] Kubernetes — Resource Management for Pods and Containers.
[2] VSHN / Kubernetes resource units explanation.
[3] Kubernetes resources examples (config snippets).

Fix the CPU resource unit to use lowercase 'm'.

The CPU value 500M is incorrect. Agentuity follows standard Kubernetes resource format where CPU must use lowercase m for millicores (e.g., 500m = 0.5 CPU cores). Change cpu: 500M to cpu: 500m on line 46.

🤖 Prompt for AI Agents

In agents/tensorlake/tensorlake-agent/agentuity.yaml around lines 42 to 48, the CPU resource uses an incorrect unit `500M`; change it to the Kubernetes-standard lowercase millicore unit `500m` so the CPU entry reads `cpu: 500m`. Ensure no other CPU entries use uppercase units.

coderabbitai · 2025-12-03T04:05:12Z

agents/tensorlake/tensorlake-agent/README.md

+1. Clone the repo:
+   ```bash
+   git clone https://github.com/agentuity/tensorlake-agent.git
+   cd tensorlake-agent
+   ```


⚠️ Potential issue | 🟡 Minor

Clone URL may be incorrect.

The clone URL references https://github.com/agentuity/tensorlake-agent.git, but this agent appears to be part of the agentuity/examples repository based on the PR context. Consider updating to reflect the correct repository path or providing instructions for navigating to the correct subdirectory.

1. Clone the repo: ```bash - git clone https://github.com/agentuity/tensorlake-agent.git - cd tensorlake-agent + git clone https://github.com/agentuity/examples.git + cd examples/agents/tensorlake/tensorlake-agent ```

🤖 Prompt for AI Agents

In agents/tensorlake/tensorlake-agent/README.md around lines 25 to 29, the repository clone instructions point to https://github.com/agentuity/tensorlake-agent.git which is incorrect for this PR; update the instructions to clone the main examples repo and change the cd step to the agent subdirectory (e.g., git clone https://github.com/agentuity/examples.git and cd examples/agents/tensorlake/tensorlake-agent) so users land in the correct project folder.

coderabbitai · 2025-12-03T04:05:12Z

agents/tensorlake/tensorlake-agent/README.md

+## Configuration
+
+
+1. Open `.env` and set:
+   ```
+   TENSORLAKE_API_KEY=your_tensorlake_api_key
+   ```
+


⚠️ Potential issue | 🟡 Minor

Missing step to create .env file.

Step 1 under Configuration instructs users to "Open .env", but there's no prior instruction to create this file. Consider adding a step to copy from a template or create the file.

## Configuration -1. Open `.env` and set: +1. Create a `.env` file and set: ``` TENSORLAKE_API_KEY=your_tensorlake_api_key ```

🤖 Prompt for AI Agents

In agents/tensorlake/tensorlake-agent/README.md around lines 36 to 43, the Configuration section tells users to "Open `.env`" but omits how to create it; add a preceding step instructing users to create the file (for example by copying a provided template like `.env.example` or creating a new `.env`), and show the exact command to run (e.g., copy or touch) and mention where to place it before setting TENSORLAKE_API_KEY so the instructions are complete.

TensorLake agent example

3d2095f

coderabbitai bot reviewed Dec 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TensorLake agent example #91

TensorLake agent example #91

Uh oh!

joel13samuel commented Dec 3, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 3, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 3, 2025

Uh oh!

coderabbitai bot Dec 3, 2025

Uh oh!

coderabbitai bot Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TensorLake agent example #91

Are you sure you want to change the base?

TensorLake agent example #91

Uh oh!

Conversation

joel13samuel commented Dec 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joel13samuel commented Dec 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 3, 2025 •

edited

Loading