Skip to content

Commit 5a008db

Browse files
authored
Supporting blog content - Local rag with lightweight Elasticsearch (#488)
1 parent ab47c4a commit 5a008db

File tree

12 files changed

+754
-0
lines changed

12 files changed

+754
-0
lines changed
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
MEETING TRANSCRIPT - QA TEAM
2+
Date: Wednesday, September 18, 2025
3+
Time: 10:00 AM - 11:30 AM
4+
Participants: Maria (QA Lead), Tom (Senior QA Engineer), Lisa (QA Automation Engineer), Roberto (Manual Testing Specialist)
5+
6+
[10:02] Maria: Let's review CRM migration testing progress. Tom, report on data import tests?
7+
8+
[10:03] Tom: Found critical issues. Import failures with special characters in addresses and names.
9+
10+
[10:06] Tom: UTF-8 parsing problems with accents, currency symbols, and Asian characters.
11+
12+
[10:08] Tom: 12% of records affected - about 15,000 out of 125,000 total records.
13+
14+
[10:09] Roberto: Confirmed. Also, failed imports corrupt entire batches.
15+
16+
[10:12] Lisa: No atomic transactions for batches?
17+
18+
[10:13] Tom: Correct. Each record processed independently without rollback.
19+
20+
[10:15] Roberto: Found referential integrity issues - orphaned references between contacts and companies.
21+
22+
[10:19] Maria: Need three validation types: pre-import, during import, and post-import.
23+
24+
[10:25] Tom: Recommend smaller migration batches to reduce risk?
25+
26+
[10:26] Maria: Excellent. Batches of 5,000 records with validation between each.
27+
28+
[10:30] Maria: Four recommendations: UTF-8 parser fix, atomic transactions, handle orphaned references, small batch migration.
29+
30+
[10:33] Roberto: Also need concurrency testing during migration.
31+
32+
[10:40] Maria: Complete additional testing in one week. Feasible?
33+
34+
[10:42] Tom: Will share test cases today.
35+
36+
[10:44] Maria: Friday 2 PM meeting before management review.
37+
38+
[10:45] Lisa: Will prepare testing metrics dashboard.
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
MEETING TRANSCRIPT - DEVELOPMENT TEAM
2+
Date: Monday, September 16, 2025
3+
Time: 09:00 AM - 10:15 AM
4+
Participants: Alice (Tech Lead), John (Senior Developer), Sarah (Backend Developer), Mike (DevOps Engineer)
5+
6+
[09:02] Alice: Let's review the search API deployed last week. Any issues?
7+
8+
[09:03] Sarah: API works but performance degrades with 1,000+ queries per minute. Response times jump from 200ms to 3 seconds.
9+
10+
[09:05] John: Database queries and no caching layer?
11+
12+
[09:06] Sarah: Exactly. Complex queries are slow, and we need a caching layer.
13+
14+
[09:07] Mike: Also hitting CPU limits during spikes. Need auto-scaling.
15+
16+
[09:08] Alice: Three priorities: query optimization, cache implementation, and infrastructure scaling.
17+
18+
[09:11] Sarah: Propose 15-minute TTL cache with event-based invalidation.
19+
20+
[09:13] John: I'll optimize bool queries and add calculated index fields.
21+
22+
[09:17] Mike: Can set up auto-scaling by tomorrow - scale to 6 instances at 70% CPU.
23+
24+
[09:18] Sarah: Starting cache implementation today, basic version by Wednesday.
25+
26+
[09:19] John: New indexes and query optimization ready for testing Wednesday.
27+
28+
[09:24] Alice: Clear plan. Mike handles scaling, Sarah implements cache, John optimizes queries.
29+
30+
[09:26] Alice: I'll coordinate with product team on deployment impacts and QA for load testing.
31+
32+
[09:30] Alice: Meeting Wednesday 3 PM to review progress. Thanks team!
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
MEETING TRANSCRIPT - MANAGEMENT SYNC
2+
Date: Friday, September 20, 2025
3+
Time: 02:00 PM - 03:00 PM
4+
Participants: David (Project Manager), Alice (Tech Lead), Maria (QA Lead), Emma (Product Manager), Carlos (DevOps Manager)
5+
6+
[14:03] Emma: Good progress. Users report 40% search speed improvement, but support tickets show peak hour performance issues.
7+
8+
[14:05] Alice: We've identified bottlenecks. Working on cache layer implementation and search engine query optimization.
9+
10+
[14:06] David: Can we resolve issues without impacting October migration date?
11+
12+
[14:09] Alice: Recommend two-week extension for complete migration due to performance issues.
13+
14+
[14:10] Maria: QA agrees. Found data import blockers with special characters and integrity issues.
15+
16+
[14:12] Maria: Need one week to fix issues, another for complete re-testing.
17+
18+
[14:14] Carlos: Infrastructure supports extension for proper rollback and disaster recovery testing.
19+
20+
[14:15] Emma: Could we do partial migration on original date?
21+
22+
[14:17] Alice: Yes. Contact management module first, reports and analytics in phase two.
23+
24+
[14:21] Maria: Phased migration ideal for QA - validate each module independently.
25+
26+
[14:22] David: Proposal: Phase 1 - Contact management October 15th. Phase 2 - Complete migration October 30th.
27+
28+
[14:23] Alice: Reasonable timeline for performance fixes.
29+
30+
[14:24] Emma: Works from product perspective. Will update stakeholder communications.
31+
32+
[14:25] Maria: QA commits to these timelines.
33+
34+
[14:26] Carlos: Will prepare deployment strategies for both phases.
35+
36+
[14:32] David: Carlos, send deployment calendar by Monday. Thanks team!
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
WEEKLY REPORT - QA TEAM
2+
Week of September 16-20, 2025
3+
Prepared by: Maria Gonzalez, QA Lead
4+
5+
=== EXECUTIVE SUMMARY ===
6+
QA team identified critical issues in CRM migration testing. Significant problems in legacy data import and referential integrity require immediate attention.
7+
8+
=== TESTING COMPLETED ===
9+
- Functional: Contact management (100%), Authentication (100%), Search (75%), Analytics (60%)
10+
- Data import: 125,000 legacy records tested, 12 critical issues found
11+
- Performance: Core modules complete, identified issues with 500+ concurrent users
12+
13+
=== CRITICAL ISSUES ===
14+
**QA-2025-001 - Data Import Failures**
15+
- UTF-8 parsing problems with special characters
16+
- 15,000 records affected (12% of total)
17+
- Escalated to development
18+
19+
**QA-2025-002 - Transaction Integrity**
20+
- Failed imports leave batches in inconsistent state
21+
- No atomic transactions for batches
22+
- Requires architecture redesign
23+
24+
**QA-2025-003 - Orphaned References**
25+
- 2,300 records with invalid company/contact references
26+
- Pending business logic decision
27+
28+
=== METRICS ===
29+
- Test cases executed: 847 of 1,200 (70.6%)
30+
- Pass rate: 79.3%, Automation coverage: 36%
31+
- Bugs: 28 total (4 critical, 8 high, 12 medium, 4 low)
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
WEEKLY REPORT - DEVELOPMENT TEAM
2+
Week of September 16-20, 2025
3+
Prepared by: Alice Thompson, Tech Lead
4+
5+
=== EXECUTIVE SUMMARY ===
6+
Development team completed critical infrastructure components but identified performance bottlenecks requiring attention before production deployment.
7+
8+
=== KEY ACCOMPLISHMENTS ===
9+
- Database schema and indexes completed for CRM
10+
- 12 of 18 API endpoints integrated with authentication
11+
- Contact management: 95% complete, Search: 80%, Analytics: 70%
12+
13+
=== TECHNICAL CHALLENGES ===
14+
- Critical: Search API degrades at 1,000+ queries/minute (200ms to 3+ seconds)
15+
- Root cause: Complex database queries without caching layer
16+
- Multi-filter searches average 1.2 seconds execution time
17+
18+
=== ACTION PLAN NEXT WEEK ===
19+
1. Cache system implementation (Sarah) - Basic by Wednesday, complete by Friday
20+
2. Search engine query optimization (John) - Testing ready Wednesday
21+
3. Auto-scaling setup (Mike) - Scale to 6 instances at 70% CPU
22+
23+
=== METRICS ===
24+
- Story points: 43 of 50 completed (86%)
25+
- Bugs: 7 reported, 12 resolved
26+
- Code coverage: 78% (target: 80%)
27+
28+
=== TIMELINE ===
29+
- October 15 Contact Management: 85% confidence, 2 sprints remaining
30+
- October 30 Complete Migration: 90% confidence, 4 sprints remaining
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Build a Local lightweight RAG System with Elasticsearch
2+
3+
Simple RAG (Retrieval-Augmented Generation) system using Elasticsearch for semantic search and Local AI as model provider. This application serves as supporting content for the blog post [Build a Local lightweight RAG System with Elasticsearch](https://www.elastic.co/search-labs/blog/local-rag-with-lightweight-elasticsearch)
4+
5+
## Prerequisites
6+
7+
- Docker
8+
- Python 3.11+
9+
10+
## Quick Start
11+
12+
### 1. Activate Virtual Environment
13+
14+
```bash
15+
python -m venv venv
16+
source venv/bin/activate
17+
```
18+
19+
### 2. Install Python Dependencies
20+
21+
```bash
22+
pip install -r requirements.txt
23+
```
24+
25+
### 3. Configure Environment Variables
26+
27+
Create an `.env` and put there your settings:
28+
29+
```yaml
30+
# Elasticsearch Configuration
31+
ES_URL=http://localhost:9200
32+
ES_API_KEY="your_elasticsearch_api_key_here"
33+
INDEX_NAME=team-data
34+
35+
# Local AI Configuration
36+
LOCAL_AI_URL=http://localhost:8080/v1
37+
38+
# Dataset Configuration
39+
DATASET_FOLDER=./Dataset
40+
```
41+
42+
### 4. Run the Script
43+
44+
```bash
45+
python script.py
46+
```
47+
48+
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
🔍 Search: 'Can you summarize the performance issues in the API?'
2+
3+
🤖 Asking to model: llama-smoltalk-3.2-1b-instruct
4+
5+
## 💡 Question:
6+
Can you summarize the performance issues in the API?
7+
8+
## 📝 Answer:
9+
Based on the documents, it appears that the main performance issue with the API is related to the search query optimization. The API degrades to around 1,000+ queries per minute (QP/min) when there are 12 of 18 API endpoints integrated with authentication. This issue is caused by complex queries without a caching layer, leading to performance degrades and slow response times.
10+
11+
However, there is also a smaller issue with the "Search" API, where it degrades to around 3+ seconds after 1.2 seconds execution time. This is likely due to multi-filter searches and the need for a caching layer to improve performance.
12+
13+
To address these issues, the team is working on implementing a caching layer (Sarah) and optimizing bool queries and adding calculated index fields (John) to improve query efficiency. They are also working on setting up auto-scaling for the database (Mike) to ensure that it can handle increased traffic.
14+
15+
A meeting was held to discuss these issues and a plan for improvement was agreed upon. The team will work together to implement a caching layer and optimize the queries, and the team will work with product team to ensure that the migration is completed on time and does not impact the October migration date.
16+
17+
📚 Citations:
18+
[1] report_development-team.txt
19+
[2] meeting_development-team_monday.txt
20+
[3] meeting_management-sync_friday.txt
21+
22+
23+
## Stats
24+
🔍 Search Latency: 12ms
25+
26+
🤖 AI Latency: 21019ms | 5.8 tokens/s
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
2+
🔍 Search: 'Can you summarize the performance issues in the API?'
3+
4+
🤖 Asking to model: dolphin3.0-qwen2.5-0.5b
5+
6+
## 💡 Question:
7+
Can you summarize the performance issues in the API?
8+
## 📝 Answer:
9+
10+
The performance issues in the API can be summarized as follows:
11+
12+
1. **Search API degrades with 1,000+ queries per minute**: The search API has degraded, with performance degrades to a time of 3 seconds compared to the target of 200ms.
13+
14+
2. **Complex queries are slow and no caching layer**: Complex queries take longer to execute as expected.
15+
16+
3. **CPU limits during spikes**: The CPU usage is at 100% at the peak hour and limits to 70% at the peak hour.
17+
18+
📚 Citations:
19+
[1] report_development-team.txt
20+
[2] meeting_development-team_monday.txt
21+
[3] meeting_management-sync_friday.txt
22+
23+
## Stats
24+
25+
🔍 Search Latency: 14ms
26+
27+
🤖 AI Latency: 16044ms | 9.5 tokens/s
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
🔍 Search: 'Can you summarize the performance issues in the API?'
2+
3+
🤖 Asking to model: smollm2-1.7b-instruct
4+
5+
## 💡 Question:
6+
7+
Can you summarize the performance issues in the API?
8+
## 📝 Answer:
9+
10+
The development team identified two key technical challenges for the API:
11+
12+
1. The search API degrades at 1,000+ queries per minute, causing average execution times to jump from 200ms to 3 seconds.
13+
2. The root cause is complex database queries without a caching layer, leading to poor query performance.
14+
15+
📚 Citations:
16+
[1] report_development-team.txt
17+
[2] meeting_development-team_monday.txt
18+
[3] meeting_management-sync_friday.txt
19+
20+
## Stats
21+
22+
🔍 Search Latency: 16ms
23+
24+
🤖 AI Latency: 47561ms | 4.8 tokens/s
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
>>> Why is the sky blue?
2+
3+
## Raw Response
4+
5+
```json
6+
{"created":1763994585,"object":"chat.completion","id":"d96f7e7a-003f-4167-bb07-501aa41ef8e7","model":"dolphin3.0-qwen2.5-0.5b","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"The sky being blue is primarily due to the scattering of sunlight by the Earth's atmosphere. Here are a few key factors contributing to the blue color of the sky:\n\n1. **Scattering of Light**: Blue light is scattered more than other colors by the Earth's atmosphere. This scattering occurs because blue light has a shorter wavelength, which is more likely to travel through the Earth's atmosphere than other wavelengths. The scattering occurs as the light bounces off particles in the atmosphere, such as dust and water vapor, which have a higher concentration in the upper atmosphere.\n\n2. **Atmospheric Composition**: The Earth's atmosphere is composed of gases like nitrogen, oxygen, and water vapor. These gases have different properties, and their interactions with sunlight can affect the colors they transmit. Nitrogen and oxygen gases have a scattering effect that is more intense at shorter wavelengths than water vapor, which results in the blue color observed in the sky.\n\n3. **Atmospheric Reflection**: The blue color of the sky also depends on the amount of light reflecting off the Earth's surface. Clouds, for instance, can reflect a lot of sunlight, particularly blue and green light, which creates a blue tint to the sky. Additionally, the reflection of sunlight from the Earth's surface can contribute to the blue color observed in the sky.\n\n4. **Clouds and Precipitation**: The presence of clouds and precipitation can also affect the color of the sky. Clouds can scatter blue light, but they can also trap it, making the sky appear a bit less blue. Precipitation, particularly heavy rain or snow, can also scatter light, making the sky appear a bit darker and more gray.\n\nIn summary, the blue color of the sky is a result of the scattering of sunlight by the Earth's atmosphere, with the primary contributor being the scattering of blue light and the absorption of other wavelengths by particles in the atmosphere."}}],"usage":{"prompt_tokens":14,"completion_tokens":384,"total_tokens":398}}
7+
```
8+
9+
## Answer
10+
11+
The sky being blue is primarily due to the scattering of sunlight by the Earth's atmosphere. Here are a few key factors contributing to the blue color of the sky:
12+
13+
1. **Scattering of Light**: Blue light is scattered more than other colors by the Earth's atmosphere. This scattering occurs because blue light has a shorter wavelength, which is more likely to travel through the Earth's atmosphere than other wavelengths. The scattering occurs as the light bounces off particles in the atmosphere, such as dust and water vapor, which have a higher concentration in the upper atmosphere.
14+
2. **Atmospheric Composition**: The Earth's atmosphere is composed of gases like nitrogen, oxygen, and water vapor. These gases have different properties, and their interactions with sunlight can affect the colors they transmit. Nitrogen and oxygen gases have a scattering effect that is more intense at shorter wavelengths than water vapor, which results in the blue color observed in the sky.
15+
3. **Atmospheric Reflection**: The blue color of the sky also depends on the amount of light reflecting off the Earth's surface. Clouds, for instance, can reflect a lot of sunlight, particularly blue and green light, which creates a blue tint to the sky. Additionally, the reflection of sunlight from the Earth's surface can contribute to the blue color observed in the sky.
16+
4. **Clouds and Precipitation**: The presence of clouds and precipitation can also affect the color of the sky. Clouds can scatter blue light, but they can also trap it, making the sky appear a bit less blue. Precipitation, particularly heavy rain or snow, can also scatter light, making the sky appear a bit darker and more gray.
17+
18+
In summary, the blue color of the sky is a result of the scattering of sunlight by the Earth's atmosphere, with the primary contributor being the scattering of blue light and the absorption of other wavelengths by particles in the atmosphere.

0 commit comments

Comments
 (0)