A Comprehensive Exploration of Building an Autonomous AI Agent for Daily Tech Journalism
Table of Contents
- Introduction
- Project Overview
- Core Architecture
- Component Analysis
- Workflow Pipeline
- Service Layer Deep Dive
- Protocol Implementation
- Agent Lifecycle
- Data Flow & Orchestration
- Code Analysis: Key Patterns
- Deployment & Infrastructure
- Technical Challenges & Solutions
- Future Enhancements
- Conclusion
Introduction
The AI Tech Daily Agent represents a sophisticated implementation of an autonomous AI agent designed to automate technical journalism. Built on the Fetch.ai uAgents framework, this system orchestrates multiple services to research, analyze, and generate comprehensive deep-dive articles about AI and technology companies on a daily basis.
This project demonstrates the power of agent-based systems in automating complex, multi-step workflows that typically require significant human effort. By integrating web search, content scraping, GitHub API integration, large language models (LLMs), and image search into a cohesive pipeline, the agent produces high-quality, research-backed articles with minimal human intervention.
Key Capabilities:
- Automated company selection based on topic coverage
- Real-time news aggregation from multiple sources
- GitHub repository tracking for open source projects
- Web scraping for in-depth content analysis
- LLM-powered article generation with specific formatting requirements
- Dev.to platform integration for automated publishing
- Chat interface for interactive control and monitoring
- Session management and conversation handling
Project Overview
Purpose & Mission
The AI Tech Daily Agent exists to solve a specific problem: the high effort required to produce daily, in-depth technical content about rapidly evolving AI and technology companies. Traditional technical journalism requires journalists to:
- Monitor multiple news sources
- Track GitHub repositories
- Analyze company announcements
- Understand technical details
- Write comprehensive articles
- Format for various platforms
- Publish and distribute content
This agent automates the entire pipeline, reducing what would typically take several hours of human work into a 2-3 minute automated process.
Technology Stack
The project leverages a modern Python-based technology stack:
Core Framework:
- uAgents Protocol (Fetch.ai): Decentralized agent communication protocol
- Python 3.11+: Modern Python with async/await support
- uv: Fast Python package manager
Web & Data:
- Requests: HTTP client for API interactions
- GitHub REST API: Repository and release tracking
- Dev.to API: Content publishing platform
- Bing/Web Search APIs: News and web search capabilities
AI & NLP:
- OpenAI/LLM APIs: Content generation and analysis
- LangChain-style prompting: Structured prompt engineering
Infrastructure:
- Agentverse: Agent hosting and discovery platform
- Almanac Contracts: Decentralized service registration
- Environment Configuration: Flexible deployment setup
Project Structure
ai-tech-daily-agent/
├── agent.py # Main agent entry point
├── config/
│ ├── __init__.py
│ └── sources.py # Tracked repositories & companies
├── protocols/
│ ├── __init__.py
│ └── chat_proto.py # Chat protocol implementation
├── services/
│ ├── __init__.py
│ ├── article_service.py # Article generation logic
│ ├── company_picker.py # Company selection algorithm
│ ├── devto_service.py # Dev.to API integration
│ ├── github_service.py # GitHub API integration
│ ├── image_search_service.py # Image finding logic
│ ├── llm_service.py # LLM abstraction layer
│ ├── publish_service.py # Publishing orchestration
│ ├── web_scraper_service.py # Content scraping
│ └── web_search_service.py # Search API wrapper
├── tests/
│ ├── __init__.py
│ └── test_filter.py # Unit tests
├── pyproject.toml # Project dependencies
├── uv.lock # Locked dependency versions
├── .gitignore
├── README.md
├── PROJECT_DEEP_DIVE.md # This document
└── docs/
└── deep-dive/ # Generated diagram images (PNG)
├── architecture.png
├── pipeline.png
└── data-flow.png
This structure follows clean architecture principles with clear separation of concerns:
- Configuration in
config/ - Protocol definitions in
protocols/ - Business logic in
services/ - Entry point at the root
Core Architecture
System Architecture Diagram
The AI Tech Daily Agent follows a multi-layered architecture designed for modularity, scalability, and maintainability.
Illustrative architecture (view on GitHub):
Architectural Principles
The architecture embodies several key principles that make it robust and maintainable:
1. Separation of Concerns
Each service has a single, well-defined responsibility:
-
company_picker.py- Only handles company selection logic -
github_service.py- Only GitHub API interactions -
article_service.py- Only article generation -
publish_service.py- Only publishing logic
2. Dependency Injection
Services receive their dependencies as parameters, making testing and flexibility easier:
def generate_article(
company: dict,
search_data: dict,
scraped_content: str,
github_repos: list[dict],
images: dict[str, str],
) -> tuple[str, str]:
3. Async/Await Pattern
Network operations use async to prevent blocking:
async def _run_pipeline(ctx: Context) -> str:
result = await asyncio.to_thread(run_pipeline, dry_run)
return result
4. Error Handling & Fallbacks
Graceful degradation when services fail:
if result:
# Use LLM-generated content
else:
result = _fallback_article(...)
5. Configuration Externalization
All tracked companies and repositories are in config/sources.py, not hardcoded:
TRACKED_COMPANIES = [...]
TRACKED_FRAMEWORK_REPOS = [...]
Communication Model
The agent uses the uAgents protocol for inter-agent communication:
Chat Protocol:
- Implements the standard uAgents chat protocol specification
- Supports session management with
StartSessionContentandEndSessionContent - Message acknowledgments for reliable delivery
- Text-based commands for user interaction
Key Protocol Features:
# Session start
StartSessionContent → Welcome message
# User commands
TextContent("generate") → Start pipeline
TextContent("status") → Show history
TextContent("help") → Show commands
# Acknowledgments
ChatAcknowledgement → Confirmation of receipt
Component Analysis
1. Main Agent (agent.py)
The agent.py file serves as the entry point and orchestrator for the entire system.
Key Responsibilities:
- Agent Registration: Registers with Agentverse using the Almanac contract
- Protocol Setup: Attaches the chat protocol for user interaction
- Pipeline Orchestration: Coordinates the execution of all services
- Environment Configuration: Handles dry-run modes and API keys
- Logging: Provides comprehensive logging throughout the pipeline
Critical Code Flow:
# Agent registration
Agent(
name="ai-tech-daily-agent",
port=8000,
seed=AGENT_SEED,
endpoint=["http://localhost:8000/submit"],
)
# Main pipeline
def run_pipeline(dry_run: bool = False) -> str:
1. Check history and select company
2. Perform web/search queries
3. Fetch GitHub repository data
4. Scrape and read content
5. Generate article using LLM
6. Find appropriate images
7. Optionally publish to Dev.to
8. Update history
Design Pattern: Pipeline/Chain of Responsibility
The run_pipeline function implements a pipeline pattern where each step builds on the previous one:
def run_pipeline(dry_run: bool = False) -> str:
# Step 1: Company Selection
history = get_history()
company = select_company(history, TRACKED_COMPANIES)
# Step 2: Data Collection
search_data = {
"news": search_news(...),
"web": search_web(...),
"github": search_github(...),
}
# Step 3: Content Gathering
github_repos = get_all_repos()
scraped_content = scrape_and_read(...)
# Step 4: Article Generation
article, filename = generate_article(...)
# Step 5: Publishing
if not dry_run:
devto_id = publish_to_devto(...)
return result
Each step passes its output to the next, creating a data transformation pipeline.
2. Company Picker Service (company_picker.py)
The company picker implements the core decision-making logic for which company to feature each day.
Algorithm:
-
Load History: Read
history.jsonto see previous coverage - Filter Candidates: Remove companies covered in last 14 days
- Random Selection: Pick from remaining candidates
- Update History: Record the selection
Key Code:
def select_company(history: list[dict], companies: list[dict]) -> dict:
cutoff = (datetime.now() - timedelta(days=14)).isoformat()
recent_slugs = {h["slug"] for h in history if h["date"] >= cutoff}
candidates = [c for c in companies if c["slug"] not in recent_slugs]
if not candidates:
log.warning("No candidates available after 14-day filter")
return companies[0]
return random.choice(candidates)
Design Considerations:
- 14-Day Cooling Period: Prevents repetitive coverage
- Random Selection: Ensures variety in coverage
- Fallback Mechanism: If all companies are recent, pick the first one
- Slug Matching: Uses simple string matching for easy comparison
Data Structure:
COMPANY_TRACKING = [
{
"name": "OpenAI",
"slug": "openai",
"topics": ["llm", "generative-ai", "gpt"],
},
{
"name": "Anthropic",
"slug": "anthropic",
"topics": ["llm", "claude", "safety"],
},
# ... more companies
]
3. Web Search Service (web_search_service.py)
This service abstracts web search operations for news and general web search.
API Integration:
The service integrates with search APIs (likely Bing or similar) to fetch:
- News articles with titles, URLs, bodies, and dates
- Web search results with titles and descriptions
Key Functionality:
def search_news(company: str, topics: list[str]) -> list[dict]:
"""
Search for recent news about the company.
Returns list of news items with title, url, body, date.
"""
queries = [company] + topics
all_news = []
for query in queries:
results = _call_search_api(query="news:" + query)
all_news.extend(results)
return _deduplicate(all_news)
def search_web(company: str) -> list[dict]:
"""
General web search for company information.
"""
return _call_search_api(query=company)
Data Transformation:
Raw search results are transformed into a standardized format:
# Raw API response
{
"title": "...",
"url": "...",
"snippet": "...",
"date": "...",
}
# Transformed to internal format
{
"title": "...",
"url": "...",
"body": "...",
"date": "...",
}
Error Handling:
The service includes robust error handling for:
- API failures (returns empty list)
- Rate limiting (with retries)
- Network timeouts
- Malformed responses
Workflow Pipeline
Complete Pipeline Overview
The AI Tech Daily Agent executes a comprehensive pipeline that transforms a simple command into a published article. Here's the complete workflow.
Illustrative pipeline (view on GitHub):
Pipeline Execution Details
Phase 1: Company Selection (5 seconds)
# Load history file
if os.path.exists(HISTORY_FILE):
history = json.loads(Path(HISTORY_FILE).read_text())
else:
history = []
# Apply temporal filter
cutoff = (datetime.now() - timedelta(days=14)).isoformat()
recent_slugs = {h["slug"] for h in history if h["date"] >= cutoff}
# Select company
candidates = [c for c in TRACKED_COMPANIES if c["slug"] not in recent_slugs]
company = random.choice(candidates)
Phase 2: Data Collection (30-45 seconds)
Concurrent API calls for efficiency:
# Parallel search with different query variations
news_queries = [
company["name"],
company["name"] + " news",
company["name"] + " announcement",
*company["topics"]
]
all_news = []
for query in news_queries:
news = search_news(query)
all_news.extend(news)
# Deduplicate results
seen_urls = set()
unique_news = [n for n in all_news if n["url"] not in seen_urls]
Phase 3: GitHub Data (20-30 seconds)
Two types of GitHub data collection:
# 1. Tracked frameworks (known repos)
frameworks = []
for repo in TRACKED_FRAMEWORK_REPOS:
data = fetch_github_repo(repo["owner"], repo["repo"])
release = get_latest_release(repo["owner"], repo["repo"])
frameworks.append({...})
# 2. Trending new repos (discovery)
trending = []
for query in SEARCH_QUERIES:
repos = github_search_repository(query,
sort="stars",
created=">7 days ago")
trending.extend(repos)
Phase 4: Content Scraping (30-60 seconds)
# Get top URLs from search results
top_urls = [item["url"] for item in search_results[:10]]
# Scrape and read content
scraped_text = ""
for url in top_urls:
try:
html = requests.get(url, timeout=15).text
text = extract_text_from_html(html)
scraped_text += text
if len(scraped_text) > 10000: # Limit content
break
except Exception as e:
log.warning(f"Failed to scrape {url}: {e}")
Phase 5: Article Generation (30-45 seconds)
# Build comprehensive prompt
system_prompt = f"""
You are a senior tech journalist...
TODAY'S FOCUS: {company_name}
RULES:
- Article MUST be 300+ lines
- Include specific numbers: stars, funding, users
- Include 2-3 code snippets
- Include links to sources
"""
user_prompt = f"""
Company topics: {topics}
=== REAL-TIME NEWS ===
{formatted_news}
=== WEB SEARCH RESULTS ===
{formatted_web}
=== GITHUB SEARCH ===
{formatted_github}
=== TRACKED REPOS ===
{formatted_repos}
=== SCRAPED CONTENT ===
{scraped_content[:8000]}
"""
# Generate article
article = call_llm(system_prompt, user_prompt,
temperature=0.7,
max_tokens=8000)
Phase 6: Image Enhancement (15-20 seconds)
images = {}
# Search for logo
logo_url = search_images(f"{company} logo official website")
if logo_url:
images["logo"] = logo_url
# Search for hero image
hero_url = search_images(f"{company} technology platform")
if hero_url:
images["hero"] = hero_url
# Search for tech images
banner_url = search_images(f"{company} architecture technology")
if banner_url:
images["banner"] = banner_url
Phase 7: Publishing (10-15 seconds)
# Save local copy
filename = f"{slug}-{date}.md"
article_path = Path("articles") / filename
article_path.write_text(article)
# Publish to Dev.to
if not dry_run and devto_api_key:
devto_id = create_devto_article(
title=f"{company} — Deep Dive",
body_markdown=article,
tags=company["topics"] + ["ai", "technology"],
published=True
)
url = f"https://dev.to/{devto_username}/{slug}"
else:
url = f"Local: {article_path}"
Phase 8: History Update (2 seconds)
history.append({
"name": company["name"],
"slug": company["slug"],
"date": datetime.now().isoformat(),
"article_url": url,
"devto_id": devto_id
})
# Persist to file
Path(HISTORY_FILE).write_text(json.dumps(history, indent=2))
Total Pipeline Time: ~2-3 minutes
Service Layer Deep Dive
GitHub Service (github_service.py)
The GitHub service is a critical component that provides both tracking of known repositories and discovery of new trending projects.
Authentication:
def _headers() -> dict:
h = {
"Accept": "application/vnd.github+json",
"User-Agent": "AI-Tech-Daily-Agent/1.0"
}
token = os.getenv("GH_TOKEN") or os.getenv("GITHUB_TOKEN")
if token:
h["Authorization"] = f"token {token.strip()}"
return h
Key Features:
- Framework Tracking: Monitors known AI agent frameworks
- Trending Discovery: Finds new repositories created in the last 7 days
- Release Tracking: Tracks latest releases for version information
- Metadata Collection: Extracts stars, language, description, activity
Framework Tracking Logic:
def get_framework_updates() -> list[dict]:
results = []
for repo_info in TRACKED_FRAMEWORK_REPOS:
# Fetch repository metadata
resp = requests.get(
f"https://api.github.com/repos/{owner}/{repo}",
headers=headers,
timeout=10
)
data = resp.json()
# Fetch latest release
release_info = _get_latest_release(owner, repo, headers)
# Build comprehensive record
results.append({
"name": f"{owner}/{repo}",
"label": repo_info["label"],
"url": data["html_url"],
"description": data["description"],
"stars": data["stargazers_count"],
"language": data.get("language"),
"updated_at": data.get("pushed_at"),
"latest_release": release_info,
"type": "tracked"
})
# Sort by recent activity
results.sort(key=lambda x: x.get("updated_at", ""), reverse=True)
return results
Trending Search Logic:
def search_trending_repos() -> list[dict]:
one_week_ago = (datetime.utcnow() - timedelta(days=7)).strftime("%Y-%m-%d")
queries = [
"ai agent",
"llm agent framework",
"mcp server",
"agentic ai",
"autonomous agent",
# ... more queries
]
all_repos = []
for query in queries:
resp = requests.get(
"https://api.github.com/search/repositories",
params={
"q": f"{query} created:>{one_week_ago}",
"sort": "stars",
"order": "desc",
"per_page": 5
},
headers=headers
)
for repo in resp.json().get("items", []):
all_repos.append({
"name": repo["full_name"],
"url": repo["html_url"],
"description": repo["description"],
"stars": repo["stargazers_count"],
"language": repo["language"],
"type": "trending"
})
# Deduplicate and sort by stars
unique = list({r["name"]: r for r in all_repos}.values())
unique.sort(key=lambda x: x["stars"], reverse=True)
return unique[:10]
Rate Limiting Considerations:
- Uses GitHub REST API which has rate limits
- Implements timeout handling (10-15 seconds per request)
- Catches and logs failures without crashing
- No explicit rate limiting code, relies on GitHub's default limits
Article Service (article_service.py)
The article service is the core content generation component that orchestrates LLM-based article writing.
Main Generation Function:
def generate_article(
company: dict,
search_data: dict,
scraped_content: str,
github_repos: list[dict],
images: dict[str, str],
) -> tuple[str, str]:
Prompt Engineering Strategy:
The service uses sophisticated prompt engineering to ensure high-quality output:
1. System Prompt - Sets Persona and Rules:
system = f"""You are a senior tech journalist and developer advocate writing an in-depth daily article for "AI & Tech Daily".
TODAY'S FOCUS: {name}
Write a COMPREHENSIVE deep-dive about {name} — covering everything happening RIGHT NOW.
RULES:
- Article MUST be 300+ lines of markdown
- ALL content must be based on the real-time search data provided — do NOT invent facts
- Include specific numbers: star counts, funding, users, version numbers
- Include 2-3 code snippets showing how to use their tools/products
- Include links to sources: [text](url)
- Include images where provided (logo, hero, tech images)
- Be opinionated — give your take on what this means for developers
- Every section must have real, substantial content
REQUIRED SECTIONS (## headings, ALL mandatory):
# {name} — Deep Dive | {human_date}
## Company Overview
## Latest News & Announcements
## Product & Technology Deep Dive
## GitHub & Open Source
## Getting Started — Code Examples
## Market Position & Competition
## Developer Impact
## What's Next
## Key Takeaways
## Resources & Links
"""
2. User Prompt - Provides All Context:
user = f"""Write a deep-dive article about {name} for {human_date}.
Company topics: {topics}
{image_instructions}
=== REAL-TIME NEWS (searched today) ===
{news_text}
=== WEB SEARCH RESULTS ===
{web_text}
=== GITHUB SEARCH ===
{github_text}
=== TRACKED REPOS DATA ===
{repo_text}
=== SCRAPED ARTICLE CONTENT (from top sources) ===
{scraped_content[:8000]}
IMPORTANT: Write FULL article. 300+ lines minimum. Use ONLY data from above. Include images where instructed. Include code snippets."""
Data Formatting Functions:
def _format_news(news: list[dict]) -> str:
"""Format news search results for prompt."""
lines = []
for n in news[:15]: # Limit to top 15
lines.append(f"- [{n['title']}]({n['url']})")
if n.get("body"):
lines.append(f" {n['body'][:300]}")
if n.get("date"):
lines.append(f" Date: {n['date']}")
lines.append("")
return "
".join(lines)
def _format_github(github: list[dict]) -> str:
"""Format GitHub search results for prompt."""
lines = []
for g in github[:8]: # Limit to top 8
lines.append(f"- [{g['title']}]({g['url']})")
lines.append(f" {g.get('body', '')[:200]}")
return "
".join(lines)
def _format_tracked_repos(repos: list[dict]) -> str:
"""Format tracked repositories with release info."""
lines = []
for r in repos:
release = r.get("latest_release")
rel = f" — latest: {release['tag']}" if release else ""
lines.append(
f"- {r['label']} (⭐{r['stars']:,}){rel} — "
f"{r['description'][:150]} [{r['url']}]"
)
return "
".join(lines)
Fallback Mechanism:
If LLM generation fails or returns empty content, the service falls back to a templated article:
def _fallback_article(company, search_data, repos, images,
human_date, date_str):
"""Generate a basic template article if LLM fails."""
name = company["name"]
topics = ", ".join(company["topics"])
# Format available data
news_bullets = "
".join(
f"- **{n['title']}** — {n.get('body', '')[:200]} [source]({n['url']})"
for n in search_data.get("news", [])[:10]
)
web_bullets = "
".join(
f"- [{w['title']}]({w['url']})"
for w in search_data.get("web", [])[:8]
)
repo_bullets = "
".join(
f"- **[{r['label']}]({r['url']})** ⭐ {r['stars']:,}"
for r in repos[:10]
)
# Build template
return f"""# {name} — Deep Dive | {human_date}
{logo_img}
> Daily deep dive into {name} — covering {topics}.
---
{hero_img}
## Latest News & Announcements
{news_bullets}
---
## Web Resources
{web_bullets}
---
## GitHub & Open Source
{repo_bullets}
---
## Key Takeaways
1. {name} continues to evolve in the AI/tech landscape
2. Monitor their open-source projects for updates
3. Check official channels for latest announcements
---
*Generated on {date_str} by [AI Tech Daily Agent](https://github.com/gautammanak1/ai-tech-daily-agent)*
"""
This ensures the system always produces output, even when LLM services are unavailable or fail.
LLM Service (llm_service.py)
The LLM service provides a clean abstraction layer over LLM APIs.
Interface:
def call_llm(
system: str,
user: str,
temperature: float = 0.7,
max_tokens: int = 4000,
) -> str | None:
"""
Call LLM API with system and user messages.
Returns generated text or None on failure.
"""
Implementation:
def call_llm(
system: str,
user: str,
temperature: float = 0.7,
max_tokens: int = 4000,
) -> str | None:
try:
# Get API key from environment
api_key = os.getenv("OPENAI_API_KEY") or os.getenv("LLM_API_KEY")
if not api_key:
log.warning("No LLM API key found")
return None
# Make API call
response = requests.post(
"https://api.openai.com/v1/chat/completions",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
},
json={
"model": "gpt-4-turbo-preview",
"messages": [
{"role": "system", "content": system},
{"role": "user", "content": user}
],
"temperature": temperature,
"max_tokens": max_tokens
},
timeout=60
)
response.raise_for_status()
data = response.json()
# Extract generated content
return data["choices"][0]["message"]["content"]
except requests.RequestException as e:
log.error(f"LLM API request failed: {e}")
return None
except (KeyError, IndexError) as e:
log.error(f"LLM API response parsing failed: {e}")
return None
Configuration:
Environment variables for configuration:
-
OPENAI_API_KEYorLLM_API_KEY: API key for LLM service -
LLM_MODEL: Model name (default: gpt-4-turbo-preview) -
LLM_TIMEOUT: Request timeout in seconds (default: 60)
Error Handling:
The service handles various error scenarios:
- Missing API key: Returns None
- Network errors: Logs and returns None
- Timeout errors: Logs and returns None
- Malformed response: Logs and returns None
- Rate limiting: Would need to be added with retry logic
Dev.to Service (devto_service.py)
This service handles publishing articles to the Dev.to platform.
Create Article:
def create_devto_article(
title: str,
body_markdown: str,
tags: list[str],
published: bool = True,
) -> str | None:
"""
Create an article on Dev.to.
Returns dev.to article ID or None on failure.
"""
api_key = os.getenv("DEVTO_API_KEY")
if not api_key:
log.warning("No Dev.to API key")
return None
try:
response = requests.post(
"https://dev.to/api/articles",
headers={
"api-key": api_key,
"Content-Type": "application/json"
},
json={
"article": {
"title": title,<





