Building a portfolio website with an AI assistant

by: Seif Bassem
Jan 31, 2026

Building a Portfolio Website with an AI Assistant

I’ve always noticed something in my career: my best career conversations never happened over a resume alone—they happened when I had a career deck and could show what I built, how I think, and the impact behind it.

During the quiet holiday period, I wanted to work on a small side project to build something with AI or focus on my personal brand.

So, I decided: why not mix both?

I built my own portfolio website, but with a twist: https://seifbassem.com

It includes an AI assistant that takes any job description and generates a personalized summary showing:

How my skills match the role
Which projects are most relevant
And where my experience creates the strongest fit

It started as a “holiday project”, but it ended up becoming the most natural way I’ve ever represented my career.

In this post, I will explain how I built it including:

🏗️ Architecture and design trade-offs (Azure free services/tiers without compromising security)
📚 AI frameworks, techniques and optimizations (caching, rate limiting, RAG, ..etc)
🏭 Automation (CI/CD)
🤖 Peer/vibe coding tips (what made me go faster vs slowed me down)
💲 How I manage to keep costs under 10$ while not sacrificing security best practices

Architecture

The architecture follows a modern serverless approach with clear separation of concerns between the frontend, backend API, and AI services.

Screenshot showing the diagram

Component Overview

Component	Purpose	Trade-off
Cloudflare	CDN, proxy, SSL termination	Free tier provides enterprise-grade edge security
Astro Frontend	Front-end using Astro content-driven framework	Allows server-side API routes to proxy requests to the internal backend, keeping it hidden from the internet
FastAPI Backend	AI API with streaming responses	Internal-only ingress (not exposed to internet)
Azure Container Apps	Hosting frontend and backend services	Scaling to zero to optimize cost and integrate easily with container registry
Microsoft Foundry	LLM inference (DeepSeek-V3.2) + Embeddings	Pay-per-token, open-source model provides great value for performance vs cost
Azure AI Search	Vector + semantic search for RAG	Free tier provides more than enough capabilities for this project
Azure Table Storage	Persistent rate limiting across restarts	Essentially free for low-volume use
Key Vault	Secrets management	Standard tier provides low cost given limited operations
Container Registry	Storing images for frontend and backend	Basic tier provides low cost for pull/push operations

Key Trade-offs

Scale-to-Zero vs Cold Starts: Both container apps scale to zero when idle, saving costs but introducing ~5-10 second cold starts. This is a trade-off I can live with given that this is a portfolio website. A warmup endpoint is used to reduce wait time for the AI assistant endpoint.
Internal API Ingress: The FastAPI container app backend has ingressExternal: false, making it only accessible within the Container Apps environment. This eliminates the need for additional API authentication.
DeepSeek vs OpenAI: Using DeepSeek-V3.2 via Microsoft Foundry provides excellent quality at a fraction of GPT costs while maintaining security best practices and guardrails.
Container Apps Vs Static web apps: Another option was to use Azure Static Web Apps instead of Container apps, the main issue was using a custom backend would force me to use the standard plan which alone costs 9$ per month. This would not allow me to stay within the 10$ goal given the other costs.
Network security perimeter (NSP): Using private endpoints would require more expensive SKUs for the different Azure services so I decided to use NSP to lock down PaaS services and allow communication between them internally.
Managed Identity for everything: All PaaS services communicate with each other using managed identities. Local authentication, shared access keys, all are disabled.

Tech Stack

Infrastructure Layer

Technology	Purpose
Bicep	Infrastructure as Code with Azure Verified Modules
Github Actions	CI/CD for continuous integration and deployment to Azure using OpenID Connect
Azure Container Apps	Serverless container hosting with scale-to-zero
Azure Container Registry	Private container image storage (Basic tier)
Network Security Perimeter	Zero-trust network isolation for PaaS services

Frontend Layer

The frontend is built with Astro, a content-driven framework that supports Server-Side Rendering (SSR). It runs as a containerized Node.js application, enabling server-side API routes that proxy requests to the internal backend.

The Docker image is optimized for production using a few key techniques:

Multi-stage builds: The build process happens in separate stages, so development tools and source code don’t end up in the final image
Alpine Linux base: A lightweight Linux distribution (~5MB) that keeps the image small
Production-only dependencies: Only the packages needed to run the app are included
Non-root user: The container runs as a regular user instead of root for better security

Backend Layer

Technology	Purpose
FastAPI	High-performance async API framework
LiteLLM	Unified LLM interface with caching
SlowAPI	Rate limiting for API endpoints
Python	Runtime with optimization flags
Azure Python SDKs	Identity, Search, Tables integration

This is a containerized FastAPI-based AI assistant API that answers questions about my professional expertise. It combines Retrieval Augmented Generation (RAG) with Azure services to provide intelligent, context-aware responses streamed in real-time. It’s following the same best practices like the frontend to optimize security and image size.

Configuration & Setup

The application loads configuration from environment variables and a YAML file for prompts. It sets up key parameters like cache TTL, rate limits, maximum question length, and search retrieval settings. System prompts for both the assistant and question classification are loaded from an external prompts.yaml file at startup.

Environment variables with critical secrets are stored in Azure Key Vault for better security.

Azure Services Integration

The app authenticates with Azure using DefaultAzureCredential, enabling seamless authentication across local development (Azure CLI) and production (Managed Identity). It connects to three core Azure services:

Azure AI Search for retrieving relevant portfolio content and expertise
Microsoft Foundry for generating embeddings and LLM responses
Azure Table Storage for persistent rate limiting across container restarts
Azure Key Vault for storing different keys and secrets
Caching Response caching is handled via LiteLLM’s built-in cache (in-memory by default, with an optional disk cache for persistence). I didn’t use the Azure file share caching option which would provide better caching as it requires using the Storage Account Shared Access Key which I disabled for security reasons.
Rate Limiting Rate limiting is implemented using SlowAPI to protect the API from abuse. Custom fun messages are returned when users exceed their daily quota.

Two-Stage AI Pipeline

The AI assistant uses a cost-efficient two-stage approach:

Classification Stage: A lightweight, low-token call determines if the question is relevant to my career/expertise. Off-topic questions are filtered out before expensive RAG processing.
RAG + Response Stage: For relevant questions, the system generates embeddings, performs hybrid semantic search on Azure AI Search, retrieves relevant document chunks, and streams the LLM response.

Context Retrieval (RAG)

The RAG function implements hybrid search combining keyword matching and vector similarity, enhanced with semantic ranking. It converts the user’s question to an embedding, searches the Azure AI Search index, and returns concatenated document chunks to provide grounded context for the LLM. All my expertise is stored in markdown files in blob storage.

Streaming Responses

Responses are delivered as Server-Sent Events (SSE), enabling word-by-word streaming to the client. This provides a responsive user experience where answers appear progressively rather than all at once. This allows the user experience to be smooth and avoid excessive wait times.

Security & Input Validation

The API includes multiple layers of protection:

Input sanitization to detect and deflect prompt injection attempts
Request validation for empty or overly long questions
IP-based rate limiting to avoid API abuse and unexpected costs for AI tokens
Non-root user execution (the API runs internally, not exposed externally)

API Endpoints

The application exposes two main endpoints:

GET /: A simple health check returning service status
POST /ask: The main endpoint that accepts a question, applies rate limiting, sanitizes input, and streams the AI-generated response. This endpoint is also used to warmup the AI assistant service once the website is loaded

AI services

Service	Model/SKU	Purpose
Microsoft Foundry	S0	Unified AI services account
DeepSeek-V3.2	GlobalStandard	Primary LLM for responses
text-embedding-3-small	Standard	Query embeddings for RAG
Azure AI Search	Free	Hybrid + semantic search + Semantic ranker

AI pipeline

RAG Architecture

The AI assistant uses Retrieval-Augmented Generation (RAG) to answer questions grounded in my portfolio content:

User Question → Embedding → Hybrid Search (Keyword + Vector) → Semantic Reranking → LLM with Context

Document Ingestion

Portfolio markdown files are:

Stored in blob storage
Chunked into smaller segments
Embedded using text-embedding-3-small
Indexed in Azure AI Search with semantic configuration

Retrieval

def retrieve_context(query: str, top_k: int = 5) -> str:
    query_embedding = get_embedding(query)
    
    results = search_client.search(
        search_text=query,  # Keyword component
        vector_queries=[VectorizedQuery(
            vector=query_embedding,
            k_nearest_neighbors=top_k,
            fields="text_vector"
        )],
        query_type="semantic",  # Enable semantic ranking
        semantic_configuration_name=semantic_config,
    )
    return "\n\n".join([doc["chunk"] for doc in results])

Two-Stage Question Processing

To optimize costs and improve relevance, questions go through two stages:

Stage 1: Classification (~150 tokens)

classification_prompt: |
  Classify if this input is asking ABOUT Seif or his professional work.
  Reply TRUE if asking about experience, skills, job fit, etc.
  Reply FALSE if asking for code, tutorials, or off-topic.

Stage 2: RAG Response (only for relevant questions)

assistant_prompt: |
  You are Seif's AI assistant. Answer using the provided context.
  Include MULTIPLE specific examples (at least 2-3 when available).
  Reference specific projects, contributions, or roles.

Model Selection

Model	Use Case	Why
DeepSeek-V3.2	Main LLM responses	High quality, low cost, fast inference
text-embedding-3-small	Query embeddings	Industry standard, good price/performance

Rate Limiting

Rate limiting is implemented at multiple levels:

Cloudflare: Provides basic DDOS and security features protection within its free plan

Application: 3 requests/day per IP using SlowAPI:

@app.post("/ask")
@limiter.limit(RATE_LIMIT)
async def ask_question(request: Request, question_request: QuestionRequest):

Persistent Storage: Azure Table Storage ensures limits persist across container restarts:

class AzureTableStorage(Storage):
    def _sanitize_key(self, key: str) -> str:
        # Extract IP for consistent rate limiting
        ip_match = re.search(r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})', key)
        return ip_match.group(1) if ip_match else key

Response Caching

LiteLLM provides built-in caching to avoid redundant LLM calls for identical questions. This is crucial for cost optimization and response latency:

# Cache TTL from environment (default: 1 hour)
CACHE_TTL = int(os.getenv("CACHE_TTL_SECONDS", "3600"))

# In-memory cache for fast repeated responses
litellm.cache = litellm.Cache(type="local", ttl=CACHE_TTL)
litellm.enable_cache()

Caching Trade-offs

Strategy	Pros	Cons
In-Memory (current)	Fast, no dependencies	Resets on cold start/restart
Disk Cache	Persists across restarts	Requires volume mount, slightly slower. At the time I built this, integration with Azure File Share was available but required the Storage Account shared access key (which I disabled for better security)
Redis	Distributed, persistent	Additional cost and complexity

The architecture supports disk caching via an EmptyDir volume mount:

volumes: [
  {
    name: 'cache-volume'
    storageType: 'EmptyDir'
  }
]

For a low-traffic portfolio site, in-memory caching strikes the right balance—cache hits during active sessions provide instant responses, while cold starts naturally refresh the cache with current data.

Streaming Responses

Responses are streamed using Server-Sent Events (SSE) for real-time user experience:

async def stream_ai_response(question: str) -> AsyncGenerator[str, None]:
    response = await litellm.acompletion(
        model=f"azure/{DEPLOYMENT_NAME}",
        messages=messages,
        stream=True,
    )
    async for chunk in response:
        if chunk.choices[0].delta.content:
            yield sse_message(content=chunk.choices[0].delta.content)
    yield sse_message(done=True)

System Prompt Design

The system prompt is externalized to a YAML file for easy updates without code changes:

assistant_prompt: |
  You are Seif's AI assistant, helping visitors understand his expertise.

  ## Response Guidelines
  - Base answers primarily on provided context
  - Include MULTIPLE specific examples (at least 2-3)
  - Cite concrete metrics and achievements
  - Reference specific projects and contributions
  - Format response in markdown for readability

Security

Despite the low cost, security is not compromised. Here’s how:

Zero-Trust Network Architecture

module nsp 'br/public:avm/res/network/network-security-perimeter:0.1.3' = {
  params: {
    resourceAssociations: [
      { privateLinkResource: keyVault.outputs.resourceId, accessMode: 'Enforced' },
      { privateLinkResource: storageAccount.outputs.resourceId, accessMode: 'Enforced' },
      { privateLinkResource: foundry.outputs.foundryResourceId, accessMode: 'Enforced' }
    ]
  }
}

Network Security Perimeter (NSP) enforces that Key Vault, Storage, and Microsoft Foundry can only be accessed from within the security perimeter, no public internet exposure.

Authentication & Authorization

Security Feature	Implementation
No API Keys	All services use `disableLocalAuth: true`
Managed Identity	User-assigned identity with least-privilege RBAC
Key Vault RBAC	Secrets accessed via identity, not access policies
TLS 1.2+	Enforced on all storage and service endpoints

module storageAccount 'br/public:avm/res/storage/storage-account:0.31.0' = {
  params: {
    allowSharedKeyAccess: false,  // No storage keys
    minimumTlsVersion: 'TLS1_2',
    allowBlobPublicAccess: false
  }
}

Container Security

Both Dockerfiles follow security best practices:

# Non-root user execution
RUN groupadd --gid 1000 appgroup && \
    useradd --uid 1000 --gid appgroup appuser
USER appuser

# Multi-stage builds (no build tools in production)
FROM python:3.12-slim AS runtime

Input Sanitization & Prompt Injection Protection

The backend includes regex-based detection for prompt injection attempts:

SUSPICIOUS_PATTERNS = [
    r"ignore\s+(previous|above|all)\s+instructions?",
    r"disregard\s+(previous|above|all)",
    r"you\s+are\s+now",
    r"system\s*:\s*",
    r"\{\{.*\}\}",  # Template injection
]

Suspicious inputs are logged and deflected with a friendly message rather than processed.

Edge Security via Cloudflare

WAF Rules: Blocks common attack patterns before reaching Azure
DDoS Protection: Automatic mitigation at the edge
SSL Termination: Cloudflare handles certificate management
Origin Certificates: Mutual TLS between Cloudflare and Azure

Cost

Here’s the breakdown of monthly costs for this architecture:

Service	SKU	Monthly Cost
Container Apps (Astro)	Consumption	~$2-4 (scale-to-zero) Sometimes for managed domain and certificates, there are requests that get send to the app so it scales to 1 in different times through the day
Container Apps (API)	Consumption	~$0-1 (scale-to-zero)
Azure AI Search	Free	$0
Microsoft Foundry (DeepSeek)	Pay-per-token	~$0.08 (light usage)
Microsoft Foundry (Embeddings)	Pay-per-token	~$0.02
Container Registry	Basic	~$5
Storage Account	Standard LRS, Cold	<$0.01
Key Vault	Standard	<$0.02
Cloudflare	Free tier	$0
Total		~$8-10/month

Cost Optimization Strategies

Scale-to-Zero: Both container apps scale down to zero replicas when idle:

scaleSettings: {
  maxReplicas: 3,
  minReplicas: 0,  // Scale to zero
  rules: [{ name: 'http-scaling', http: { metadata: { concurrentRequests: '10' }}}]
}

Minimal Container Resources: Each container uses only 0.25 vCPU and 0.5 GB RAM.
Cold Storage Tier: The storage account uses the Cold access tier since RAG documents are indexed, not frequently accessed.
In-Memory Caching with Disk Fallback: LiteLLM caches responses to avoid duplicate LLM calls:
```
litellm.cache = litellm.Cache(type="local", ttl=CACHE_TTL)
```

Two-Stage Classification: Off-topic questions are rejected with a lightweight LLM call (~10 tokens) before expensive RAG retrieval:

async def classify_question(question: str) -> bool:
    response = await litellm.acompletion(
        max_tokens=10,  # Minimal tokens for TRUE/FALSE
        temperature=0,  # Deterministic
    )

Rate Limiting: 3 questions per day per IP prevents abuse:
```
RATE_LIMIT = os.getenv("RATE_LIMIT", "3/day")
```

Networking

Traffic Flow

User → Cloudflare CDN → Azure Container Apps (Astro) → Internal API (FastAPI) → AI Services

Cloudflare as Reverse Proxy: All traffic first hits Cloudflare, which provides caching, compression, and security filtering.

Custom Domain with Managed Certificate: The Astro container app has a custom domain binding with a certificate from Key Vault:

customDomains: [{
  name: containerAppDomain,
  bindingType: 'Auto',
  certificateId: appEnvironment.outputs.appCertResourceId
}]

Internal Ingress for API: The FastAPI backend is not exposed to the internet:
```
ingressExternal: false
```

Service-to-Service Communication: The Astro frontend proxies API requests internally:

// Astro API route proxies to internal container app
const response = await fetch(`${AI_API_URL}/ask`, {
  headers: { 'X-Forwarded-For': request.headers.get('x-forwarded-for') }
});

Client IP Preservation: The X-Forwarded-For header is forwarded through the proxy chain for accurate rate limiting:

def get_client_ip(request: Request) -> str:
    forwarded = request.headers.get("x-forwarded-for")
    if forwarded:
        return forwarded.split(",")[0].strip()
    return get_remote_address(request)

Network Security Perimeter

The NSP creates a logical boundary around sensitive services, allowing only intra-subscription communication:

profiles: [{
  accessRules: [
    { name: 'inbound', direction: 'Inbound', subscriptions: [{ id: subscription().id }] },
    { name: 'outbound', direction: 'Outbound', fullyQualifiedDomainNames: ['*.search.windows.net'] }
  ]
}]

Vibe/peer coding tips

When building this project, I leaned heavily on GitHub Copilot with Claude Opus 4.5 to help me get up and running quickly on areas where I’m not an expert, specifically with the frontend framework and navigating the different Python libraries. There are a couple of tips that help me go from idea to full project in a couple of days.

Knowing what you want and having some level of knowledge about the topic can help you get what you need in 1 or more shots. As I have some development background, being explicit on what I need in the frontend made it easier for the model to write the right code quicker. *Example: I want to have a modal tied to the onClick trigger for that button*
If you are asking it about something relatively new, like Azure Network Security Perimeter, I noticed that it provided wrong answers most of the time, that’s why I recommend for those situations to not rely heavily on LLMs
I either use the Ask mode or explicitly tell the model to lay out what it’s going to implement before making any changes. Then I can switch to Agent mode for execution. This makes me validate its approach and modify it before it touches any files.
Always do one change at a time, especially if it’s a big one. I usually break down asks into multiple turns to avoid confusing the model and messing up multiple files at the same time.
I didn’t know LiteLLM existed, so when I tried to ask it to suggest open-source libraries, it wasn’t one of them. So I found doing my due-diligence first by understanding the landscape and then choosing one to work with, makes things easier
I found it to hallucinate sometimes when I ask it to implement a specific feature of LiteLLM (which can be expected given there is not a lot of content out there on that). To overcome that, I found it really helpful that I find the right docs, share it and ask explicitly for what I want it to implement. That way, it knew exactly what to do and took minimum shots to get it right.
It’s very helpful to use it as a reviewer, after bringing everything together, I used it to review my Dockerfiles, Bicep code, Python API for security best practices, optimizations, unnecessary code blocks,…etc. This allows you to have additional eyes on your code, sometimes even different perspectives and paths

Conclusion

Building a production AI assistant for a portfolio doesn’t have to be expensive or insecure. By leveraging:

Serverless architecture with scale-to-zero
Managed identity and zero-trust networking
Cost-effective models open source models
Smart caching and rate limiting to prevent abuse

…you can create a professional, AI-powered experience for under $10/month.

You can view the code on Github for inspiration.

Have questions about this architecture? Ask my AI assistant! (Just kidding—there’s a rate limit 😉)

Share on:

Level Up your workflows with GitHub Copilot’s custom chat modes

GitHub Copilot has evolved far beyond just completing lines of code — …

by: Seif
Aug 09, 2025

CI/CD evaluation of Large Language Models using OpenEvals

Reliability of Large Language Models: Why continuous Evaluation …

by: Seif
Jun 29, 2025

Simplifying private deployment of Azure AI services using AVM

I recently worked with a couple of customers on designing an …

by: Seif
May 06, 2025

Building a portfolio website with an AI assistant

Building a Portfolio Website with an AI Assistant

Architecture

Component Overview

Key Trade-offs

Tech Stack

Infrastructure Layer

Frontend Layer

Backend Layer

Configuration & Setup

Azure Services Integration

Two-Stage AI Pipeline

Context Retrieval (RAG)

Streaming Responses

Security & Input Validation

API Endpoints

AI services

AI pipeline

RAG Architecture

Document Ingestion

Retrieval

Two-Stage Question Processing

Model Selection

Rate Limiting

Response Caching

Caching Trade-offs

Streaming Responses

System Prompt Design

Security

Zero-Trust Network Architecture

Authentication & Authorization

Container Security

Input Sanitization & Prompt Injection Protection

Edge Security via Cloudflare

Cost

Cost Optimization Strategies

Networking

Traffic Flow

Network Security Perimeter

Vibe/peer coding tips

Conclusion

Level Up your workflows with GitHub Copilot’s custom chat modes

CI/CD evaluation of Large Language Models using OpenEvals

Simplifying private deployment of Azure AI services using AVM

You May Also Like

Level Up your workflows with GitHub Copilot’s custom chat modes

CI/CD evaluation of Large Language Models using OpenEvals

Simplifying private deployment of Azure AI services using AVM