I build and share production-grade AI systems — from research to rollout.

Documenting the projects, experiments, and lessons that help turn research ideas into reliable applications.

Portrait of Vitor Sousa

About

Hello, I'm Vitor Sousa

I'm a Data Scientist & AI Engineer building production-grade AI systems at Wellhub. My work spans the full machine learning lifecycle — from research and experimentation to deploying robust applications at scale. I specialize in large language models and intelligent agents, bringing cutting-edge ML research (LLMs, fine-tuning, RAG, reinforcement learning) into real-world use.

I created this site to share what I'm working on. Here you'll find the projects I've built and the articles (or "essays") I've written about machine learning, data science, and software engineering. The aim is to document experiments, insights, and lessons learned — bridging research and practice — rather than to craft a glossy self-promotional page.

Current Work

Right now I'm focused on a few active projects that connect research ideas with production requirements.

  • LLM agents

    Building production-ready agents with tools, memory, and planning using frameworks like LangGraph and CrewAI.

  • Fine-tuning & alignment

    Designing efficient LoRA/QLoRA pipelines plus alignment techniques such as DPO and RLHF to shape model behaviour.

  • RAG systems

    Optimising retrieval-augmented generation with smart chunking, hybrid search, and reranking for sharper answers.

  • Personalisation

    Developing contextual bandit algorithms for adaptive content personalisation and recommendations.

  • Evaluation frameworks

    Combining automated metrics, human review, and A/B tests to monitor model performance in production.

Research interests

I'm continuously exploring new ideas in AI. A few areas currently on my mind:

  • Prompt optimisation

    Automating prompt engineering (for example with DSPy) to systematically improve how we instruct LLMs.

  • RLHF & alignment

    Using reinforcement learning from human feedback to align models with human preferences and safety guardrails.

  • Multi-agent systems

    Exploring how multiple AI agents coordinate, collaborate, and reason together on complex tasks.

  • Hallucination mitigation

    Reducing model hallucinations with retrieval, fact-checking, and verification loops.

  • Efficient serving

    Designing low-latency, cost-effective serving architectures for deploying AI at scale.

I stay close to research from OpenAI, DeepMind, Anthropic, and voices I admire like Eugene Yan, Sebastian Raschka, Andrej Karpathy, and Chip Huyen.

Reading & learning

I keep a steady rotation of books, papers, and hobbies to broaden my perspective.

  • Hands-On Large Language Models

    Working through practical patterns for shipping LLM applications.

  • Reinforcement Learning (Sutton & Barto)

    Revisiting the fundamentals of reinforcement learning theory.

  • Agent research papers

    Diving into recent publications on agent architectures and advanced ML systems.

  • "Four Thousand Weeks"

    Taking a non-technical pause to think about time, focus, and sustainable pace.

  • Strategic hobbies

    Learning chess and playing football — both sharpen strategic thinking and keep the work balanced.

Tech Stack

My day-to-day toolkit spans languages, frameworks, and infrastructure for training, evaluating, and shipping ML systems.

Writing

Beyond the Vibe Check: A Systematic Approach to LLM Evaluation

Beyond the Vibe Check: A Systematic Approach to LLM Evaluation

Drafting a practical playbook for building trustworthy LLM evaluation pipelines that go beyond surface-level vibes.

1 min read

Read article
View all articles

Projects

Explore full portfolio