Domain-Specific Code Generation System
Problem Background: Network studies in clinical research enable standardized analysis across multiple institutions using common data models such as OMOP CDM, ensuring reproducibility and privacy-preserving analytics. However, defining such studies typically requires expertise in R programming and domain-specific analytical constructs, creating a barrier for non-technical researchers.
Proposed Approach: The OHDSI Strategus framework enables standardized study representation through JSON-based specifications, improving reproducibility and portability. However, constructing these specifications still requires significant technical expertise in R and an understanding of complex study design components.
This project introduces a natural language driven interface that generates Strategus-compatible study specifications by augmenting LLMs with domain-specific context and tools via Model Context Protocol (MCP). The system improves accessibility while preserving structural and methodological correctness.
Key Features:
- Designed an MCP-based tool orchestration layer to inject structured domain context, study templates, and validation rules into LLM-driven generation workflows
- Enabled generation of text-to-R analytical workflows for standardized clinical datasets, reducing dependency on manual R-based specification writing
-
Building a Python-based pipeline that transforms natural language inputs into validated, structured
JSONstudy specifications compatible with Strategus execution frameworks