- Abstract:
Single-Cell RNA Sequencing (scRNA-seq) technology enables gene expression analysis at individual cell resolution, revealing cellular heterogeneity and developmental pathways, but generates massive, complex datasets that pose significant bioinformatics challenges for processing, storage, and analysis. This master's thesis presents an integrated platform that addresses these challenges by combining automated data processing through Nextflow workflows with advanced search and analysis systems, including centralized metadata storage in optimized relational databases and intelligent data searching using both traditional indexing and artificial intelligence approaches.
The system's key innovation is a RAG (Retrieval-Augmented Generation) architecture that enables natural language-based contextual searching and semantic analysis of scientific studies, integrated with both a Python API and an intuitive web interface for researchers. Designed for scalability, high performance, and data security with support for distributed computing, the platform significantly improves accessibility and usability of single-cell RNA sequencing data, accelerating research in functional genomics, developmental biology, biomedicine, and personalized medicine while providing practical value to the bioinformatics community.
- Keywords: Automation in Bioinformatics, Genomic Bioinformatics, Single-Cell RNA Sequencing, Nextflow, Python API, RAG Search System, scRNA-seq, Genomic Data Management, Artificial Intelligence in Biology
- Published on website: 2025-07-01
- Attached files: kcesarevic.pdf
Ksenija Cesarevic