Beyond Static Documents:
The Multimodal Shift
Transitioning from simple prompt-response interactions toward sophisticated, multi-agent "Think → Plan → Act" workflows. Automate the conversion of research articles into narrated, pedagogical presentations without manual fine-tuning.
Logic Layer
Mistral 7B
Parsing
LayoutLMv3
Audio Synthesis
Suno Bark
Strategic Advantage
-
Modularity
8 specialized agents provide "best-in-class" processing at every stage.
-
Grounded Reasoning
FAISS-based Vector DB eliminates hallucinations by anchoring content to PDF text.
-
Scalable Efficiency
Inference-only workflow requires no custom fine-tuning or specialized training.
See It In Action
Watch a live demonstration of how the pipeline transforms research documents into narrated PowerPoint presentations.
Watch on YouTube
Click to view the full demo video
The "Think → Plan → Act" Architecture
Unlike simple OCR, this pipeline employs a cognitive loop. It deconstructs the document, plans a pedagogical narrative, and then synthesizes media.
🧠 THINK
📝 PLAN
⚡ ACT
Inside the Pipeline: 8 Specialized Nodes
Explore how each agent contributes to the transformation from raw pixels to pedagogical narration. Click a step to view the logic.
Technical Performance Insights
Data visualization highlighting the relative processing loads and semantic breakdown of the pipeline.
Process Load Distribution
Layout Awareness (Agent 1)
Preserving "bounding boxes" ensures the system respects original hierarchy—distinguishing primary headers from footnotes to maintain context.
Inference-Only Scaling
No custom RAG retraining required. The system pivots from academic texts to medical journals instantly using Mistral 7B's base reasoning capabilities.
Pedagogical Constraints
Visuals follow the 5/12 rule (Max 5 bullets, Max 12 words) while the Script Agent adopting an "Academic Lecturer" persona provides the depth.
Pipeline Execution Results
The culmination of the 8-agent effort results in a standardized structure within the output/ directory.
output/slides/
The finalized .pptx file with text, layout, and linked narration audio ready for delivery.
output/audio/
Individual .wav and .mp4 files. Optimized via Dynamic Range Compression to prevent clipping.
output/metadata/
Execution JSON logs tracking agent dependencies and grounding IDs for full auditability.
$ python "Agentic Systems/Master Orchestrator Agent/master_agent.py"
[*] Initializing Agentic Pipeline...
[*] Agent 1: Parsing PDF Layout [DONE]
[*] Agent 3: Vector Indexing Chunks [DONE]
[*] Agent 7: Synthesizing Audio with Suno Bark (Temp 0.7) [IN PROGRESS]
|############################----------| 75%
The Team
Mahmoud Alyosify
LinkedIn ProfileMirna Embaby
LinkedIn Profile