> For the complete documentation index, see [llms.txt](https://pmse.gitbook.io/pmse-dhdk/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://pmse.gitbook.io/pmse-dhdk/2.-software-requirement-specification/2.4-system-architecture.md).

# 2.4	System Architecture

The system architecture integrates **web scraping**, **knowledge base creation**, **vector store management**, and a **conversational AI pipeline**. Data is sourced from the [MediaWiki](https://gna.cultura.gov.it/wiki/index.php/Pagina_principale) **web platform** of **GNA's User Manual** and converted into structured formats like **XML** and **CSV** using tools such as **BeautifulSoup** and **Pandas**. This structured data is then transformed into a **knowledge base** comprising **text chunks** and **embeddings**. The embeddings are stored in a **FAISS vector store** for efficient retrieval.

An **embedding model** provided by **Mistral** generates the embeddings, which are used for querying. User interactions occur through a **Streamlit interface**, which communicates with a **conversational retrieval chain** powered by **LangChain**. This chain combines embeddings from the vector store with **Mistral's language model NeMo** to generate responses. **Chat history** is managed using **LangChain's memory capabilities**, ensuring continuity in conversations. The overall architecture emphasizes seamless **data flow** from source to **conversational output**.

It is important to highlight that while the overall application architecture has been designed to establish a comprehensive framework, the detailed system architecture presented here specifically pertains to the prototype version of the chatbot. Focusing on the **prototype’s architecture** allows for a more iterative development approach, enabling validation of key functionalities before scaling up to a full deployment. This methodology ensures that core components - such as **retrieval-augmented generation (RAG) mechanisms, knowledge base integrations, and response generation modelling -** are tested and refined early in the process. Moreover, by detailing the prototype architecture separately, developers can **identify potential bottlenecks, optimize performance**, and **incorporate feedbacks** before transitioning to a more robust production system.

{% content-ref url="/pages/p6DHz9YUGuYYxFdFSCV0" %}
[2.4.1 General application diagram](/pmse-dhdk/2.-software-requirement-specification/2.4-system-architecture/2.4.1-general-application-diagram.md)
{% endcontent-ref %}

{% content-ref url="/pages/hDPVE8pUmzryyfFk5Kpc" %}
[2.4.2 System architecture diagram](/pmse-dhdk/2.-software-requirement-specification/2.4-system-architecture/2.4.2-system-architecture-diagram.md)
{% endcontent-ref %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://pmse.gitbook.io/pmse-dhdk/2.-software-requirement-specification/2.4-system-architecture.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.