> For the complete documentation index, see [llms.txt](https://pmse.gitbook.io/pmse-dhdk/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://pmse.gitbook.io/pmse-dhdk/5.-reference-documentation-for-further-development/reference-doc.md).

# Reference Doc

### 1. **Setting Up the Environment**

* **Install Required Libraries**: Install all the Python libraries mentioned in the [Requirements](/pmse-dhdk/2.-software-requirement-specification/2.6-requirements-list.md), such as `langchain`, `streamlit`, `pandas`, `faiss`, `dotenv`, `mistralai`.
* **Create a Virtual Environment**: Use tools like `venv` or `conda` to manage dependencies and ensure a clean setup.
* **Environment Variables**: Create a`.env` file to securely store sensitive API keys, like `MISTRAL_API_KEY`.

### **2. Data Preparation**

* **Dataset Loading**:
  * Ensure the dataset (`gna_kg_dataset.csv`) exists in the `./data` directory.
  * Validate the dataset contains the required columns: `body`, `title`, `description`, and `url`.
  * Handle missing columns and errors gracefully.
* **Chunk Creation**:
  * Use `RecursiveCharacterTextSplitter` to split the dataset into manageable chunks.
  * Validate that chunks are created correctly and ensure no empty or malformed chunks are processed.

### **3. Vector Store Management**

* **Create or Load Vector Store**:
  * Check if a FAISS-based vector store exists in the `./db` directory.
  * If not, generate embeddings using `MistralAIEmbeddings` and create the vector store.
  * Save the vector store locally for future use.
  * Handle edge cases, such as empty documents or failed embedding generation.

### **4. Language Model Integration**

* **Initialize Mistral LLM**:
  * Set up the `ChatMistralAI` model with appropriate parameters (e.g., `temperature`, `max_retries`).
  * Ensure the model can handle text generation tasks effectively.
* **Configure Conversation Chain**:
  * Integrate the LLM with a conversational retrieval chain.
  * Use `ConversationBufferMemory` to manage chat history.
  * Link the chain to the vector store by storing it in a variable (e.g., name it`memory`) for efficient information retrieval.

### **5. Error Handling and Retries**

* **API Retry Logic**:
  * Implement retry logic for API calls to handle errors such as rate limits (HTTP 429).
  * Log errors and ensure retries are spaced with appropriate delays.
* **Graceful Failure**:
  * Provide user-friendly error messages in case of issues with API calls or data processing.

### **6. Streamlit User Interface**

* **Setup Streamlit Application**:
  * Configure the Streamlit page with a title, caption statement, icon, image, and layout settings.
* **Chatbot Interface**:
  * Implement a chat interface to handle user input and display chatbot responses.
  * Use markdown with custom styles to differentiate between user and chatbot assistant messages.
* **Session State Management**:
  * Store conversation history in `st.session_state` to maintain state across interactions.

### **7. Testing and Debugging**

* **Unit Tests**:
  * Test `get_dataset.py` script for dataset creation.
  * Test each individual function in the `main.py` script.
* **End-to-End Testing**:
  * Validate the entire workflow from dataset loading to chatbot UI interaction.
* **Debugging Tools**:
  * Add logging and print statements to identify and resolve issues during development.

***

## Future Development and System Expansion

### **Deployment**

* **Host the Application**: Deploy the Streamlit application on platforms like Streamlit Cloud, AWS, or Heroku.
* **Secure Deployment**: Ensure environment variables are securely stored and accessed during deployment.

### **Documentation**

* **User Guide**: Provide a clear guide on how to run the application, including installation steps and usage instructions.
* **Code Comments**: Add detailed comments to explain each function and block of code.

### **Maintenance**

* **Monitoring**: Monitor application performance and address issues reported by users.
* **Updates**: Regularly update dependencies and models to maintain compatibility and improve performance.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://pmse.gitbook.io/pmse-dhdk/5.-reference-documentation-for-further-development/reference-doc.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
