Frequently Asked Questions (FAQ)¶

This comprehensive FAQ addresses common issues and questions you might encounter while using DNALLM.

Table of Contents¶

Installation Issues
Training and Fine-tuning Issues
Model Loading and Inference Issues
Model-Specific Issues
Performance and Memory Issues
Task-Specific Issues
General Usage Questions
Troubleshooting Guides

Installation Issues¶

Q: `mamba-ssm` or `flash-attn` Installation Fails¶

Problem: These packages require specific versions of the CUDA toolkit and a C++ compiler, and compilation often fails.

Solution: - Ensure you have a compatible NVIDIA GPU and the correct CUDA toolkit version installed on your system. - Install the necessary build tools: conda install -c conda-forge gxx clang. - Try installing pre-compiled wheels if available for your system. Check the official repositories for mamba-ssm and flash-attention for installation instructions. - For Mamba, use the provided installation script: sh scripts/install_mamba.sh.

Q: `uv pip install` Fails Due to Network Issues¶

Problem: Your network may be blocking access to PyPI or GitHub.

Solution: Configure uv or pip to use a proxy or a mirror. For example, you can set environment variables:

export HTTP_PROXY="http://your.proxy.server:port"
export HTTPS_PROXY="http://your.proxy.server:port"

Q: `ImportError: ... not installed` for Specific Models¶

Error Messages: - ImportError: EVO-1 package is required... - ImportError: No module named 'mamba_ssm' - ImportError: No module named 'gpn' - ImportError: No module named 'ai2_olmo'

Solution: You must install the required dependencies for the specific model you are trying to use.

Example for Mamba:

uv pip install -e '.[mamba]' --no-cache-dir --no-build-isolation

Example for Evo-1:

uv pip install evo-model

Q: `flash_attn` Installation Fails¶

Error Message: HTTP Error 404: Not Found during pip install or compilation errors.

Cause: FlashAttention is highly specific to your Python, PyTorch, and CUDA versions. A pre-compiled wheel might not be available for your exact environment.

Solution: 1. Check Compatibility: Visit the FlashAttention GitHub releases and find a wheel that matches your environment. 2. Install Manually: Download the .whl file and install it directly:

uv pip install /path/to/flash_attn-2.5.8+cu122torch2.3-cp312-cp312-linux_x86_64.whl

3. Compile from Source: If no wheel is available, you may need to compile it from source, which requires having the CUDA toolkit and a C++ compiler installed. 4. Run without FlashAttention: Most models can run without FlashAttention by using a slower, "eager" attention mechanism. Performance will be reduced, but the model will still work.

Training and Fine-tuning Issues¶

Q: `CUDA out of memory` Error¶

Problem: Your model, data, and gradients are too large to fit in your GPU's VRAM.

Solution: This is the most common training error. Try these steps in order: 1. Enable Gradient Accumulation: In your config file, set training_args.gradient_accumulation_steps to a value like 4 or 8. This is the most effective solution. 2. Reduce Batch Size: Lower training_args.per_device_train_batch_size to 4, 2, or even 1. 3. Enable Mixed Precision: Set training_args.fp16: true. This halves the memory required for the model and activations. 4. Use an 8-bit Optimizer: Set training_args.optim: "adamw_8bit". This requires the bitsandbytes library. 5. Enable Gradient Checkpointing: Set training_args.gradient_checkpointing: true. This saves a lot of memory at the cost of slower training.

Q: Loss is `NaN` or Explodes¶

Problem: The training process is unstable. This can be caused by a learning rate that is too high, or numerical instability with FP16.

Solution: - Lower the Learning Rate: Decrease training_args.learning_rate by a factor of 10 (e.g., from 5e-5 to 5e-6). - Use a Learning Rate Scheduler: Ensure lr_scheduler_type is set to linear or cosine. - Use BF16 instead of FP16: If you have an Ampere-based GPU (A100, RTX 30xx/40xx) or newer, use bf16: true instead of fp16: true. Bfloat16 is more numerically stable.

Model Loading and Inference Issues¶

Q: `trust_remote_code=True` is Required¶

Problem: You are trying to load a model with a custom architecture (e.g., Hyena, Caduceus, Evo) that is not yet part of the main transformers library.

Solution: You must pass trust_remote_code=True when loading the model. This allows transformers to download and run the model's defining Python code from the Hugging Face Hub.

model, tokenizer = load_model_and_tokenizer(
    "togethercomputer/evo-1-131k-base",
    trust_remote_code=True
)

Q: `ValueError: Model ... not found locally.`¶

Cause: You specified source: "local" but the path provided in model_name is incorrect or does not point to a valid model directory.

Solution: - Double-check that the path in your configuration or code is correct. - Ensure the directory contains the necessary model files (e.g., pytorch_model.bin, config.json).

Q: `ValueError: Failed to load model: ...`¶

This is a general error that can have several causes.

Common Causes & Solutions: 1. Incorrect task_type: You are trying to load a model for a task it wasn't designed for without a proper configuration. - Fix: Ensure your task configuration in the YAML file is correct. For classification/regression, num_labels must be specified.

Corrupted Model Cache: The downloaded model files may be incomplete or corrupted.

Fix: Clear the cache and let DNALLM re-download the model.

from dnallm.models.model import clear_model_cache

# For models from Hugging Face
clear_model_cache(source="huggingface")

# For models from ModelScope
clear_model_cache(source="modelscope")

Network Issues: The model download failed due to an unstable connection.

Fix: Use a mirror by setting use_mirror=True.

model, tokenizer = load_model_and_tokenizer(
    "zhihan1996/DNABERT-2-117M",
    task_config=configs['task'],
    source="huggingface",
    use_mirror=True # This uses hf-mirror.com
)

Q: Tokenizer Mismatch or Poor Performance¶

Problem: You are using a model pre-trained on natural language (like the original LLaMA) directly on DNA sequences. The tokenizer doesn't understand DNA k-mers, leading to poor results.

Solution: Always use a model that has been specifically pre-trained or fine-tuned on DNA. These models, like DNABERT or GENERator, come with a tokenizer designed for DNA. Check the model card on Hugging Face to confirm it's intended for genomic data.

Model-Specific Issues¶

Q: EVO Model Installation and Usage¶

Problem: ImportError: EVO-1 package is required... or EVO2 package is required...

Solution: You have not installed the required package. Follow the installation steps:

EVO-1 Installation:

uv pip install evo-model

EVO-2 Installation:

# 1. Install the Transformer Engine from NVIDIA
uv pip install "transformer-engine[pytorch]==2.3.0" --no-build-isolation --no-cache-dir

# 2. Install the EVO-2 package
uv pip install evo2

# 3. (Optional but Recommended) Install Flash Attention for performance
uv pip install "flash_attn<=2.7.4.post1" --no-build-isolation --no-cache-dir

Q: CUDA Out-of-Memory with EVO-2¶

Cause: EVO-2 models, especially the larger ones, are very memory-intensive.

Solution: 1. Ensure you are using a GPU with sufficient VRAM (e.g., A100, H100). 2. Reduce the batch_size in your configuration to 1 if necessary. 3. If you are on a Hopper-series GPU (H100/H200), ensure FP8 is enabled, as DNALLM's EVO-2 handler attempts to use it automatically for efficiency.

Performance and Memory Issues¶

Q: `CUDA Out-of-Memory` During Inference¶

Cause: The model, data, and intermediate activations require more GPU VRAM than is available.

Solutions: - Primary: Reduce batch_size in your inference or training configuration. This is the most effective way to lower memory usage. - Secondary: Reduce max_length. The memory requirement for transformers scales quadratically with sequence length. - Use Half-Precision: Set use_fp16: true or use_bf16: true. This can nearly halve the model's memory footprint. - Disable Interpretability Features: For large-scale runs, ensure output_hidden_states and output_attentions are False.

Task-Specific Issues¶

Q: Model outputs unexpected scores or flat predictions¶

Cause: There is a mismatch between the model's architecture and the task it's being used for.

Solutions: - Check Model Type vs. Task: - For classification/regression, fine-tuned models are generally required. Using a base MLM/CLM model without fine-tuning will likely produce random or uniform predictions on a classification task. - For zero-shot mutation analysis, you should use a base MLM or CLM model with the appropriate task_type (mask or generation) to get meaningful likelihood scores. - Verify Tokenizer: Ensure the tokenizer is appropriate for the model. - Check max_length: If your sequences are being truncated too much, the model may not have enough information to make accurate predictions.

Q: `IndexError: Target out of bounds` during training/evaluation¶

Cause: The labels in your dataset do not match the num_labels specified in your task configuration. For example, your data has labels [0, 1, 2] but you set num_labels: 2.

Solution: - Verify num_labels: Ensure num_labels in your YAML configuration correctly reflects the number of unique classes in your dataset. - Check Label Encoding: Make sure your labels are encoded as integers starting from 0 (i.e., 0, 1, 2, ...). If your labels are strings or start from 1, they must be preprocessed correctly.

General Usage Questions¶

Q: How do I choose the right model for my task?¶

Answer: - For Classification Tasks: Choose BERT-based models (DNABERT, Plant DNABERT) - For Generation Tasks: Use CausalLM models (Plant DNAGPT, GenomeOcean) - For Large-scale Analysis: Consider Nucleotide Transformer or EVO models - For Plant-specific Tasks: Prefer Plant-prefixed models

See the Model Selection Guide for detailed guidance.

Q: What are the system requirements for DNALLM?¶

Answer: - Python: 3.10 or higher (Python 3.12 recommended) - GPU: NVIDIA GPU with at least 8GB VRAM recommended for optimal performance - Memory: 16GB RAM minimum, 32GB+ recommended for large models - Storage: At least 10GB free space for model downloads and cache

Q: How can I improve inference speed?¶

Answer: - Use smaller models for faster inference - Enable mixed precision (FP16/BF16) - Reduce sequence length when possible - Use batch processing for multiple sequences - Consider model quantization for deployment

See the Performance Optimization Guide for detailed tips.

Q: Where can I find example configurations?¶

Answer: Example configurations are available in the example/ directory of the DNALLM repository. You can also use the interactive configuration generator:

dnallm model-config-generator --output my_config.yaml

Troubleshooting Guides¶

For specific troubleshooting guides by topic:

Installation Troubleshooting - Common installation issues and solutions
Models Troubleshooting - Common model-related issues and solutions
Benchmark Troubleshooting - Common issues with model benchmarking
Fine-tuning Troubleshooting - Common issues with model fine-tuning
Data Processing Troubleshooting - Common issues with data preparation and processing
CLI Troubleshooting - Common issues with command-line interface
Inference Troubleshooting - Common issues with model inference
MCP Troubleshooting - Common issues with Model Context Protocol server

Still Need Help?¶

If you can't find the answer to your question in this FAQ:

Check the Documentation: Browse the User Guide for detailed tutorials and guides
Search Issues: Look through existing GitHub Issues
Create an Issue: If your problem isn't documented, create a new issue with:
A clear description of the problem
Steps to reproduce the issue
Your system information (OS, Python version, CUDA version)
Relevant error messages and logs
Join Discussions: Participate in community discussions on GitHub

This FAQ is regularly updated. If you find a solution that's not documented here, please consider contributing to help other users.