Installation¶
DNALLM is a comprehensive, open-source toolkit designed for fine-tuning and inference with DNA Language Models. This guide will help you install DNALLM and its dependencies.
Prerequisites¶
- Python 3.10 or higher (Python 3.12 recommended)
- Git
- CUDA-compatible GPU (optional, for GPU acceleration)
- Environment Manager: Choose one of the following:
- Python venv (built-in)
- Conda/Miniconda (recommended for scientific computing)
Quick Installation with uv (Recommended)¶
DNALLM uses uv for dependency management and packaging.
What is uv is a fast Python package manager that is 10-100x faster than traditional tools like pip.
Method 1: Using venv + uv¶
# Clone repository
git clone https://github.com/zhangtaolab/DNALLM.git
cd DNALLM
# Create virtual environment
python -m venv .venv
# Activate virtual environment
source .venv/bin/activate # Linux/MacOS
# or
.venv\Scripts\activate # Windows
# Upgrade pip (recommended)
pip install --upgrade pip
# Install uv in virtual environment
pip install uv
# Install DNALLM with base dependencies
uv pip install -e '.[base]'
# Verify installation
python -c "import dnallm; print('DNALLM installed successfully!')"
Method 2: Using conda + uv¶
# Clone repository
git clone https://github.com/zhangtaolab/DNALLM.git
cd DNALLM
# Create conda environment
conda create -n dnallm python=3.12 -y
# Activate conda environment
conda activate dnallm
# Install uv in conda environment
conda install uv -c conda-forge
# Install DNALLM with base dependencies
uv pip install -e '.[base]'
# Verify installation
python -c "import dnallm; print('DNALLM installed successfully!')"
GPU Support¶
For GPU acceleration, install the appropriate CUDA version:
# For venv users: activate virtual environment
source .venv/bin/activate # Linux/MacOS
# or
.venv\Scripts\activate # Windows
# For conda users: activate conda environment
# conda activate dnallm
# CUDA 12.4 (recommended for recent GPUs)
uv pip install -e '.[cuda124]'
# Other supported versions: cpu, cuda121, cuda126, cuda128
uv pip install -e '.[cuda121]'
Dependency Groups¶
DNALLM provides multiple dependency groups for different use cases:
Core Dependency Groups¶
| Dependency Group | Purpose | When to Use |
|---|---|---|
| base | Development tools + ML libraries | Recommended for most users |
| dev | Complete development environment | For contributors |
| test | Testing environment only | For running tests |
| notebook | Jupyter and Marimo support | For interactive notebooks |
| docs | Documentation building | For building docs |
| mcp | MCP server support | For MCP deployment |
Note: Core ML libraries (torch, transformers, datasets, peft, accelerate) are installed automatically as main dependencies. The groups above add additional functionality.
Hardware-Specific Groups¶
| Dependency Group | PyTorch Version | GPU Type | When to Use |
|---|---|---|---|
| cpu | 2.4.0-2.7 | CPU only | Development without GPU |
| cuda121 | 2.2.0-2.7 | NVIDIA (older) | Volta/Turing/Ampere early |
| cuda124 | 2.4.0-2.7 | NVIDIA (recommended) | Most modern GPUs |
| cuda126 | 2.6.0-2.7 | NVIDIA (latest) | Ada/Hopper with Flash Attention |
| cuda128 | 2.7.0+ | NVIDIA (cutting-edge) | Latest hardware |
| rocm | 2.5.0-2.7 | AMD GPUs | AMD GPU users |
| mamba | 2.4.0-2.7 | NVIDIA + Mamba | For Mamba architecture models |
Installation Scenarios¶
Scenario 1: CPU-only Development¶
For development and testing without GPU acceleration:
# Create environment
conda create -n dnallm-cpu python=3.12 -y
conda activate dnallm-cpu
# Install base dependencies and CPU version
uv pip install -e '.[base,cpu]'
# Verify installation
python -c "import dnallm; print('DNALLM installed successfully!')"
Scenario 2: Using NVIDIA GPU for Training and Inference¶
For GPU-accelerated training and inference:
# Determine CUDA version
nvidia-smi
# Create environment (using CUDA 12.4 as example)
conda create -n dnallm-gpu python=3.12 -y
conda activate dnallm-gpu
# Install base dependencies and CUDA 12.4 support
uv pip install -e '.[base,cuda124]'
# Verify installation
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}')"
Scenario 3: Using Mamba Model Architecture¶
For models with Mamba architecture (Plant DNAMamba, Caduceus, Jamba-DNA):
# Create environment
conda create -n dnallm-mamba python=3.12 -y
conda activate dnallm-mamba
# Install base dependencies
uv pip install -e '.[base]'
# Install Mamba support (requires GPU)
uv pip install -e '.[cuda124,mamba]' --no-cache-dir --no-build-isolation
# Verify installation
python -c "from mambapy import Mamba; print('Mamba installed successfully!')"
Scenario 4: Complete Development Environment¶
For contributors and developers:
# Create environment
conda create -n dnallm-dev python=3.12 -y
conda activate dnallm-dev
# Install complete development dependencies
uv pip install -e '.[dev,notebook,docs,mcp,cuda124]'
# Verify installation
python -c "
import dnallm
import torch
print('DNALLM:', dnallm.__version__)
print('PyTorch:', torch.__version__)
print('CUDA:', torch.version.cuda if torch.cuda.is_available() else 'CPU')
"
Scenario 5: Running MCP Server Only¶
For MCP server deployment:
# Create environment
conda create -n dnallm-mcp python=3.12 -y
conda activate dnallm-mcp
# Install MCP-related dependencies
uv pip install -e '.[base,mcp,cuda124]'
# Verify installation
python -c "from dnallm.mcp import server; print('MCP server dependencies installed!')"
Verification¶
Basic Verification¶
# Verify DNALLM import
python -c "import dnallm; print(f'DNALLM version: {dnallm.__version__}')"
# Verify core modules
python -c "
from dnallm import load_config, load_model_and_tokenizer
from dnallm.datahandling import DNADataset
from dnallm.finetune import DNATrainer
from dnallm.inference import DNAInference
print('All core modules imported successfully!')
"
Hardware Verification¶
# Verify PyTorch and CUDA
python -c "
import torch
print(f'PyTorch version: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
print(f'CUDA version: {torch.version.cuda}')
print(f'GPU: {torch.cuda.get_device_name(0)}')
print(f'Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB')
"
# Verify Mamba (if installed)
python -c "
try:
from mambapy import Mamba
print('Mamba: Available')
except ImportError:
print('Mamba: Not installed')
"
Troubleshooting¶
CUDA Version Mismatch¶
Issue: Installed PyTorch CUDA version doesn't match system CUDA version
Solution:
# 1. Check system CUDA version
nvidia-smi
nvcc --version
# 2. Uninstall installed torch
uv pip uninstall torch torchvision torchaudio
# 3. Reinstall matching version
uv pip install -e '.[cuda121]' # Choose based on actual situation
Mamba Installation Failure¶
Issue: mamba-ssm or causal_conv1d installation fails
Solution:
# 1. Install compilation dependencies
conda install -c conda-forge gxx clang ninja
# 2. Clear cache and reinstall
rm -rf .venv/lib/python*/site-packages/mamba_ssm*
rm -rf .venv/lib/python*/site-packages/causal_conv1d*
uv pip install -e '.[mamba]' --no-cache-dir --no-build-isolation
# 3. Or use installation script
sh scripts/install_mamba.sh
Dependency Conflicts¶
Issue: Dependency conflicts during installation
Solution:
# 1. Create new environment
conda create -n dnallm-new python=3.12 -y
conda activate dnallm-new
# 2. Use uv to resolve dependencies
uv pip install -e '.[base]' --resolution=lowest
Native Mamba Support¶
Native Mamba architecture runs significantly faster than transformer-compatible Mamba architecture, but native Mamba depends on Nvidia GPUs.
If you need native Mamba architecture support, after installing DNALLM dependencies, use the following command:
# For venv users: activate virtual environment
source .venv/bin/activate # Linux/MacOS
# For conda users: activate conda environment
# conda activate dnallm
# Install Mamba support
uv pip install -e '.[mamba]' --no-cache-dir --no-build-isolation
# If encounter network or compile issue, using the special install script for mamba (optional)
sh scripts/install_mamba.sh # select github proxy
Please ensure your machine can connect to GitHub, otherwise Mamba dependencies may fail to download.
Additional Model Dependencies¶
Specialized Model Dependencies¶
Some models use their own developed model architectures that haven't been integrated into HuggingFace's transformers library yet. Therefore, fine-tuning and inference for these models require pre-installing the corresponding model dependency libraries:
EVO2¶
EVO2 model fine-tuning and inference depends on its own software package or third-party Python library1/library2:
# evo2 requires python version >=3.11
# Install transformer torch engine
uv pip install "transformer-engine[pytorch]==2.3.0" --no-build-isolation --no-cache-dir
# Install evo2
uv pip install evo2
# (Optional) Install flash attention 2
uv pip install "flash_attn<=2.7.4.post1" --no-build-isolation --no-cache-dir
## Note that build transformer-engine and flash-attn package will cost much time.
# add cudnn path to environment
export LD_LIBRARY_PATH=[path_to_DNALLM]/.venv/lib64/python3.11/site-packages/nvidia/cudnn/lib:${LD_LIBRARY_PATH}
EVO-1¶
# Install evo-1 model
uv pip install evo-model
# (Optional) Install flash attention
uv pip install "flash_attn<=2.7.4.post1" --no-build-isolation --no-cache-dir
GPN¶
Project address: https://github.com/songlab-cal/gpn
uv pip install git+https://github.com/songlab-cal/gpn.git
megaDNA¶
Note that megaDNA weights stored at the Hugging Face can be accessed after requesting permission from the author.
Project address: https://github.com/lingxusb/megaDNA
git clone https://github.com/lingxusb/megaDNA
cd megaDNA
uv pip install .
LucaOne¶
Project address: https://github.com/LucaOne/LucaOneTasks
uv pip install lucagplm
Omni-DNA¶
Project address: https://huggingface.co/zehui127/Omni-DNA-20M
uv pip install ai2-olmo
Enformer¶
Project address: https://github.com/lucidrains/enformer-pytorch
uv pip install enformer-pytorch
Borzoi¶
Project address: https://github.com/johahi/borzoi-pytorch
uv pip install borzoi-pytorch
Some models require support from other dependencies. We will continue to add dependencies requirement for different models.
Flash Attention Support¶
Some models support Flash Attention acceleration. If you need to install this dependency, you can refer to the project GitHub for installation. Note that flash-attn versions are tied to different Python versions, PyTorch versions, and CUDA versions. Please first check if there are matching version installation packages in GitHub Releases, otherwise you may encounter HTTP Error 404: Not Found errors.
uv pip install flash-attn --no-build-isolation --no-cache-dir
Compilation Dependencies¶
If compilation is required during installation and compilation errors occur, please first install the dependencies that may be needed. We recommend using conda to install dependencies.
conda install -c conda-forge gxx clang
Verify Installation¶
Check if installation was successful:
# Test basic functionality
python -c "import dnallm; print('DNALLM installed successfully!')"
# Run comprehensive tests
sh tests/test_all.sh