Model Guides¶
This page provides access to comprehensive guides for different DNA language model architectures and their usage with DNALLM.
Model Architecture Guides¶
Core Architectures¶
- BERT Models: DNABERT, DNABERT-2, and BERT-based models for DNA sequence analysis
- Caduceus Models: Caduceus-Ph, Caduceus-Ps, and PlantCaduceus models
- ESM Models: Nucleotide Transformer and ESM-based models
- Hyena Models: HyenaDNA and Hyena-based architectures
- Llama Models: GENERator, OmniNA, and Llama-based models
Specialized Architectures¶
- EVO Models: EVO-1 and EVO-2 models for ultra-long sequence modeling
- Mamba Models: Mamba-based models for efficient sequence processing
- Flash Attention Models: Models optimized with Flash Attention
- Special Models: Other specialized model architectures
Model Resources¶
Selection and Troubleshooting¶
- Model Selection Guide: Choose the right model for your specific task
- Model Troubleshooting: Common issues and solutions for model usage
- Model Zoo: Complete list of supported models and their capabilities
Quick Reference¶
By Task Type¶
| Task Type | Recommended Models | Guide |
|---|---|---|
| Classification | DNABERT, Plant DNABERT | BERT Models |
| Generation | Plant DNAGPT, GenomeOcean | Llama Models |
| Long Sequences | EVO-1, EVO-2 | EVO Models |
| Efficient Processing | DNAMamba, Mamba variants | Mamba Models |
| Plant-specific | Plant DNABERT, PlantCaduceus | Plant Models |
By Model Size¶
| Size Category | Examples | Use Case |
|---|---|---|
| Small (<100M) | Caduceus-Ph, HyenaDNA | Fast inference, real-time applications |
| Medium (100M-1B) | DNABERT, Plant models | Balanced performance and speed |
| Large (1B-10B) | Nucleotide Transformer, EVO-1 | High accuracy, complex tasks |
| Extra Large (>10B) | EVO-2 (40B) | State-of-the-art performance |
Getting Started¶
Basic Model Loading¶
from dnallm import load_model_and_tokenizer
# Load a DNA-specific model
model, tokenizer = load_model_and_tokenizer(
"zhangtaolab/plant-dnabert-BPE",
source="huggingface"
)
Model Selection Tips¶
- For Classification Tasks: Choose BERT-based models (DNABERT, Plant DNABERT)
- For Generation Tasks: Use CausalLM models (Plant DNAGPT, GenomeOcean)
- For Large-scale Analysis: Consider Nucleotide Transformer or EVO models
- For Plant-specific Tasks: Prefer Plant-prefixed models
Related Resources¶
- Installation Guide: Set up your environment
- Quick Start: Get started with DNALLM
- Performance Optimization: Optimize model performance
- Fine-tuning Guide: Train models on your data
- Inference Guide: Use models for predictions
For detailed information about specific model architectures and their usage, please refer to the individual model guides in the Resources section.