Getting Started with Benchmarking¶
This guide will walk you through the basics of benchmarking DNA language models using DNALLM. You'll learn how to set up your first benchmark, configure models and datasets, and interpret results.
Overview¶
Benchmarking in DNALLM allows you to: - Compare multiple DNA language models on the same tasks - Evaluate performance across different datasets - Measure accuracy, speed, and resource usage - Generate comprehensive performance reports
Prerequisites¶
Ensure you have the following installed and configured:
# Install DNALLM
pip install dnallm
# Or with uv (recommended)
uv pip install dnallm
Basic Setup¶
1. Import Required Modules¶
from dnallm import load_config, Benchmark
from dnallm.inference import load_model_and_tokenizer
from dnallm.datahandling import DNADataset
2. Create a Simple Configuration¶
Create a benchmark_config.yaml
file:
# benchmark_config.yaml
benchmark:
name: "My First Benchmark"
description: "Comparing DNA models on promoter prediction"
models:
- name: "Plant DNABERT"
path: "zhangtaolab/plant-dnabert-BPE"
source: "huggingface"
task_type: "classification"
- name: "Plant DNAGPT"
path: "zhangtaolab/plant-dnagpt-BPE"
source: "huggingface"
task_type: "generation"
datasets:
- name: "promoter_data"
path: "path/to/your/data.csv"
task: "binary_classification"
text_column: "sequence"
label_column: "label"
metrics:
- "accuracy"
- "f1_score"
- "precision"
- "recall"
evaluation:
batch_size: 16
max_length: 512
device: "cuda" # or "cpu"
output:
format: "html"
path: "benchmark_results"
3. Load Your Data¶
# Load your dataset
dataset = DNADataset.load_local_data(
"path/to/your/data.csv",
seq_col="sequence",
label_col="label",
max_length=512
)
# Split if needed
if not dataset.is_split:
dataset.split_data(test_size=0.2, val_size=0.1)
4. Run the Benchmark¶
# Load configuration
config = load_config("benchmark_config.yaml")
# Initialize benchmark
benchmark = Benchmark(config=config)
# Run benchmark
results = benchmark.run()
# Display results
print("Benchmark Results:")
print("=" * 50)
for model_name, model_results in results.items():
print(f"\n{model_name}:")
for dataset_name, metrics in model_results.items():
print(f" {dataset_name}:")
for metric, value in metrics.items():
print(f" {metric}: {value:.4f}")
Command Line Interface¶
DNALLM also provides a convenient command-line interface:
# Basic benchmark run
dnallm-benchmark --config benchmark_config.yaml
# Generate detailed report
dnallm-benchmark --config config.yaml --output report.html
# Run with custom parameters
dnallm-benchmark --config config.yaml --batch-size 32 --device cuda
Understanding Results¶
Basic Metrics¶
Metric | Description | Range | Best Value |
---|---|---|---|
Accuracy | Correct predictions / Total predictions | 0.0 - 1.0 | 1.0 |
F1 Score | Harmonic mean of precision and recall | 0.0 - 1.0 | 1.0 |
Precision | True positives / (True positives + False positives) | 0.0 - 1.0 | 1.0 |
Recall | True positives / (True positives + False negatives) | 0.0 - 1.0 | 1.0 |
Performance Metrics¶
Metric | Description | Unit |
---|---|---|
Inference Time | Time to process one batch | seconds |
Memory Usage | GPU/RAM memory consumption | MB/GB |
Throughput | Samples processed per second | samples/sec |
Example: Complete Benchmark¶
Here's a complete working example:
import os
from dnallm import load_config, Benchmark
from dnallm.datahandling import DNADataset
# 1. Prepare your data
data_path = "path/to/your/dna_sequences.csv"
if not os.path.exists(data_path):
print("Please provide a valid data path")
exit()
# 2. Load and prepare dataset
dataset = DNADataset.load_local_data(
data_path,
seq_col="sequence",
label_col="label",
max_length=512
)
# 3. Create configuration
config = {
"benchmark": {
"name": "DNA Model Comparison",
"models": [
{
"name": "Plant DNABERT",
"path": "zhangtaolab/plant-dnabert-BPE",
"source": "huggingface",
"task_type": "classification"
},
{
"name": "Plant DNAGPT",
"path": "zhangtaolab/plant-dnagpt-BPE",
"source": "huggingface",
"task_type": "generation"
}
],
"datasets": [dataset],
"metrics": ["accuracy", "f1_score", "precision", "recall"],
"evaluation": {
"batch_size": 16,
"max_length": 512,
"device": "cuda"
},
"output": {
"format": "html",
"path": "my_benchmark_results"
}
}
}
# 4. Run benchmark
benchmark = Benchmark(config=config)
results = benchmark.run()
# 5. Generate report
benchmark.generate_report(
output_path="my_benchmark_results",
format="html",
include_predictions=True
)
print("Benchmark completed! Check 'my_benchmark_results' folder for results.")
Data Format Requirements¶
Your dataset should be in one of these formats:
CSV/TSV Format¶
sequence,label
ATCGATCGATCG,1
GCTAGCTAGCTA,0
TATATATATATA,1
JSON Format¶
[
{"sequence": "ATCGATCGATCG", "label": 1},
{"sequence": "GCTAGCTAGCTA", "label": 0}
]
FASTA Format¶
>sequence1|label:1
ATCGATCGATCG
>sequence2|label:0
GCTAGCTAGCTA
Common Tasks¶
Binary Classification¶
task: "binary_classification"
num_labels: 2
label_names: ["Negative", "Positive"]
threshold: 0.5
Multi-class Classification¶
task: "multiclass"
num_labels: 4
label_names: ["Class_A", "Class_B", "Class_C", "Class_D"]
Regression¶
task: "regression"
num_labels: 1
Next Steps¶
After completing this basic tutorial:
- Explore Advanced Features: Learn about cross-validation and custom metrics
- Optimize Performance: Discover performance profiling techniques
- Customize Output: Learn about advanced configuration options
- Real-world Examples: See practical use cases
Troubleshooting¶
Common Issues¶
"Model not found" error
# Check if model exists on Hugging Face
# Visit: https://huggingface.co/models?search=dna
Memory errors
# Reduce batch size in config
evaluation:
batch_size: 8 # Reduced from 16
Slow performance
# Enable mixed precision
evaluation:
use_fp16: true
Additional Resources¶
- Configuration Guide - Detailed configuration options
- Advanced Techniques - Cross-validation and custom metrics
- Examples and Use Cases - Real-world scenarios
- Troubleshooting - Common problems and solutions
Ready for more? Continue to Advanced Techniques to learn about cross-validation, custom metrics, and performance profiling.