CLI Usage Guide¶

Overview¶

DNALLM provides two ways to use the command-line interface: 1. After package installation: Using the dnallm-* commands 2. Development environment: Running directly from the project root

Usage After Installation¶

After installing the DNALLM package, you can use the following commands:

# Training
dnallm-train --config config.yaml
dnallm-train --model model_name --data data_path --output output_dir

# Run inference
dnallm-inference --config config.yaml
dnallm-inference --model model_name --input input_file

# Generate configuration files
dnallm-model-config-generator --output config.yaml
dnallm-model-config-generator --preview

# MCP server
dnallm-mcp-server --config config.yaml

Development Environment Usage¶

In the project root directory, you can use the following methods:

1. Using the Launcher Script¶

# Main CLI
python run_cli.py --help

# Training
python run_cli.py train --config config.yaml

# Inference
python run_cli.py inference --config config.yaml

# Generate configuration
python run_cli.py model-config-generator --output config.yaml

2. Running CLI Modules Directly¶

# Main CLI
python cli/cli.py --help

# Training
python cli/train.py config.yaml model_path data_path

# Inference
python cli/inference.py config.yaml model_path

# Configuration generator
python cli/model_config_generator.py --output config.yaml

3. Using Package Modules¶

# Package CLI
python -m dnallm.cli.cli --help

# Package training
python -m dnallm.cli.train config.yaml model_path data_path

# Package inference
python -m dnallm.cli.inference config.yaml model_path

# Package configuration generator
python -m dnallm.cli.model_config_generator --output config.yaml

Command Reference¶

`dnallm-train`¶

Train a DNA language model with specified configuration.

Options: - --config, -c: Path to training configuration file - --model, -m: Model name or path - --data, -d: Path to training data - --output, -o: Output directory for training results

Examples:

# Using configuration file
dnallm-train --config finetune_config.yaml

# Using command line arguments
dnallm-train --model zhangtaolab/plant-dnagpt-BPE --data ./data --output ./outputs

`dnallm-inference`¶

Run inference with a trained DNA language model.

Options: - --config, -c: Path to inference configuration file - --model, -m: Model name or path - --input, -i: Path to input data file - --output, -o: Output file path

Examples:

# Using configuration file
dnallm-inference --config inference_config.yaml

# Using command line arguments
dnallm-inference --model ./models/trained_model --input ./test_data.csv

`dnallm-model-config-generator`¶

Generate configuration files for DNALLM tasks.

Options: - --output, -o: Output file path for configuration - --preview: Preview configuration without saving - --template: Template type (training, inference, benchmark)

Examples:

# Generate training configuration
dnallm-model-config-generator --output training_config.yaml

# Preview configuration
dnallm-model-config-generator --preview

`dnallm-mcp-server`¶

Start MCP (Model Context Protocol) server.

Options: - --config, -c: Path to configuration file - --port, -p: Server port (default: 8000) - --host, -h: Server host (default: localhost)

Examples:

# Start server with configuration
dnallm-mcp-server --config mcp_config.yaml

# Start server on specific port
dnallm-mcp-server --port 9000

Configuration Examples¶

Training Configuration (config.yaml)¶

model:
  name_or_path: "zhangtaolab/plant-dnagpt-BPE"
  source: "huggingface"

task:
  task_type: "binary_classification"
  num_labels: 2
  label_names: ["negative", "positive"]

data:
  train_file: "path/to/train.csv"
  eval_file: "path/to/eval.csv"
  text_column: "sequence"
  label_column: "label"

training:
  num_train_epochs: 3
  per_device_train_batch_size: 8
  learning_rate: 5e-5
  save_steps: 1000
  eval_steps: 1000

Inference Configuration (config.yaml)¶

model:
  name_or_path: "path/to/trained/model"
  source: "local"

task:
  task_type: "binary_classification"
  num_labels: 2

data:
  input_file: "path/to/input/data"
  output_file: "predictions.csv"

Benchmark Configuration (config.yaml)¶

benchmark:
  name: "DNA Model Benchmark"
  description: "Comparing DNA language models on various tasks"

models:
  - name: "Model 1"
    source: "huggingface"
    path: "zhangtaolab/plant-dnagpt-BPE"
    task_type: "binary_classification"
  - name: "Model 2"
    source: "modelscope"
    path: "zhangtaolab/plant-dnabert"
    task_type: "binary_classification"

datasets:
  - name: "Test Dataset"
    path: "path/to/dataset.csv"
    format: "csv"
    task: "binary_classification"

evaluation:
  metrics: ["accuracy", "precision", "recall", "f1", "mcc"]
  save_predictions: true
  output_dir: "./benchmark_results"

Project Structure¶

DNALLM/
├── cli/                    # Root directory CLI entry points
│   ├── cli.py            # Main CLI
│   ├── train.py          # Training CLI
│   ├── inference.py      # Inference CLI
│   └── model_config_generator.py # Configuration generator
├── ui/                    # UI applications
│   ├── run_config_app.py # Configuration generator launcher
│   └── ...
├── dnallm/               # Core package
│   ├── cli/             # Package CLI modules
│   │   ├── cli.py       # Package CLI implementation
│   │   ├── train.py     # Package training module
│   │   ├── inference.py # Package inference module
│   │   └── model_config_generator.py # Package config generator
│   └── ...
├── run_cli.py           # Root directory CLI launcher
└── pyproject.toml       # Package configuration

Important Notes¶

Development Environment: Ensure you're running commands from the project root directory
Dependencies: Make sure all dependencies are properly installed
Path Configuration: Use absolute paths or paths relative to the project root
Python Version: Requires Python 3.10 or higher

Troubleshooting¶

Import Errors¶

Ensure you're running from the project root directory
Check Python path settings
Verify the package is properly installed

Configuration Errors¶

Check configuration file format
Verify file paths are correct
Ensure all required configuration parameters are present

Permission Errors¶

Check file and directory permissions
Ensure you have write permissions for output directories

Getting Help¶

# Show help for a specific command
dnallm-train --help

# Show help for configuration generator
dnallm-model-config-generator --help

# Show help for MCP server
dnallm-mcp-server --help

Next Steps¶

Configuration Generator - Learn how to create configuration files
MCP Server - Learn about the Model Context Protocol server
Fine-tuning Tutorials - Learn to train models
Benchmark Tutorials - Compare model performance
Inference Tutorials - Run model inference

CLI Usage Guide¶

Overview¶

Usage After Installation¶

Development Environment Usage¶

1. Using the Launcher Script¶

2. Running CLI Modules Directly¶

3. Using Package Modules¶

Command Reference¶

dnallm-train¶

dnallm-inference¶

dnallm-model-config-generator¶

dnallm-mcp-server¶

Configuration Examples¶

Training Configuration (config.yaml)¶

Inference Configuration (config.yaml)¶

Benchmark Configuration (config.yaml)¶

Project Structure¶

Important Notes¶

Troubleshooting¶

Import Errors¶

Configuration Errors¶

Permission Errors¶

Getting Help¶

Next Steps¶

`dnallm-train`¶

`dnallm-inference`¶

`dnallm-model-config-generator`¶

`dnallm-mcp-server`¶