Skip to content

Model

dnallm.models.model

DNA Model loading and management utilities.

This module provides functions for downloading, loading, and managing DNA language models from various sources including Hugging Face Hub, ModelScope, and local storage.

Classes

DNALLMforSequenceClassification

DNALLMforSequenceClassification(config, custom_model=None)

Bases: PreTrainedModel

An automated wrapper that selects an appropriate pooling strategy based on the underlying model architecture and appends a customizable MLP head for sequence classification or regression tasks.

Functions
from_base_model classmethod
from_base_model(model_name_or_path, config, module=None)

Handles weights diffusion when loading a model from a pre-trained base model.

Functions

clear_model_cache

clear_model_cache(source='huggingface')

Remove all the cached models

Parameters:

Name Type Description Default
source str

Source to clear model cache from ( 'huggingface', 'modelscope'), default 'huggingface'

'huggingface'

download_model

download_model(
    model_name, downloader, revision=None, max_try=10
)

Download a model with retry mechanism for network issues.

In case of network issues, this function will attempt to download the model multiple times before giving up.

Parameters:

Name Type Description Default
model_name str

Name of the model to download

required
downloader Any

Download function to use (e.g., snapshot_download)

required
max_try int

Maximum number of download attempts, default 10

10

Returns:

Type Description
str

Path where the model files are stored

Raises:

Type Description
ValueError

If model download fails after all attempts

load_model_and_tokenizer

load_model_and_tokenizer(
    model_name,
    task_config,
    source="local",
    use_mirror=False,
    revision=None,
    custom_tokenizer=None,
)

Load model and tokenizer from either HuggingFace or ModelScope.

This function handles loading of various model types based on the task configuration, including sequence classification, token classification, masked language modeling, and causal language modeling.

Args:
    model_name: Model name or path
    task_config: Task configuration object containing task type and
        label information
            source: Source to load model and tokenizer from (
        'local',
        'huggingface',
        'modelscope'),
        default 'local'
            use_mirror: Whether to use HuggingFace mirror (
        hf-mirror.com),
        default False

Returns:
    Tuple containing (model, tokenizer)

Raises:
    ValueError: If model is not found locally or loading fails

load_preset_model

load_preset_model(model_name, task_config)

Load a preset model and tokenizer based on the task configuration.

This function loads models from the preset model registry, which contains pre-configured models for various DNA analysis tasks.

Parameters:

Name Type Description Default
model_name str

Name or path of the model task_config: Task configuration object containing task type and label information

required

Returns:

Type Description
tuple[Any, Any] | int

Tuple containing (model, tokenizer) if successful, 0 if model not found

Note

If the model is not found in preset models,

    the function will print a warning
        and
    return 0. Use `load_model_and_tokenizer` function for custom model
    loading.

peft_forward_compatiable

peft_forward_compatiable(model)

Convert base model forward to be compatiable with HF

Parameters:

Name Type Description Default
model Any

Base model

required

Returns:

Type Description
Any

model with changed forward function