Model¶

dnallm.models.model ¶

DNA Model loading and management utilities.

This module provides functions for downloading, loading, and managing DNA language models from various sources including Hugging Face Hub, ModelScope, and local storage.

Classes¶

DNALLMforSequenceClassification ¶

DNALLMforSequenceClassification(config, custom_model=None)

Bases: PreTrainedModel

An automated wrapper that selects an appropriate pooling strategy based on the underlying model architecture and appends a customizable MLP head for sequence classification or regression tasks.

Functions¶

from_base_model `classmethod` ¶

from_base_model(model_name_or_path, config, module=None)

Handles weights diffusion when loading a model from a pre-trained base model.

Functions¶

clear_model_cache ¶

clear_model_cache(source='huggingface')

Remove all the cached models

Parameters:

Name	Type	Description	Default
`source`	`str`	Source to clear model cache from ( 'huggingface', 'modelscope'), default 'huggingface'	`'huggingface'`

download_model ¶

download_model(
    model_name, downloader, revision=None, max_try=10
)

Download a model with retry mechanism for network issues.

In case of network issues, this function will attempt to download the model multiple times before giving up.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	Name of the model to download	required
`downloader`	`Any`	Download function to use (e.g., snapshot_download)	required
`max_try`	`int`	Maximum number of download attempts, default 10	`10`

Returns:

Type	Description
`str`	Path where the model files are stored

Raises:

Type	Description
`ValueError`	If model download fails after all attempts

load_model_and_tokenizer ¶

load_model_and_tokenizer(
    model_name,
    task_config,
    source="local",
    use_mirror=False,
    revision=None,
    custom_tokenizer=None,
)

Load model and tokenizer from either HuggingFace or ModelScope.

This function handles loading of various model types based on the task configuration, including sequence classification, token classification, masked language modeling, and causal language modeling.

Args:
    model_name: Model name or path
    task_config: Task configuration object containing task type and
        label information
            source: Source to load model and tokenizer from (
        'local',
        'huggingface',
        'modelscope'),
        default 'local'
            use_mirror: Whether to use HuggingFace mirror (
        hf-mirror.com),
        default False

Returns:
    Tuple containing (model, tokenizer)

Raises:
    ValueError: If model is not found locally or loading fails

load_preset_model ¶

load_preset_model(model_name, task_config)

Load a preset model and tokenizer based on the task configuration.

This function loads models from the preset model registry, which contains pre-configured models for various DNA analysis tasks.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	Name or path of the model task_config: Task configuration object containing task type and label information	required

Returns:

Type	Description
`tuple[Any, Any] \| int`	Tuple containing (model, tokenizer) if successful, 0 if model not found

Note

If the model is not found in preset models,

    the function will print a warning
        and
    return 0. Use `load_model_and_tokenizer` function for custom model
    loading.

peft_forward_compatiable ¶

peft_forward_compatiable(model)

Convert base model forward to be compatiable with HF

Parameters:

Name	Type	Description	Default
`model`	`Any`	Base model	required

Returns:

Type	Description
`Any`	model with changed forward function

Model¶

dnallm.models.model ¶

Classes¶

DNALLMforSequenceClassification ¶

Functions¶

from_base_model classmethod ¶

Functions¶

clear_model_cache ¶

download_model ¶

load_model_and_tokenizer ¶

load_preset_model ¶

peft_forward_compatiable ¶

from_base_model `classmethod` ¶