Skip to content

Run

Robust protonation processing module.

This module provides the main Protonate class for processing SMILES strings and generating protonated variants. Each function is focused on a specific aspect of the protonation workflow with comprehensive error handling.

Protonate(smiles_input=None, ph_min=6.4, ph_max=8.4, precision=1.0, label_identifiers=False, label_states=False, max_variants=128, validate_output=True, **smiles_processor_kwargs)

Generator class for protonating SMILES strings with comprehensive error handling.

Processes molecules one at a time, generating protonated variants based on pH conditions and pKa data. Includes validation, fallback mechanisms, and detailed statistics tracking.

Design goals: Safety through validation, performance through batching, developer experience through clear error messages and statistics.

Initialize the protonation generator with explicit parameters.

PARAMETER DESCRIPTION
smiles_input

SMILES string, file path, or iterable of SMILES

TYPE: str | Iterable[str] | Iterator[str] | None DEFAULT: None

ph_min

Minimum pH to consider.

TYPE: float DEFAULT: 6.4

ph_max

Maximum pH to consider.

TYPE: float DEFAULT: 8.4

precision

pKa precision factor.

TYPE: float DEFAULT: 1.0

label_identifiers

When returning SMILES, format the string to include any identifier.

TYPE: bool DEFAULT: False

label_states

When returning SMILES, format the string to include states.

TYPE: bool DEFAULT: False

max_variants

Maximum number of variants per input compound (bounded)

TYPE: int DEFAULT: 128

validate_output

Whether to validate generated SMILES (explicit)

TYPE: bool DEFAULT: True

**smiles_processor_kwargs

Additional arguments for SMILESProcessor

DEFAULT: {}

current_results_queue = []

label_identifiers = label_identifiers

label_states = label_states

max_variants = max_variants

ph_max = ph_max

ph_min = ph_min

pka_data = PKaData()

precision = precision

site_detector = ProtonationSiteDetector(validate_sites=True, max_sites_per_molecule=50)

smiles_input = smiles_input

smiles_processor = SMILESProcessor(**smiles_processor_kwargs)

stats = ProtonationStats()

validate_output = validate_output

__iter__()

Return this generator object for iteration.

__next__()

Generate the next protonated SMILES string.

RETURNS DESCRIPTION
str

String containing protonated SMILES with metadata

RAISES DESCRIPTION
StopIteration

When no more SMILES are available to process

ProtonationError

When processing encounters a critical error

get_stats()

Get comprehensive processing statistics.

RETURNS DESCRIPTION
dict[str, int | dict[str, int]]

Dictionary containing processing statistics

reset_stats()

Reset all processing statistics to zero.

stream_all()

Stream all protonated SMILES as a generator.

YIELDS DESCRIPTION
str

Protonated SMILES strings with metadata

to_list()

Return all protonated SMILES as a list.

RETURNS DESCRIPTION
list[str]

List of protonated SMILES strings with metadata

ProtonationError

Bases: Exception

Raised when protonation processing encounters an error.

ProtonationResult(smiles, identifier, states='')

Data structure for protonation results with explicit fields.

identifier

smiles

states = ''

__post_init__()

Validate result data after initialization.

to_string(include_identifier=False, include_states=False, separator=',')

Convert result to output string format.

PARAMETER DESCRIPTION
include_identifier

Whether to include the identifier.

TYPE: bool DEFAULT: False

include_states

Whether to include state information.

TYPE: bool DEFAULT: False

separator

What to separate additional information with.

TYPE: str DEFAULT: ','

RETURNS DESCRIPTION
str

Formatted string representation

ProtonationStats(molecules_processed=0, total_variants_generated=0, variants_validated=0, variants_rejected=0, molecules_with_sites=0, molecules_without_sites=0, fallback_used=0)

Statistics tracking for protonation processing with explicit counters.

fallback_used = 0

molecules_processed = 0

molecules_with_sites = 0

molecules_without_sites = 0

total_variants_generated = 0

variants_rejected = 0

variants_validated = 0

__post_init__()

Validate statistics after initialization.

protonate_smiles(smiles_input, ph_min=6.4, ph_max=8.4, precision=1.0, label_identifiers=False, label_states=False, max_variants=128, validate_output=True, **kwargs)

Convenience function to protonate SMILES with explicit parameters.

PARAMETER DESCRIPTION
smiles_input

SMILES string, file path, or iterable of SMILES

TYPE: str | Iterable[str] | Iterator[str]

ph_min

Minimum pH to consider

TYPE: float DEFAULT: 6.4

ph_max

Maximum pH to consider

TYPE: float DEFAULT: 8.4

precision

pKa precision factor

TYPE: float DEFAULT: 1.0

include_states

When returning SMILES, format the string to include states.

label_identifiers

When returning SMILES, format the string to include any identifier.

TYPE: bool DEFAULT: False

max_variants

Maximum number of variants per input compound

TYPE: int DEFAULT: 128

validate_output

Whether to validate generated SMILES

TYPE: bool DEFAULT: True

**kwargs

Additional arguments for SMILESProcessor

DEFAULT: {}

RETURNS DESCRIPTION
list[str]

Iterator of protonated SMILES strings