ExtractSmilesFromText

About 222 wordsLess than 1 minute

2025-10-09

📘 Overview

ExtractSmilesFromText is an operator designed to extract or parse chemical SMILES expressions from OCR text. It uses a given prompt_template to construct model inputs, combining text content and optional abbreviation/monomer information, then calls a Large Language Model (LLM) to produce structured outputs which are parsed into JSON format.

init

def __init__(self, llm_serving: LLMServingABC, prompt_template=None)

Parameter	Type	Default	Description
llm_serving	LLMServingABC	Required	The Large Language Model serving instance for executing inference.
prompt_template	Any	None	The prompt template object used to construct the model input.

Prompt Template Descriptions

Prompt Template Name	Primary Use	Applicable Scenarios	Feature Description

run

def run(self, storage: DataFlowStorage, content_key: str = "text", abbreviation_key: str = "abbreviations", output_key: str = "synth_smiles")

Parameter	Type	Default	Description
storage	DataFlowStorage	Required	The data flow storage instance responsible for reading and writing data.
content_key	str	"text"	The name of the input column containing the OCR text.
abbreviation_key	str	"abbreviations"	The name of the input column containing abbreviation/monomer information.
output_key	str	"synth_smiles"	The name of the output column where the extracted results will be stored.

🧠 Example Usage

🧾 Default Output Format (Output Format)

Field	Type	Description
...	...	Input columns from the source dataframe.
synth_smiles (default)	list/dict	The JSON-parsed SMILES structure extracted by the model. Returns `[]` on failure.

eval

generate

eval

generate

eval

filter

generate

eval

filter

generate

generate

eval

filter

refine

generate

generate

generate

eval

filter

refine

generate

generate

eval

filter

generate

eval

filter

generate

eval

generate

filter

eval

filter

generate

refine

ExtractSmilesFromText

📘 Overview

__init__

Prompt Template Descriptions

run

🧠 Example Usage

🧾 Default Output Format (Output Format)

init