AgenticRAGWidthQAGenerator

About 449 wordsAbout 2 min

2025-10-09

📘 Overview

The AgenticRAGWidthQAGenerator is an operator designed to combine two question-answer (QA) pairs to generate a new, more complex question. This process involves multiple steps of generation, verification, and scoring, using a large language model (LLM) to ensure the quality and relevance of the newly created questions. It is a key component in workflows aimed at broadening the scope and complexity of a question dataset.

`init` function

def __init__(self, llm_serving: LLMServingABC = None)

Parameter	Type	Default	Description
llm_serving	LLMServingABC	None	The large language model serving instance, used for executing inference and generation.

Prompt Template Descriptions

Prompt Template Name	Primary Purpose	Applicable Scenarios	Feature Description
This operator does not use prompt templates; it generates, merges, and verifies complex QA pairs directly through multi-step reasoning without relying on intermediate prompt templates.

`run`

def run(
    self,
    storage: DataFlowStorage,
    input_question_key: str = "question",
    input_identifier_key: str = "identifier",
    input_answer_key: str = "answer",
    output_question_key: str = "generated_width_task"
)

Parameter	Type	Default	Description
storage	DataFlowStorage	Required	The DataFlow storage instance, responsible for reading and writing data.
input_question_key	str	"question"	The column name for the input questions.
input_identifier_key	str	"identifier"	The column name for the input identifiers.
input_answer_key	str	"answer"	The column name for the input answers.
output_question_key	str	"generated_width_task"	The column name for the generated wider questions.

🧠 Example Usage

from dataflow.operators.agentic_rag.generate.agenticrag_width_qa_generator import AgenticRAGWidthQAGenerator
from dataflow.utils.storage import DataFlowStorage

# Initialize the operator
generator = AgenticRAGWidthQAGenerator(
    llm_serving=your_llm_serving_instance
)

# Run the operator
storage = DataFlowStorage()
generator.run(
    storage=storage,
    input_question_key="question",
    input_identifier_key="identifier",
    input_answer_key="answer",
    output_question_key="generated_width_task"
)

🧾 Output Format

The operator modifies the input DataFrame by adding several new columns.

Field	Type	Description
generated_width_task	str	The generated complex question combining multiple QA pairs
content_identifier	str	Combined content identifier from merged questions
qa_index	list	Indices of the original QA pairs that were merged
index	int	Index of the generated question
original_answer	list	List of original answers from merged QA pairs
original_question	list	List of original questions from merged QA pairs
state	int	Validation state (1 for valid, 0 for invalid)

Example Input:

[
  {
    "question": "What is machine learning?",
    "identifier": "ml_concepts",
    "answer": "Machine learning is a subset of AI that enables computers to learn without explicit programming."
  },
  {
    "question": "How does deep learning work?",
    "identifier": "dl_concepts", 
    "answer": "Deep learning uses neural networks with multiple layers to process data and make predictions."
  }
]

Example Output:

{
  "generated_width_task": "How do machine learning and deep learning work together in modern AI systems?",
  "content_identifier": "ml_dl_integration",
  "qa_index": [0, 1],
  "index": 0,
  "original_answer": [
    "Machine learning is a subset of AI that enables computers to learn without explicit programming.",
    "Deep learning uses neural networks with multiple layers to process data and make predictions."
  ],
  "original_question": [
    "What is machine learning?",
    "How does deep learning work?"
  ],
  "state": 1
}

eval

generate

eval

generate

eval

filter

generate

eval

filter

generate

generate

eval

filter

refine

generate

generate

generate

eval

filter

refine

generate

generate

eval

filter

generate

eval

filter

generate

eval

generate

filter

eval

filter

generate

refine

AgenticRAGWidthQAGenerator

📘 Overview

__init__ function

Prompt Template Descriptions

run

🧠 Example Usage

🧾 Output Format

`init` function

`run`