EMScore Evaluator (EMScoreEval)

About 441 wordsAbout 1 min

2025-01-20

📘 Overview

EMScoreEval is a video frame-level EMScore evaluation operator. It extracts frames from videos using specified strategies, uses CLIP model to compute semantic similarity between candidate text and reference text/video frames, providing fine-grained and coarse-grained evaluation metrics.

🏗️ `init` Function

def __init__(
    self,
    every_n_seconds: Optional[float] = None,
    every_n_frames: Optional[int] = None,
    return_all_frames: bool = False,
    clip_model_path: Optional[str] = None,
    score_types: Optional[List[str]] = None,
    metrics: Optional[List[str]] = None
):
    ...

🧾 `init` Parameters

Parameter	Type	Default	Description
`every_n_seconds`	`Optional[float]`	`None`	Extract frame every N seconds
`every_n_frames`	`Optional[int]`	`None`	Extract frame every N frames
`return_all_frames`	`bool`	`False`	Return per-frame detailed scores
`clip_model_path`	`Optional[str]`	`None`	CLIP model path
`score_types`	`Optional[List]`	`None`	Score types to compute (default: all)
`metrics`	`Optional[List]`	`None`	Metrics to output (default: all)

Available Score Types

Score Type	Description
`EMScore(X,X*)`	Candidate text vs reference text
`EMScore(X,V)`	Candidate text vs video frames
`EMScore(X,V,X*)`	Combined score (text + video)

Available Metrics

Metric	Description
`figr_P`	Fine-grained Precision
`figr_R`	Fine-grained Recall
`figr_F`	Fine-grained F1
`cogr`	Coarse-grained (global) consistency
`full_P`	Full Precision (fine + coarse)
`full_R`	Full Recall (fine + coarse)
`full_F`	Full F1 (fine + coarse)

⚡ `run` Function

def run(
    self,
    storage: DataFlowStorage,
    video_key: str = 'video_path',
    candidate_key: str = 'candidate',
    reference_key: str = 'reference'
) -> list[str]:
    ...

Execute main logic: reads video paths, candidate text, and reference text from storage, extracts video frame features, computes EMScore, and writes back to storage.

Returns: list[str] - List of output field names, including all computed score fields (e.g., EMScore(X,X*)_figr_F) and optional frame_details field

🧾 `run` Parameters

Parameter	Type	Default	Description
`storage`	`DataFlowStorage`	-	DataFlow storage object
`video_key`	`str`	`"video_path"`	Field name for video paths
`candidate_key`	`str`	`"candidate"`	Field name for candidate text
`reference_key`	`str`	`"reference"`	Field name for reference text (optional)

🧠 Example Usage

from dataflow.utils.storage import FileStorage
from dataflow.operators.core_vision import EMScoreEval

storage = FileStorage(
    first_entry_file_name="data/emscore_input.jsonl",
    cache_path="./cache_local",
    file_name_prefix="emscore",
    cache_type="jsonl"
)

evaluator = EMScoreEval(
    every_n_seconds=2.0,
    return_all_frames=True,
    score_types=['EMScore(X,X*)', 'EMScore(X,V)', 'EMScore(X,V,X*)'],
    metrics=['figr_F', 'cogr', 'full_F']
)

evaluator.run(
    storage=storage.step(),
    video_key="video_path",
    candidate_key="candidate",
    reference_key="reference"
)

🧾 Default Output Format

Added Fields:

Each combination of score type and metric generates a field, e.g.:

Field	Type	Description
`EMScore(X,X*)_figr_F`	`float`	Fine-grained F1 (text vs text)
`EMScore(X,V)_cogr`	`float`	Coarse-grained (text vs video)
`EMScore(X,V,X*)_full_F`	`float`	Full F1 (combined)
`frame_details`	`str`	Per-frame scores (JSON, when `return_all_frames=True`)

Code: EMScoreEval
Paper: EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

generate

eval

filter

refine

generate

eval

filter

generate

eval

filter

generaterow

refine

EMScore Evaluator (EMScoreEval)

📘 Overview

🏗️ `init` Function

🧾 `init` Parameters

Available Score Types

Available Metrics

⚡ `run` Function

🧾 `run` Parameters

🧠 Example Usage

🧾 Default Output Format