Storage Module
About 1049 wordsAbout 4 min
2025-06-12
Dataflow’s storage system is built around the DataFlowStorage abstract base class, fully decoupling the storage layer from algorithm, data-flow control and other logic. Users only need to subclass DataFlowStorage and implement the read and write interfaces to seamlessly integrate custom file systems, object storage services or databases as backends—without modifying existing operators or pipeline code.
class DataFlowStorage(ABC):
"""
Abstract base class for data storage.
"""
@abstractmethod
def read(self, output_type) -> Any:
"""
Read data from file.
type: type that you want to read to, such as "datatrame", List[dict], etc.
"""
pass
@abstractmethod
def write(self, data: Any) -> Any:
pass
We provide a built-in default implementation in the DataFlow system called FileStorage, which supports reading and writing common formats on the local file system (JSON/JSONL, CSV, Parquet, Pickle), helping users get started quickly and covering the majority of scenarios.