分布式流处理框架:Apache Samza

jopen 11年前

Apache Samza 是一个分布式流处理框架。它使用 Apache Kafka 用于消息发送,采用 Apache Hadoop YARN 来提供容错,处理器隔离,安全性和资源管理。专用于实时数据的处理,非常像推ter的流处理系统Storm。它具有以下特性:

  • Simple API: Unlike most low-level messaging system APIs, Samza provides a very simple call-back based "process message" API that should be familiar to anyone who's used Map/Reduce.
  • Managed state: Samza manages snapshotting and restoration of a stream processor's state. Samza will restore a stream processor's state to a snapshot consistent with the processor's last read messages when the processor is restarted.
  • Fault tolerance: Samza will work with YARN to restart your stream processor if there is a machine or processor failure.
  • Durability: Samza uses Kafka to guarantee that messages will be processed in the order they were written to a partition, and that no messages will ever be lost.
  • Scalability: Samza is partitioned and distributed at every level. Kafka provides ordered, partitioned, re-playable, fault-tolerant streams. YARN provides a distributed environment for Samza containers to run in.
  • Pluggable: Though Samza works out of the box with Kafka and YARN, Samza provides a pluggable API that lets you run Samza with other messaging systems and execution environments.
  • Processor isolation: Samza works with Apache YARN, which supports processor security through Hadoop's security model, and resource isolation through Linux CGroups.

分布式流处理框架:Apache Samza

项目主页:http://www.open-open.com/lib/view/home/1379911648961