skip to main content
10.1145/2588555.2595641acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Storm@twitter

Published:18 June 2014Publication History

ABSTRACT

This paper describes the use of Storm at Twitter. Storm is a real-time fault-tolerant and distributed stream data processing system. Storm is currently being used to run various critical computations in Twitter at scale, and in real-time. This paper describes the architecture of Storm and its methods for distributed scale-out and fault-tolerance. This paper also describes how queries (aka. topologies) are executed in Storm, and presents some operational stories based on running Storm at Twitter. We also present results from an empirical evaluation demonstrating the resilience of Storm in dealing with machine failures. Storm is under active development at Twitter and we also present some potential directions for future work.

References

  1. Arvind Arasu, Brian Babcock, Shivnath Babu, Mayur Datar, Keith Ito, Rajeev Motwani, Itaru Nishizawa, Utkarsh Srivastava, Dilys Thomas, Rohit Varma, Jennifer Widom: STREAM: The Stanford Stream Data Manager. IEEE Data Eng. Bull. 26(1): 19--26 (2003)Google ScholarGoogle Scholar
  2. Hari Balakrishnan, Magdalena Balazinska, Donald Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey, Eduardo F. Galvez, Jon Salz, Michael Stonebraker, Nesime Tatbul, Richard Tibbetts, Stanley B. Zdonik: Retrospective on Aurora. VLDB J. 13(4): 370--383 (2004) Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Minos N. Garofalakis, Johannes Gehrke: Querying and Mining Data Streams: You Only Get One Look. VLDB 2002Google ScholarGoogle Scholar
  4. Daniel J. Abadi, Yanif Ahmad, Magdalena Balazinska, Ugur Çetintemel, Mitch Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, Anurag Maskey, Alex Rasin, Esther Ryvkina, Nesime Tatbul, Ying Xing, Stanley B. Zdonik: The Design of the Borealis Stream Processing Engine. CIDR 2005: 277--289Google ScholarGoogle Scholar
  5. S4 Distributed stream computing platform. http://incubator.apache.org/s4/Google ScholarGoogle Scholar
  6. Tyler Akidau, Alex Balikov, Kaya Bekiroglu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, Sam Whittle: MillWheel: Fault-Tolerant Stream Processing at Internet Scale. PVLDB 6(11): 1033--1044 (2013) Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Apache Samza. http://samza.incubator.apache.orgGoogle ScholarGoogle Scholar
  8. Spark Streaming. http://spark.incubator.apache.org/docs/latest/streaming-programming-guide.htmlGoogle ScholarGoogle Scholar
  9. Mohamed H. Ali, Badrish Chandramouli, Jonathan Goldstein, Roman Schindlauer: The extensibility framework in Microsoft StreamInsight. ICDE 2011: 1242--1253 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sankar Subramanian, Srikanth Bellamkonda, Hua-Gang Li, Vince Liang, Lei Sheng, Wayne Smith, James Terry, Tsae-Feng Yu, Andrew Witkowski: Continuous Queries in Oracle. VLDB 2007: 1173--1184 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. IBM Infosphere Streams. http://www-03.ibm.com/software/products/en/infosphere-streams/Google ScholarGoogle Scholar
  12. Namit Jain, Shailendra Mishra, Anand Srinivasan, Johannes Gehrke, Jennifer Widom, Hari Balakrishnan, Ugur Çetintemel, Mitch Cherniack, Richard Tibbetts, Stanley B. Zdonik: Towards a streaming SQL standard. PVLDB 1(2): 1379--1390 (2008) Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jay Kreps, Neha Narkhede, and Jun Rao. Kafka: a distributed messaging system for log processing. SIGMOD Workshop on Networking Meets Databases, 2011.Google ScholarGoogle Scholar
  14. Kestrel: A simple, distributed message queue system. http://robey.github.com/kestrelGoogle ScholarGoogle Scholar
  15. Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica. 2011. Mesos: a platform for fine-grained resource sharing in the data center. In NSDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Goetz Graefe: Encapsulation of Parallelism in the Volcano Query Processing System. SIGMOD Conference 1990: 102--111 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Apache Zookeeper. http://zookeeper.apache.org/Google ScholarGoogle Scholar
  18. Summingbird. https://github.com/Twitter/summingbirdGoogle ScholarGoogle Scholar
  19. Rajagopal Ananthanarayanan, Venkatesh Basker, Sumit Das, Ashish Gupta, Haifeng Jiang, Tianhao Qiu, Alexey Reznichenko, Deomid Ryabkov, Manpreet Singh, Shivakumar Venkataraman: Photon: fault-tolerant and scalable joining of continuous data streams. SIGMOD Conference 2013: 577--588 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Nathan Marz: (Storm) Tutorial. https://github.com/nathanmarz/storm/wiki/TutorialGoogle ScholarGoogle Scholar
  21. Storm, Stream Data Processing: http://hortonworks.com/labs/storm/Google ScholarGoogle Scholar
  22. Apache Storm: http://hortonworks.com/hadoop/storm/Google ScholarGoogle Scholar
  23. Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, Eric Baldeschwieler: Apache Hadoop YARN: yet another resource negotiator. SoCC 2013: 5Google ScholarGoogle Scholar
  24. Nathan Marz: Trident API Overview. https://github.com/nathanmarz/storm/wiki/Trident-API-OverviewGoogle ScholarGoogle Scholar
  25. ZeroMQ: http://zeromq.org/Google ScholarGoogle Scholar

Index Terms

  1. Storm@twitter

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
          June 2014
          1645 pages
          ISBN:9781450323765
          DOI:10.1145/2588555

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 June 2014

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SIGMOD '14 Paper Acceptance Rate107of421submissions,25%Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader