Data Store for low-latency
Big Data Apps.

Already using Apache Hadoop for batch data deliveries?
Need an efficient store for your analytical application?

screenshot

jumboDB allows you to store, index and query large amounts of data using Apache Hadoop core features.

jumboDB: Big Data for the masses!

Balancing performance and cost efficiency.

Affordable Big Data

Low IO requirements, efficient usage of disk space, low memory footprint.

placeholder
placeholder

Fast disk access through compression

Snappy achieves compression rates up to 5 times increasing disk IO efficiency and saving storage cost.

Batch processing - delivery driven approach

“Write once – read many”, one batch of data is an atomic write with the instant rollback, replacement or combination possibility and delivery versioning included.

placeholder
placeholder

Supports JSON documents

Schema flexibility for rapid application development.

Power and scalability of Apache Hadoop

For batch processing, aggregation and indexing of your data. Example: jumboDB allows writes up to 500.000 JSON documents per second on a single AWS m1.xlarge instance (document size being 420 bytes).

placeholder
placeholder

Low read latency for end-user apps

Optimized querying even for large result sets through multithreading and efficient data streaming. Example: 100.000 JSON documents returned in less than a second on a single m1.xlarge AWS instance.

Hadoop Connector and Java Driver available

Easy to integrate into any JVM based application.

placeholder

Open Source: Join our community!

Dedicated to cost efficient Big Data.

Working on Big Data projects with Telefonica Digital, Carsten Hufe and the Comsysto Reply Team started looking for an efficient and affordable way to store and query large amounts of data being delivered in large batches through Apache Hadoop. Our goal was to build a data visualization app for end users issuing different kinds of selective queries on already processed data. Some of the queries were returning large result sets of up to 800.000 JSON documents representing data points for browser visualisation.

We faced three major challenges:

  • Write performance limitations from Hadoop to other databases (including index builds)
  • Read performance issues for large result sets
  • Cost saving pressure for AWS infrastructure
Instead of waiting for the database to write and index the data, Carsten proposed to organize the batch result data within the Hadoop process, including index builds. Using Snappy for compression he reduced the overall the hardware and network footprint of the stored data while increasing write and read performance. Index builds are now part of the standard Hadoop processing saving valuable processing time and money, still allowing fast querying of large data sets. With jumboDB we were able to deliver all required features based on a full data set while saving infrastructure cost.

Do with jumboDB:

  • Store your data for complex data visualisation apps
  • Improve Write performance for Hadoop batch jobs
  • Low-latency queries on indexed data for large result sets

Don’t:

  • Use SQL on jumboDB
  • Use dumboDB without Apache Hadoop
  • Update single documents

About TDI

Telefónica Dynamic Insights provide near real-time data, collected 24 hours a day, 7 days a week, 365 days a year and present it through simple web interfaces to enable visualisation and understanding. We fuse this data with the best consumer preference, attitude and behavioural insight through our global partnership with GfK. Find out more about TDI.

About Comsysto Reply

Comsysto Reply is a Munich-based software company specialized in lean business and technology development. While supporting all three steps of a well known Build-Measure-Learn lean feedback loop, Comsysto Reply focuses on open source frameworks and software as major enablers of short, agile Build-Measure-Learn iterations and fast gains in validated learning. Powerful MongoDB technology provides the needed flexibility and agility for turning ideas into products as well as performance for handling Big Data while turning data into knowledge. We also enjoy developing with Spring framework and its subprojects, Apache Wicket, Gradle, Git, Oracle DB and Oracle BI. Comsysto Reply is dedicated to eliminating waste in both business and technology since 2005. Find out more about Comsysto Reply.

Initiated by Carsten Hufe, Lean Java Expert at Comsysto Reply GmbH, created in the Telefónica Big Data Lab.
Powered by Telefónica Dynamic Insights and Comsysto Reply.