Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large Memory Consumption on OSX - Yosemite 10.10.2 #3712

Closed
marchon opened this issue Feb 2, 2015 · 34 comments
Closed

Large Memory Consumption on OSX - Yosemite 10.10.2 #3712

marchon opened this issue Feb 2, 2015 · 34 comments
Assignees
Milestone

Comments

@marchon
Copy link

marchon commented Feb 2, 2015

https://cloudup.com/iDdsKmsLXWB

Attached is a screenshot of rethinkdb consuming all of my available ram and slowing my machine down to the point that I could barely type in a terminal window.

My database was a mere 6,000,000 records with 4 fields each. It was a small sample of that I was hoping to accomplish with rethinkdb.
.
I will admit is seems like a great idea, but there are probably some issues with memory usage that need to be addressed.

@marchon
Copy link
Author

marchon commented Feb 2, 2015

Georges-MacBook-Pro:parts marchon$ system_profiler SPMemoryDataType
Memory:

Memory Slots:

  ECC: Disabled
  Upgradeable Memory: Yes

    BANK 0/DIMM0:

      Size: 8 GB
      Type: DDR3
      Speed: 1600 MHz
      Status: OK
      Manufacturer: 0x859B
      Part Number: 0x43543130323436344246313630422E4D3136
      Serial Number: -

    BANK 1/DIMM0:

      Size: 8 GB
      Type: DDR3
      Speed: 1600 MHz
      Status: OK
      Manufacturer: 0x859B
      Part Number: 0x43543130323436344246313630422E4D3136
      Serial Number: -

@marchon
Copy link
Author

marchon commented Feb 2, 2015

SATA/SATA Express:

Intel 6 Series Chipset:

  Vendor: Intel
  Product: 6 Series Chipset
  Link Speed: 6 Gigabit
  Negotiated Link Speed: 6 Gigabit
  Physical Interconnect: SATA
  Description: AHCI Version 1.30 Supported

    Samsung SSD 840 EVO 1TB:

      Capacity: 1 TB (1,000,204,886,016 bytes)
      Model: Samsung SSD 840 EVO 1TB                 
      Revision: EXT0BB6Q
      Serial Number: S1D9NSAF203409L     
      Native Command Queuing: Yes
      Queue Depth: 32
      Removable Media: No
      Detachable Drive: No
      BSD Name: disk0
      Medium Type: Solid State
      TRIM Support: No
      Partition Map Type: GPT (GUID Partition Table)
      S.M.A.R.T. status: Verified
      Volumes:
        EFI:
          Capacity: 209.7 MB (209,715,200 bytes)
          BSD Name: disk0s1
          Content: EFI
          Volume UUID: 85D67001-D93E-3687-A1C2-79D677F0C2E0
        disk0s2:
          Capacity: 999.35 GB (999,345,127,424 bytes)
          BSD Name: disk0s2
          Content: Apple_CoreStorage
        Recovery HD:
          Capacity: 650 MB (650,002,432 bytes)
          BSD Name: disk0s3
          Content: Apple_Boot
          Volume UUID: 4699C544-EB46-31E7-A622-8FF8422DDB57

@marchon
Copy link
Author

marchon commented Feb 2, 2015

Georges-MacBook-Pro:Downloads marchon$ brew update && brew install rethinkdb
fatal: Unable to create '/usr/local/Library/Taps/homebrew/homebrew-versions/.git/index.lock': Permission denied
Error: Failed to update tap: homebrew/versions
Updated Homebrew from 3fbd645b to 98ae4e05.
==> New Formulae
fb-adb
==> Updated Formulae
duplicity mercurial pygobject3
==> Installing rethinkdb dependency: openssl
==> Downloading https://downloads.sf.net/project/machomebrew/Bottles/openssl-1.0.2.yosemite.bottle.tar.gz
######################################################################## 100.0%
==> Pouring openssl-1.0.2.yosemite.bottle.tar.gz
==> Caveats
A CA file has been bootstrapped using certificates from the system
keychain. To add additional certificates, place .pem files in
/usr/local/etc/openssl/certs

and run
/usr/local/opt/openssl/bin/c_rehash

This formula is keg-only, which means it was not symlinked into /usr/local.

Mac OS X already provides this software and installing another version in
parallel can cause all kinds of trouble.

Apple has deprecated use of OpenSSL in favor of its own TLS and crypto libraries

Generally there are no consequences of this for you. If you build your
own software and it requires this formula, you'll need to add to your
build variables:

LDFLAGS:  -L/usr/local/opt/openssl/lib
CPPFLAGS: -I/usr/local/opt/openssl/include

==> Summary
🍺 /usr/local/Cellar/openssl/1.0.2: 459 files, 18M
==> Installing rethinkdb
==> Downloading https://downloads.sf.net/project/machomebrew/Bottles/rethinkdb-1.16.0-1.yosemite.bottle.tar.gz
######################################################################## 100.0%
==> Pouring rethinkdb-1.16.0-1.yosemite.bottle.tar.gz
==> Caveats
To have launchd start rethinkdb at login:
ln -sfv /usr/local/opt/rethinkdb/*.plist ~/Library/LaunchAgents
Then to load rethinkdb now:
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.rethinkdb.plist
Error: The brew link step did not complete successfully
The formula built, but is not symlinked into /usr/local
Could not symlink bin/rethinkdb
Target /usr/local/bin/rethinkdb
already exists. You may want to remove it:
rm '/usr/local/bin/rethinkdb'

To force the link and overwrite all conflicting files:
brew link --overwrite rethinkdb

To list all files that would be deleted:
brew link --overwrite --dry-run rethinkdb

Possible conflicting files are:
/usr/local/bin/rethinkdb
==> Summary
🍺 /usr/local/Cellar/rethinkdb/1.16.0-1: 6 files, 34M

@marchon
Copy link
Author

marchon commented Feb 2, 2015

/usr/local/bin/rethinkdb
==> Summary
🍺 /usr/local/Cellar/rethinkdb/1.16.0-1: 6 files, 34M
Georges-MacBook-Pro:Downloads marchon$ rethinkdb
Initializing directory /Users/marchon/Downloads/rethinkdb_data
Running rethinkdb 1.16.0-1 (CLANG 4.2 (clang-425.0.28))...
Running on Darwin 14.1.0 x86_64
Loading data from directory /Users/marchon/Downloads/rethinkdb_data
warn: Cache size does not leave much memory for server and query overhead (available memory: 577 MB).
warn: Cache size is very low and may impact performance.
Listening for intracluster connections on port 29015
Listening for client driver connections on port 28015
Listening for administrative HTTP connections on port 8080
Listening on addresses: 127.0.0.1, ::1
To fully expose RethinkDB on the network, bind to all addresses by running rethinkdb with the --bind all command line option.
Server ready, "Georges_MacBook_Pro_local_56o" f0a66394-31c4-4a79-978b-fdd3337d6baa

@marchon
Copy link
Author

marchon commented Feb 3, 2015

after terminated

@marchon
Copy link
Author

marchon commented Feb 3, 2015

after running for a while

@timmaxw
Copy link
Member

timmaxw commented Feb 3, 2015

Hi @marchon. Sorry you ran into this, and thank you for your detailed report. Could you provide us some more information to help diagnose this problem?

  • Can you upload your RethinkDB log file? If you run RethinkDB from the command line, the log file will be located by default in the rethinkdb_data directory.
  • After reproducing the problem, can you run the query r.db("rethinkdb").table("stats") and upload the result?

Thanks!

@marchon
Copy link
Author

marchon commented Feb 3, 2015

Here are the two different log_files - from two different runs.

{

  • "id": [
    • "cluster"
      ] ,
  • "query_engine": {
    • "client_connections": 4 ,
    • "clients_active": 1 ,
    • "queries_per_sec": 0 ,
    • "read_docs_per_sec": 1638.273040425 ,
    • "written_docs_per_sec": 6550.0820782
      }

}
{

  • "id": [
    • "server" ,
    • "7fc67717-08fa-4a6f-a91d-82e23825b28c"
      ] ,
  • "query_engine": {
    • "client_connections": 4 ,
    • "clients_active": 1 ,
    • "queries_per_sec": 0 ,
    • "queries_total": 2 ,
    • "read_docs_per_sec": 1638.273040425 ,
    • "read_docs_total": 5485 ,
    • "written_docs_per_sec": 6550.0820782 ,
    • "written_docs_total": 0
      } ,
  • "server": "Georges_MacBook_Pro_local_i0d"

}
{

  • "db": "nsntest" ,
  • "id": [
    • "table" ,
    • "5935b332-99f1-4335-b9e2-8ca13ca7343e"
      ] ,
  • "query_engine": {
    • "read_docs_per_sec": 1638.273040425 ,
    • "written_docs_per_sec": 6550.0820782
      } ,
  • "table": "nsnnumbers"

}
{

  • "db": "nsntest" ,
  • "id": [
    • "table_server" ,
    • "5935b332-99f1-4335-b9e2-8ca13ca7343e" ,
    • "7fc67717-08fa-4a6f-a91d-82e23825b28c"
      ] ,
  • "query_engine": {
    • "read_docs_per_sec": 1638.273040425 ,
    • "read_docs_total": 5485 ,
    • "written_docs_per_sec": 6550.0820782 ,
    • "written_docs_total": 0
      } ,
  • "server": "Georges_MacBook_Pro_local_i0d" ,
  • "storage_engine": {
    • "cache": {
      • "in_use_bytes": 686796800
        } ,
    • "disk": {
      • "read_bytes_per_sec": 210011287.06772992 ,
      • "read_bytes_total": 790092800 ,
      • "space_usage": {
        • "data_bytes": 1935568384 ,
        • "garbage_bytes": 354521600 ,
        • "metadata_bytes": 35651584 ,
        • "preallocated_bytes": 2097152
          } ,
      • "written_bytes_per_sec": 21817309.08668211 ,
      • "written_bytes_total": 59974656
        }
        } ,
  • "table": "nsnnumbers"

}

On Mon, Feb 2, 2015 at 7:48 PM, Timothy Maxwell notifications@github.com
wrote:

Hi @marchon https://github.com/marchon. Sorry you ran into this, and
thank you for your detailed report. Could you provide us some more
information to help diagnose this problem?

Can you upload your RethinkDB log file? If you run RethinkDB from the
command line, the log file will be located by default in the
rethinkdb_data directory.

After reproducing the problem, can you run the query
r.db("rethinkdb").table("stats") and upload the result?

Thanks!


Reply to this email directly or view it on GitHub
#3712 (comment)
.

P THINK BEFORE PRINTING: is it really necessary?

This e-mail and its attachments are confidential and solely for the
intended addressee(s). Do not share or use them without approval. If
received in error, contact the sender
and delete them.

@marchon
Copy link
Author

marchon commented Feb 3, 2015

From the first execution

{

  • "id": [
    • "cluster"
      ] ,
  • "query_engine": {
    • "client_connections": 4 ,
    • "clients_active": 1 ,
    • "queries_per_sec": 0 ,
    • "read_docs_per_sec": 0 ,
    • "written_docs_per_sec": 0
      }

}
{

  • "id": [
    • "server" ,
    • "f0a66394-31c4-4a79-978b-fdd3337d6baa"
      ] ,
  • "query_engine": {
    • "client_connections": 4 ,
    • "clients_active": 1 ,
    • "queries_per_sec": 0 ,
    • "queries_total": 4 ,
    • "read_docs_per_sec": 0 ,
    • "read_docs_total": 0 ,
    • "written_docs_per_sec": 0 ,
    • "written_docs_total": 0
      } ,
  • "server": "Georges_MacBook_Pro_local_56o"

}
{

  • "db": "nsntest" ,
  • "id": [
    • "table" ,
    • "9e1633ae-2082-41dc-9e6e-089d6a52fe78"
      ] ,
  • "query_engine": {
    • "read_docs_per_sec": 0 ,
    • "written_docs_per_sec": 0
      } ,
  • "table": "nsnnumbers"

}
{

  • "db": "nsntest" ,
  • "id": [
    • "table_server" ,
    • "9e1633ae-2082-41dc-9e6e-089d6a52fe78" ,
    • "f0a66394-31c4-4a79-978b-fdd3337d6baa"
      ] ,
  • "query_engine": {
    • "read_docs_per_sec": 0 ,
    • "read_docs_total": 0 ,
    • "written_docs_per_sec": 0 ,
    • "written_docs_total": 0
      } ,
  • "server": "Georges_MacBook_Pro_local_56o" ,
  • "storage_engine": {
    • "cache": {
      • "in_use_bytes": 36864
        } ,
    • "disk": {
      • "read_bytes_per_sec": 0 ,
      • "read_bytes_total": 11872768 ,
      • "space_usage": {
        • "data_bytes": 874811392 ,
        • "garbage_bytes": 140210176 ,
        • "metadata_bytes": 20971520 ,
        • "preallocated_bytes": 2097152
          } ,
      • "written_bytes_per_sec": 22267747.50129869 ,
      • "written_bytes_total": 150952448
        }
        } ,
  • "table": "nsnnumbers"

}

On Mon, Feb 2, 2015 at 8:21 PM, George Lambert marchon@gmail.com wrote:

Here are the two different log_files - from two different runs.

{

  • "id": [
    • "cluster"
      ] ,
  • "query_engine": {
    • "client_connections": 4 ,
    • "clients_active": 1 ,
    • "queries_per_sec": 0 ,
    • "read_docs_per_sec": 1638.273040425 ,
    • "written_docs_per_sec": 6550.0820782
      }

}
{

  • "id": [
    • "server" ,
    • "7fc67717-08fa-4a6f-a91d-82e23825b28c"
      ] ,
  • "query_engine": {
    • "client_connections": 4 ,
    • "clients_active": 1 ,
    • "queries_per_sec": 0 ,
    • "queries_total": 2 ,
    • "read_docs_per_sec": 1638.273040425 ,
    • "read_docs_total": 5485 ,
    • "written_docs_per_sec": 6550.0820782 ,
    • "written_docs_total": 0
      } ,
  • "server": "Georges_MacBook_Pro_local_i0d"

}
{

  • "db": "nsntest" ,
  • "id": [
    • "table" ,
    • "5935b332-99f1-4335-b9e2-8ca13ca7343e"
      ] ,
  • "query_engine": {
    • "read_docs_per_sec": 1638.273040425 ,
    • "written_docs_per_sec": 6550.0820782
      } ,
  • "table": "nsnnumbers"

}
{

  • "db": "nsntest" ,
  • "id": [
    • "table_server" ,
    • "5935b332-99f1-4335-b9e2-8ca13ca7343e" ,
    • "7fc67717-08fa-4a6f-a91d-82e23825b28c"
      ] ,
  • "query_engine": {
    • "read_docs_per_sec": 1638.273040425 ,
    • "read_docs_total": 5485 ,
    • "written_docs_per_sec": 6550.0820782 ,
    • "written_docs_total": 0
      } ,
  • "server": "Georges_MacBook_Pro_local_i0d" ,
  • "storage_engine": {
    • "cache": {
      • "in_use_bytes": 686796800
        } ,
    • "disk": {
      • "read_bytes_per_sec": 210011287.06772992 ,
      • "read_bytes_total": 790092800 ,
      • "space_usage": {
        • "data_bytes": 1935568384 ,
        • "garbage_bytes": 354521600 ,
        • "metadata_bytes": 35651584 ,
        • "preallocated_bytes": 2097152
          } ,
      • "written_bytes_per_sec": 21817309.08668211 ,
      • "written_bytes_total": 59974656
        }
        } ,
  • "table": "nsnnumbers"

}

On Mon, Feb 2, 2015 at 7:48 PM, Timothy Maxwell notifications@github.com
wrote:

Hi @marchon https://github.com/marchon. Sorry you ran into this, and
thank you for your detailed report. Could you provide us some more
information to help diagnose this problem?

Can you upload your RethinkDB log file? If you run RethinkDB from the
command line, the log file will be located by default in the
rethinkdb_data directory.

After reproducing the problem, can you run the query
r.db("rethinkdb").table("stats") and upload the result?

Thanks!


Reply to this email directly or view it on GitHub
#3712 (comment)
.

P THINK BEFORE PRINTING: is it really necessary?

This e-mail and its attachments are confidential and solely for the
intended addressee(s). Do not share or use them without approval. If
received in error, contact the sender
and delete them.

P THINK BEFORE PRINTING: is it really necessary?

This e-mail and its attachments are confidential and solely for the
intended addressee(s). Do not share or use them without approval. If
received in error, contact the sender
and delete them.

@timmaxw
Copy link
Member

timmaxw commented Feb 3, 2015

Thanks for the stats dumps, @marchon. We'll try to reproduce the memory leak locally.

@danielmewes danielmewes added this to the 1.16.x milestone Feb 3, 2015
@marchon
Copy link
Author

marchon commented Feb 3, 2015

Would you like the files to go with it?

I can give you the datafile and the python script if that is useful.

G.

On Mon, Feb 2, 2015 at 9:27 PM, Timothy Maxwell notifications@github.com
wrote:

Thanks for the stats dumps, @marchon https://github.com/marchon. We'll
try to reproduce the memory leak locally.


Reply to this email directly or view it on GitHub
#3712 (comment)
.

P THINK BEFORE PRINTING: is it really necessary?

This e-mail and its attachments are confidential and solely for the
intended addressee(s). Do not share or use them without approval. If
received in error, contact the sender
and delete them.

@danielmewes
Copy link
Member

@marchon Thanks a lot for your help in tracking this down. Your data files and the Python script would be very helpful.
Since I imagine they'll be a little too big for an email attachment, I'm going to send you an email with instructions on how to upload the data.

@marchon
Copy link
Author

marchon commented Feb 3, 2015

they have now been sent to you.

On Tue, Feb 3, 2015 at 12:12 AM, Daniel Mewes notifications@github.com
wrote:

@marchon https://github.com/marchon Thanks a lot for your help in
tracking this down. Your data files and the Python script would be very
helpful.
Since I imagine they'll be a little too big for an email attachment, I'm
going to send you an email with instructions on how to upload the data.


Reply to this email directly or view it on GitHub
#3712 (comment)
.

P THINK BEFORE PRINTING: is it really necessary?

This e-mail and its attachments are confidential and solely for the
intended addressee(s). Do not share or use them without approval. If
received in error, contact the sender
and delete them.

@mglukhovsky
Copy link
Member

@marchon, thanks for your help! We've transferred your data to our internal servers, so we can try to reproduce this. (cc @danielmewes)

@danielmewes
Copy link
Member

@marchon I think the memory issue might have to do with the secondary index construction that was going on at the same time as the inserts.

If you want to give it another shot, I recommend one of these work-arounds:

  • Create all indexes first, wait until they are ready, then start inserting data
  • First insert all data, then create the indexes

We're working on rewriting how secondary indexes are constructed (see #2134) because of issues like this, but the new implementation will likely take a few months before it's ready and sufficiently tested for shipping.

Another problem with the current index construction implementation is that it can sometimes slow other queries down significantly (e.g. #1955). This is also going to be solved with #2134, but still exists as of now.

@marchon
Copy link
Author

marchon commented Feb 4, 2015

I offered this help in the interest of advancing your project which looks
like it has promise.

These changes are not going to have a direct positive result for me, as I
needed an immediate solution and thought that I should consider your
solution as an alternative to mongodb.

Thank you for keeping me up to date and good luck with your project.

G.

On Tue, Feb 3, 2015 at 5:37 PM, Daniel Mewes notifications@github.com
wrote:

@marchon https://github.com/marchon I think the memory issue might have
to do with the secondary index construction that was going on at the same
time as the inserts.

If you want to give it another shot, I recommend one of these work-arounds:

  • Create all indexes first, wait until they are ready, then start
    inserting data
  • First insert all data, then create the indexes

We're working on rewriting how secondary indexes are constructed (see
#2134 #2134) because of
issues like this, but the new implementation will likely take a few months
before it's ready and sufficiently tested for shipping.

Another problem with the current index construction implementation is that
it can sometimes slow other queries down significantly (e.g. #1955
#1955). This is also going
to be solved with #2134
#2134, but still exists as
of now.


Reply to this email directly or view it on GitHub
#3712 (comment)
.

P THINK BEFORE PRINTING: is it really necessary?

This e-mail and its attachments are confidential and solely for the
intended addressee(s). Do not share or use them without approval. If
received in error, contact the sender
and delete them.

@danielmewes
Copy link
Member

@marchon I understand, thank you for your help again and sorry it didn't work out.

I did some more research, and this is rather interesting. It seems that this problem has to do with how RethinkDB is compiled / which libraries we are using.
In both cases I'm using a cache size of 400 MB, no secondary indexes.

  • compiling RethinkDB 1.16.0 from source (configure output below): Inserting ~6 million documents keeps the memory usage below 830 MB.
  • using our dmg installer: Memory usage almost monotonically increases while inserting ~6 million documents, and reaches 4300 MB towards the end.

@AtnNn which configure / make flags are we using for our dmg built?

configure output:

* Detecting system configuration
Bash:                           3.2.53(1)-release
Use ccache:                     no
C++ Compiler:                   CLANG 6.0 (/usr/bin/c++)
Host System:                    x86_64-apple-darwin13.4.0
Build System:                   Darwin 13.4.0 x86_64
Cross-compiling:                no
Host Operating System:          Darwin
Build Architecture:             x86_64
stdlib:                         -lc++
C++11:                          ok
Precompiled web assets:         no
Protobuf compiler:              /usr/local/bin/protoc
python:                         python 2.7.5
Node.js package manager:        /usr/local/bin/npm
LESS css:                       external/less_1.6.2
CoffeeScript:                   external/coffee-script_1.7.1
Handlebars:                     external/handlebars_1.3.0
Browserify:                     external/browserify_3.24.13
bluebird:                       external/bluebird_2.3.2
wget:                           no
curl:                           /usr/bin/curl
Google Test:                    external/gtest_1.6.0
termcap:                        -ltermcap
boost_system:                   -lboost_system
ssl:                            -lssl
protobuf:                       -lprotobuf
v8:                             external/v8_3.22.24.17
RE2:                            external/re2_20140111
z:                              -lz
crypto:                         -lcrypto
curl:                           -lcurl
malloc:                         system
Test protobuf:                  ok
Test boost:                     ok
Installation prefix:            /usr/local
Configuration prefix:           /usr/local/etc
Runtime data prefix:            /usr/local/var
* Wrote configuration to config.mk

@marchon
Copy link
Author

marchon commented Feb 4, 2015

Glad that I could help.

That is what makes this type of collaboration helpful.

G.

On Wed, Feb 4, 2015 at 2:22 PM, Daniel Mewes notifications@github.com
wrote:

@marchon https://github.com/marchon I understand, thank you for your
help again and sorry it didn't work out.

I did some more research, and this is rather interesting. It seems that
this problem has to do with how RethinkDB is compiled / which libraries we
are using.
In both cases I'm using a cache size of 400 MB, no secondary indexes.

  • compiling RethinkDB 1.16.0 from source (configure output below):
    Inserting ~6 million documents keeps the memory usage below 830 MB.
  • using our dmg installer: Memory usage almost monotonically increases
    while inserting ~6 million documents, and reaches 4300 MB towards the end.

@AtnNn https://github.com/AtnNn which configure / make flags are we
using for our dmg built?

configure output:

  • Detecting system configuration
    Bash: 3.2.53(1)-release
    Use ccache: no
    C++ Compiler: CLANG 6.0 (/usr/bin/c++)
    Host System: x86_64-apple-darwin13.4.0
    Build System: Darwin 13.4.0 x86_64
    Cross-compiling: no
    Host Operating System: Darwin
    Build Architecture: x86_64
    stdlib: -lc++
    C++11: ok
    Precompiled web assets: no
    Protobuf compiler: /usr/local/bin/protoc
    python: python 2.7.5
    Node.js package manager: /usr/local/bin/npm
    LESS css: external/less_1.6.2
    CoffeeScript: external/coffee-script_1.7.1
    Handlebars: external/handlebars_1.3.0
    Browserify: external/browserify_3.24.13
    bluebird: external/bluebird_2.3.2
    wget: no
    curl: /usr/bin/curl
    Google Test: external/gtest_1.6.0
    termcap: -ltermcap
    boost_system: -lboost_system
    ssl: -lssl
    protobuf: -lprotobuf
    v8: external/v8_3.22.24.17
    RE2: external/re2_20140111
    z: -lz
    crypto: -lcrypto
    curl: -lcurl
    malloc: system
    Test protobuf: ok
    Test boost: ok
    Installation prefix: /usr/local
    Configuration prefix: /usr/local/etc
    Runtime data prefix: /usr/local/var
  • Wrote configuration to config.mk


Reply to this email directly or view it on GitHub
#3712 (comment)
.

P THINK BEFORE PRINTING: is it really necessary?

This e-mail and its attachments are confidential and solely for the
intended addressee(s). Do not share or use them without approval. If
received in error, contact the sender
and delete them.

@danielmewes
Copy link
Member

Seems like is has to do with boost.

This causes high memory consumption:

$ ./configure --fetch boost --allow-fetch
[...]
boost_system:                   external/boost_1.56.0
[...]

This one works as expected (low memory consumption):

$ ./configure --allow-fetch
[...]
boost_system:                   -lboost_system
[...]

I'll do some memory profiling next to find out what's using the memory exactly.

@danielmewes
Copy link
Member

Boost 1.55 appears to be good, but both Boost 1.56.0 and 1.57.0 exhibit the problem.
Next question is whether this is a Boost bug or if we're using it wrongly.

@danielmewes
Copy link
Member

I can also reproduce this on Linux with Boost 1.56.0

@danielmewes
Copy link
Member

Valgrind:

==18761== 1,064,825 bytes in 4,600 blocks are definitely lost in loss record 18 of 18
==18761==    at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18761==    by 0x167A714: rmalloc(unsigned long) (utils.cc:235)
==18761==    by 0x156B518: shared_buf_t::create(unsigned long) (shared_buffer.cc:11)
==18761==    by 0x1C2EB34: ql::datum_deserialize(read_stream_t*, ql::datum_t*) (serialize_datum.cc:694)
==18761==    by 0x18A292C: archive_result_t ql::deserialize<(cluster_version_t)4>(read_stream_t*, ql::datum_t*) (serialize_datum.hpp:84)
==18761==    by 0x1A994E5: archive_result_t std::deserialize<(cluster_version_t)4, ql::datum_t>(read_stream_t*, std::vector<ql::datum_t, std::allocator<ql::datum_t> >*) (stl_types.hpp:174)
==18761==    by 0x1C4F366: archive_result_t deserialize<(cluster_version_t)4>(read_stream_t*, batched_insert_t*) (protocol.cc:1283)
==18761==    by 0x1C5F12F: _ZN20variant_deserializerIL17cluster_version_t4ELi2EN5boost7variantI17batched_replace_t16batched_insert_t13point_write_t14point_delete_t15sindex_create_t13sindex_drop_t15sindex_rename_t6sync_t13dummy_write_tNS1_6detail7variant5void_ESE_SE_SE_SE_SE_SE_SE_SE_SE_SE_EEJS4_S5_S6_S7_S8_S9_SA_SB_SE_SE_SE_SE_SE_SE_SE_SE_SE_SE_SE_EE19deserialize_variantEiP13read_stream_tPSF_ (boost_types.hpp:93)
==18761==    by 0x1C5F088: _ZN20variant_deserializerIL17cluster_version_t4ELi1EN5boost7variantI17batched_replace_t16batched_insert_t13point_write_t14point_delete_t15sindex_create_t13sindex_drop_t15sindex_rename_t6sync_t13dummy_write_tNS1_6detail7variant5void_ESE_SE_SE_SE_SE_SE_SE_SE_SE_SE_EEJS3_S4_S5_S6_S7_S8_S9_SA_SB_SE_SE_SE_SE_SE_SE_SE_SE_SE_SE_SE_EE19deserialize_variantEiP13read_stream_tPSF_ (boost_types.hpp:99)
==18761==    by 0x1C57C93: archive_result_t deserialize<(cluster_version_t)4, batched_replace_t, batched_insert_t, point_write_t, point_delete_t, sindex_create_t, sindex_drop_t, sindex_rename_t, sync_t, dummy_write_t, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>(read_stream_t*, boost::variant<batched_replace_t, batched_insert_t, point_write_t, point_delete_t, sindex_create_t, sindex_drop_t, sindex_rename_t, sync_t, dummy_write_t, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>*) (boost_types.hpp:117)
==18761==    by 0x1C4F9B6: archive_result_t deserialize<(cluster_version_t)4>(read_stream_t*, write_t*) (protocol.cc:1298)
==18761==    by 0x19EB616: archive_result_t deserialize<(cluster_version_t)4>(read_stream_t*, master_business_card_t::write_request_t*) (metadata.cc:14)

The number of lost bytes seems to grow the longer I'm running the inserts (this was after just ~4500 inserts).
I don't get this on v1.15.x, so it's likely a regression.
I do however get this even with Boost 1.55.0 (at least on Linux), so I'm not sure how this relates to the Boost version.

@marchon
Copy link
Author

marchon commented Feb 5, 2015

If you are doing a dimple memory allocation or object creation in boost -
write a quick 10 line sample that you can compare against 2 versions of
boost so that you can have a sample test case. If they behave differently,
throw the ball at the makers of the boost library with the quick sample.

If they do behave the same - then the problem will be in your code.

Hope that helps.

G.

On Thu, Feb 5, 2015 at 1:30 AM, Daniel Mewes notifications@github.com
wrote:

Valgrind:

==18761== 1,064,825 bytes in 4,600 blocks are definitely lost in loss record 18 of 18
==18761== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18761== by 0x167A714: rmalloc(unsigned long) (utils.cc:235)
==18761== by 0x156B518: shared_buf_t::create(unsigned long) (shared_buffer.cc:11)
==18761== by 0x1C2EB34: ql::datum_deserialize(read_stream_t_, ql::datum_t_) (serialize_datum.cc:694)
==18761== by 0x18A292C: archive_result_t ql::deserialize<(cluster_version_t)4>(read_stream_t_, ql::datum_t_) (serialize_datum.hpp:84)
==18761== by 0x1A994E5: archive_result_t std::deserialize<(cluster_version_t)4, ql::datum_t>(read_stream_t_, std::vector<ql::datum_t, std::allocatorql::datum_t >) (stl_types.hpp:174)
==18761== by 0x1C4F366: archive_result_t deserialize<(cluster_version_t)4>(read_stream_t
, batched_insert_t_) (protocol.cc:1283)
==18761== by 0x1C5F12F: ZN20variant_deserializerIL17cluster_version_t4ELi2EN5boost7variantI17batched_replace_t16batched_insert_t13point_write_t14point_delete_t15sindex_create_t13sindex_drop_t15sindex_rename_t6sync_t13dummy_write_tNS1_6detail7variant5void_ESE_SE_SE_SE_SE_SE_SE_SE_SE_SE_EEJS4_S5_S6_S7_S8_S9_SA_SB_SE_SE_SE_SE_SE_SE_SE_SE_SE_SE_SE_EE19deserialize_variantEiP13read_stream_tPSF (boost_types.hpp:93)
==18761== by 0x1C5F088: ZN20variant_deserializerIL17cluster_version_t4ELi1EN5boost7variantI17batched_replace_t16batched_insert_t13point_write_t14point_delete_t15sindex_create_t13sindex_drop_t15sindex_rename_t6sync_t13dummy_write_tNS1_6detail7variant5void_ESE_SE_SE_SE_SE_SE_SE_SE_SE_SE_EEJS3_S4_S5_S6_S7_S8_S9_SA_SB_SE_SE_SE_SE_SE_SE_SE_SE_SE_SE_SE_EE19deserialize_variantEiP13read_stream_tPSF (boost_types.hpp:99)
==18761== by 0x1C57C93: archive_result_t deserialize<(cluster_version_t)4, batched_replace_t, batched_insert_t, point_write_t, point_delete_t, sindex_create_t, sindex_drop_t, sindex_rename_t, sync_t, dummy_write_t, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>(read_stream_t_, boost::variant<batched_replace_t, batched_insert_t, point_write_t, point_delete_t, sindex_create_t, sindex_drop_t, sindex_rename_t, sync_t, dummy_write_t, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::deta
il::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>) (boost_types.hpp:117)
==18761== by 0x1C4F9B6: archive_result_t deserialize<(cluster_version_t)4>(read_stream_t
, write_t_) (protocol.cc:1298)
==18761== by 0x19EB616: archive_result_t deserialize<(cluster_version_t)4>(read_stream_t_, master_business_card_t::write_request_t_) (metadata.cc:14)

The number of lost bytes seems to grow the longer I'm running the inserts
(this was after just ~4500 inserts).
I don't get this on v1.15.x, so it's likely a regression.
I do however get this even with Boost 1.55.0 (at least on Linux), so I'm
not sure how this relates to the Boost version.


Reply to this email directly or view it on GitHub
#3712 (comment)
.

P THINK BEFORE PRINTING: is it really necessary?

This e-mail and its attachments are confidential and solely for the
intended addressee(s). Do not share or use them without approval. If
received in error, contact the sender
and delete them.

@danielmewes
Copy link
Member

@marchon Yeah I doubt it's that easy. We use boost::variant all over the place, but only this one is causing an issue.

I did a git bisect between v1.15.x and v1.16.x and it seems that this commit introduced the memory leak that Valgrind reports:
890814f#diff-cc9504334637dde0a679d9cbf47c3683R345

The parent 0f2b17a was still good.

Also pinging @mlucy.

@marchon
Copy link
Author

marchon commented Feb 5, 2015

debugging memory problems in C++ between your code base and boost is well
beyond my pay-grade I am sorry to admit.

I can do it in c - but without setting up a debugging environment and
finding the line that the code breaks on and then deciphering the
hieroglyphics of it is not my strong suite, I am sorry.
But I am glad that you are on the path to tracking it down. I am glad that
we have brought some light to the subject, and I think your responsiveness
is impressive.

You might get me to switch back yet - if you have time for a quick google
hangout - we can chat about it. and you can tell me if your solution fits
my needs.

G.

On Thu, Feb 5, 2015 at 2:54 AM, Daniel Mewes notifications@github.com
wrote:

@marchon https://github.com/marchon Yeah I doubt it's that easy. We use
boost::variant all over the place, but only this one is causing an issue.

I did a git bisect between v1.15.x and v1.16.x and it seems that this
commit introduced the memory leak that Valgrind reports:
890814f#diff-cc9504334637dde0a679d9cbf47c3683R345
890814f#diff-cc9504334637dde0a679d9cbf47c3683R345

The parent 0f2b17a
0f2b17a
was still good.

Also pinging @mlucy https://github.com/mlucy.


Reply to this email directly or view it on GitHub
#3712 (comment)
.

P THINK BEFORE PRINTING: is it really necessary?

This e-mail and its attachments are confidential and solely for the
intended addressee(s). Do not share or use them without approval. If
received in error, contact the sender
and delete them.

@danielmewes
Copy link
Member

The Valgrind-reported leak was probably a red herring. We traced it down to what looks like a Clang bug in at least Clang 3.2. We don't know yet whether this specific bug even affects the OS X build, since we're using a more recent version there.

In any case after working around the leak that Valgrind reported the memory consumption is still way higher with Boost 1.56.0 than with 1.55.0, both on Linux and on OS X. Valgrind doesn't report any leaks, so we're mostly back to square one.

@danielmewes
Copy link
Member

Found the problem.
We are moving something into a boost::optional. However we later rely on the value still being there in its original place. So using the move here is a bug.
It seems that older versions of Boost didn't support/use the move constructor for boost::optional, and newer versions do. So the bug was hidden in older versions of Boost.

As a consequence of the move, we were never evicting pages from the cache, thereby filling up the memory.

Going to put a fix up soon, so we can roll out a point release next week.

@danielmewes
Copy link
Member

The fix is implemented in branch daniel_3712_2 and currently in code review 2582.

@danielmewes danielmewes self-assigned this Feb 7, 2015
@marchon
Copy link
Author

marchon commented Feb 7, 2015

congratulations guys - quick fix

G

On Fri, Feb 6, 2015 at 11:33 PM, Daniel Mewes notifications@github.com
wrote:

The fix is implemented in branch daniel_3712_2 and currently in code
review 2582.


Reply to this email directly or view it on GitHub
#3712 (comment)
.

P THINK BEFORE PRINTING: is it really necessary?

This e-mail and its attachments are confidential and solely for the
intended addressee(s). Do not share or use them without approval. If
received in error, contact the sender
and delete them.

@danielmewes
Copy link
Member

Thanks @marchon , your original testing and script helped a lot.

@danielmewes
Copy link
Member

Fix merged into v1.16.x 811ecd1 and next c8e393d.

@danielmewes
Copy link
Member

Still have to double-check the Valgrind reported leak with newer Clang.

@danielmewes danielmewes reopened this Feb 9, 2015
@danielmewes
Copy link
Member

Just tested with CLANG 3.5 and the Clang-specific leak is gone.

@AtnNn AtnNn modified the milestones: 1.16.x, 1.16.1 Feb 11, 2015
@danielmewes
Copy link
Member

@marchon we just shipped RethinkDB 1.16.1 with the fix for this problem. You can download it through the usual channels http://rethinkdb.com/docs/install/osx/
Sorry again you had to run into this and thank you for your help in debugging the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants