Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Logging drivers #7195

Closed
crosbymichael opened this issue Jul 23, 2014 · 93 comments
Closed

Proposal: Logging drivers #7195

crosbymichael opened this issue Jul 23, 2014 · 93 comments
Labels
area/logging kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny roadmap

Comments

@crosbymichael
Copy link
Contributor

Improved logging support

Topics:

  • Logging drivers
  • Initial logging drivers
  • Default driver improvements

Logging drivers

The driver interface should be able to support the smallest subset available for logging drivers to
implement their functionality. Stdout and stderr will still be the source of logging for containers
in this proposal. Docker will, however, take the raw streams from the containers and create discrete
messages delimited by writes. This parsed struct will then be sent to the logging drivers.

type Message struct {
    // ContainerID is the container id where the message originated from
    ContainerID string 

    // RawMessage is the raw bytes from the write
    RawMessage []byte 

    // Source specifies where this message originated, stderr, stdout, syslog
    Source string

    // Time is the time the message was received
    Time time.Time

    // Fields are user defined fields attach to the message
    Fields map[string]string
}

type Driver interface {
    // Log begins the logging of the stdout and stderr streams for a specific id
    Log(message *Message) error

    // ReadLog fetches the messages for a specific id
    ReadLog(containerID string) (messages []*Message, err error)

    // CloseLog tells the driver that no more log messages will be written for the specific id
    // drivers can implement this to their requirements, it may mean compressing the logs or deleting
    // them off of the disk
    CloseLog(containerID string) error

    // Close ensures that any writes for the logger are properly flushed and can be
    // stopped without data loss
    Close() error
}

When creating or initializing the drivers they will be provided with a key/value map with the user defined configuration specific to the driver. Each driver will also be provided a root directory where it is able to store and manage any type of state on disk.

Initial logging drivers

none - This driver will ignore the streams and log nothing for the containers. This is a totally valid
driver as the docker daemon has to manage the logs for all container it's a memory and performance bottleneck
on the daemon.

default - This driver will be the current implementation of logs that docker currently has. It is a single
file on disk with json objects with the message, timestamp, and stream of the log message separated by a
new line char.

syslog - This driver will write to a syslog socket and use the tag field to insert the container id.

Default driver improvements

One of the biggest issues with the default driver is that there is no log truncation or rotation. Both of
these issues need to be addressed. We can either truncate based on filesize or date. I believe filesize
is better.

Truncation size can default to 10mb with an option when you select the driver to specify additional options.
Rotation can also be set a specific size limit defaulting to 500mb. To change the defaults I propose a
--logging-opt flag on the daemon, similar to --storage-opt for the storage drivers.

Usage

The usage for this feature will be managed via the daemon:

docker -d --logging none
docker -d --logging default --logging-opt truncation=20mb --logging-opt rotation=1gb
@LK4D4
Copy link
Contributor

LK4D4 commented Jul 23, 2014

What about per container driver choosing? For example I don't want logs for elasticsearch, but I want logs for prosody.

@crosbymichael
Copy link
Contributor Author

A few questions that I have, what should we do about timestamps? Should the read return some type of structured data or should we still manage this as just steams?

Any suggestion and modifications to this proposal is welcome.

@cpuguy83
Copy link
Member

I would still manage them as just streams and allow the implementation to handle it.

@LK4D4
Copy link
Contributor

LK4D4 commented Jul 23, 2014

Hm, actually I think that we have perfect interface for logging - io.Writer :) And for none we have ioutil.Discard.

@brianm
Copy link
Contributor

brianm commented Jul 23, 2014

I encourage being able to attach streams to syslog out of the box, while syslog may not be sexy in 2014, it works everywhere.

@crosbymichael
Copy link
Contributor Author

@brianm would you be interested in working on a syslog driver for this initial push?

@brianm
Copy link
Contributor

brianm commented Jul 23, 2014

Happy to!

@crosbymichael
Copy link
Contributor Author

@brianm sounds good to me. I like to have a few different drives so that it keeps that interface honest and makes sure that we are accounting for different needs within the driver.

I'm guessing for things like syslog we will need to pass options when we create the driver. Maybe something like:

driver, err := syslog.NewDriver("/var/lib/docker/logging/syslog", map[string]string{
    "priority": "1",
    "socket": "/somepath",
})

@jamtur01
Copy link
Contributor

+1 to syslog driver and config.

@crosbymichael
Copy link
Contributor Author

I just added the syslog driver to the proposal

@kuon
Copy link
Contributor

kuon commented Jul 24, 2014

I am currently evaluating a gazillion way of getting my logs to the right place with docker (app directly forward log, agent in the container, agent in another container, agent on the host, syslog, ...) and having docker logs directly to syslog would solve it easily. All apps could just use stdout/err.

As per container configuration, we should be able to give the sender name and the facility.

The other possibility is to turn off logging in docker and use systemd or supervisord to forward the logs of each containers to syslog.

@LK4D4
Copy link
Contributor

LK4D4 commented Jul 24, 2014

If someone miss my first comment:

  • Drivers should be configurable per container
  • Drivers should implement io.Writer, so we have for free null writer and syslog writer from stdlib. This is goish way to do this.

@crosbymichael
Copy link
Contributor Author

I can see the need for this to be per container but configuration will be weird on the daemon running multiple logging drivers.

-1 on the io.Writer, we need to distinguish stdout and stderr in some of the drivers os we cannot use one interface. the steams are already io.Writers coming in so we are still good.

@crosbymichael
Copy link
Contributor Author

I updated the Driver interface to include CloseLog for signaling to the driver that no more logs will be written for a specific id.

@LK4D4
Copy link
Contributor

LK4D4 commented Jul 24, 2014

we need to distinguish stdout and stderr

Yup, and I'm totally want possibility to have different drivers for them.

@markcartertm
Copy link

Should the logging driver support a syslog server option ?
This will make it easier to troubleshoot when containers are dynamically assigned to hosts by an orchestration layer.
docker -d --logging syslog

@cpuguy83
Copy link
Member

@markcartertm Are you questioning the idea of including a syslog driver (which is much discussed above and part of the proposal) or did you not see the discussion?

@solidsnack
Copy link

An implementation that chunked logs by time could be helpful for async log archiving strategies. For example, if logs were stored by minute. I'm not sure how this would interact with the truncation option.

@kuon
Copy link
Contributor

kuon commented Jul 28, 2014

A first step that would allow "usual" logging processing system to work is to make docker logrotate compliant. At present there is no way to make docker re-create the log files after a rotation without restarting the containers. kill -HUP on the docker daemon restarts all the containers.

@crosbymichael
Copy link
Contributor Author

I think the last question here that needs to be answered is, should this be per container or a daemon wide option?

@cpuguy83
Copy link
Member

@crosbymichael People will want per container with a default set at the daemon level.

@kuon
Copy link
Contributor

kuon commented Jul 30, 2014

Syslog configuration could be (in order or priority, from low to high):

  • Daemon level default
  • Image default
  • Container config

The configuration at the image level would obviously not include all options (like where to log) but rather what to log (stderr/stdout/both, include timestamp or not or add formatting).

@crosbymichael
Copy link
Contributor Author

@kuon we cannot do anything at the image level because it makes images lose portability. Things like this should be host specific/ runtime dependencies.

@wking
Copy link

wking commented Jul 30, 2014

On Wed, Jul 30, 2014 at 10:49:07AM -0700, Michael Crosby wrote:

we cannot do anything at the image level because it makes images
lose portability.

Setting defaults at the image level shouldn't compromise portability.
We already do this with other image metadata (e.g. via a Dockerfile's
CMD, ENTRYPOINT, EXPOSE, …).

@cpuguy83
Copy link
Member

@wking all these things you listed are things that happen inside the container.
Log handling happens outside the container.

@crosbymichael
Copy link
Contributor Author

@wking Yes, they do when it's something specific about what type of logging drivers you have installed on a specific docker host. The settings in the image are all portable. This is the reason why VOLUMES /home/michael:/root is not allowed in the image because not ever docker host will have a folder structure with /home/michael

@kuon
Copy link
Contributor

kuon commented Jul 30, 2014

I guess relying on environment variables (like -e LOGLEVEL=INFO and such) is OK for this use case. It was more an idea than a "thought through" proposal.

@randywallace
Copy link

IMVHO I think a syslog plugin for docker is completely pointless. Syslog daemons that run on practically every distribution already support everything that has been discussed here. This comment outlines precisely why I feel this way. This is not meant as a flame, but as an alternative discussion that relieves some pressure off the docker devs and puts more responsibility on the Dockerfile maintainer where in this case I feel (again, IMVHO) it belongs.

I am not saying that the logging doesn't need some work; the piece about handling/rotating the stderr/stdout of the container itself is incredibly useful b/c for long-running containers pushing a lot of logs to those pipes results in the issues previously described regarding disk usage. This will at some point need to be solved, though, to cover the bevy of trusted builds that currently send everything to stderr/stdout.

configuring syslog output within the container

I find that the following options work beautifully (these should obviously be expanded):

  • Most apps themselves provide syslog output via configuration. If one doesn't, it should (and probably can be setup but just isn't documented very well). This is especially true for java apps via slf4j, logback, log4j, etc... etc.. Dockerfiles should modify/ADD correct syslog daemon configuration endpoints. My example is for elasticsearch's logging config (this is for @LK4D4) usually found in config/logging.yaml. The conversionPattern could be mangled by a startup script via sed, etc.. to throw-in the container id, hostname, whatever you want (instead of elasticsearch:, or perhaps nothing if you are fine with just the hostname showing up in the syslog). Here is the relevant snippet (I didn't include the default console appender and level in this example):
rootLogger: INFO, syslog
appender:
  syslog:
    type: syslog
    header: true
    syslogHost: <THE_HOST_SYSLOG_DAEMON>
    Facility: USER
    layout:
      type: pattern
      conversionPattern: "elasticsearch: [%p] %t/%c{1} - %m"
  • A wrapper startup script that exec's out everything to logger. For non-daemonized processes running in a wrapper, this just magically works and handles stderr/stdout appropriately (written as a boilerplate that can be modified to run in a sourced file easily)
#!/bin/bash

if host syslog > /dev/null 2>&1; then HOST_SYSLOG_DAEMON=$(host syslog | head -n 1 | cut -f 4); fi

_enable_syslog=${SYSLOG:-true}
_host_syslog_daemon="${HOST_SYSLOG_DAEMON:-172.17.42.1}" # perhaps loaded by --env/-e; ${VAR:-DEFAULT} notation sets default if ENV variable does not exist
_unique_proc_name="randywallace/test_syslog_image"
_facility='local0' # or local1 thru local7, cron, user, etc...
_syslog_and_stdout_stderr=${TEE_OUTPUT:-false} # true/false; also could be a --env

# docker run -e TEE_OUTPUT=true -e HOST_SYSLOG_DAEMON=1.2.3.4 -d my_image
# or
# docker run --link my-rsyslog-container:syslog -d my_image
# or disabled completely
# docker run -e SYSLOG=false -d my_image

__logger() {
  local LEVEL=${1:-info}
  sed -u -r -e 's/\\n/ /g' -e 's/\s\-{3,}/;/g' -e 's/\-{3,}\s//g' |\
  /usr/bin/logger -p ${_facility}.${LEVEL}  -t "${_unique_proc_name}[$$]" -n "${_host_syslog_daemon}"
}

run_logger() {
  if $_enable_syslog; then
    if $_syslog_and_stdout_stderr; then
      tee -a >(__logger $1)
    else
      __logger $1
    fi
  else
    if [ "$1" = "err" ]; then
      cat >&2
    else
      cat
    fi
  fi
}

# Catch all STDOUT and STDERR traffic
exec > >(run_logger info) 2> >(run_logger err)

log() { echo "INFO: $*" | run_logger ; }
error() { echo "ERROR: $*" | run_logger err ; }
critical() { echo "EMERGENCY: $*" | run_logger emerg; exit 1 ; }
alert() { echo "ALERT: $*" | run_logger alert ; }
notice() { echo "NOTICE: $*" | run_logger notice ; }
debug() { echo "DEBUG: $*" | run_logger debug ; }
warning() { echo "WARN: $*" | run_logger warn ; }

log "info"
error "error"
alert "alert"
notice "notice"
debug "debug"
warning "warning"

echo "STDOUT output"
echo "STDERR output" >&2

critical "critical... exiting"

identifying the host to receive syslog traffic

  • use an ENV setting in the dockerfile (see wrapper example above) to indicate your preferred default syslog host. Or, for public Dockerfiles, use a default config (SYSLOG=false in the wrapper above) that is caught at startup to disable syslog output.
  • use a docker container with a volume on /var/log to the host (perhaps on /var/log/docker/syslog/ at the host) and a syslog daemon (I use rsyslog personally). Then EXPOSE the syslog port (514) and link that container to your other containers and specify that link alias in your wrapper (no need to specify the dynamic IP b/c it shows up in /etc/hosts, an example is given in the wrapper that uses 'syslog' for the link alias).
  • Use the actual host daemon, if there is one (boot2docker does not have a syslog daemon, so I use a container and volume). This defaults to 172.17.42.1 unless docker is configured differently, but I don't ever need to change that so I set this IP statically. It would be nice if the docker0 Bridge Gateway IP was configured in /etc/hosts on the containers so that I could specify that in cases in which I may need to change the bridge subnet or something. It may already be there, food for thought.

Profit

  • The hostname of the container shows up in the syslog in all cases. Why not set this when you run the container to something useful? If you're forwarding logs from syslog to logstash/splunk/etc... The IP of the forwarding syslog server will show up, so you can always identify where container X came from.
  • The syslog daemon does not have to exist on the same host as the container. Why fight with tailing /var/log/syslog on 10 docker hosts if you could do it on one?
  • You can use syslog daemon configs to do whatever you want with that stuff getting thrown at it to include

Conclusion

Solving the problem of logging is not a new one, and I seriously doubt that docker could create enough plugins, command line options, etc... to satisfy everybody. This is why rsyslog, syslog-ng, syslogd, papertrail, logstash, graylog2, splunk, fluentd, etc... exist. We've already seen this battle start here, and I don't want to be around when the smoke clears. I hope what I've said here, though, may help some of you to come up with your own solutions that could be working today!

And, if you have problems with the container's logs getting too full (those that are generated from stderr/stdout), don't send them there at all and use my example wrapper above to get rid of that problem completely!

@kuon
Copy link
Contributor

kuon commented Aug 30, 2014

For what's worth, I am now using systemd to launch containers and forward logs to syslog, in addition with logrotate in copytruncate mode. This works fine..

I am not arguing in one way or the other, just saying that this setup works today and give per container configuration option.

@LK4D4
Copy link
Contributor

LK4D4 commented Aug 30, 2014

@kuon This is good way, but docker internal mechanism of writing logs to stdout is not perfect. So if you write long lines or you have high flow of logs - you will get huge memory and CPU overhead just for writing logs to container stdout. So having native syslog support will be great anyway.

@cpuguy83
Copy link
Member

@randywallace The point is to do something with the collected logs that we already have in Docker. Instead of forcing people to implement something on their own, Docker can provide the facility to do it without having to hack stuff around (like running a syslog daemon inside your container).

@iainlowe
Copy link

iainlowe commented Sep 9, 2014

I would like to echo the sentiment of @brianm and @zepouet and maybe suggest that there are really two discussions here:

The first one is, frankly, up to @crosbymichael and co. and it concerns how Docker handles an issue with the current implementation of logging. I think the guys are trying hard to come up with forward-facing solutions that will provide paths to new features, and I applaud that effort.

The second discussion, however, is being alluded to in previous comments; that is: is it appropriate for Docker to dictate a "logging framework"? No matter how many drivers "we" add, no matter how many options and config files, the tacit assumption becomes "everybody does logging like (or a subset of how) Docker does it".

The UNIX way is to do one thing and do it well. In the case of logging, this means let syslog do the work; for rotating logs we have logrotate, etc.

I think it would be better to have more intelligent ways to handle mounting and cross-mnt namespace access so that solutions described above like mounting /dev/log actually work. In the real world, I can't just drop all my YetAnotherSyslog code because I want to containerize something and I don't want to be a second-class citizen just because of that.

@randywallace provides an example of how easy it is to setup logging already using existing tools. I just think we're not thinking outside the box on how to provide a generally useful solution that also handles this case.

All of this is without mentioning the performance issues in containers at high load if the Docker daemon has to handle each packet. In high traffic situations, this is an absolute non-starter. We need to have access to kernel primitives at this point and a multi-layered userspace logging solution is going to force me to disable it and cobble my own each time.

As I said above, the more immediate decision is important for handling issues with log management. Of course, that should be solved. But I would hate to feel that Docker as an organization was wasting money and time building stuff I already have. The featureset so far is so out-of-the-box (both in terms of innovation and usability) that I really hate to see such a mundane and already-solved concern become the responsibility of the docker daemon.

To head off and forestall any other comments to the effect that "wouldn't it be nice if docker managed your logfiles" let me say that yes, it would. I would also like a built-in webserver so that I can launch my Node.js apps. I just don't feel that beyond the scope of improving the current stdout/stderr system this path leads anywhere but to having a whole group of people disabling docker log management and bending over backwards to use something else.

Of course, this all should be taken in the context of an assumption that the goal is to have advanced container technology that does it's thing and otherwise let's you do what you want (ie. maintaining neutrality). If docker is becoming a bit more of an "app hosting platform" where you can fully customize the inside and all the plumbing is handled for you, then this is definitely the way to go.

In case code speaks louder than words, I'm willing to work on a PR with a sketch of something if I get at least two people who will read it.

@frank-dspeed
Copy link

I reviewed it all i think the only thing that needs to happen is that the log files get rotate able all else will break my existing setups probally and if i need logs from a process i can gather them via the dockerhost on os level via tools i don't need docker to handle that

@discordianfish
Copy link
Contributor

My 2 cents: It should be easy to get some default logging (aka something without that huge memory and CPU overhead) and possible (without unnecessary overhead) to integrate your own logging as @randywallace suggested, all that as lean as possible. Therefor I wouldn't try to interprete the logstream and just implement the bare minimal features for the default logging (truncated, rotate and maybe some 'tailing' to get only the last x lines).

@kuon
Copy link
Contributor

kuon commented Sep 15, 2014

@frank-dspeed You can rotate docker logs with logrotate using copytruncate, see #7333

@tve
Copy link

tve commented Sep 25, 2014

After reading all the comment I still feel that docker needs to do something to simplify logging. Some comments mentioned using -v /dev/log:/dev/log but that apparently doesn't really work because the link gets broken if syslog is restarted because it creates a fresh /dev/log leaving all running containers logging into a dead pipe. One can work around that by moving /dev/log to a directory of its own, such as /tmp/syslog/log and doing -v /mnt/syslog:/dev (suggested in http://jpetazzo.github.io/2014/08/24/syslog-docker/), but now all containers share /dev.

Suggestions made by @randywallace don't help me at all unless I'm missing something, many apps don't have the capability to log to a remote syslog, they expect a local syslog device.

@mhart
Copy link

mhart commented Oct 8, 2014

Any news on this?

What @tve and others have said is very important for anyone using -v /dev/log:/dev/log – the link does indeed get broken if syslog restarts:

$ docker run -d -v /dev/log:/dev/log ubuntu sh -c 'while true; do logger hello; sleep 5; done'

$ tail -f /var/log/syslog
2014-10-08T04:04:17.009793+00:00 notice logger: hello
2014-10-08T04:04:22.014052+00:00 notice logger: hello
2014-10-08T04:04:27.018377+00:00 notice logger: hello
^C

$ sudo restart rsyslog

$ tail -f /var/log/syslog
... no new logs from docker container

Which makes this a very brittle solution...

@afolarin
Copy link

Is there a reason I shouldn't just (as of v1.3) if I want to inspect a container by container, log file by log file ?

$ docker exec -it my-container cat /path/to/my.log

@randywallace
Copy link

@mhart You shouldn't need to mount the log device. If your syslog daemon on the host is listening on the gateway of the docker bridge (172.17.42.1 by default), this should work just fine, even across host syslog daemon restarts:

docker run -d --name logger_test ubuntu:raring /bin/bash -c 'while true; do /usr/bin/logger -n $(grep default < <(ip route) | grep -Eo "([0-9]{1,3}[\.]){3}[0-9]{1,3}") -p user.info -P 514 -i -t logger_test -u /tmp/unused hello; sleep 1; done'

@lennartkoopmann
Copy link

Graylog2 developer here. Not having much practical experience with Docker yet so I can't go much into Docker configuration specifics but I thought I'd join and leave a few comments based on my experience from building a logging system in the last 4-5 years:

Keep it as simple and hand off the logging to for example the local syslog subsystem as soon as possible. Tools like rsyslog or syslog-ng have spent enormous amounts of time to let the user flexibly configure stuff. Simple tasks like choosing the facility are easy to build but you can spent a lot of time implementing different TCP syslog framing methods for example. Do not try to build anything yourself that the popular syslog daemons are doing already. They should be available on basically every platform that Docker runs on.

If you want to ship Docker with log management capabilities like basic search, archiving or live tailing then use a log management system that already exists. Graylog2 for example has REST APIs that can be built upon while all the data management is abstracted. Even implementing something like log rotation yourself can go wrong in many different ways and cause OS compatibility nightmares.

You will also think about avoiding a Docker log silo that only contains Docker logs. You need to have all your logs (network hardware, OS, applications) in one place for proper correlation,

Key=Value pairs are a good way to structure data. There is also GELF, which Graylog2, Logstash, fluentd and nxlog speak. Structured syslog as defined in RFC5424 is probably the most compliant approach but could cause issues with maximum message length.

Just my suggestions. :)

@maximkulkin
Copy link
Contributor

Hey guys, any volunteers to try #9513 patch and provide some feedback?

@crosbymichael
Copy link
Contributor Author

I think @LK4D4 said my current proposal here is a little too complex and he is looking into something much simpler.

@LK4D4
Copy link
Contributor

LK4D4 commented Jan 30, 2015

@crosbymichael Not very much simpler, but yes :) I'll prepare "proposal with code"

@LK4D4 LK4D4 mentioned this issue Feb 4, 2015
@jessfraz jessfraz added the kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny label Feb 26, 2015
@icecrime
Copy link
Contributor

I think this is closed by #10568.

@varshneyjayant
Copy link

Can we configure it to send applications logs to rsyslog of host machine?

@thom4parisot
Copy link

@psquickitjayant by using --log-driver=syslog :-) cf. https://docs.docker.com/reference/run/#logging-drivers-log-driver

@varshneyjayant
Copy link

@oncletom Thanks for the information. According to my understanding, we send stdout / stderr logs to host syslog. Possible to send logs from applications running inside container like Apache, cron, Nginx etc.?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logging kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny roadmap
Projects
None yet
Development

No branches or pull requests