Python decides for certificate validation
LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing |
Python offers library functions for establishing secure HTTPS connections between a client and server, but few users are aware that those routines suffer from what some would deem a fatal flaw. By default, Python does not check the validity of the SSL/TLS certificate presented by remote servers, which leaves users vulnerable to man-in-the-middle attacks. The project recently decided to correct this shortcoming, although there was considerable disagreement about whether doing so would break existing applications—and, if it does, whether or not such breakage is acceptable in light of the potential security threat.
On August 29, Alex Gaynor posted Python Enhancement Proposal (PEP) 476 to the Python development list. Currently, he explained, the standard Python library does not actually check that the server in an HTTPS connection has an SSL/TLS certificate that is signed by a certificate authority (CA) in any trust root (such as the operating system's certificate database), and it does not check that the Common Name on the certificate matches the server name. The result, of course, is that all programs using the standard library are vulnerable to man-in-the-middle attacks—and, moreover, users are not made aware of this vulnerability. The application developer may reasonably expect that if the SSL/TLS connection is established, then all of the proper steps were taken during the handshake and setup stages. If the user notices the connection at all, he or she, understandably, might also assume it was established securely.
The fix proposed by Gaynor in PEP 476 is straightforward. Python would attempt to verify the certificate presented by the server by querying the system's certificate database. Failure to locate the database would be handled by raising an exception. For situations where an unverified certificate should be trusted (e.g., a self-signed certificate, which would not be signed by a certificate in the system's database), developers (or users) would be able to manually modify their application to accept the certificate.
Earlier proposals had suggested bundling a certificate database into Python (specifically, Mozilla's). But relying on the system's database removes the necessity for Python maintainers to keep their copy of the database up to date, and it simplifies matters for corporate Python deployments that include internal CAs in their system databases.
Gaynor's original proposal suggested making the fix in both Python 3 and Python 2, although the specifics of that part of the plan later became a far more involved debate.
Defaults
The list participants were overwhelmingly in support of PEP 476, with one small caveat: the original PEP wording suggests that the change should be enacted immediately, changing the default behavior in the next Python release. Marc-Andre Lemberg, among others, pointed out that a transition plan would be wiser than introducing a backward-incompatible change without warning. He suggested adding certificate-validation failures as a warning in 3.5, then making them an exception in 3.6. In addition, he suggested making it possible to turn the validation behavior off with a command-line switch or environment variable, and perhaps making it possible to pass in certificates not in the system's trust store from the command line.
The primary justification for allowing such workarounds is that real-world Python applications tend not to be deployed in pristine network conditions: corporate intranets, embedded devices, and third-party content-delivery networks all routinely sport problematic security settings. Users may not be able to modify the Python code in question, nor to add certificates to the operating system's trust store. As Nick Coghlan observed, corporate intranets may be rife with internal servers running HTTPS because of company-wide edicts that have little connection to the intranet's real security needs.
In addition, Python developers have a real need to test their code with self-signed certificates at times. Providing a workaround allows the developer to test the application in more normal conditions—in essence, not triggering exceptions caused by the network environment. But in such cases, installing a self-signed certificate into the operating system's trust store is rarely the ideal approach.
Few parties in the discussion, however, thought that allowing validation checks to be bypassed by a command-line switch or environment variable was a safe approach. There are just too many ways such a feature could be exploited, not to mention the temptation that would exist for developers to misuse the switch and write code that depends on it.
Christian Heimes suggested one possible solution: making site-wide configuration for certificate validation possible through an "sslcustomize" module akin to the existing sitecustomize. It would allow corporate system administrators to properly configure machines for the vagaries of the intranet, as well as allow individual users to add a personal certificate store to the validation process (without necessarily modifying the platform's built-in certificate database). Heimes's plan seemed reasonable to others, although Coghlan eventually decided that it deserved to be written up as a separate PEP, since many of its potential uses are orthogonal to the specifics of PEP 476.
When to throw the switch
Implementing a flexible method for users to tweak SSL behavior to
fit the peculiarities of their network answers many of the questions
about how Python should transition to a validate-by-default
stance, but it does not address when such a change should be
rolled out. Donald Stufft argued in
favor of moving the change up from the 3.5 release to the next update
for 3.4, noting that "otherwise it’s going to be 2.5+ years
until we stop being unsafe by default.
"
Specifically, Stufft suggested introducing certificate-validation warnings in Python 3.4.2, and changing the default behavior (thus throwing exceptions for validation failures) in 3.5. Not everyone was on board with that time frame; Glyph Lefkowitz argued against the need for a one-cycle warning period, saying that Twisted implemented a similar fix in 14.0, without warning, and had literally received no complaints from its users.
At the crux of the timing question, of course, is determining what Python's responsibility is when users' code stops functioning because of an untrusted SSL/TLS certificate. The responsibility is crystal-clear when the untrusted certificate is actually a man-in-the-middle attack, of course: users need to be alerted to the security threat without delay. But things are more nebulous, again, when the corporate intranet deployment is considered.
The consensus that emerged was that Heimes's sslcustomize solution is viable for Python 3 users, but it soon became apparent that there was a push to backport the fix to Python 2 as well. As Coghlan pointed out, Python 2's status as a maintenance release mandates an even better migration strategy: introducing a break in backward-compatibility is a far worse move in an maintenance branch (particularly one with a large installed base).
Lefkowitz again advocated an
immediate switch-over, saying "this is not a break in backwards
compatibility, it's a bug fix. Yes, systems might break, but that
breakage represents an increase in security which may well be
operationally important.
" Antoine Pitrou disagreed with that notion, responding: "saying it
doesn't make it magically true. Besides, it can perfectly well be a
bug fix *as well as* a break in backwards compatibility. Which is why
we sometimes choose to fix bugs only in the feature development
branch.
"
There were, of course, several parties in each camp, which led to a rather lengthy discussion. Ultimately, though, the backport camp managed to convince Guido van Rossum, effectively ending the debate. Van Rossum decided that certificate validation by default should be backported to the next Python 2.7 release:
Heimes pointed out that the next 2.7 release (which would be numbered 2.7.8) was not a viable option, since there would not be any way to support a workaround for self-signed and intranet certificates. A fix for that issue is underway, but it will not land until Python 2.7.9. Van Rossum agreed, but reiterated his decision that the validate-by-default behavior should be ported to the next possible Python 2.7 update—that update will simply be one or two releases later.
As it stands now, then, the 3.4.2 release will issue warnings when an SSL/TLS certificate does not validate. In addition, 3.4.2 will add a simple workaround (described by Coghlan) for code needing to bypass certificate validation: urllib.request.urlopen() will support a new "SSL context" parameter with which developers can opt-in to the old behavior on a per-call basis. The exact details are still under discussion, naturally, but such a bypass operation might look like:
urllib.request.urlopen(context=ssl._create_unverified_context())
as one example described it. An officially recommended monkeypatching fix has also been discussed, which would be used to tweak application behavior. 3.5 will introduce the more full-featured sslcustomize module. "SSL Context" will also be added to Python 2.7.9 and, assuming that goes well, certificate validation could be switched on by default as early as 2.7.10.
In all likelihood, there will be users (and perhaps developers)
whose first encounter with any of this work will be when they have a
previously working Python application unexpectedly fail with a
scary-looking exception about untrusted security certificates. Some of
those exceptions will be fixable with workarounds, albeit
frustrating ones. But the point of the whole endeavor is that there
will be other exceptions, too: users who have been exposed to bad or
even malicious SSL/TLS servers and simply never heard about it in the past.
Index entries for this article | |
---|---|
Security | Python |
Security | Secure Sockets Layer (SSL) |
(Log in to post comments)
Python decides for certificate validation
Posted Sep 11, 2014 5:35 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]
Python decides for certificate validation
Posted Sep 11, 2014 16:52 UTC (Thu) by josh (subscriber, #17465) [Link]
Python decides for certificate validation
Posted Sep 11, 2014 14:43 UTC (Thu) by mathstuf (subscriber, #69389) [Link]
Python decides for certificate validation
Posted Sep 11, 2014 17:41 UTC (Thu) by leoluk (guest, #97665) [Link]
http://docs.python-requests.org/en/latest/user/advanced/#...
Nice to hear that they'll implement verification in the standard library.
Python decides for certificate validation
Posted Sep 12, 2014 15:11 UTC (Fri) by DavidS (guest, #84675) [Link]
In the same time I spent checking how I could (not) solve my TLS use case with urllib I had it already implemented and running with requests. Never looked back.
small correction
Posted Sep 11, 2014 21:08 UTC (Thu) by gutworth (guest, #96497) [Link]
Python decides for certificate validation
Posted Sep 18, 2014 6:18 UTC (Thu) by thedevil (guest, #32913) [Link]
I am thinking of smtplib.SMTP_SSL and imaplib.IMAP4_SSL in particular.
Or have we really reached the point where "networking" == "HTTP"? If
so, I quit.
Python decides for certificate validation
Posted Sep 18, 2014 17:47 UTC (Thu) by raven667 (subscriber, #5198) [Link]
Python decides for certificate validation
Posted Sep 18, 2014 23:43 UTC (Thu) by ssokolow (guest, #94568) [Link]
It's bad enough that many of them break cross-domain web fonts by being so hyper-paranoid about HTTP headers that they strip CORS headers.
Python decides for certificate validation
Posted Sep 19, 2014 3:35 UTC (Fri) by mathstuf (subscriber, #69389) [Link]
Python decides for certificate validation
Posted Sep 19, 2014 3:42 UTC (Fri) by thedevil (guest, #32913) [Link]
does mean I have to do it again for every new one. One of these days
I'll follow your example :-P
Python decides for certificate validation
Posted Sep 22, 2014 3:42 UTC (Mon) by ssokolow (guest, #94568) [Link]
Some sites do REALLY stupid things. The worst being Scribd, which prevents downloading documents without creating an account by applying a substitution cipher to the text beyond the Google-visible preview snippet and then using a dynamically generated web font to unscramble the glyph mappings for display.
Python decides for certificate validation
Posted Sep 22, 2014 3:47 UTC (Mon) by mathstuf (subscriber, #69389) [Link]
I've gotten used to sites just barfing text everywhere without their CSS's CDN access, so garbled icons isn't that bad.
Python decides for certificate validation
Posted Sep 22, 2014 10:51 UTC (Mon) by ssokolow (guest, #94568) [Link]
(Maybe it's just that RequestPolicy is a hair too far on the side of hassle for me so I stick to stuff like RefControl and NoScript with some custom ABE rules to block 3rd-party requests on sites I actually visit like Twitter.)
Python decides for certificate validation
Posted Sep 19, 2014 3:36 UTC (Fri) by thedevil (guest, #32913) [Link]
tone for which I apologize; my point was serious. I wasn't asking about
new protocols but about existing code using protocols like IMAP and SMTP
over SSL. That code will break with the new change just like HTTP will,
but I see no mention of a fix equivalent to urlopen(context=...) for
them. So what's the plan, just abandon that code or force it to be
rewritten in Python 3 (where the new sslcustomize module is or will be
available) ?
Python decides for certificate validation
Posted Sep 21, 2014 11:59 UTC (Sun) by intgr (subscriber, #39733) [Link]
I assume those protocols will continue to ignore certificate validation like in previous versions. Why do you think they will break?
Python decides for certificate validation
Posted Sep 23, 2014 17:19 UTC (Tue) by thedevil (guest, #32913) [Link]
default) is in the underlying SSL code, on which all the individual
protocol modules depend. Was this a misunderstanding?
Python decides for certificate validation
Posted Sep 19, 2014 2:33 UTC (Fri) by mathstuf (subscriber, #69389) [Link]
[1]Ignore the Python3 problems though, please.