New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed #24604 -- Added JSONField to contrib.postgres. #4427
Conversation
JSONField.register_lookup(lookups.DataContains) | ||
JSONField.register_lookup(lookups.ContainedBy) | ||
JSONField.register_lookup(lookups.HasKey) | ||
JSONField.register_lookup(lookups.HasKeys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also another operator related to HasKeys: ?|, which means has at least one of the keys.
I guess that could be inferred from the type of the lookup argument: if it's a list/tuple, then use the ?| operator, else use the ? operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, it also applies to hstore where we don't implement it. We might as well for completeness I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, override has_key
, or has_any_key
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think has_any_key
or has_some_keys
are appropriate. any
probably makes most sense as it mimics the any()
python builtin.
One of the things that I have had many feature requests for (and implemented in both JSON fields I've built) is the ability to customise encoding and decoding. I've done this in the most recent one by allowing The related things I have been playing with related to this are sane defaults for these. My gut feeling is that we should use I'm not sure it will be possible to do this per field though: http://initd.org/psycopg/docs/extras.html#json-adaptation It seems that overriding the dumping function might be possible, but the loading function is global. |
Aha. As of psycopg2 2.5.4, JSONB is automatically registered (and converted by psycopg2) into dict/list structures. Is this something we want to override, and allow the flexibility of parsing per-field? I don't see that it will be a performance hit, as it will be basically the same code, but just running in the context of the django field, rather than the psycogp2 layer. It will also mean that the JSON field could be used (without the lookup functionality) for other database backends. |
Hmm. It seems that the I'm not sure if there is any nice way to handle this. |
At present, I'd just let psycopg2 do it's thing. The use of We can handle the dumping on a field by field basis quite easily as we wrap the objects in a psycopg2 json object which can take a dumping function, but it's more difficult to customise the loading on a field by field basis as this is registered in psycopg2 against the field type, not the column specifically. I'm not familiar enough with the internals of psycopg2 to know how this works. It may be worth investigating the performance difference. If it's insignificant we're ok. |
I looked a bit more into implementing something with I think there is a possible solution to override the encoders on a field by field basis, but I am concerned about the performance implications. Django+postgres is almost completely free of "converter functions", which are a known performance hit (due to a second loop through all the rows after psycopg2). That said, loading large json blobs for a large number of rows will have a performance hit no matter how you look at it, and I think that using Nonetheless, I feel this whole issue can be delayed for future consideration. We should get a basic functioning version in and then we can add improvements like this later. |
I've been thinking about this too (whilst playing with CompositeField stuff, which is only related in the sense that it deals with conversion of data). It's always going to be problematic converting data back into whatever it was, as things like datetime are indistinguishable from a string that happens to be an ISO8601 string, for instance. Perhaps a better solution is to allow passing in a "converter" as needed, that will be applied in the new I think there's still value in allowing an override of the encoder kwargs, as that can be done on a per-field basis, as it's still done by django. |
Given the difficulty of decoding, I'd be inclined to suggest that someone who wants to do it can subclass If this becomes a common request we could easily add a utility to deal with it, personally right now I'm not completely sure it's a Good Idea ™️ |
Yes, that was my other approach. Perhaps a documented example of how to do that, too. |
Docs added now, should be ready for review. I'll squash together after integrating feedback from a detailed review. |
def to_python(self, value): | ||
try: | ||
return json.loads(value) | ||
except ValueError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is loading user-supplied data, this can be made to fairly easily cause other kinds of errors which would cause a different kind of error to propagate upwards. For example:
>>> s = '['*1000 + ']'*1000
>>> json.loads(s)
...
RuntimeError: maximum recursion depth exceeded while decoding a JSON array from a byte string
There are probably other kinds of sneaky things we can do here, too. There's an argument to be made that this is OK, that a server crash is the right thing to do here, but I thought I'd point it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is handling user supplied data. I'd argue a 500 isn't the worst thing here as it's recieving weird data designed to break the site (or just weird data!). If there was a general security issue here then anyone using json.loads(request.body)
would have a problem. It may be we want to be more careful, but IANASE (I am not a security expert)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we might want to think about, using json.loads()
before actually saving the object to the database. Otherwise malicious data could prevent your page to show up if the item is loaded as part of an evaluated queryset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that seems reasonable. I'm wondering now if json.loads
is really safe for user-supplied data -- in theory it ought to be, but see the history of JSON vulnerabilities in Ruby for how "theory" and "practice" differ. This is probably something we should think about further, and maybe test -- this could be an RCE waiting to happen.
Maybe we should include some sort of security note in the docs about being careful about feeding raw JSON directly to a JSON field?
/cc @PaulMcMillan any thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a genuine security concern, then we shouldn't ship an actual JSONField
at all, but then we make ourselves vulnerable to someone not checking user data at all and we get into the problem Markus has suggested.
Markus - the form layer would normally do this kind of validation I think, which is what's going on here. The model layer will receive the decoded JSON object to save to the database always. (If it got a string, this could just be a valid "JSON string" i.e. a string literal).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a genuine security concern
I don't know that it is; it just triggered my spidey-sense. Again in theory this should be OK, since JSON is designed to be safe in this way (unlike, say, YAML). I'm doing some digging right now, but having a look at the Ruby vulns I mentioned, they're usually related to "extra things" that various folks added on top, not the core JSON functionality.
I don't think this should block merge, but it is something I want to investigate further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we only be attempting to json.loads(value)
if value is a string object?
From https://docs.djangoproject.com/en/1.8/howto/custom-model-fields/#converting-values-to-python-objects:
As a general rule, to_python() should deal gracefully with any of the following arguments:
- An instance of the correct type (e.g., Hand in our ongoing example).
- A string
*None (if the field allows null=True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the to_python
of the form field, not the model field. It would under normal circumstances always receive a string. We do need to handle empty values though, I'll add that.
Could you add a mention the 1.9 release notes (contrib.postgres section)? I only had minor edits to the docs: http://dpaste.com/21RWTQ3 When I submit an invalid value in the admin, the value that's displayed on the subsequent form rendering is escaped with backslashes: I also updated Jenkins to PG 9.4. |
What is the status of this PR? Has it been abandoned? #24604 says it needs improvement. |
Any to allow this field to work with non Postgres DBs where it uses a text field when the the DB doesn't support JSONB? This would be along the lines of the UUID that uses the native type where available. |
Needs some tweaks, have been busy sorry. Will definitely get done by end of DjangoConEU. As for non-PG dbs, all you'd get is the ability to store a JSON string and load it again, no querying operations would be practically fast to be useful, and would require extensive SQL scripting in each database. Consequently, you may as well do this yourself as a TextField (and multiple third party solutions do this). |
Even though the querying operations would be slow / non-existent, having the ability to quickly run unit tests using SQLite on my local dev box is still helpful. |
@mjtamlyn No worries! Was just discussing an internal implementation of the same with a coworker and came across this PR. |
eb83d78
to
b32cf8e
Compare
38eb618
to
aa009ec
Compare
Merged in 33ea472 |
TODO:
This adds a JSONField to django.contrib.postgres using the underlying jsonb data type available on Postgres 9.4+. We do not support the json data type due to the fact it lacks an equality operator and is generally otherwise a bit useless.
The implementation seems to be remarkably simple all things considered, making good use of existing code for hstore and a bit of transform magic.