Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic: SPINLOCK is not safe in iOS #2619

Closed
NachoSoto opened this issue Dec 15, 2015 · 20 comments · Fixed by #2670
Closed

Atomic: SPINLOCK is not safe in iOS #2619

NachoSoto opened this issue Dec 15, 2015 · 20 comments · Fixed by #2670

Comments

@NachoSoto
Copy link
Member

https://twitter.com/steipete/status/676851647042203648

I'll refrain from making any comments (I'll just say... if it's illegal on iOS, why is it a public API???)

Apparently pthread mutexes are faster now. We should create a benchmark to compare, and try to convert Atomic to this (also RACCompoundDisposable and RACSerialDisposable?)

@NachoSoto NachoSoto added the bug label Dec 15, 2015
@kastiglione
Copy link
Member

I'm not even sure what "illegal" means.

@joshaber
Copy link
Member

@NachoSoto
Copy link
Member Author

nope

@liscio
Copy link
Contributor

liscio commented Dec 21, 2015

I'm just curious, as I wasn't around from the beginning, but why was OSSpinLock used for the Atomic type to begin with?

Was NSLock used and then shown to be too slow, then pthread_mutex_t was tested and found to be too slow as well? Unfortunately git doesn't have history before "move sources…"

Spinlocks are not something I'd think to use in a non-kernel, non-device-driver context, which is why I ask. If anything, I'd say that the bug here is that OSSpinLock is used at all. 😛

@liscio
Copy link
Contributor

liscio commented Dec 21, 2015

To add a counter-point to my comment above, this isn't as big a deal as the title might make it out to be: https://twitter.com/Catfish_Man/status/676854531615883265

Even under CPU starvation that critical section is so tiny it’s likely ok in real situations.
—@Catfish_Man on Twitter

As for the comment above from @kastiglione:

I'm not even sure what "illegal" means.

I don't have the answer, but I would guess that something in the ARM CPU doesn't allow the spinlock primitive to enjoy many of the same benefits/guarantees/assumptions as it does on Intel CPUs. (But likely not all of them, according to https://twitter.com/Catfish_Man/status/676851988596809728)

Anyway, there's likely "nothing to see here" in practice, and instead some investigation should be done to replace with the safer primitive and measure for any performance regressions, as @NachoSoto suggests.

@jspahrsummers
Copy link
Member

See https://twitter.com/jspahrsummers/status/679245419931660288 and follow-up, including #278.

@jspahrsummers
Copy link
Member

I do think we should replace our uses of OSSpinLock, but I think our/my reasoning for using them was valid given the documentation available to us.

@liscio
Copy link
Contributor

liscio commented Dec 22, 2015

Yeah, and thanks for the relevant history commits. Not sure why GitHub doesn't do a git log --follow when history is requested for a given file.

At any rate, I will reiterate that "it's probably fine." Using a "now faster" mutex is probably a straightforward task, however it is less so without knowing how you profiled / tested them to begin with. Got any suggestions/pointers there?

As for a way forward if/when performance becomes a concern, lock-free queues are worth investigating. (Here's a pretty good article for readers unfamiliar with the concept: http://www.linuxjournal.com/content/lock-free-multi-producer-multi-consumer-queue-ring-buffer?page=0,0). I built something similar for an internal audio library a while ago, and I may be able to adapt it for general use like this.

@jspahrsummers
Copy link
Member

Using a "now faster" mutex is probably a straightforward task, however it is less so without knowing how you profiled / tested them to begin with. Got any suggestions/pointers there?

I was running GitHub Desktop in Instruments' Time Profiler and Allocations, and focusing on RAC-heavy stacks. You'll have to talk to @joshaber or @mdiep for that now. 😉

@liscio
Copy link
Contributor

liscio commented Dec 22, 2015

OK good to know. I imagine the Swift-side stuff will require a similarly large/deep code base to test with. I have grown some of my own over the last few months1 but I don't know whether I push RAC "hard enough" to get enough samples out of it. Plenty of other things tend to bubble up in the Instruments charts with apps like mine. 😄

1 Plug! http://capoapp.com, with http://capoapp.com/neptune being the most recent, totally-designed-using-RAC4 component.

@lilyball
Copy link

I wrote a blog post today that goes into more detail about why spinlocks are "illegal" on iOS: http://engineering.postmates.com/Spinlocks-Considered-Harmful-On-iOS/

@Adlai-Holler
Copy link
Contributor

I broke off Atomic a little while ago and migrated it to pthread_mutex_lock. By all accounts it's the best lock there is – no dynamic dispatch like NSLock, instant lock if not contended like spinlock, doesn't waste energy unlike spin lock.

Feel free to copy-paste the source from my repo or do whatever you like with it.

https://github.com/Adlai-Holler/Atomic
https://github.com/Adlai-Holler/Atomic/blob/master/Atomic/Atomic.swift

@NachoSoto
Copy link
Member Author

@kballard that's a great explanation, thanks!

I don't have strong opinions on this as I'm not an expert on the matter. I'll defer to others on the decision to change (or not change) what type of lock we use, though I'd be happy to make the change myself!

@lilyball
Copy link

A benchmark like that is completely useless without source. What's more, you're only showing a single time for each construct, without documenting what that time actually represents. The various constructs will behave differently depending on whether they're under contention, the priorities of the threads that are contending on the lock, etc. @synchronized is also a curious beast, its performance will depend heavily on the access patterns, such as how many @synchronized blocks are being entered/exited at the same time, whether you're locking the same object repeatedly or locking new objects, etc.

@jspahrsummers
Copy link
Member

NB: dispatch_semaphore does not donate priority, which is another potential pitfall.

@russbishop
Copy link

@jspahrsummers @kballard I have an implementation using atomic CAS. If there is any interest I'm happy to submit a PR.

edit: Looking at the use of Atomic I don't think my implementation will be useful. If you can do idempotent updates CAS lets you do a kind of optimistic transactions in memory which can be great but that would require quite a bit of refactoring (though it works really well if everything is a CoW immutable, then you really can treat mutation as an in-memory transaction).

@RomanTruba
Copy link

I found out this thread today, and made an investigation of different types of synchronization and did execute a benchmark on iOS simulator and real devices.
The results are very interesting for me. In iOS 10 we have visible performance degradation of dispatch_semaphore, which is probably changed his behaviour with thread priority respect.
Here is the summary diagram of basic synchronization mechanisms available in iOS. All tests run with release configuration (Swift optimization -O)

2016-06-29 17 58 39

Benchmark code for SDK9
Benchmark code for SDK10

@lilyball
Copy link

Your numbers look very suspect to me. How many times did you run the tests?

For comparison, I ran my own benchmark (source) on OS X 10.11.5 using a hacky benchmark harness I wrote a while back. I ran the entire benchmark 10 times, dropped the bottom and top 2 numbers from each test (to get rid of outliers), and the rest of the numbers produced the following ranges:

no sync: 22-41ns
spinlock: 24ns
semaphore: 29ns
NSLock: 45-71ns
mutex: 40-64ns
synchronized: 75-122ns
queue: 505-554ns

From this you can see that spinlock is by far the cheapest, followed by semaphore, then mutex, then NSLock just slightly behind mutex (since it's basically mutex + objc_msgSend), then synchronized is almost twice as expensive, and finally queues end up being extremely expensive, much more-so than I expected.

@RomanTruba
Copy link

@kballard as you can see, my benchmark is quite simple, but it uses default xctest mechanism for measuring execution time. Numbers are average run time for each test, not for each sync. In general, your and my results are correlating excepting queues, which are not so bad, as we used to think

@kiddhmh
Copy link

kiddhmh commented Sep 7, 2016

okok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants