|
|
Subscribe / Log in / New account

Supporting filesystems in persistent memory

Benefits for LWN subscribers

The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

By Jonathan Corbet
September 2, 2014
For a few years now, we have been told that upcoming non-volatile memory (NVM) devices are going to change how we use our systems. These devices provide large amounts (possibly terabytes) of memory that is persistent and that can be accessed at RAM speeds. Just what we will do with so much persistent memory is not entirely clear, but it is starting to come into focus. It seems that we'll run ordinary filesystems on it — but those filesystems will have to be tweaked to allow users to get full performance from NVM.

It is easy enough to wrap a block device driver around an NVM device and make it look like any other storage device. Doing so, though, forces all data on that device to be copied to and from the kernel's page cache. Given that the data could be accessed directly, this copying is inefficient at best. Performance-conscious users would rather avoid use of the page cache whenever possible so that they can get full-speed performance out of NVM devices.

The kernel has actually had some support for direct access to non-volatile memory since 2005, when execute-in-place (XIP) support was added to the ext2 filesystem. This code allows files from a directly-addressable device to be mapped into user space, allowing file data to be accessed without going through the page cache. The XIP code has apparently seen little use, though, and has not been improved in some years; it does not work with current filesystems.

Last year, Matthew Wilcox began work on improving the XIP code and integrating it into the ext4 filesystem. Along the way, he found that it was not well suited to the needs of contemporary filesystems; there are a number of unpleasant race conditions in the code as well. So over time, his work shifted from enhancing XIP to replacing it. That work, currently a 21-part patch set, is getting closer to being ready for merging into the mainline, so it is beginning to get a bit more attention.

Those patches replace the XIP code with a new subsystem called DAX (for "direct access," apparently). At the block device level, it replaces the existing direct_access() function in struct block_device_operations with one that looks like this:

    long (*direct_access)(struct block_device *dev, sector_t sector,
			  void **addr, unsigned long *pfn, long size);

This function accepts a sector number and a size value saying how many bytes the caller wishes to access. If the given space is directly addressable, the base (kernel) address should be returned in addr and the appropriate page frame number goes into pfn. The page frame number is meant to be used in page tables when arranging direct user-space access to the memory.

The use of page frame numbers and addresses may seem a bit strange; most of the kernel deals with memory at this level via struct page. That cannot be done here, though, for one simple reason: non-volatile memory is not ordinary RAM and has no page structures associated with it. Those missing page structures have a number of consequences; perhaps most significant is the fact that NVM cannot be passed to other devices for DMA operations. That rules out, for example, zero-copy network I/O to or from a file stored on an NVM device. Boaz Harrosh is working on a patch set allowing page structures to be used with NVM, but that work is in a relatively early state.

Moving up the stack, quite a bit of effort has gone into pushing NVM support into the virtual filesystem layer so that it can be used by all filesystems. Various generic helpers have been set up for common operations (reading, writing, truncating, memory-mapping, etc.). For the most part, the filesystem need only mark DAX-capable inodes with the new S_DAX flag and call the helper functions in the right places; see the documentation in the patch set for (a little) more information. The patch set finishes by adding the requisite support to ext4.

Andrew Morton expressed some skepticism about this work, though. At the top of his list of questions was: why not use a "suitably modified" version of an in-memory filesystem (ramfs or tmpfs, for example) instead? It seems like a reasonable question; those filesystems are already designed for directly-addressable memory and have the necessary optimizations. But RAM-based filesystems are designed for RAM; it turns out that they are not all that well suited to the NVM case.

For the details of why that is, this message from Dave Chinner is well worth reading in its entirety. To a great extent, it comes down to this: the RAM-based filesystems have not been designed to deal with persistence. They start fresh at each boot and need never cope with something left over from a previous run of the system. Data stored in NVM, instead, is expected to persist over reboots, be robust in the face of crashes, not go away when the kernel is upgraded, etc. That adds a whole set of requirements that RAM-based filesystems do not have to satisfy.

So, for example, NVM filesystems need all the tools that traditional filesystems have to recognize filesystems on disk, check them, deal with corruption, etc. They need all of the techniques used by filesystems to ensure that the filesystem image in persistent storage is in a consistent state at all times; metadata operations must be carefully ordered and protected with barriers, for example. Since compatibility with different kernels is important, no in-kernel data structures can be directly stored in the filesystem; they must be translated to and from an on-disk format. Ordinary filesystems do these things; RAM-based filesystems do not.

Then, as Dave explained, there is the little issue of scalability:

Further, it's going to need to scale to very large amounts of storage. We're talking about machines with *tens of TB* of NVDIMM capacity in the immediate future and so free space management and concurrency of allocation and freeing of used space is going to be fundamental to the performance of the persistent NVRAM filesystem. So, you end up with block/allocation groups to subdivide the space. Looking a lot like ext4 or XFS at this point.

And now you have to scale to indexing tens of millions of everything. At least tens of millions - hundreds of millions to billions is more likely, because storing tens of terabytes of small files is going to require indexing billions of files. And because there is no performance penalty for doing this, people will use the filesystem as a great big database. So now you have to have a scalable posix compatible directory structures, scalable freespace indexation, dynamic, scalable inode allocation, freeing, etc. Oh, and it also needs to be highly concurrent to handle machines with hundreds of CPU cores.

Dave concluded by pointing out that the kernel already has a couple of "persistent storage implementations" that can handle those needs: the XFS and ext4 filesystems (though he couldn't resist poking at the scalability of ext4). Both of them will work now on a block device based on persistent memory. The biggest thing that is missing is a way to allow users to directly address all of that data without copying it through the page cache; that is what the DAX code is meant to provide.

There are groups working on filesystems designed for NVM from the beginning. But most of that work is in an early stage; none has been posted to the kernel mailing lists, much less proposed for merging. So users wanting to get full performance out of NVM will find little help in that direction for some years yet. It is thus not unreasonable to conclude that there will be some real demand for the ability to use today's filesystems with NVM systems.

The path toward that capability would appear to be DAX. All that is needed is to get the patch set reviewed to the point that the relevant subsystem maintainers are comfortable merging it. That review has been somewhat slow in coming; the patch set is complex and touches a number of different subsystems. Still, the code has changed considerably in response to the reviews that have come in and appears to be getting close to a final state. Perhaps this functionality will find its way into the mainline in a near-future development cycle.

Index entries for this article
KernelDAX
KernelMemory management/Nonvolatile memory


(Log in to post comments)

Supporting filesystems in persistent memory

Posted Sep 3, 2014 2:36 UTC (Wed) by flewellyn (subscriber, #5047) [Link]

I was going to ask essentially the same question as Andrew Morton, along the lines of "Why not just treat it as persistent RAM, aka a RAM disk?" But the point about persistence requiring consistency checks and the other features that a full-on filesystem provides is a good one.

Now I just find myself hoping that the DAX feature, somewhere, has something called "Jadzia"...

Supporting filesystems in persistent memory

Posted Sep 3, 2014 23:56 UTC (Wed) by nix (subscriber, #2304) [Link]

Pick the previous host. It's quite plausible that a filesystem could be split into zones of some kind and have some identifier somewhere relating to the current zone...

Supporting filesystems in persistent memory

Posted Sep 3, 2014 8:41 UTC (Wed) by lpremoli (guest, #94065) [Link]

For some time I have been studying PRAMFS and I think that PRAMFS pretty much satisfies most requirements an NVM has (persistency, memory mappable contents, simple structures and small code base).

Does anybody have comments about the fitting of PRAMFS with mmappable NVMs?

Supporting filesystems in persistent memory

Posted Sep 3, 2014 10:11 UTC (Wed) by smurf (subscriber, #17840) [Link]

Looking at PRAMFS's tech page on SourceForge, I seriously doubt its suitability for multi-terabyte file systems. A free-block bitmap, linear directories, and no hard links? No way.

Supporting filesystems in persistent memory

Posted Sep 3, 2014 21:48 UTC (Wed) by ssl (guest, #98177) [Link]

> free block bitmap

By the way, is there a comparision how modern FS keep tabs on free space?

Supporting filesystems in persistent memory

Posted Sep 12, 2014 19:40 UTC (Fri) by vedantk (guest, #88435) [Link]

Btrfs: Btrfs keeps track of the space it allocates in its extent tree [1]. Holes between extents correspond to free space. There's a free space cache but no explicit free space tracking [2].

ZFS: ZFS appends allocation/deallocation metadata to a log, which it replays into an in-memory AVL tree. You can read about ZFS-style space maps in [3].

Those are the only two `modern' (read: non-bitmappy) approaches I know of, but I could be missing others.

[1]: https://btrfs.wiki.kernel.org/index.php/Btrfs_design#Exte...

[2]: https://btrfs.wiki.kernel.org/index.php/Glossary

[3]: https://blogs.oracle.com/bonwick/en/entry/space_maps

Supporting filesystems in persistent memory

Posted Sep 3, 2014 11:33 UTC (Wed) by dgc (subscriber, #6611) [Link]

> Does anybody have comments about the fitting of PRAMFS with mmappable NVMs

I've got a quote for you:

| That's where we started about two years ago with that
| horrible pramfs trainwreck.

Bonus points for working out which village idiot said that and where it fits in the context of the discussion. ;)

-Dave.

Supporting filesystems in persistent memory

Posted Sep 3, 2014 11:50 UTC (Wed) by lpremoli (guest, #94065) [Link]

Thanks Dave. I googled your quote and found the village person you mention and his comments ;)

Supporting filesystems in persistent memory

Posted Sep 9, 2014 11:41 UTC (Tue) by compudj (subscriber, #43335) [Link]

About PRAMFS: we tried using it to map the LTTng tracer ring buffer, so we could recover user-space and kernel traces after crash. Unfortunately, PRAMFS does not support a very useful and obvious use-case: memory mapping a file backed by persistent memory with MAP_SHARED; it only supports MAP_PRIVATE, which actually keeps the written pages in the page cache, which pretty much defeats the purpose of non-volatile memory.

In the case of tracing, we really want to bypass the page cache. Moreover, we don't want the performance hit of going through a system call for each event.

So this DAX patchset, assuming it supports MAP_SHARED memory maps, is really welcome from a tracing POV. Having the ability to associate struct page to those memory regions will be useful for kernel tracing too.

Thanks!

Mathieu

Supporting filesystems in persistent memory

Posted Sep 3, 2014 14:48 UTC (Wed) by markusw (guest, #53024) [Link]

Wait a second. When talking about "persistent memory", are we looking at (block) devices (speaking ATA or SCSI) or something closer to byte-addressable RAM?

In the former case, I'm wondering if the page cache really is inefficient, as I think RAM is likely to keep some latency margin compared to any kind of sufficiently big NVM device. (Why else would you keep the RAM if NVM was available in "tens of TB" and equal latency?) In any case: sure, use a block based filesystem for what basically is a block device.

In the latter case, I don't quite think a block based filesystem is a good fit. Nor a non-persistent one.

(And no, with all its queue management and 64 byte long commands, I don't consider NVMe to be byte-addressable. Not efficiently. It may well "ensur[e] efficient small random I/O operation" [1] as long as you consider 4K a small I/O operation.)

Markus

[1]: http://www.nvmexpress.org/about/nvm-express-overview/

Supporting filesystems in persistent memory

Posted Sep 3, 2014 16:01 UTC (Wed) by busterb (subscriber, #560) [Link]

The article mentions NVDIMMs, which are byte-addressable (well, as byte addressable as a DDR3 can be, if we're being pedantic). Current implementations are usually RAM + FLASH + some sort of backup power, but I assume there's something greater on the horizon.

I think it'd be great to have a form of hibernation that would allow the DRAM refresh to be completely disabled. I've worked on embedded server systems whether or not to enable the memory controllers when the CPU was not in use was a real bone of contention between the software developers (who wanted to use more memory for the management processor cores) and the hardware guys (who wanted the lowest power consumption when the main processor cores were powered off.)

Supporting filesystems in persistent memory

Posted Sep 4, 2014 4:55 UTC (Thu) by dgc (subscriber, #6611) [Link]

persistent memory == cpu addressable memory that is persistent.

That means PCIe based devices such as NVMe storage as well as NVDIMMs that you plug into normal ddr3/ddr4 slots. All "byte addressable" but the actual minimum read/write resolution is that of a CPU cacheline. i.e. any sub-cacheline size modification requires a RMW cycle by the CPU on that cacheline. So really there is no difference between NVMe devices or NVDIMMs apart from protocols and access latency.

However, all this NVRAM is still *page based* due to requiring the physical memory to be virtually mapped. This means the "block sizes" for storage is the page sizes the hardware TLB can support are. i.e. 4k or 2M for x86-64. IOWs, to optimise storage layout to minimise page faults we really need to allocate in page aligned chunks and not share pages between files. Which is exactly what a block based filesystem does ;).

SO, really, when you look at persistent memory, it's really just a very fast block device where the block size is determined by the CPU TLB architecture rather than the sector size used by traditional storage.

Like flash based SSDs before NVRAM, the architectural structure is not very different to storage we've been using for 50 years. The speeds and feeds are faster, but the same access and optimisation algorithms apply to the problem. We don't need to completely redesign the wheel to take advantage of the new technology - we only need to do that if we want to eek every last bit of performance out of it.

As for the question of "why do we need RAM if we have tens of TB of NVRAM" I'll just say this: most of the information the kernel keeps for running the machine is volatile. It's not designed to be persistent. If you just want a machine to have persistent memory, the OS needs to be designed from the ground up to be stateless and to *never corrupt itself*. When everything is persistent, rebooting does not fix problems...

-Dave.

Supporting filesystems in persistent memory

Posted Sep 4, 2014 13:23 UTC (Thu) by ballombe (subscriber, #9523) [Link]

> SO, really, when you look at persistent memory, it's really just a very fast block device where the block size is determined by the CPU TLB architecture rather than the sector size used by traditional storage.

Does that means that a particular device will only be readable on some hardware and not some others ?

Supporting filesystems in persistent memory

Posted Sep 5, 2014 3:00 UTC (Fri) by dgc (subscriber, #6611) [Link]

Yup, nothing new there. We already have to deal with that with filesystems made on a 64k page architecture with a 64k block sizw. They can't be mounted and used on an architecture with a 4k page size, because the linux kernel does not support block size > page size.

-Dave.

Supporting filesystems in persistent memory

Posted Sep 5, 2014 9:47 UTC (Fri) by etienne (guest, #25256) [Link]

> because the linux kernel does not support block size > page size

FAT filesystems can have a "cluster size" bigger than page size, Linux probably only reject "device minimum access block size" > page size

Supporting filesystems in persistent memory

Posted Feb 11, 2016 14:15 UTC (Thu) by smurf (subscriber, #17840) [Link]

What does one do in that situation?

This is not an idle question; I have inherited an apparently-Linux-based hard disk video recorder (now deceased) which seems to have created an ext3 file system with 8k blocks.

Supporting filesystems in persistent memory

Posted Feb 13, 2016 2:19 UTC (Sat) by dlang (guest, #313) [Link]

mount it on something that has larger than 4K block size. IIRC this can be done with powerpc, sparc, and I think even AMD64 systems

Supporting filesystems in persistent memory

Posted Sep 4, 2014 14:26 UTC (Thu) by etienne (guest, #25256) [Link]

> That means PCIe based devices such as NVMe storage as well as NVDIMMs that you plug into normal ddr3/ddr4 slots.

I am not sure both types can be treated the same way.
- PCIe devices are hot-plug with sufficiently good hardware.
- PCIe devices are better accessed with DMA, because then the length of the transfer is clearly known by the hardware before the start of the PCIe transaction (one PCIe transaction to transfer 4096 bytes versus a lot of transactions, one per cacheline or one per assembly instruction)
- I am not sure memory caching is easy to enable on PCIe mapped memory, there is plenty of small details there...

Also, are there conditions where a modification of the page-cache are not sent to the media, for instance a modified memory mapped file receiving a revoke system call?

Supporting filesystems in persistent memory

Posted Sep 5, 2014 13:32 UTC (Fri) by marcH (subscriber, #57642) [Link]

> I am not sure memory caching is easy to enable on PCIe mapped memory, there is plenty of small details there...

Like for instance? I naively thought it was MTRR and done.

Supporting filesystems in persistent memory

Posted Sep 8, 2014 10:43 UTC (Mon) by etienne (guest, #25256) [Link]

> > I am not sure memory caching is easy to enable on PCIe mapped memory, there is plenty of small details there...
>
> Like for instance? I naively thought it was MTRR and done.

If your processor support MTRR (most of them, but maybe do not crash if not), then you can use them - problem is their number is limited and their size/alignment is a bit strict.
On newer processor you shall use PATs (http://en.wikipedia.org/wiki/Page_attribute_table).
But the problem I was thinking of is that the PCIe connection to your persistent memory device may include PCIe bridges, and that bridge may have a different "Cache Line Size", its memory base address may not include the "Prefetchable" bit (see PCI specs) and few other stuff like that (like error recovery) due to PCI history...

Supporting filesystems in persistent memory

Posted Sep 4, 2014 18:35 UTC (Thu) by markusw (guest, #53024) [Link]

Granted, I omitted the cacheline size. However, the CPU/MMU happily writes individual cachelines to RAM, despite the VM mapping. I'd also argue that the VM page size barely has an influence on the block size of block devices (otherwise, moving from 512B to 4K sectors would have a noticeable impact; let alone the 2M pages).

To me, the "architectural structure" of a NAND based SSD looks very different from magnetic tapes or disks. It's not that we're lucky to not have to modify our designs because the architecture didn't change. It did. A lot. It's our existing designs that force new technology to immitate something we already have, know and designed for.

I certainly agree that it makes sense to use a file-system that was designed with block devices in mind on anything that resembles a block device. It also seems logical that a filesystem not designed for persistency is an utterly bad fit for NVM. And I can imagine getting quick results with a block device based filesystem on NVM. However, if the NVM device can write individual cachelines (and assuming that's not a lot less efficient than 4K block writes, as I would expect from something I connect to the CPU's memory bus), then a filesystem that assumes it needs to write a sector at a time doesn't feel like the right thing to me.

I don't follow your last argument, either. By definition, volatile information doesn't need to persist across a reboot (unlike an FS on NVM, where your argument holds). If NVM would be cheaper than RAM (at the same capacity and latency), you could simply wipe the device (or relevant portions) upon boot to make it behave like RAM.

However, persistence clearly is an additional feature (when compared to RAM). And I bet we continue to have to pay for that. Either with higher latency or higher price.

Markus

Supporting filesystems in persistent memory

Posted Sep 11, 2014 14:45 UTC (Thu) by kevinm (guest, #69913) [Link]

> However, if the NVM device can write individual cachelines (and assuming that's not a lot less efficient than 4K block writes, as I would expect from something I connect to the CPU's memory bus), then a filesystem that assumes it needs to write a sector at a time doesn't feel like the right thing to me.

That's the point of this DAX work, though - when a filesystem supports direct access, the NVM page *is* the page cache page, and at least for file data there's no writing of sectors at all. A write() of 64 bytes will just result in a 64 byte copy_from_user() directly into the NVM.

(The sticking point is that there will still be an in-memory inode structure and an "on-disk" inode structure stored in the NVM, which will probably mean that the entire NVM copy of the inode gets rewritten whenever an attribute like mtime changes).

Supporting filesystems in persistent memory

Posted Sep 20, 2014 22:13 UTC (Sat) by Lennie (subscriber, #49641) [Link]

"If NVM would be cheaper than RAM .., you could simply wipe the device .. upon boot to make it behave like RAM."

The security community is going to laugh at that if you don't have some kind of policy around not writing private keys to NVM. Which will probably not work. Existing applications don't have any concept of different kinds of memory for holding keys so you'd need to modify all of them ?

The security community isn't even happy with keeping keys in RAM as it is. As a simple can of compressed air can prevent your RAM from being wiped already:

http://en.wikipedia.org/wiki/Cold_boot_attack

Supporting filesystems in persistent memory

Posted Sep 22, 2014 20:09 UTC (Mon) by zlynx (guest, #2285) [Link]

> The security community isn't even happy with keeping keys in RAM as it is.

And then the invent things like TPM chips and "secure element" chips. But then they make them so impossible to use that programmers have no choice except to store encryption keys in RAM.

Supporting filesystems in persistent memory

Posted Sep 5, 2014 13:24 UTC (Fri) by marcH (subscriber, #57642) [Link]

> Like flash based SSDs before NVRAM, the architectural structure is not very different to storage we've been using for 50 years.

... except SSDs and flash memory generally speaking are all elaborated smoke and mirrors especially designed to maintain backward compatibility. Very fast smoke and mirrors in the case of SSDs but still. So, probably not a great example :-)

http://arstechnica.com/information-technology/2012/06/ins...

http://www.bunniestudios.com/blog/?p=3554 (On Hacking MicroSD Cards)

Supporting filesystems in persistent memory

Posted May 19, 2015 14:22 UTC (Tue) by Aissen (subscriber, #59976) [Link]

This is changing. PCIe and NVMe SSDs are already on the market without all the cruft of backward compatibility.

Supporting filesystems in persistent memory

Posted Sep 5, 2014 13:29 UTC (Fri) by marcH (subscriber, #57642) [Link]

> As for the question of "why do we need RAM if we have tens of TB of NVRAM" I'll just say this: most of the information the kernel keeps for running the machine is volatile. It's not designed to be persistent. If you just want a machine to have persistent memory, the OS needs to be designed from the ground up to be stateless and to *never corrupt itself*. When everything is persistent, rebooting does not fix problems...

Well, you can always do less with more. Name a partition of your NVRAM as volatile and ignore its former content when you boot, job done?

Supporting filesystems in persistent memory

Posted Sep 6, 2014 22:04 UTC (Sat) by pedrocr (guest, #57415) [Link]

> the OS needs to be designed from the ground up to be stateless and to *never corrupt itself*. When everything is persistent, rebooting does not fix problems...

You'd just need to clearly mark the parts of the NVRAM that are persistent filesystem structures and the parts that are kernel structures. The filesystem parts are read just as today and the kernel structure parts ignored and created from scratch. That doesn't seem too hard.

Supporting filesystems in persistent memory

Posted Sep 4, 2014 12:59 UTC (Thu) by martin.langhoff (guest, #61417) [Link]

The first use I can think of is speeding up writes to traditional rotational media. An outsize, kernel-controlled write back cache; or a "data-writeback" journal.

Can these NVM devices be used for that workload easily today?

Supporting filesystems in persistent memory

Posted Sep 5, 2014 14:32 UTC (Fri) by pkern (subscriber, #32883) [Link]

XIP was designed for readonly blobs of memory on mainframes shared by multiple VMs. It needed a persistable structure, so ext2 was picked. I'm not sure why you'd try to port it to ext4 instead. You do not need the journaling of ext4 and extents are probably also not all that useful if you are mapping in page sized hunks anyway (huge pages aside).

Supporting filesystems in persistent memory

Posted Sep 6, 2014 17:20 UTC (Sat) by willy (subscriber, #9762) [Link]

Yes, the ext2-XIP design was for that purpose, but the infrastructure that it put in looked suitable as a basis for supporting persistent memory with a modern filesystem like ext4/XFS. It turned out that the ->direct_access() block device API was close, needing minor improvements, but the ->get_xip_mem() filesystem API was all wrong and needed to be replaced. We ought not have support for two direct access APIs in Linux, so my patchset migrates ext2 from XIP to DAX as the appropriate pieces of DAX are added.

One of the ext* maintainers/developers told me that having XIP support in ext2 and not in ext4 was one of the few remaining reasons not to remove ext2 from the Linux codebase, so we may see ext2-XIP deprecated in favour of ext4-DAX soon. That wasn't particularly my intent, but if it happens to help others, then that's just gravy.

Speaking of huge pages, I'm currently working on support for those. I haven't finished debugging them yet, but I've definitely seen them inserted into the user's address space and removed again.

Supporting filesystems in persistent memory

Posted Sep 8, 2014 1:18 UTC (Mon) by mithro (subscriber, #50469) [Link]

Anyone know where you can actually by a NVM device as a home consumer? It seems like a really cool thing to experiment with, even if its doesn't deliver on the 100TB of fast storage for less than DDR cost.

Supporting filesystems in persistent memory

Posted Sep 10, 2014 14:28 UTC (Wed) by reubenhwk (guest, #75803) [Link]

I suspect this is going to be another vaporware. Even if somebody does produce a storage device which is as fast as ram, somebody else will build on that technology by making it much faster and more volatile.

Volatile storage has always been considerably faster, smaller in capacity and far more expensive and we've seen how a tiny bit of cache (compared to the size of RAM) can massively improve performance. A tiny bit of RAM (compared to HDD storage space) can greatly improve the performance of disk access (using the page cache). I see no reason why that will change now. If we suddenly have 100 TB of VNM which cost maybe $100, the MOBO will undoubtedly have a slot for 8-16 TB of even faster volatile memory which is going to do exactly was RAM does today and will also cost about $100. In other words, nothing will change, but everything will be faster and larger.

Supporting filesystems in persistent memory

Posted Sep 11, 2014 9:01 UTC (Thu) by dakas (guest, #88146) [Link]

Volatile storage has always been considerably faster, smaller in capacity and far more expensive[...]
"Always" is such a strong word. There were literally decades where core memory was persistent because, well, consisting of magnetic cores.

HP at the current point of time appears to be betting a significant part of its assets on memristor technology. It does not look feasible to make it "much faster and more volatile": they want to move the CPUs to that technology as well, so there will be no issue of "better fit/worse fit" as there is with static RAM vs dynamic RAM vs block devices.

I'd like to see them pull this off and make computing more exciting again than the "more of the same, just more so" developments of the last 30+ years.

Supporting filesystems in persistent memory

Posted Feb 12, 2016 9:24 UTC (Fri) by roblucid (guest, #48964) [Link]

And as one of the links in article explains, core memory was not only slower than RAM, but the persistence was only 80% reliable, frequently a colder style boot was required.

Fast memory has needed to be closer to the core, for example modern L1 cache, technologies which include RAM inside the CPU package (EDRAM/HBM2) are intended for high bandwidith whilst providing low latency. A device on the end of some bus, that's plugged in, can NEVER compete with the short physical connections of on chip RAM, so there's very good reason to believe the miracle people want to believe, will prove to have engineering tradeoffs (for example the EROS persistent RAM store, was disk backed therefore with log write throughput is FAR lower than RAM and it's performance relied on cache and the hot checkpoint area of disk reducing seeks for "85%" of accesses).

Supporting filesystems in persistent memory

Posted Sep 11, 2014 15:02 UTC (Thu) by rrcsraghu (guest, #87071) [Link]

"RAM-based filesystems are designed for RAM; it turns out that they are not all that well suited to the NVM case"

In this context, I'd like to point out to some of the work folks at Lawrence Livermore National Laboratory, and us at The Ohio State University, have done in designing a highly-scalable in-memory file system designed to handle persistent memory devices.

http://go.osu.edu/cruise-paper

Some of you might be familiar with the IBM BlueGene/Q supercomputer's persistent memory capability. We used that as our playground when prototyping CRUISE. While the file system was designed for checkpoint-restart workloads, the design allows extension for typical file system workloads as well.

Our prototype is up on github for those who want to look around:
https://github.com/hpc/cruise

Supporting filesystems in persistent memory

Posted Sep 12, 2014 3:37 UTC (Fri) by kevinm (guest, #69913) [Link]

Thinking a bit about NVM, it seems like NVM that is directly addressable by the entire kernel vastly increases the probability that write through a wild pointer by $RANDOM_DODGY_DRIVER will corrupt data on your "disk".

Supporting filesystems in persistent memory

Posted Sep 14, 2014 21:40 UTC (Sun) by zlynx (guest, #2285) [Link]

Its possible. But its also possible for the kernel to write to a random position on your SATA drive. Granted, this isn't as likely to happen by accident. Although it could drop random bytes into file cache which would result in corrupted data...

Also, I was under the impression that the kernel switches memory maps in kernel mode, locking itself into a 1 or 2 gigabyte range. Writes outside that range that don't go through copy_to_user result in a BUG. Although maybe that was only done in 32-bit 4G/4G mode...Hmm.

If it doesn't do that currently, it certainly could do it in the future.


Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds