Wild this hasn't been caught until now.
Because the exact ordering of the comparison in lfs_bd_cmp is a bit
ambiguous, lfs_dir_find_match returned the wrong result when filenames
were equal, and only differed in length.
For example:
- cmp("a", "aa") should be LFS_CMP_LT
- cmp("aaa", "aa") should be LFS_CMP_GT
We're quite lucky that none of the littlefs internals currently depend
on the sorted order, otherwise we'd probably be stuck with this weird
ordering for backwards compatibility reasons...
Fixed, and added some test cases over directory ordering to prevent
regression in the future.
Found by andriyndev
Superblock expansion is an irreversible operation. In an effort to
prevent superblock expansion from claiming valuable scratch space
(important for small, <~8 block filesystems), littlefs prevents
superblock expansion when the disk is "mostly full".
In true computer-scientist fashion, this "mostly full" threshold was
set to ~50%.
As pointed out by gbolgradov and rojer, >~50% utilization is not
uncommon, and it can lead to a situation where superblock expansion does
not occur in a relatively healthy filesystem, causing focused wear at
the root.
To remedy this, the threshold is now increased to ~88% (7/8) full.
This may change in the future and should probably be eventually user
configurable.
Found by gbolgradov and rojer
The idea is in the future this function may be extended to support other
block janitorial work. In such a case calling this lfs_fs_gc provides a
more general name that can include other operations.
This is currently just wishful thinking, however.
The initial implementation for this was provided by kaetemi, originally
as a mount flag. However, it has been modified here to be self-contained
in an explicit runtime function that can be called after mount.
The reasons for an explicit function:
1. lfs_mount stays a strictly readonly operation, and avoids pulling in
all of the write machinery.
2. filesystem-wide operations such as lfs_fs_grow can be a bit risky,
and irreversable. The action of growing the filesystem should be very
intentional.
---
One concern with this change is that this will be the first function
that changes metadata in the superblock. This might break tools that
expect the first valid superblock entry to contain the most recent
metadata, since only the last superblock in the superblock chain will
contain the updated metadata.
The code-cost wasn't that bad: 16556 B -> 16754 B (+1.2%)
But moving write support of older versions behind a compile-time flag
allows us to be a bit more liberal with what gets added to support older
versions, since the cost won't hit most users.
The intention is to help interop with older minor versions of littlefs.
Unfortunately, since lfs2.0 drivers cannot mount lfs2.1 images, there are
situations where it would be useful to write to write strictly lfs2.0
compatible images. The solution here adds a "disk_version" configuration
option which determines the behavior of lfs2.1 dependent features.
Normally you would expect this to only change write behavior. But since the
main change in lfs2.1 increased validation of erased data, we also need to
skip this extra validation (fcrc) or see terrible slowdowns when writing.
In terms of ease-of-use, a user familiar with other filesystems expects
block_usage in fsinfo. But in terms of practicality, block_usage can be
expensive to find in littlefs, so if it's not needed in the resulting
fsinfo, that operation is wasteful.
It's not clear to me what the best course of action is, but since
block_usage can always be added to fsinfo later, but not removed without
breaking backwards compatibility, I'm leaving this out for now.
Block usage can still be found by explicitly calling lfs_fs_size.
Version are now returned with major/minor packed into 32-bits,
so 0x00020001 is the current disk version, for example.
1. This needed to change to use a disk_* prefix for consistency with the
defines that already exist for LFS_VERSION/LFS_DISK_VERSION.
2. Encoding the version this way has the nice side effect of making 0 an
invalid value. This is useful for adding a similar config option
that needs to have reasonable default behavior for backwards
compatibility.
In theory this uses more space, but in practice most other config/status
is 32-bits in littlefs. We would be wasting this space for alignment
anyways.
Currently this includes:
- minor_version - on-disk minor version
- block_usage - estimated number of in-use blocks
- name_max - configurable name limit
- file_max - configurable file limit
- attr_max - configurable attr limit
These are currently the only configuration operations that need to be
written to disk. Other configuration is either needed to mount, such as
block_size, or does not change the on-disk representation, such as
read/prog_size.
This also includes the current block usage, which is common in other
filesystems, though a more expensive to find in littlefs. I figure it's
not unreasonable to make lfs_fs_stat no worse than block allocation,
hopefully this isn't a mistake. It may be worth caching the current
usage after the most recent lookahead scan.
More configuration may be added to this struct in the future.
This fixes the overflowing left shift on 8 bit platforms.
littlefs2/lfs.c: In function ‘lfs_dir_commitcrc’:
littlefs2/lfs.c:1654:51: error: left shift count >= width of type [-Werror=shift-count-overflow]
commit->ptag = ntag ^ ((0x80 & ~eperturb) << 24);
lfs_fs_mkconsistent allows running the internal consistency operations
(desuperblock/deorphan/demove) on demand and without any other
filesystem changes.
This can be useful for front-loading and persisting consistency operations
when you don't want to pay for this cost on the first write to the
filesystem.
Conveniently, this also offers a way to force the on-disk minor version
to bump, if that is wanted behavior.
Idea from kasper0
The underlying issue is that lfs_fs_deorphan did not updating gstate
correctly. The way it determined if there are any orphans remaining in
the filesystem was by subtracting the number of found orphans from an
internal counter.
This internal counter is a leftover from a previous implementation that
allowed leaving the lfs_fs_deorphan loop early if we know the number of
expected orphans. This can happen during recursive mdir relocations, but
with only a single bit in the gstate, can't happen during mount. If we
detect orphans during mount, we set this internal counter to 1, assuming
we will find at least one orphan.
But this presents a problem, what if we find _no_ orphans? If this happens
we never decrement the internal counter of orphans, so we would never
clear the bit in the gstate. This leads to a running lfs_fs_deorphan
on more-or-less every mutable operation in the filesystem, resulting in
an extreme performance hit.
The solution here is to not subtract the number of found orphans, but assume
that when our lfs_fs_deorphan loop finishes, we will have no orphans, because
that's the whole point of lfs_fs_deorphan.
Note that the early termination of lfs_fs_deorphan was dropped because
it would not actually change the runtime complexity of lfs_fs_deorphan,
adds code cost, and risks fragile corner cases such as this one.
---
Also added tests to assert we run lfs_fs_deorphan at most once.
Found by kasper0 and Ldd309
This just means a rewrite of the superblock entry with the new minor
version.
Though it's interesting to note, we don't need to rewrite the superblock
entry until the first write operation in the filesystem, an optimization
that is already in use for the fixing of orphans and in-flight moves.
To keep track of any outdated minor version found during lfs_mount, we
can carve out a bit from the reserved bits in our gstate. These are
currently used for a counter tracking the number of orphans in the
filesystem, but this is usually a very small number so this hopefully
won't be an issue.
In-device gstate tag:
[-- 32 --]
[1|- 11 -| 10 |1| 9 ]
^----^-----^--^--^-- 1-bit has orphans
'-----|--|--|-- 11-bit move type
'--|--|-- 10-bit move id
'--|-- 1-bit needs superblock
'-- 9-bit orphan count
This was just an oversight. Seeking to the end of the directory should
not error, but instead restore the directory to the state where the next
read returns 0.
Note this matches the behavior of lfs_file_tell/lfs_file_seek.
Found by sosthene-nitrokey
There was already an assert for this, but because it included the
underlying equation for the requirement it was too confusing for
users that had no prior knowledge for why the assert could trigger.
The math works out such that 128 bytes is a reasonable minimum
requirement, so I've added that number as an explicit assert.
Hopefully this makes this sort of situation easier to debug.
Note that this requirement would need to be increased to 512 bytes if
block addresses are ever increased to 64-bits. DESIGN.md goes into more
detail why this is.
This has caught enough people that an explicit assert is warranted.
How littlefs, a c99 project, should be integrated with c89 projects
is still an open question, but no one deserves to debug this sort of
undetected casting issue.
Found by johnernberg and XinStellaris
Before, once converted to a CTZ skip-list, a file would remain a CTZ
skip-list even if truncated back to a size that could be inlined.
This was just a shortcut in implementation. And since the fix for boundary
truncates needed special handling for size==0, it made sense to extend
this special condition to allow reverting to inline files.
---
The only case I can think of, where reverting to an inline file would be
detrimental, is if it's a readonly file that you would otherwise not need
to pay the metadata overhead for. But as a tradeoff, inlining the file
would free up the block it was on, so it's unclear if this really is
a net loss.
If the truncate is followed by a write, reverting to an inline file will
always be beneficial. We assume writes will change the data, so in the
non-inlined case there's no way to avoid copying the underlying block.
Even if we assume padding issues are solved.
There has been a bug in the filesystem for a while where truncating to a
block boundary suffers from an off-by-one mistake that corrupts the
internal representation of the CTZ skip-list.
This mostly appears when the file_size == block_size, as file_size >
block_size includes CTZ skip-list metadata, so the underlying block
boundaries appear at slightly different offsets.
---
The reason for off-by-one issue is a nuance in lfs_ctz_find that we sort
of abuse to get two different behaviors.
Consider the situation where this bug occurs:
block 0 block 1
.--------. .--------.
| abcdef |<-| {ptr0} |
| ghijkl | | yzabcd |
| mnopqr | | |
| stuvwx | | |
'--------' '--------'
With these 24-byte blocks, there's an ambiguity if we wanted to point to
offset 24. We could point before the block boundary, or we could point
after the block boundary
Before:
block 0 block 1
.--------. .--------.
| abcdef |<-| {ptr0} |
| ghijkl | | yzabcd |
| mnopqr | | |
| stuvwx | | |
'-------^' '--------'
'-- off=24 is here
After:
block 0 block 1
.--------. .--------.
| abcdef |<-| {ptr0} |
| ghijkl | | yzabcd |
| mnopqr | | ^ |
| stuvwx | | | |
'--------' '-|------'
'-- off=24 is here
When we want these two offsets depends on the context. We want the
offset to be conservative if it represents a size, but eager if it is
being used to prepare a block for writing.
The workaround/hack is to prefer the eager offset, after the block boundary,
but use `size-1` as the argument if we need the conservative offset.
This finds the correct block, but is off-by-one in the calculated
block-offset. Fortunately we happen to not use the block-offset in the
places we need this workaround/hack.
---
To get back to the bug, the wrong mode of lfs_ctz_find was used in
lfs_file_truncate, leading to internal corruption of the CTZ skip-list.
The correct behavior is size-1, with care to avoid underflow.
Also I've tweaked the code to make it clear the calculated block-offset
goes unused in these situations.
Thanks to ghost, ajaybhargav, and others for reporting the issue,
colin-foster-advantage for a reproducible test case, and rvanschoren,
hgspbs for the initial solution.
This of course should never happen normally, two half-orphans requires
two parents, which is disallowed in littlefs for this reason. But it can
happen if there is an outdated half-orphan later in the metadata
linked-list. The two half-orphans can cause the deorphan step to get
stuck, constantly "fixing" the first half-orphan before it has a chance
to remove the problematic, outdated half-orphan later in the list.
The solution here is to do a full check for half-orphans before
restarting the half-orphan loop. This strategy has the potential to
visit more metadata blocks unnecessarily, but avoids situations where
removing a later half-orphan will eventually cause an earlier
half-orphan to resolve itself.
Found with heuristic powerloss testing with test_relocations_reentrant_renames
after 192 nested powerlosses.
Initially I thought the fcrc would be sufficient for all of the
end-of-commit context, since indicating that there is a new commit is a
simple as invalidating the fcrc. But it turns out there are cases that
make this impossible.
The surprising, and actually common, case, is that of an fcrc that
will end up containing a full commit. This is common as soon as the
prog_size is big, as small commits are padded to the prog_size at
minimum.
.------------------. \
| metadata | |
| | |
| | +-.
|------------------| | |
| foward CRC ------------.
|------------------| / | |
| commit CRC -----' |
|------------------| |
| padding | |
| | |
|------------------| \ \ |
| metadata | | | |
| | +-. | |
| | | | +-'
|------------------| / | |
| commit CRC --------' |
|------------------| |
| | /
'------------------'
When the commit + crc is all contained in the fcrc, something silly
happens with the math behind crcs. Everything in the commit gets
canceled out:
crc(m) = m(x) x^|P|-1 mod P(x)
m ++ crc(m) = m(x) x^|P|-1 + (m(x) x^|P|-1 mod P(x))
crc(m ++ crc(m)) = (m(x) x^|P|-1 + (m(x) x^|P|-1 mod P(x))) x^|P|-1 mod P(x)
crc(m ++ crc(m)) = (m(x) x^|P|-1 + m(x) x^|P|-1) x^|P|-1 mod P(x)
crc(m ++ crc(m)) = 0 * x^|P|-1 mod P(x)
This is the reason the crc of a message + naive crc is zero. Even with an
initializer/bit-fiddling, the crc of the whole commit ends up as some
constant.
So no manipulation of the commit can change the fcrc...
But even if this did work, or we changed this scheme to use two
different checksums, it would still require calculating the fcrc of
the whole commit to know if we need to tweak the first bit to invalidate
the unlikely-but-problematic case where we happen to match the fcrc. This
would add a large amount of complexity to the commit code.
It's much simpler and cheaper to keep the 1-bit counter in the tag, even
if it adds another moving part to the system.
This fixes most of the remaining bugs (except one with multiple padding
commits + noop erases in test_badblocks), with some other code tweaks.
The biggest change was dropping reliance on end-of-block commits to know
when to stop parsing commits. We can just continue to parse tags and
rely on the crc for catch bad commits, avoiding a backwards-compatiblity
hiccup. So no new commit tag.
Also renamed nprogcrc -> fcrc and commitcrc -> ccrc and made naming in
the code a bit more consistent.
Previously forward-looking CRCs was just two new CRC types, one for
commits with forward-looking CRCs, one without. These both contained the
CRC needed to complete the current commit (note that the commit CRC
must come last!).
[-- 32 --|-- 32 --|-- 32 --|-- 32 --]
with: [ crc3 tag | nprog size | nprog crc | commit crc ]
without: [ crc2 tag | commit crc ]
This meant there had to be several checks for the two possible structure
sizes, messying up the implementation.
[-- 32 --|-- 32 --|-- 32 --|-- 32 --|-- 32 --]
with: [nprogcrc tag| nprog size | nprog crc | commit tag | commit crc ]
without: [ commit tag | commit crc ]
But we already have a mechanism for storing optional metadata! The
different metadata tags! So why not use a separate tage for the
forward-looking CRC, separate from the commit CRC?
I wasn't sure this would actually help that much, there are still
necessary conditions for wether or not a forward-looking CRC is there,
but in the end it simplified the code quite nicely, and resulted in a ~200 byte
code-cost saving.
- General cleanup from integration, including cleaning up some older
commit code
- Partial-prog tests do not make sense when prog_size == block_size
(there can't be partial-progs!)
- Fixed signed-comparison issue in modified filebd