Commit Graph

2441 Commits

Author SHA1 Message Date
c9691503bc scripts: plot[mpl].py: Added --x/ylim-ratio for simpler limits
I've been struggling to keep plots readable with --x/ylim-stddev, it may
have been the wrong tool for the job.

This adds --x/ylim-ratio as an alternative, which just sets the limit to
include x-percent of the data (I avoided "percen"t in the name because
it should be --x/ylim-ratio=0.98, not 98, though I'm not sure "ratio" is
great either...).

Like --x/ylim-stddev, this can be used in both one and two argument
forms:

  $ ./scripts/plot.py --ylim-ratio=0.98
  $ ./scripts/plot.py --ylim-=-0.98,+0.98

So far, --x/ylim-ratio has proven much easier to use, maybe because our
amortized results don't follow a normal distribution? --x/ylim-ratio
seems to do a good job of clipping runaway amortized results without too
much information loss.
2025-10-01 17:57:32 -05:00
92af5de3ca emubd: Added optional nor-masking emulation
This adds NOR-style masking emulation to emubd when erase_value is set
to -2:

  erase     => 0xff
  prog 0xf0 => 0xf0
  prog 0xcc => 0xc0

We do _not_ rely on this property in littlefs, and so this feature will
probably go unused in our tests, but it's useful for running other
filesystems (SPIFFS) on top of emubd.

It may be a bit of a scope violation to merge this into littlefs's core
repo, but it's useful to centralize emubd's features somewhere...
2025-10-01 17:57:28 -05:00
6a57258558 make: Adopted lowercase for foreach variables
This seems to be the common style in other Makefiles, and avoids
confusion with global/env variables.
2025-10-01 17:57:23 -05:00
a1b75497d6 bmap: rdonly: Got LFS3_RDONLY + LFS3_BMAP compiling
Counterintuitively, LFS3_RDONLY + LFS3_BMAP _does_ make sense for cases
where you want to include the bmap in things like ckmeta/ckdata scans.

Though this is another argument for a LFS3_RDONLY + LFS3_NO_TRV build.
Traversals add quite a bit of code to the rdonly build that is probably
not always needed.

---

This just required another bunch of ifdefs.

Current bmap rdonly code size:

                code          stack          ctx
  rdonly:      10616            896          532
  rdonly+bmap: 10892 (+2.6%)    896 (+0.0%)  636 (+19.5%)
2025-10-01 17:57:15 -05:00
60ef118dcd rdonly: Got LFS3_RDONLY compiling again
Just a few alloc/eoff references slipped through in the bmap work.

Current rdonly code size:

            code           stack           ctx
  default: 37024            2352           684
  rdonly:  10616 (-71.3%)    896 (-61.9%)  532 (-22.2%)

This biggest change was tweaking our mtortoise again to use the unused
trunk field for the power-of-two bound. The original intention of using
eoff was an extra precaution to avoid the mtortoise looking like a valid
shrub at any point, but eoff is not available in LFS3_RDONLY.

And we definitely want our mtortoise in LFS3_RDONLY!

---

Note I haven't actually tested LFS3_RDONLY + LFS3_BMAP. Does this config
even make sense? I guess ckmeta/ckdata will need to traverse the bmap,
so, counterintuitively, yes?
2025-10-01 17:57:14 -05:00
664d99dbeb Fixed crystallize_ losing track of ungrafted leaves on error
Whoops, this was an oversight when readopting lazy grafting.

It turns out the crystallization refactor that led to
lfs3_file_crystallize_ operating directly on file->leaf.bptr was a bit
incompatible with lazy grafting.

If we encounter an error and need to relocate, we need to rewrite any
data in our crystal, _including data in ungrafted leaves_.

By pure luck, the previous lazy grafting implementation side-stepped
this issue by including ungrafted leaves in lfs3_file_lookupnext calls.
This implicitly included the ungrafted leaf in any recrystallizations,
as long as it wasn't modified on error.

---

The fix required two tweaks:

- Recrystallize into a copy in case we hit an error.

  Instead of a full lfs3_bptr_t, I just copied the relevant
  block_/off_/pos_ pieces we need.

- Include file leaves in the crystallization logic.

  Fortunately the multi-data-prioritization loop we already have for
  any cached data was relatively easy to adapt for this.

As a plus lfs3_file_crystallize_ can also now short-circuit a
bshrub/btree lookup if the data we're crystallizing happens to be in the
file leaf.

This adds a bit more code, but doesn't break if we hit an error. In
theory this would add stack for the recrystallization copy, but
lfs3_file_crystallize_ is just off the stack hot-path:

           code          stack          ctx
  before: 36972           2352          684
  after:  37024 (+0.1%)   2352 (+0.0%)  684 (+0.0%)

Another fix I considered -- calling lfs3_file_graft on error -- may have
been a bit less code, but would have moved the stack hot-path under
lfs3_file_crystallize_. An error triggering _more_ progs/commits also
doesn't really sound like the greatest of ideas.

Found by test_ck_spam_fwrite_fuzz.
2025-10-01 17:57:12 -05:00
8c04482ea3 Disable LFS3_BMAP when LFS3_BIGGEST for now
Currently LFS3_BMAP implies LFS3_YES_BMAP, which is an ugly hack because
I don't want to figure out the BMAP flag logic right now.

As a side-effect, this makes it impossible to test LFS3_BIGGEST without
LFS3_BMAP, which breaks a number of tests that have not been updated to
support >2 format blocks.
2025-10-01 17:57:09 -05:00
83196ed67a Dropped redundant isuncryst check in lfs3_file_flush_
Saves a bit of code, at the cost of making this logic a bit more
difficult to read:

           code          stack          ctx
  before: 36992           2352          684
  after:  36972 (-0.1%)   2352 (+0.0%)  684 (+0.0%)
2025-10-01 17:57:08 -05:00
be3e61dd13 Dropped lfs3_file_weight_
Now that ungrafted leaves are much more limited in scope, I'm not sure
lfs3_file_weight_ still makes sense as a separate function.

The only call was in lfs3_file_read, where we decide if we bother
flushing things before actually flushing things. Given that we should
now generally graft before expecting lfs3_file_lookupnext to make sense,
relying on lfs3_file_weight_ too much probably hints at a logic mistake.

This logic ends up inlined anyways, so no code changes:

           code          stack          ctx
  before: 36992           2352          684
  after:  36992 (+0.0%)   2352 (+0.0%)  684 (+0.0%)
2025-10-01 17:57:06 -05:00
15c3d2f87a Flattened lfs3_file_crystallize_
We no longer need to discard the leaf, since we can just leave ungrafted
leaves around as long as LFS3_o_UNGRAFT is set.

This let's us flatten lfs3_file_crystallize_, saving a bit of code and
cleaning up the logic a bit:

           code          stack          ctx
  before: 37032           2352          684
  after:  36992 (-0.1%)   2352 (+0.0%)  684 (+0.0%)

Note that lfs3_file_crystallize_ is still NOINLINE to force it off the
stack hot-path in lfs3_file_flush_, etc.
2025-10-01 17:57:04 -05:00
58c5506e85 Brought back lazy grafting, but not too lazy
Continued benchmarking efforts are indicating this isn't really an
optional optimization.

This brings back lazy grafting, where the file leaf is allowed to fall
out-of-date to minimize bshrub/btree updates. This is controlled by
LFS3_o_UNGRAFT, which is similar, but independent from LFS3_o_UNCRYST:

- LFS3_o_UNCRYST - File's leaf not fully crystallized
- LFS3_o_UNGRAFT - File's leaf does not match disk

Note it makes sense for files to be UNGRAFT only, in the case where the
current crystal terminates at the end-of-file but future appends are
likely. And it makes sense for files to be UNCRYST only, in cases where
we graft uncrystallized blocks so the bshrub/btree makes sense.

Which brings us to the main change from the previous lazy-grafting
implementation: lfs3_file_lookupnext no longer includes ungrafted
leaves.

Instead, functions should call lfs3_file_graft if they need
lfs3_file_lookupnext to make sense.

This significantly reduces the code cost of lazy grafting, at the risk
of needing to graft more frequently. Fortunately we don't actually need
to call lfs3_file_graft all that often:

- lfs3_file_read already flushes caches/leaves before attempting any
  bshrub/btree reads for simplicity (heavy are not currently considered
  a priority, if you need this consider opening two file handles).

- lfs3_file_flush_ _does_ need to call lfs3_file_graft before the
  crystallization heuristic pokes, but if we can't resume
  crystallization, we would probably need to graft the crystal to
  satisfy the flush anyways.

---

Lazy grafting, i.e. procrastinating on bshrub/btree updates during block
appends, is an optimization previously dropped due to perceived
nicheness:

- We can only lazily graft blocks, inlined data fragments always require
  bshrub/btree updates since they live in the bshrub/btree.

- Sync forces bshrub/btree updates anyways, so lazy grafting has no
  benefit for most logging applications.

- This performance penalty of eagerly grafting goes away if your caches
  are large enough.

Note that the last argument is a non-argument in littlefs's case. They
whole point of littlefs is that you _don't_ need RAM to fix things.

However these arguments are all moot when you consider that the "niche
use case" -- linear file writes -- is the default bottleneck for most
applications. Any file operation becomes a linear write bottleneck when
the arguments are large enough. And this becomes a noticeable issue when
benchmarking.

So... This brings back lazy grafting. But with a more limited scope
w.r.t. internal file operations (the above lfs3_file_lookupnext/
lfs3_file_graft changes).

---

Long story short, lazy grafting is back again, reverting the ~3x
performance regression for linear file writes.

But now with quite a bit less code/stack cost:

           code          stack          ctx
  before: 36820           2368          684
  after:  37032 (+0.6%)   2352 (-0.7%)  684 (+0.0%)
2025-10-01 17:57:01 -05:00
68424f8cda bmap: t: Added the on-disk bmap (bmap_p) to mtree traversals
I was wrong! New bmaps containing the old bmap is _not_ sufficient for
lookahead scans, in the case where we rebuild the bmap multiple times
before an mdir commit! The previous bug with lfs3_trv_read +
lfs3_alloc_ckpoint _was_ a bug because the traversal was rdonly, but in
theory the bmap shouldn't have been corrupted.

Since this case is possible, we also need to traverse the on-disk bmap.
Fortunately only when the on-disk bmap does not match the active bmap.

This is probably safer behavior anyways, and means ckmeta/ckdata will
traverse both the in-RAM and on-disk bmaps, which is probably a good
thing in case we need to revert to the on-disk bmap due to power-loss/
error.

---

The implementation got a bit messy, since we only track the on-disk
encoding of the on-disk bmap (we need the encoding for gdeltas to work).
This ended up adding code, and an annoying (avoidable? TODO?) stack
cost, but correct behavior is better than incorrect behavior:

                code          stack          ctx
  before:      36856           2368          684
  after:       36820 (-0.1%)   2368 (+0.0%)  684 (+0.0%)

                code          stack          ctx
  bmap before: 38452           2400          812
  bmap after:  38504 (+0.1%)   2464 (+2.7%)  812 (+0.0%)

Once again the inaccuracy of our stack frame calculations strike
again...
2025-10-01 17:56:58 -05:00
27a722456e scripts: Added support for SI-prefixes as iI punescape modifiers
This adds %i and %I as punescape modifiers for limited printing of
integers with SI prefixes:

- %(field)i - base-10 SI prefixes
  - 100   => 100
  - 10000 => 10K
  - 0.01  => 10m

- %(field)I - base-2SI prefixes
  - 128   => 128
  - 10240 => 10Ki
  - 0.125 => 128mi

These can also easily include units as a part of the punescape string:

- %(field)iops/s => 10Kops/s
- %(field)IB => 10KiB

This is particularly useful in plotmpl.py for adding explicit
x/yticklabels without sacrificing the automatic SI-prefixes.
2025-10-01 17:56:51 -05:00
2a4e0496b6 scripts: csv.py: Fixed lexing of signed float exponents
So now these lex correctly:

- 1e9  =>  1000000000
- 1e+9 =>  1000000000
- 1e-9 => -1000000000

A bit tricky when you think about how these could be confused for binary
addition/subtraction. To fix we just eagerly grab any signs after the e.

These are particularly useful for manipulating simulated benchmarks,
where we need to convert things to/from nanoseconds.
2025-10-01 17:56:29 -05:00
18d1f68445 t: Limited lfs3_alloc_ckpoint to LFS3_T_LOOKAHEAD
This was causing a problem where the bmap was being rebuilt on every
lfs3_trv_read, even though the traversal was opened LFS3_T_RDONLY!

Also added a note on why we don't need to traverse both the active and
on-disk bmaps. Counterintuitively, we don't need to because the new bmap
always contains the entirety of the on-disk bmap.

Code changes minimal:

                code          stack          ctx
  before:      36840           2368          684
  after:       36856 (+0.0%)   2368 (+0.0%)  684 (+0.0%)

                code          stack          ctx
  bmap before: 38440           2400          812
  bmap after:  38452 (+0.0%)   2400 (+0.0%)  812 (+0.0%)
2025-10-01 17:56:27 -05:00
507c04db70 Fixed clobbered shrub estimates when redundantly syncing shrubs
Found while benchmarking, our shrub estimates were being recalculated
much more frequently than they should be (every shrub commit). The
problem is that we never staged shrub estimates!

In theory this is fine, shrub estimates don't necessarily need staging.
We update shrub.r.eoff/estimate directly in lfs3_bshrub_commitroot_
after the mdir commit succeeds. But then we _redundantly_ sync
shrub_ -> shrub.r in lfs3_bshrub_commit. Since we never staged the
shrub estimate, we end up with garbage.

---

It's not clear to me this (staging the shrub estimate) is the best fix
for this, but the reason for the redundant shrub sync is the shared
bshrub/btree post-commit path. Added a TODO comment and should look at
this again when not in a time crunch.

Code changes minimal:

           code          stack          ctx
  before: 36836           2368          684
  after:  36840 (+0.0%)   2368 (+0.0%)  684 (+0.0%)
2025-10-01 17:56:25 -05:00
7289619859 Tweaked lfs3_mdir_commit to imply lfs3_alloc_ckpoint
Now that lfs3_alloc_ckpoint is more complicated, and can error, it makes
sense for lfs3_alloc_ckpoint to be implied by lfs3_mdir_commit.

Most lfs3_mdir_commit calls represent an atomic transaction from one
state -> another, so this saves a bit of code:

                code          stack          ctx
  before:      36912           2368          684
  after:       36836 (-0.2%)   2368 (+0.0%)  684 (+0.0%)

                code          stack          ctx
  bmap before: 38512           2400          812
  bmap after:  38436 (-0.2%)   2400 (+0.0%)  812 (+0.0%)

The notable exception being bshrub-related commits in
lfs3_bshrub_commitroot_. Bshrub commits are trying to resolve an
in-flight btree, so the relevant blocks are very much _not_ at rest.

---

I've been hesitant to adopt this mostly just because it makes the
lfs3_mdir_commit* names even more of a mess:

- lfs3_mdir_commit__   -> lfs3_mdir_commit___
- lfs3_mdir_commit_    -> lfs3_mdir_commit__
- lfs3_mdir_commit     -> lfs3_mdir_commit_
- added lfs3_mdir_commit
- lfs3_mdir_compact    -> lfs3_mdir_compact_
- add lfs3_mdir_compact
- lfs3_mdir_alloc__    -> lfs3_mdir_alloc___
- lfs3_mdir_estimate__ -> lfs3_mdir_estimate___
- lfs3_mdir_swap__     -> lfs3_mdir_swap___
2025-10-01 17:56:24 -05:00
27e3e10634 bmap: Added error propagation to ckpoints and cleaned up test TODOs
The main change is error propagation in lfs3_alloc_ckpoint. Since
lfs3_alloc_ckpoint writes to disk during bmap rebuilds, it can now fail
in all sorts of ways. Fortunately lfs3_alloc_ckpoint should only ever be
called by write operations, where these errors are be expected.

With bmap rebuild errors now reported correctly, this unblocks most of
the remaining test TODOs:

- Passing test_badblocks
- Passing test_ck
- Passing test_trvs

With this, LFS3_YES_BMAP is now passing all but two tests, which are
still ifndef-disabled as a temporary measure:

- test_btree - We make some low-level assumptions about the lookahead
  allocator when testing btrees. It's probably not worth trying to get
  this passing with the bmap allocator.

- test_grow - This one does need fixing! We currently don't update
  on-disk bmaps correctly when growing the filesystem.

Code changes minimal:

                code          stack          ctx
  before:      36912           2368          684
  after:       36912 (+0.0%)   2368 (+0.0%)  684 (+0.0%)

                code          stack          ctx
  bmap before: 38456           2400          812
  bmap after:  38512 (+0.1%)   2400 (+0.0%)  812 (+0.0%)
2025-10-01 17:56:22 -05:00
41be512272 bmap: Fixed up low-hanging fruit, tests and things
- Consistent grm_op -> alloc_ckpoint -> mdir_commit order
- Drop some low priority TODOs
- Got test_alloc at least passing existing tests
- Got test_gc passing
- Got test_mount passing
- test_relocations was already passing, lol

No code changes:

                code          stack          ctx
  before:      36912           2368          684
  after:       36912 (+0.0%)   2368 (+0.0%)  684 (+0.0%)

                code          stack          ctx
  bmap before: 38456           2400          812
  bmap after:  38456 (+0.0%)   2400 (+0.0%)  812 (+0.0%)
2025-10-01 17:56:20 -05:00
047fb83b62 dread: Fixed lingering orphans affecting dir positions
We need to adjust mids to ignore orphans during dir traversal, but we
shouldn't also adjust the dir position. In theory it shouldn't matter if
we use adjusted/non-adjusted dir positions, but it becomes a problem if
intermediate writes cause those orphans to be cleaned up. Now all your
dir positions are wrong.

Not entirely sure why this only started to fail with the bmap. I'm
guessing it's just due to the additional gstate causing the mdirs to
split differently.

Code changes minimal:

           code          stack          ctx
  before: 36920           2368          684
  after:  36912 (-0.0%)   2368 (+0.0%)  684 (+0.0%)

Tangential, but toss this on the pile of problems with dir positions.
I'm increasingly convinced we should just remove the concept if we can
get away with it.
2025-10-01 17:56:17 -05:00
726cccfe76 bmap: Tweaked bmapcache algo to piggyback on mdir commits
There's really no reason to immediately commit the bmap to disk, at
least no until the first mdir commit, when we need to at least discard
the previous bmap state.

We already do all the gstate handling in lfs3_mdir_commit anyways, and
piggybacking on mdir commit lets us get rid of the annoying extra mdir
param in lfs3_alloc_ckpoint.

This does mean a slightly higher risk of needing to re-rebuild the bmap
after a powerloss, but in theory only if the user does something weird
like writing to a file and never calling sync. Most on-disk operations
terminate in an mdir commit as that's how any state change becomes
atomically visibile in littlefs.

Saves a nice bit of stack:

                code          stack          ctx
  before:      36920           2368          684
  after:       36920 (+0.0%)   2368 (+0.0%)  684 (+0.0%)

                code          stack          ctx
  bmap before: 38552           2472          812
  bmap after:  38464 (-0.2%)   2400 (-2.9%)  812 (+0.0%)
2025-10-01 17:56:16 -05:00
316ca1cc05 bmap: The initial bmapcache algorithm seems to be working
At least at a proof-of-concept level, there's still a lot of cleanup
needed.

To make things work, lfs3_alloc_ckpoint now takes an mdir, which
provides the target for gbmap gstate updates.

When the bmap is close to empty (configurable via bmap_scan_thresh), we
opportunistically rebuild it during lfs3_alloc_ckpoints. The nice thing
about lfs3_alloc_ckpoint is we know the state of all in-flight blocks,
so rebuilding the bmap just requires traversing the filesystem + in-RAM
state.

We might still fall back to the lookahead buffer, but in theory a well
tuned bmap_scan_thresh can prevent this from becoming a bottleneck (at
the cost of more frequent bmap rebuilds).

---

This is also probably a good time to resume measuring code/ram costs,
though it's worth repeating the above note about the bmap work still
needing cleanup:

             code          stack          ctx
  before:   36840           2368          684
  after:    36920 (+0.2%)   2368 (+0.0%)  684 (+0.0%)

Haha, no, the bmap isn't basically free, it's just an opt-in features.
With -DLFS3_YES_BMAP=1:

             code          stack          ctx
  no bmap:  36920           2368          684
  yes bmap: 38552 (+4.4%)   2472 (+4.4%)  812 (+18.7%)
2025-10-01 17:56:14 -05:00
8666830515 bmap: scripts: Fixed missing geometry race condition
The gbmap's weight is defined by the block count stored in the geometry
config field, which should always be present in valid littlefs3 images.

But our scripts routinely try to parse _invalid_ littlefs3 images when
running in parallel with benchmarks/tests (littlefs3 does _not_ support
multiple read/writers), so this was causing exceptions to be thrown.

The fix is to just assume weight=0 when the geometry field is missing.
The image isn't valid, and the gbmap is optional anyways.
2025-10-01 17:56:13 -05:00
71b9ad2412 bmap: Enabled at least opportunistic bmap allocations
This doesn't fully replace the lookahead buffer, but at least augments
it with known bmap state when available.

To be honest, this is a minimal effort hack to try to get something
benchmarkable without dealing with all the catch-22 issues that a
self-support bmap allocator would encounter (allocating blocks for the
bmap requires a bmap, oh no).

Though now that I'm writing this, maybe this is a reasonable long-term
solution? Having the lookahead buffer to fall back on solves a lot of
problems, and, realistically, it's unlikely to be a performance
bottleneck unless the user has extreme write requests (>available
storage?).

---

Also tweaked field naming to be consistent between the bmap and
lookahead buffer.
2025-10-01 17:56:12 -05:00
838a4beee1 bmap: Moved gbmap traversal to the end
This avoids issues with the different traversal paths with an mtree vs
inline-mtree. Previously this was broken when the mtree was inlined.

This order also makes more sense if we want to check mdirs before we
consider the gstate to be trustworthy enough for gbmap traversal.
2025-10-01 17:56:10 -05:00
ebae43898e bmap: Changing direction, store bmap mode in wcompat flags
The idea behind separate ctrled+unctrled airspaces was to try to avoid
multiple interpretations of the on-disk bmap, but I'm starting to think
this adds more complexity than it solves.

The main conflict is the meaning of "in-flight" blocks. When using the
"uncontrolled" bmap algorithm, in-flight blocks need to be
double-checked by traversing the filesystem. But in the "controlled"
bmap algorithm, blocks are only marked as "in-flight" while they are
truly in-flight (in-use in RAM, but not yet in use on disk).
Representing these both with the same "in-flight" state risks
incompatible algorithms misinterpreting the bmap across different
mounts.

In theory the separate airspaces solve this, but now all the algorithms
need to know how to convert the bmap from different modes, adding
complexity and code cost.

Well, in theory at least. I'm unsure separate airspaces actually solves
this due to subtleties between what "in-flight" means in the different
algorithms (note both in-use and free blocks are "in-flight" in the
unknown airspace!). It really depends on how the "controlled" algorithm
actually works, which isn't implemented/fully designed yet.

---

Long story short, due to a time crunch, I'm ripping this out for now and
just storing the current algorithm in the wcompat flags:

  LFS3_WCOMPAT_GBMAP       0x00006000  Global block-map in use
  LFS3_WCOMPAT_GBMAPNONE   0x00000000  Gbmap not in use
  LFS3_WCOMPAT_GBMAPCACHE  0x00002000  Gbmap in cache mode
  LFS3_WCOMPAT_GBMAPVFR    0x00004000  Gbmap in VFR mode
  LFS3_WCOMPAT_GBMAPIFR    0x00006000  Gbmap in IFR mode

Note GBMAPVFR/IFR != BMAPSLOW/FAST! At least BMAPSLOW/FAST can share
bmap representations:

- GBMAPVFR => Uncontrolled airspace, i.e. in-flight blocks may or may
  not be in use, need to traverse open files.

- GBMAPIFR => Controlled airspace, i.e. in-flight blocks are in use,
  at least until powerloss, no traversal needed, but requires more bmap
  writes.

- BMAPSLOW => Treediff by checking what blocks are in B but not in A,
  and what blocks are in A but not in B, O(n^2), but minimizes bmap
  updates.

  Can be optimized with a bloom filter.

- BMAPFAST => Treediff by clearing all blocks in A, and then setting all
  blocks in B, O(n), but also writes all blocks to the bmap twice even
  on small changes.

  Can be optimized with a sliding bitmap window (or a block hashtable,
  though a bitmap converges to the same thing in both algorithms when
  >=disk_size).

It will probably be worth unifying the bmap representation later (the
more algorithm-specific flags there are, the harder interop becomes for
users, but for now this opens a path to implementing/experimenting with
bmap algorithms without dealing with this headache.
2025-10-01 17:56:08 -05:00
beb1f1346a bmap: Started implementing ctrled/unctrled lfs3_alloc paths 2025-10-01 17:56:07 -05:00
e7c3755e21 bmap: Split known into ctrled+unctrled 2025-10-01 17:56:05 -05:00
732d6079e3 bmap: Added low-level bmap set algorithm and related tests
The neat thing about the on-disk bmap is that it's a range tree. We can
leverage order-statistic properties to compactly represent ranges of
similar blocks.

However, this does make updating the bmap slightly more complicated...
2025-10-01 17:55:39 -05:00
98f016b07e bmap: Added initial gbatc interactions, up until out-of-known or remount
This only works immediately after format, and only for one pass of the
disk, but it's a good way to test bmap lookups/allocation without
worrying about more complicated filesystem-wide interactions.
2025-10-01 17:55:31 -05:00
357526e775 rbyd: Allow refetching after claiming erased-state
Except for niche file snapshotting, most btree updates until this point
are probably linear, i.e. a successful commit replaces any internal
rbyd state that has been claimed. In this model it makes sense to mark
claimed rbyd as "invalid", since failure to replace the claimed state
indicates something went wrong during the commit.

But this isn't necessarily true when snapshotting, since we don't
replace the state of claimed snapshots.

But wait, shouldn't snapshotted rbyds become readonly? Not necessarily!
Rbyds can have multiple trunks with unrelated (or in this case, shared)
histories, so there's nothing wrong with refetching an rbyd and
continuing to commit after another snapshot commits to tbe block.

Eventually both snapshots will need to compact and diverge into two
blocks, but until then sharing an rbyd makes the most of available
erased state.

At least in theory, experience will show us how well this works.

---

Also note this is not true for mdirs. We view mdirs as atomic and always
up-to-date, so snapshotting doesn't really make sense.
2025-10-01 17:55:29 -05:00
59a4ae6f61 bmap: Taught littlefs how to traverse the gbmap
Fortunately the btree traversal logic is pretty reusable, so this just
required an additional tstate (LFS3_TSTATE_BMAP).

This raises an interesting question: _when_ do we traverse the bmap? We
need to wait until at least mtree traversal completes for gstate to be
reconstructed during lfs3_mount, but I think traversing before file
btrees makes sense.
2025-10-01 17:55:27 -05:00
1537f6a430 bmap: Decoding the gbmap in gstate now works
This is just a copy + edit of the grm logic, which raises the question
of if this logic can be generalized.
2025-10-01 17:55:24 -05:00
5f65b49ef8 bmap: scripts: Added on-disk bmap traversal to dbgbmap and friends
And yes, dbgbmapsvg.py's parents are working, thanks to a hacky blocks
@property (Python to the rescue!)
2025-10-01 17:55:16 -05:00
88180b6081 bmap: Initial scaffolding for on-disk block map
This is pretty exploratory work, so I'm going to try to be less thorough
in commit messages until the dust settles.

---

New tag for gbmapdelta:

  LFS3_TAG_GBMAPDELTA   0x0104  v--- ---1 ---- -1rr

New tags for in-bmap block types:

  LFS3_TAG_BMRANGE      0x033u  v--- --11 --11 uuuu
  LFS3_TAG_BMFREE       0x0330  v--- --11 --11 ----
  LFS3_TAG_BMINFLIGHT   0x0331  v--- --11 --11 ---1
  LFS3_TAG_BMINUSE      0x0332  v--- --11 --11 --1-
  LFS3_TAG_BMBAD        0x0333  v--- --11 --11 --11
  LFS3_TAG_BMERASED     0x0334  v--- --11 --11 -1--

New gstate decoding for gbmap:

  .---+- -+- -+- -+- -. cursor: 1 leb128  <=5 bytes
  | cursor            | known:  1 leb128  <=5 bytes
  +---+- -+- -+- -+- -+ block:  1 leb128  <=5 bytes
  | known             | trunk:  1 leb128  <=4 bytes
  +---+- -+- -+- -+- -+ cksum:  1 le32    4 bytes
  | block             | total:            23 bytes
  +---+- -+- -+- -+- -'
  | trunk         |
  +---+- -+- -+- -+
  |     cksum     |
  '---+---+---+---'

New bmap node revdbg string:

  vvv---- -111111- -11---1- -11---1-  (62 62 7e v0  bb~r)  bmap node

New mount/format/info flags (still unsure about these):

  LFS3_M_BMAPMODE     0x03000000  On-disk block map mode
  LFS3_M_BMAPNONE     0x00000000  Don't use the bmap
  LFS3_M_BMAPCACHE    0x01000000  Use the bmap to cache lookahead scans
  LFS3_M_BMAPSLOW     0x02000000  Use the slow bmap algorithm
  LFS3_M_BMAPFAST     0x03000000  Use the fast bmap algorithm

New gbmap wcompat flag:

  LFS3_WCOMPAT_GBMAP  0x00002000  Global block-map in use
2025-10-01 17:55:13 -05:00
238dbc705d Abandoned data-backed cache, use indirect lfs3_data_t on stack
This abandons the data-backed cache idea due to concerns around
readability and maintainability. Mixing const/mutable buffers in
lfs3_data_t was not great.

Instead, we now just allocate an indirect lfs3_data_t on the stack in
lfs3_file_sync_ to avoid the previous undefined behavior.

This actually results in less stack usage total, due to lfs3_file_t
allocations in lfs3_set/read, and avoid the more long-term memory cost
in lfs3_file_t:

              code          stack          ctx
  before:    36832           2376          684
  after:     36840 (+0.0%)   2368 (-0.3%)  684 (+0.0%)

Oh. And lfs3_file_sync_ isn't even on the stack hot-path, so this is a
net benefit over the previous cache -> data cast:

              code          stack          ctx
  before sa: 36844           2368          684
  after sa:  36840 (-0.0%)   2368 (+0.0%)  684 (+0.0%)

Still less cool though.
2025-07-22 13:39:43 -05:00
5035aa566b Adopted data-backed cache in lfs3_file_t to avoid undefined behavior
This fixes a strict aliasing violation in lfs3_file_sync_, where we cast
the file cache -> lfs3_data_t to avoid an extra stack allocation, by
modifying the file's cache struct to use an lfs3_data_t directly.

- file.cache.pos -> file.cache.pos
- file.cache.buffer -> file.cache.d.u.buffer_
- file.cache.size -> file.cache.d.size
- (const lfs3_data_t*)&file->cache -> &file->cache.d

Note the underscore_ in file.cache.d.u.buffer_. This did not fit
together as well as I had hoped, due to different const expectation
between the file cache and lfs3_data_t.

Up until this point lfs3_data_t has only been used to refer to const
data (ignoring side-band pointer casting in lfs3_mtree_traverse*), while
the file cache very much contains mutable data. To work around this I
added data.u.buffer_ as a mutable variant, which works, but risks an
accidental const violation in the future.

---

Unfortunately this does come with a minor RAM cost, since we no longer
hide file.cache.pos in lfs3_data_t's buffer padding:

           code          stack          ctx
  before: 36844           2368          684
  after:  36832 (-0.0%)   2376 (+0.3%)  684 (+0.0%)

  lfs3_file_t before: 164
  lfs3_file_t after:  168 (+2.4%)

I think it's pretty fair to call C's strict aliasing rules a real wet
blanket. It would be interesting to create a -fno-strict-aliasing
variant of littlefs in the future, to see how much code/RAM could be
saved if we were given free reign to abuse the available memory.

Probably not enough to justify the extra work, but it would be an
interesting experiment.
2025-07-22 13:31:16 -05:00
14d1f4778f trv: mtortoise: Gave up on a reasonable type, abuse shrub fields
I think modern C simply doesn't let us do what we want to do here, so
I'm giving up, discarding the lfs3_mtortoise_t type, and just abusing
various unrelated shrub fields to implement the tortoise. This
sacrifices readability, but at least avoids undefined behavior without
a RAM penalty:

- shrub.blocks => tortoise blocks
- shrub.weight => cycle distance
- shrub.eoff => power-of-two bound

Note this keeps trunk=0, which is a nice safety net in case some code
ever tries to read from the shrub in the future.

Fortunately the mtortoise logic is fairly self-contained in
lfs3_mtree_traverse_, so with enough comments hopefully the code is not
too confusing.

---

Apparently shaves off a couple more bytes of code. I'm guessing this is
just because of the slightly different struct offsets (we're reusing the
root's rbyd instead of the leaf's rbyd now):

           code          stack          ctx
  before: 36852           2368          684
  after:  36844 (-0.0%)   2368 (+0.0%)  684 (+0.0%)
2025-07-22 12:50:39 -05:00
fadf0cbd0e trv: Moved cycle detection tortoise into the shrub leaf
This forces our cycle detection tortoise (previously trv.u.mtortoise),
into the unused shrub leaf via pointer shenanigans.

This reclaims the remaining stack (and apparently code) we theoretically
gained from the btree traversal rework, up until the compiler got in the
way:

           code          stack          ctx
  before: 36876           2384          684
  after:  36852 (-0.1%)   2368 (-0.7%)  684 (+0.0%)

And it only required some _questionably_ defined behavior.

---

It's probably not well-defined behavior, but trying to understand what
the standard actually means on this is giving me a headache. I think I
have to agree C99+strict-aliasing lost the plot on this one. Note
mtortoise is only ever written/read through the same type.

What I want:

  lfs3_trv_t:          lfs3_bshrub_t:       lfs3_handle_t:
  .---+---+---+---. .. .---+---+---+---. .. .---+---+---+---.
  |     handle    |    |     handle    |    |     handle    |
  |               |    |               |    |               |
  +---+---+---+---+    +---+---+---+---+ .. '---+---+---+---'
  |   root rbyd   |    |   root rbyd   |
  |               |    |               |    lfs3_mtortoise_t:
  +---+---+---+---+    +---+---+---+---+ .. .---+---+---+---.
  |   leaf rbyd   |    |   leaf rbyd   |    |   mtortoise   |
  |               |    |               |    |               |
  +---+---+---+---+    +---+---+---+---+ .. '---+---+---+---'
  | staging rbyd  |    | staging rbyd  |
  |               |    |               |
  +---+---+---+---+ .. '---+---+---+---'
  |               |
  :               :

But I'm starting to think this is simply not possible in modern C.

At least this shows what is theoretically possible if we didn't have to
fight the compiler.
2025-07-22 12:49:49 -05:00
70872b5703 trv: Renamed trv.htrv -> trv.h
Just moving away from the *trv when unnecessary. This matches the h
variable used for local iteration.
2025-07-21 17:24:46 -05:00
4b7a5c9201 trv: Renamed OMDIRS -> HANDLES, OBTREE -> HBTREE
Looks like these traversal states were missed in the omdir -> handle
rename. I think HANDLES and HBTREE states make sense:

- LFS3_TSTATE_OMDIRS -> LFS3_TSTATE_HANDLES
- LFS3_TSTATE_OBTREE -> LFS3_TSTATE_HBTREE
2025-07-21 16:47:24 -05:00
ff7e196f92 btree: Renamed btree.leaf.rbyd -> btree.leaf.r
This matches other internal rbyds: btree.r, mdir.r, etc.

The intention of the single-char names is to reduce clutter around these
severely nested structs, both btrees and mdirs _are_ rbyds, so the name
doesn't really besides C-level type info.

I was hesitant on btree.leaf.rbyd, but decided consistency probably wins
here.
2025-07-21 16:43:39 -05:00
a871e02354 btree: Reworked btree traversal to leverage leaf caches
This comes from an observation that we never actually use the leaf cache
during traversals, and there is surprisingly little risk of a lookup
creating a conflict in the future.

Btree traversal fall into two categories:

1. Full traversals, where we traverse a full btree all at once. These
   are unlikely to have lookup conflicts because everything is
   usually self-contained in one chunk of logic.

2. Incremental traversals. These _are_ at risk, but in our current
   design limited to lfs3_trv_t, which already creates a fully
   bshrub/btree copy for tracking purposes.

   This copy unintentionally, but conveniently, protects against lookup
   conflicts.

So, why not reuse the btree leaf cache to hold the rbyd state during
traversals? In theory this makes lfs3_btree_traverse the same cost and
lfs3_btree_lookupnext, drops the need for lfs3_btrv_t, and simplifies
the internal API.

The only extra bit of state we need is the current target bid, which is
now expected as a caller-incremented argument similar to
lfs3_btree_lookupnext iteration.

There was a bit of futzing around with bid=-1 being necessary to
initialize traversal (to avoid conflicts with bid=-1 => 0 caused by
empty btrees). But the end result is a btree traversal that only needs
one extra word of state.

---

Unfortunately, in practice, the savings were not as great as expected:

           code          stack          ctx
  before: 36792           2400          684
  after:  36876 (+0.2%)   2384 (-0.7%)  684 (+0.0%)

This does claw back some stack, but less than a full rbyd due to the
union with the mtortoise in lfs3_trv_t. The mtortoise now dominates. It
might be possible to union the mtortoise and the bshrub/btree state
better (both are not needed at the same time), but strict aliasing rules
in C make this tricky.

The new lfs3_btree_traverse is also a bit more complicated in terms of
code cost. In theory this would be offset by the simpler traversal setup
logic, but we only actually call lfs3_btree_traverse twice:

1. In lfs3_mtree_traverse
2. In lfs3_file_ck

Still, some stack savings + a simpler internal API makes this worthwhile
for now. lfs3_trv_t is also due for a revisit, and hopefully it's
possible to better union things with btree leaf caches somehow.
2025-07-21 16:36:50 -05:00
e3e719a0f5 btree: Dropped bcommit->bid TODO comment
I was confused, but this commit->bid update is used to limit the
commit->bid to the btree weight. Note we limit the bid after storing it
as the initial rid.
2025-07-20 15:17:13 -05:00
ae53c326d6 btree: Limited leaf discarding in mdir commit to shrub roots only
What a mouthful.

The unconditional bshrub leaf discarding in lfs3_mdir_commit was copied
from the previous btree leaf caching implementation, but discarding
_all_ bshrub leaves on _every_ mdir commit is a bit insane.

Really, the only bshrub leaves that ever need to be discarded here are
the shrub roots, which are already questionable leaf caching targets
because they're already cached as the root rbyd.

An alternative option would be to just never cache shrub roots, but
tinkering around with the idea showed it would be more costly that
conditionally discarding leaves in lfs3_mdir_commit. At least here we
can reuse some of the logic that discards file leaves.

I'm also probably overthinking what is only a small code cost:

           code          stack          ctx
  before: 36784           2400          684
  after:  36792 (+0.0%)   2400 (+0.0%)  684 (+0.0%)

This doesn't take into account how much CPU time is spent creating rbyd
copies, but that is not something we are optimizing for.
2025-07-20 15:03:02 -05:00
cd9f93d859 btree: Resurrected btree leaf caching
This is an indulgence to simplify the upcoming auxiliary btree work.

Brings back the previously-reverted per-btree leaf caches, where each
lfs3_btree_t keeps track of two rbyds: The root and the most recently
accessed leaf.

At the surface level, this optimizes repeated access to the same btree
leaf. A common pattern for a number of littlefs's operations that has
proven tricky to manually optimize:

- Btree iteration
- Pokes for our crystalization heuristic
- Checksum collision resolution for dids and (FUTURE) ddkeys
- Related rattrs attached to a single bid

But the real motivation is to drop lfs3_btree_*lookupleaf and simplify
the internal APIs. If repeated lfs3_btree_lookup*s are already
efficient, there's no reason for extra leaf-level APIs, and in theory
any logic that interacts with btrees will be simpler.

---

This comes at a cost (humorously about the same amount as the
tag-returning refactor, if you ignore the extra 28 bytes of ctx).
Unsurprisingly, increasing the size of lfs3_btree_t has the biggest
impact on stack and ctx:

           code          stack          ctx
  before: 36084           2336          656
  after:  36784 (+1.9%)   2400 (+2.7%)  684 (+4.3%)

Also note from the previous commit messages: Btree leaf caching has
resulted in surprisingly little performance improvement for our current
benchmarks + implementation. It turns out if you're dominated by write
cost, optimizing btree lookups -- which already skip rbyd fetches, has
barely noticeable impact.

---

A note on reverting!

Eventually (after the auxiliary btree work) it will probably make sense
to revert this -- or at least provide a non-leaf-caching build for
code/RAM sensitive users.

I don't think this should be reverted as-is. Instead, I think we should
allow the option to just disable the leaf cache, while keeping the
simpler internal API. This would give us the best of all three worlds:

- A small code/RAM option
- Optimal btree iteration/nearby-lookup performance
- Simpler internal APIs

The only reason this isn't already implemented is because I want to
avoid fragmenting the codebase further while we're still in development
mode.
2025-07-20 13:57:50 -05:00
ba9a45aa01 runners: Don't include case-less suites in -Y/--summary
Note --list-suite-paths was already skipping case-less suites! I think
only -Y/--summary was an outlier.

This is consistent with test.py's matching of suite ids when no cases
are found (test_runner itself doesn't really care, it just reports no
matching cases). Though we do still compile case-less suites and include
them in the test_suites array, which may be confusing in the future.
2025-07-20 10:22:22 -05:00
c87361508b scripts: test.py/bench.py: Added --no-internal to skip internal tests
The --no-internal flag avoids building any internal tests/benches
(tests/benches with in="lfs3.c"), which can be useful for quickly
testing high-level things while refactoring. Refactors tend to break all
the internal tests, and it can be a real pain to update everything.

Note that --no-internal can be injected into the build with TESTCFLAGS:

  TESTCFLAGS=--no-internal make test-runner -j \
      && ./scripts/test.py -j -b

For a curious data point, here's the current number of
internal/non-internal tests:

                suites          cases                  perms
  total:            24            808          633968/776298
  internal:         22 (91.7%)    532 (65.8%)  220316/310247 (34.8%)
  non-internal:      2 ( 8.3%)    276 (34.2%)  413652/466051 (65.2%)

It's interesting to note that while internal tests have more test cases,
the non-internal tests generate a larger number of test permutations.
This is probably because internal tests tend to target specific corner
cases/known failure points, and don't invite much variants.

---

While --no-internal may be useful for high-level testing during a
refactor, I'm not sure it's a good idea to rely on it for _debugging_ a
refactor.

The whole point of internal testing is to catch low-level bugs early,
with as little unnecessary state as possible. Skipping these to debug
integration tests is a bit counterproductive!
2025-07-20 09:53:53 -05:00
4aaa928554 Renamed scmp -> cmp
- enum lfs3_scmp -> enum lfs3_cmp
- cmp -> cmp

lfs3_scmp_t is still used as the type, as the s prefix indicates the
type is signed, usually for muxing with error codes.

I think that led to the enum also being named lfs3_scmp, but that's not
quite right.

But none of this really matters because enums are so useless and broken
in C.
2025-07-18 18:38:29 -05:00
2e47172fa4 Renamed error -> err
- enum lfs3_error -> enum lfs3_err
- err -> err

Really this just updates `enum lfs3_err` to match the prefixes used
everywhere else. And because enum types are kind of useless in C, this
has no effect on any other part of the codebase.
2025-07-18 18:38:20 -05:00