Commit Graph

71 Commits

Author SHA1 Message Date
b01a385bc9 Added LFS3_F_REGBMAP and LFS3_F_COMPACTMETA
These are unlikely to make much progress, but that doesn't seem like a
great reason to disallow these flags in lfs3_format:

  LFS3_F_REGBMAP      0x00002000  Repopulate the gbmap
  LFS3_F_COMPACTMETA  0x00008000  Compact metadata logs

These are actually guaranteed to do _no_ work when formatting _without_
the gbmap, but with the gbmap it's less clear. Looking forward to the
planned ckfactory feature, these may be useful for cleaning up any rbyd
commits created as a part of building the initial gbmap.

---

Also tweaked the formatting for LFS3_F_* flags a bit, including making
all ifdefs explicit (mainly ifdef LFS3_RDONLY). Mixed ifdefs are a real
pain to read.

No code changes.
2025-11-13 16:14:56 -06:00
9e75138f7a Rearranged O/M/F/GC/I flags
Now that we don't need to encode tstate info in our traversal flags, we
can move things around to be a bit more comfortable.

This is also after some tweaking to make space for planned features:

O flags:

  O_MODE             0x00000003  ---- ---- ---- ---- ---- ---- ---- --11
  O_RDONLY           0x00000000  ---- ---- ---- ---- ---- ---- ---- ----
  O_WRONLY           0x00000001  ---- ---- ---- ---- ---- ---- ---- ---1
  O_RDWR             0x00000002  ---- ---- ---- ---- ---- ---- ---- --1-
  O_CREAT            0x00000004  ---- ---- ---- ---- ---- ---- ---- -1--
  O_EXCL             0x00000008  ---- ---- ---- ---- ---- ---- ---- 1---
  O_TRUNC            0x00000010  ---- ---- ---- ---- ---- ---- ---1 ----
  O_APPEND           0x00000020  ---- ---- ---- ---- ---- ---- --1- ----
  O_FLUSH            0x00000040  ---- ---- ---- ---- ---- ---- -1-- ----
  O_SYNC             0x00000080  ---- ---- ---- ---- ---- ---- 1--- ----
  O_DESYNC           0x00100000  ---- ---- ---1 ---- ---- ---- ---- ----
  O_DEDAG*           0x00000100  ---- ---- ---- ---- ---- ---1 ---- ----
  O_DEDUP*           0x00000200  ---- ---- ---- ---- ---- --1- ---- ----
  O_COMPR?           0x00000400  ---- ---- ---- ---- ---- -1-- ---- ----

  O_CKMETA           0x00010000  ---- ---- ---- ---1 ---- ---- ---- ----
  O_CKDATA           0x00020000  ---- ---- ---- --1- ---- ---- ---- ----
  O_REPAIRMETA*      0x00040000  ---- ---- ---- -1-- ---- ---- ---- ----
  O_REPAIRDATA*      0x00080000  ---- ---- ---- 1--- ---- ---- ---- ----

  o_WRSET            0x00000003  ---- ---- ---- ---- ---- ---- ---- --11
  o_TYPE             0xf0000000  1111 ---- ---- ---- ---- ---- ---- ----
  o_ZOMBIE           0x08000000  ---- 1--- ---- ---- ---- ---- ---- ----
  o_UNCREAT          0x04000000  ---- -1-- ---- ---- ---- ---- ---- ----
  o_UNSYNC           0x02000000  ---- --1- ---- ---- ---- ---- ---- ----
  o_UNCRYST          0x01000000  ---- ---1 ---- ---- ---- ---- ---- ----
  o_UNGRAFT          0x00800000  ---- ---- 1--- ---- ---- ---- ---- ----
  o_UNFLUSH          0x00400000  ---- ---- -1-- ---- ---- ---- ---- ----

  * Planned
  ? Hypothetical

T flags:

  T_MODE             0x00000001  ---- ---- ---- ---- ---- ---- ---- ---1
  T_RDONLY           0x00000000  ---- ---- ---- ---- ---- ---- ---- ----
  T_RDWR             0x00000001  ---- ---- ---- ---- ---- ---- ---- ---1
  T_MTREEONLY        0x00000002  ---- ---- ---- ---- ---- ---- ---- --1-
  T_EXCL             0x00000008  ---- ---- ---- ---- ---- ---- ---- 1---
  T_MKCONSISTENT     0x00000800  ---- ---- ---- ---- ---- 1--- ---- ----
  T_RELOOKAHEAD      0x00001000  ---- ---- ---- ---- ---1 ---- ---- ----
  T_REGBMAP          0x00002000  ---- ---- ---- ---- --1- ---- ---- ----
  T_PREERASE*        0x00004000  ---- ---- ---- ---- -1-- ---- ---- ----
  T_COMPACTMETA      0x00008000  ---- ---- ---- ---- 1--- ---- ---- ----
  T_CKMETA           0x00010000  ---- ---- ---- ---1 ---- ---- ---- ----
  T_CKDATA           0x00020000  ---- ---- ---- --1- ---- ---- ---- ----
  T_REPAIRMETA*      0x00040000  ---- ---- ---- -1-- ---- ---- ---- ----
  T_REPAIRDATA*      0x00080000  ---- ---- ---- 1--- ---- ---- ---- ----

  t_EVICT*           0x00000010  ---- ---- ---- ---- ---- ---- ---1 ----
  t_TYPE             0xf0000000  1111 ---- ---- ---- ---- ---- ---- ----
  t_ZOMBIE           0x08000000  ---- 1--- ---- ---- ---- ---- ---- ----
  t_CKPOINTED        0x04000000  ---- -1-- ---- ---- ---- ---- ---- ----
  t_DIRTY            0x02000000  ---- --1- ---- ---- ---- ---- ---- ----
  t_STALE            0x01000000  ---- ---1 ---- ---- ---- ---- ---- ----
  t_BTYPE            0x00f00000  ---- ---- 1111 ---- ---- ---- ---- ----

  * Planned

M/F flags:

  M_MODE             0x00000001  ---- ---- ---- ---- ---- ---- ---- ---1
  M_RDWR             0x00000000  ---- ---- ---- ---- ---- ---- ---- ----
  M_RDONLY           0x00000001  ---- ---- ---- ---- ---- ---- ---- ---1
  M_STRICT?          0x00000002  ---- ---- ---- ---- ---- ---- ---- --1-
  M_FORCE?           0x00000004  ---- ---- ---- ---- ---- ---- ---- -1--
  M_FORCEWITHRECKLESSABANDON?
                     0x00000008  ---- ---- ---- ---- ---- ---- ---- 1---
  M_FLUSH            0x00000040  ---- ---- ---- ---- ---- ---- -1-- ----
  M_SYNC             0x00000080  ---- ---- ---- ---- ---- ---- 1--- ----
  M_DEDAG*           0x00000100  ---- ---- ---- ---- ---- ---1 ---- ----
  M_DEDUP*           0x00000200  ---- ---- ---- ---- ---- --1- ---- ----
  M_COMPR?           0x00000400  ---- ---- ---- ---- ---- -1-- ---- ----
  M_REVDBG           0x00000010  ---- ---- ---- ---- ---- ---- ---1 ----
  M_REVNOISE         0x00000020  ---- ---- ---- ---- ---- ---- --1- ----
  M_CKPROGS          0x00100000  ---- ---- ---1 ---- ---- ---- ---- ----
  M_CKFETCHES        0x00200000  ---- ---- --1- ---- ---- ---- ---- ----
  M_CKMETAPARITY     0x00400000  ---- ---- -1-- ---- ---- ---- ---- ----
  M_CKMETAREDUND*    0x00800000  ---- ---- 1--- ---- ---- ---- ---- ----
  M_CKDATACKSUMS     0x01000000  ---- ---1 ---- ---- ---- ---- ---- ----
  M_CKREADS*         0x01800000  ---- ---1 1--- ---- ---- ---- ---- ----

  M_MKCONSISTENT     0x00000800  ---- ---- ---- ---- ---- 1--- ---- ----
  M_RELOOKAHEAD      0x00001000  ---- ---- ---- ---- ---1 ---- ---- ----
  M_REGBMAP          0x00002000  ---- ---- ---- ---- --1- ---- ---- ----
  M_PREERASE*        0x00004000  ---- ---- ---- ---- -1-- ---- ---- ----
  M_COMPACTMETA      0x00008000  ---- ---- ---- ---- 1--- ---- ---- ----
  M_CKMETA           0x00010000  ---- ---- ---- ---1 ---- ---- ---- ----
  M_CKDATA           0x00020000  ---- ---- ---- --1- ---- ---- ---- ----
  M_REPAIRMETA*      0x00040000  ---- ---- ---- -1-- ---- ---- ---- ----
  M_REPAIRDATA*      0x00080000  ---- ---- ---- 1--- ---- ---- ---- ----

  F_CKFACTORY*       0x00000002  ---- ---- ---- ---- ---- ---- ---- --1-
  F_GBMAP            0x02000000  ---- --1- ---- ---- ---- ---- ---- ----
  F_GDDTREE*         0x04000000  ---- -1-- ---- ---- ---- ---- ---- ----
  F_GPTREE*          0x08000000  ---- 1--- ---- ---- ---- ---- ---- ----

  F_METAR1*          0x10000000  ---1 ---- ---- ---- ---- ---- ---- ----
  F_METAR2*          0x20000000  --1- ---- ---- ---- ---- ---- ---- ----
  F_METAR3*          0x30000000  --11 ---- ---- ---- ---- ---- ---- ----
  F_DATAR1*          0x40000000  -1-- ---- ---- ---- ---- ---- ---- ----
  F_DATAR2*          0x80000000  1--- ---- ---- ---- ---- ---- ---- ----
  F_DATAR3*          0xc0000000  11-- ---- ---- ---- ---- ---- ---- ----

  * Planned
  ? Hypothetical

It's a bit concerning that _all_ 32-bit mount flags end up used, but
what can you do...

Code changes minimal:

                 code          stack          ctx
  before:       35964           2280          660
  after:        35968 (+0.0%)   2280 (+0.0%)  660 (+0.0%)

                 code          stack          ctx
  gbmap before: 38828           2296          772
  gbmap after:  38828 (+0.0%)   2296 (+0.0%)  772 (+0.0%)
2025-11-13 16:13:24 -06:00
673fa7876f Reduced the scope of LFS3_REVDBG/REVNOISE
LFS3_REVDBG introduced a lot of overhead for something I'm not sure
anyone will actually use (I have enough tooling that the state of an
rbyd is rarely a mystery, see dbgbmap.py). That, and we're running out
of flags!

So this reduces LFS3_REVDBG to just store one of "himb" in the first
(lowest) byte of the revision count; information that is easily
available:

  vvvv---- -------- -------- --------
  vvvvrrrr rrrrrr-- -------- --------
  vvvvrrrr rrrrrrnn nnnnnnnn nnnnnnnn
  vvvvrrrr rrrrrrnn nnnnnnnn dddddddd
  '-.''----.----''----.- - - '---.--'
    '------|----------|----------|---- 4-bit relocation revision
           '----------|----------|---- recycle-bits recycle counter
                      '----------|---- pseudorandom noise (if revnoise)
                                 '---- h, i, m, or b (if revdbg)
                             -11-1---  - h = mroot anchor
                             -11-1--1  - i = mroot
                             -11-11-1  - m = mdir
                             -11---1-  - b = btree node

Some other notes:

- Enabled LFS3_REVDBG and LFS3_REVNOISE to work together, now that
  LFS3_REVDBG doesn't consume all unused rev bits.

  Note that LFS3_REVDBG has priority over LFS3_REVNOISE, but _not_
  recycle-bits, etc. Otherwise problems would happen for recycle-bits
  >2^20 (though do we care?).

- Fixed an issue where using the gcksum as a noise source results in
  noise=0 when there is only an mroot. This is due to how we xor out
  the current mdir cksum during an mdir commit.

  Fixed by using gcksum_p instead of gcksum.

- Added missing LFS3_I_REVDBG/REVNOISE flags in the tests, so now you
  can actually run the tests with LFS3_REVDBG/REVNOISE (this probably
  just fell out-of-date at some point).

---

Curiously, despite LFS3_REVDBG/REVNOISE being disabled by default, this
did save some code. I'm guessing the non-tail-call mtree/gbmap commit
functions prevented some level of inlining?:

                 code          stack          ctx
  before:       35964           2280          660
  after:        35964 (+0.0%)   2280 (+0.0%)  660 (+0.0%)

                 code          stack          ctx
  gbmap before: 38940           2296          772
  gbmap after:  38828 (-0.3%)   2296 (+0.0%)  772 (+0.0%)
2025-11-13 01:44:37 -06:00
4010afeafd trv: Reintroduced LFS3_T_EXCL
With the relaxation of traversal behavior under mutation, I think it
makes sense to bring back LFS3_T_EXCL. If only to allow traversals to
gaurantee termination under mutation. Now that traversals no longer
guarantee forward progress, it's possible to get stuck looping
indefinitely if the filesystem is constantly being mutated.

Non-excl traversals are probably still useful for GC work and debugging
threads, but LFS3_T_EXCL now allows traversals to terminate immediately
with LFS3_ERR_BUSY at the first sign of unrelated filesystem mutation:

  LFS3_T_EXCL  0x00000008  Error if filesystem modified

Internally, we already track unrelated mutation to avoid corrupt state
(LFS3_t_DIRTY), so this is a very low-cost feature:

                 code          stack          ctx
  before:       35944           2280          660
  after:        35964 (+0.1%)   2280 (+0.0%)  660 (+0.0%)

                 code          stack          ctx
  gbmap before: 38916           2296          772
  gbmap after:  38940 (+0.1%)   2296 (+0.0%)  772 (+0.0%)

                 code          stack          ctx
  gc before:    36016           2280          768
  gc after:     36036 (+0.1%)   2280 (+0.0%)  768 (+0.0%)
2025-11-12 13:30:11 -06:00
e9f2944573 Renamed bshrub.shrub[_] -> bshrub.b[_]
Mostly for consistency with mtrv.b and gbmap.b, but also (1) this
hopefully reduces confusion around the fact that these can refer to both
bshrubs and btrees, and (2) saves a bit of typing with the messy struct
namespaces forced by C's strict aliasing.
2025-11-08 22:31:46 -06:00
14c369af93 trv: Adopted LFS3_t_STALE for marking block queue as stale
This solves the previous gc-needs-block-queue-so-we-can-clobber-block-
queue issue by adding an additional LFS3_t_STALE flag to indicate when
any block queues would be invalid.

So instead of clearing block queues in lfs3_alloc_ckpoint, we just set
LFS3_t_STALE, and any lfs3_trv_ts can clear their block queues in
lfs3_trv_read. This allows lfs3_mgc_ts to be allocated without a block
queue when doing any LFS3_M_*/LFS3_F_*/LFS3_GC_* work.

LFS3_t_STALE is set at the same time as LFS3_t_CKPOINT and LFS3_t_DIRTY,
but we need a separate bit so lfs3_trv_read can clear the flag after
flushing without losing ckpoint/dirty information.

---

Unfortunately, none of the stack-allocated lfs3_mgc_ts are on the stack
hot-path, so we don't immediate savings. But note the 2-words saved in
ctx when compiling in LFS3_GC mode:

                 code          stack          ctx
  before:       35940           2280          660
  after:        35944 (+0.0%)   2280 (+0.0%)  660 (+0.0%)

                 code          stack          ctx
  gbmap before: 38916           2296          772
  gbmap after:  38916 (+0.0%)   2296 (+0.0%)  772 (+0.0%)

                 code          stack          ctx
  gc before:    36012           2280          776
  gc after:     36016 (+0.0%)   2280 (+0.0%)  768 (-1.0%)
2025-11-08 22:31:42 -06:00
d1d69c0a52 trv: Greatly simplified filesystem traversal
The main idea here is to drop the flag-encoded tstate state machine, and
replace it with a matrix controlled by special mid + bid values:

                    -- mid ->
             -5   -4   -3   -2 >=-1
  bid   -2    x    x              x  --> mdir
   v  >=-1         x  gbm  gbm    x  --> bshrub/btree

              '----|----|----|----|----> mroot anchor
                   '----|----|----|----> mroot chain + mtree
                        '----|----|----> gbmap   (in-ram gbmap)
                             '----|----> gbmap_p (on-disk gbmap)
                                  '----> file bshrubs/btrees

This was motivated by the observation that everything in our filesystem
can be modeled as mdir + bshrub/btree tuples, as long as some states are
noops. And we can cleanly encode these tuples in the unused negative
mid + bid ranges without needing an explicit state machine.

Well, that and the previous tstate state machine approach being an ugly
pile of switch cases and messy logic.

Note though that some mids may need to traverse multiple mdirs/bshrub/
btrees:

- The mroot chain + mtree (mid=-4) needs to traverse all mroots in the
  mroot chain, and detect any cycles.

- File mdirs (mid>=-1) need to traverse both the on-disk bshrub/btree
  and any opened file handles' bshrubs/btrees before moving onto the
  next mid.

  This grows O(n^2) because all file handles are in one big unsorted
  linked-list, but as usual we don't care.

In addition to the greatly simplified traversal logic, the new state
matrix simplifies traversal clobbering: Setting bid=-2 always forces a
bshrub/btree refetch.

This comes at the cost of traversal _precision_, i.e. we can now revisit
previously visited bshrub/btree nodes. But I think this is well worth it
for more robust traversal clobbering. Traversal clobbering is delicate
and difficult to get right.

Besides, we can already revisit blocks due to CoW references, so what's
the harm in revisiting blocks when under mutation?

---

The simpler traversal logic leads to a nice amount of code savings
across the board:

                 code          stack          ctx
  before:       36476           2304          660
  after:        35940 (-1.5%)   2280 (-1.0%)  660 (+0.0%)

                 code          stack          ctx
  gbmap before: 39524           2320          772
  gbmap after:  38916 (-1.5%)   2296 (-1.0%)  772 (+0.0%)

                 code          stack          ctx
  gc before:    36548           2304          804
  gc after:     36012 (-1.5%)   2280 (-1.0%)  776 (-3.5%)

Note the ctx savings in LFS3_GC mode. Most of the stack/ctx savings
comes from the smaller lfs3_mtrv_t struct, which no longer needs to
stage bshrubs (we no longer care about bshrubs across mdir commit as a
part of the above clobbering simplifications):

                before  after
  lfs3_mtrv_t:     128    100 (-21.9%)
  lfs3_mgc_t:      128    100 (-21.9%)
  lfs3_trv_t:      136    108 (-20.6%)

Unfortunately, the simpler clobbering means now any gc work needs the
block queue (i.e. lfs3_trv_t), solely so clobbering the block queue
doesn't clobber unallocated memory. Not great but hopefully fixable.

---

Some other notes:

- As a part of simplifying traversal clobbering, everything is triggered
  by lfs3_alloc_ckpoint (via lfs3_trv_ckpoint_).

  This may clobber traversals more than is strictly necessary, but
  that's kinda the idea. Better safe than sorry.

  And no more need to explicit lfs3_handle_clobber calls is nice.

- Opened file handle iteration is now tracked by the traversal handle's
  position in the handle linked-list, instead of a separate handle
  pointer. This means one less thing to disentangle and makes traversals
  no longer a special case for things like lfs3_handle_close.

  You may think this bumps traversals up to O(n^3) in-ram, but because
  we only ever visit each unique handle + mid once, we can keep the
  total O(n^2) if we're smart about linked-list updates!

- lfs3_mdir_commit needed to be tweaked to accept mids<=-1, instead of
  just mid=-1 for the mroot. Unfortunately I don't know how much this
  costs on its own.

- The reorganization of lfs3_mtrv_t means lfs3_mtortoise_t gets its own
  struct again!

- No more tstate state machine also frees up a big chunk of the
  traversal flag space, which was getting pretty cramped.
2025-11-08 19:46:22 -06:00
a01b1b73b2 btree: Moved leaf caching behind LFS3_BLEAFCACHE ifdef
This is motivated by the observation that the O(n log_b n) btree
iteration really just hasn't been a bottleneck in our benchmarks.

Our write performance is mostly dominated by compaction costs, and while
filesystem _traversals_ are a concern, it's easy to explicitly track
rbyds in lfs3_btrv_t.

Additionally:

- We track mdirs during mtree iteration, which are the true mtree
  leaves.

- We already cache file leaves, i.e. bptrs and read-fragments.

On top of this, leaf caching adds complexity, both in terms of
code/stack costs, but also in terms of reliability. It introducing the
need for cache invalidation, which is infamously one of the two hard
problems in computer science!

This is the second(?) time btree leaf traversals have been reverted, so
see previous commit messages for even more arguments against.

---

Eventually, we should probably just delete the btree leaf cache logic to
avoid the maintenance headache (cache invalidation + opt+in/less
testing = ouch). But I want to do a bit more benchmarking comparing the
two modes, so just moving this behind an ifdef for now.

Saves code, and of course RAM:

                              code          stack          ctx
  before btrv:               37160           2352          688
  before:                    37088 (-0.2%)   2384 (+1.4%)  688 (+0.0%)
  after:                     36480 (-1.8%)   2304 (-2.0%)  660 (-4.1%)

But note while this keeps the performance implications of btree leaf
caching, it does not keep the code/stack optimizations that internally
reuse the leaf cache for things (btrv, lookupnext_ rbyd side-channel,
etc).

In _theory_ these could have been kept with enough ifdefs, but it would
have made the codebase quite a bit of a hell to maintain:

                              code          stack          ctx
  always-bleafcache:         37160           2352          688
  no-bleafcache:             36480 (-1.8%)   2304 (-2.0%)  660 (-4.1%)
  yes-bleafcache:            37044 (-0.3%)   2384 (+1.4%)  688 (+0.0%)

Gbmap mode has even more savings due to how many gbmap copies we have
flying around:

                              code          stack          ctx
  gbmap + always-bleafcache: 40132           2368          856
  gbmap + no-bleafcache:     39464 (-1.7%)   2320 (-2.0%)  772 (-9.8%)
  gbmap + yes-bleafcache:    40052 (-0.2%)   2400 (+1.4%)  856 (+0.0%)

---

In the future, _maybe_ we can revisit this. But I think a better design
would be to cache btree leaves globally, in lfs3_t, similarly to the
theoretical mdir cache. This would allow a user-configurable number of
cached btree nodes, and may make cache invalidation easier.

Note, however, that btree nodes don't need to be fetched (even for
commits now!), so the benefits would be much smaller than for the
theoretical mdir cache.

But hey, it would defend the lack of low-level rbyd tracking during
iteration/rattr queries!
2025-10-26 15:33:27 -05:00
39a265ce90 btree: Dropped reliance on leaf cache during traversals
Brings back lfs3_btrv_t, but keeps some of the btree internal changes.

I think the biggest one is dropping the internal branch pointer, now
instead of internally pointing to the root rbyd, we just unconditionally
sync the rbyd state anytime the rbyd matches the root's weight. This is
necessary to avoid out-of-sync state when traversing bshrubs under
mutation.

Also after refactoring I think the current btree traversal logic is
easier to read.

---

This is in preparation for removing the leaf cache, or at least making
it opt-in.

It adds a chunk of stack, but in theory we can reclaim this by allowing
leaf caches to be disabled:

           code          stack          ctx
  before: 37160           2352          688
  after:  37088 (-0.2%)   2384 (+1.4%)  688 (+0.0%)
2025-10-25 16:54:41 -05:00
5d905e6da4 Dropped LFS3_KVONLY and LFS3_2BONLY modes for now
I think these are good ideas to bring back when littlefs3 is more
mature, but at the moment the number of different builds is creating too
much friction.

LFS3_KVONLY and LFS3_2BONLY in particular _add_ significant chunks of
code (lfs3_file_readget_, lfs3_file_flushset_, and various extra logic
sprinkled throughout the codebase), and the current state of testing
means I have no idea if any of it still works.

These are also low-risk for introducing any disk related changes.

So, ripping out for now to keep the current experimental development
tractable. May reintroduce in the future (probably after littlefs3 is
stabilized) if there is sufficient user interest. But doing so will
probably also need to come with actual testing in CI.
2025-10-24 00:20:53 -05:00
207446223b rdonly: Fixed various LFS3_RDONLY compile errors
This just fell out-of-sync a bit during the gbmap work. Note we _do_
support LFS3_RDONLY + LFS3_GBMAP, as fetching the gbmap is necessary for
CKMETA to check all metadata. Fortunately this is relatively cheap:

                 code          stack          ctx
  rdonly:       10716            896          532
  rdonly+gbmap: 10988 (+2.5%)    896 (+0.0%)  680 (+27.8%)

Though this does highlight that a sort of LFS3_NO_TRV mode could remove
quite a bit of code.
2025-10-24 00:19:49 -05:00
3ab7ecb2b0 Renamed file_cache -> fcache and gbmap_re -> regbmap
This walks back some of the attempt at strict object namespacing in
struct lfs3_cfg:

- cfg.file_cache_size  -> cfg.fcache_size
- filecfg.cache_size   -> filecfg.fcache_size
- filecfg.cache_buffer -> filecfg.fcache_buffer
- cfg.gbmap_re_thresh  -> cfg.regbmap_thresh

Motivation:

- cfg.regbmap_thresh now matches cfg.gc_regbmap_thresh, instead of using
  awkwardly different namespacing patterns.

- Giving fcache a more unique name is useful for discussion. Having
  pcache, rcache, and then file_cache was a bit awkward.

  Hopefully it's also more clear that cfg.fcache_size and
  filecfg.fcache_size are related.

- Config in struct lfs3_cfg is named a bit more consistently, well, if
  you ignore gc_*_* options.

- Less typing.

Though this gets into pretty subjective naming territory. May revert
this if the new terms are uncomfortable after use.
2025-10-24 00:18:54 -05:00
b49d9e9ece Renamed REPOP* -> RE*
So:

- cfg.gc_repoplookahead_thresh -> cfg.gc_relookahead_thresh
- cfg.gc_repopgbmap_thresh     -> cfg.gc_regbmap_thresh
- cfg.gbmap_repop_thresh       -> cfg.gbmap_re_thresh
- LFS3_*_REPOPLOOKAHEAD        -> LFS3_*_RELOOKAHEAD
- LFS3_*_REPOPGBMAP            -> LFS3_*_REGBMAP

Mainly trying to reduce the mouthful that is REPOPLOOKAHEAD and
REPOPGBMAP.

As a plus this also avoids potential confusion of "repop" as a push/pop
related operation.
2025-10-24 00:16:37 -05:00
8a58954828 trv: Reduced LFS3_t_CKPOINTED + LFS3_t_MUTATED -> LFS3_t_CKPOINTED
This drops LFS3_t_MUTATED in favor of just using LFS3_t_CKPOINTED
everywhere:

1. These meant roughly the same thing, with LFS3_t_MUTATED being a bit
   tighter at the cost of needing to be explicitly set.

2. The implicit setting of LFS3_t_CKPOINTED by lfs3_alloc_ckpoint -- a
   function that already needs to be called before mutation -- means we
   have one less thing to worry about.

   Implicit properties like LFS3_t_CKPOINTED are great for building a
   reliable system. Manual flags like LFS3_t_MUTATED, not so much.

3. Why use two flags when we can get away with one?

The only downside is we may unnecessarily clobber gc/traversal work when
we don't actually mutate the filesystem. Failed file open calls are a
good example.

However this tradeoff seems well worth it for an overall simpler +
more reliable system.

---

Saves a bit of code:

                 code          stack          ctx
  before:       37220           2352          688
  after:        37160 (-0.2%)   2352 (+0.0%)  688 (+0.0%)

                 code          stack          ctx
  gbmap before: 40184           2368          856
  gbmap after:  40132 (-0.1%)   2368 (+0.0%)  856 (+0.0%)
2025-10-24 00:12:32 -05:00
5d70e47708 trv: Reverted LFS3_t_NOSPC, forward gbmap repop errors
Note: This affects the blocking lfs3_alloc_repopgbmap as well as
incremental gc/traversal repopulations. Now all repop attempts return
LFS3_ERR_NOSPC when we don't have space for the gbmap, motivation below.

This reverts the previous LFS3_t_NOSPC soft error, in which traversals
were allowed to continue some gc/traversal work when encountering
LFS3_ERR_NOSPC. This results in a simpler implementation and fewer error
cases to worry about.

Observation/motivation:

- The main motivation is noticing that when we're in low-space
  conditions, we just start spamming gbmap repops even if they all fail.

  That's really not great! We might as well just mark the flash as dead
  if we're going to start spamming erases!

  At least with an error the user can call rmgbmap to try to make
  progress.

- If we're in a low-space condition, something else will probably return
  LFS3_ERR_NOSPC anyways. Might as well report this early and simplify
  our system.

- It's a simpler model, and littlefs3 is already much more complicated
  than littlefs2. Maybe we should lean more towards a simpler system
  at the cost of some niche optimizations.

---

This had the side-effect of causing more lfs3_alloc_ckpoints to return
errors during testing, which revealed a bug in our uz/uzd_fuzz tests:

- We weren't flushing after writes to the opened RDWR files, which could
  cause delayed errors to occur during the later read checks in the
  test.

  Fortunately LFS3_O_FLUSH provides a quick and easy fix!

  Note we _don't_ adopt this in all uz/uzd_fuzz tests, only those that
  error. It's good to test both with and without LFS3_O_FLUSH to test
  that read-flushing also works under stress.

Saves a bit of code:

                 code          stack          ctx
  before:       37260           2352          688
  after:        37220 (-0.1%)   2352 (+0.0%)  688 (+0.0%)

                 code          stack          ctx
  gbmap before: 40220           2368          856
  gbmap after:  40184 (-0.1%)   2368 (+0.0%)  856 (+0.0%)
2025-10-24 00:03:14 -05:00
f892d299dd trv: Added LFS3_t_NOSPC, avoid ENOSPC errors in traversals
This relaxes error encountered during lfs3_mtree_gc to _not_ propagate,
but instead just log a warning and prevent the relevant work from being
checked off during EOT.

The idea is this allows other work to make progress in low-space
conditions.

I originally meant to limit this to gbmap repopulations, to match the
behavior of lfs3_alloc_repopgbmap, but I think extending the idea to all
filesystem mutating operations makes sense (LFS3_T_MKCONSISTENT +
LFS3_T_REPOPGBMAP + LFS3_T_COMPACTMETA).

---

To avoid incorrectly marking traversal work as completed, we need to
track if we hit any ENOSPC errors, thus the new LFS3_t_NOSPC flag:

  LFS3_t_NOSPC  0x00800000  Optional gc work ran out of space

Not the happiest just throwing flags at problems, but I can't think of a
better solution at the moment.

This doesn't differentiate between ENOSPC errors during the different
types of work, but in theory if we're hitting ENOSPC errors whatever
work returns the error is a toss-up anyways.

---

Adds a bit of code:

                 code          stack          ctx
  before:       37208           2352          688
  after:        37248 (+0.1%)   2352 (+0.0%)  688 (+0.0%)

                 code          stack          ctx
  gbmap before: 40120           2368          856
  gbmap after:  40204 (+0.2%)   2368 (+0.0%)  856 (+0.0%)
2025-10-24 00:00:39 -05:00
12874bff76 gbmap: Added gc_repoplookahead_thresh and gc_repopgbmap_thresh
To allow relaxing when LFS3_I_REPOPLOOKAHEAD and LFS3_I_REPOPGBMAP will
be set, potentially reducing gc workload after allocating only a couple
blocks.

The relevant cfg comments have quite a bit more info.

Note -1 (not the default, 0, maybe we should explicitly flip this?)
restores the previous functionality of setting these flags on the first
block allocation.

---

Also tweaked gbmap repops during gc/traversals to _not_ try to repop
unless LFS3_I_REPOPGBMAP is set. We probably should have done this from
the beginning since repopulating the gbmap writes to disk and is
potentially destructive.

Adds code, though hopefully we can claw this back with future config
rework:

                 code          stack          ctx
  before:       37176           2352          684
  after:        37208 (+0.1%)   2352 (+0.0%)  688 (+0.6%)

                 code          stack          ctx
  gbmap before: 40024           2368          848
  gbmap after:  40120 (+0.2%)   2368 (+0.0%)  856 (+0.9%)
2025-10-23 23:56:50 -05:00
1dc1a26f11 gc: Added LFS3_GC_ALL to make running all gc work easier
This is an alias for all possible gc work, which is a bit more
complicated than you might think due to compile-time features (example:
LFS3_GC_REPOPGBMAP).

The intention is to make loops like the following easy to write:

  struct lfs3_fsinfo fsinfo;
  lfs3_fs_stat(&lfs3, &fsinfo) => 0;

  lfs3_trv_t trv;
  lfs3_trv_open(&lfs3, &trv, fsinfo.flags & LFS3_GC_ALL) => 0;
  ...

It's possible to do this by explicitly setting all gc flags, but that
requires quite a bit of knowledge from the user.

Another option is allowing -1 for gc/traversal flags, but that loses
assert protection against unknown/misplaced flags.

---

This raises more questions about the prefix naming: it feels a bit weird
to take LFS3_I_* flags, mask with LFS3_GC_* flags, and pass them as
LFS3_T_* flags, but it gets the job done.

Limiting LFS3_GC_ALL to the LFS3_GC_* namespace avoids issues with
opt-out/mode flags such as LFS3_T_RDONLY, LFS3_T_MTREEONLY, etc. For
this reason it probably doesn't make sense to add something similar to
the other namespaces.
2025-10-23 23:55:54 -05:00
1f824a029b Renamed LFS3_T_COMPACT -> LFS3_T_COMPACTMETA (and gc_compactmeta_thresh)
- LFS3_T_COMPACT -> LFS3_T_COMPACTMETA
- gc_compact_thresh -> gc_compactmeta_thresh

And friends:

  LFS3_M_COMPACTMETA   0x00000800  Compact metadata logs
  LFS3_GC_COMPACTMETA  0x00000800  Compact metadata logs
  LFS3_I_COMPACTMETA   0x00000800  Filesystem may have uncompacted metadata
  LFS3_T_COMPACTMETA   0x00000800  Compact metadata logs

---

This does two things:

1. Highlights that LFS3_T_COMPACTMETA only interacts with metadata logs,
   and has no effect on data blocks.

2. Better matches the verb+noun names used for other gc/traversal flags
   (REPOPGBMAP, CKMETA, etc).

It is a bit more of a mouthful, but I'm not sure that's entirely a bad
thing. These are pretty low-level flags.
2025-10-23 23:54:57 -05:00
9bdfb25a09 Renamed LFS3_T_LOOKAHEAD -> LFS3_T_REPOPLOOKAHEAD
And friends:

  LFS3_M_REPOPLOOKAHEAD   0x00000200  Repopulate lookahead buffer
  LFS3_GC_REPOPLOOKAHEAD  0x00000200  Repopulate lookahead buffer
  LFS3_I_REPOPLOOKAHEAD   0x00000200  Lookahead buffer is not full
  LFS3_T_REPOPLOOKAHEAD   0x00000200  Repopulate lookahead buffer

To match LFS3_T_REPOPGBMAP, which is more-or-less the same operation.
Though this does turn into quite the mouthful...
2025-10-23 23:54:02 -05:00
ced63a4c73 Renamed inline_size -> shrub_size
There's a strong argument for naming this inline_size as that's more
likely what users expect, but shrub_size is just the more correct name
and avoids confusion around having multiple names for the same thing.

It also highlights that shrubs in littlefs3 are a bit different than
inline files in littlefs2, and that this config also affects large files
with a shrubbed root.

May rerevert this in the future, but probably only if there is
significant user confusion.
2025-10-23 23:53:02 -05:00
3b4e1e9e0b gbmap: Renamed gbmap_rebuild_thresh -> gbmap_repop_thresh
And tweaked a few related comments.

I'm still on the fence with this name, I don't think it's great, but it
at least betters describes the "repopulation" operation than
"rebuilding". The important distinction is that we don't throw away
information. Bad/erased block info (future) is still carried over into
the new gbmap snapshot, and persists unless you explicitly call
rmgbmap + mkgbmap.

So, adopting gbmap_repop_thresh for now to see if it's just a habit
thing, but may adopt a different name in the future.

As a plus, gbmap_repop_thresh is two characters shorter.
2025-10-23 23:51:18 -05:00
fb90bf976c trv: Split lfs3_trv_t -> lfs3_trv_t, lfs3_mgc_t, and lfs3_mtrv_t
A big downside of LFS3_T_REBUILDGBMAP is the addition of an lfs3_btree_t
struct to _every_ traversal object.

Unfortunately, I don't see a way around this. We need to track the new
gbmap snapshot _somewhere_, and other options (such as a global gbmap.b_
snapshot) just move the RAM around without actually saving anything.

To at least mitigate this internally, this splits lfs3_trv_t into
distinct lfs3_trv_t, lfs3_mgc_t, and lfs3_mtrv_t structs that capture
only the relevant state for internal traversal layers:

- lfs3_mtree_traverse <- lfs3_mtrv_t
- lfs3_mtree_gc       <- lfs3_mgc_t (contains lfs3_mtrv_t)
- lfs3_trv_read       <- lfs3_trv_t (contains lfs3_mgc_t)

This minimizes the impact of the gbmap rebuild snapshots, and saves a
big chunk of RAM. As a plus it also saves RAM in the default build by
limiting the 2-block block queue to the high-level lfs3_trv_read API:

                 code          stack          ctx
  before:       37176           2360          684
  after:        37176 (+0.0%)   2352 (-0.3%)  684 (+0.0%)

                 code          stack          ctx
  gbmap before: 40060           2432          848
  gbmap after:  40024 (-0.1%)   2368 (-2.6%)  848 (+0.0%)

The main downside? Our field names are continuing in their
ridiculousness:

  lfs3.gc.gc.t.b.h.flags // where else would the global gc flags be?
2025-10-23 23:49:58 -05:00
06bc4dff04 trv: Simplified MUTATED/DIRTY flags, no more swapping
A bit less simplified than I hoped, we don't _strictly_ need both
LFS3_t_DIRTY + LFS3_t_MUTATED if we're ok with either (1) making
multiple passes to confirm fixorphans succeeded or (2) clear the COMPACT
flag after one pass (which may introduce new uncompacted metadata). But
both of these have downsides, and we're not _that_ stressed for flag
space yet...

So keeping all three of:

  LFS3_t_DIRTY      0x04000000  Filesystem modified outside traversal
  LFS3_t_MUTATED    0x02000000  Filesystem modified during traversal
  LFS3_t_CKPOINTED  0x01000000  Filesystem ckpointed during traversal

But I did manage to get rid of the bit swapping by tweaking LFS3_t_DIRTY
to imply LFS3_t_MUTATED instead of being exclusive. This removes the
"failed" gotos in lfs3_mtree_gc and makes things a bit more readable.

---

I also split lfs3_fs/handle_clobber into separate lfs3_fs/handle_clobber
and lfs3_fs/handle_mutate functions. This added a bit of code, but I
think is worth it for a simpler internal API. A confusing internal API
is no good.

In total these simplifications saved a bit of code:

                 code          stack          ctx
  before:       37208           2360          684
  after:        37176 (-0.1%)   2360 (+0.0%)  684 (+0.0%)

                 code          stack          ctx
  gbmap before: 40100           2432          848
  gbmap after:  40060 (-0.1%)   2432 (+0.0%)  848 (+0.0%)
2025-10-23 23:41:43 -05:00
f5508a1b6c gbmap: Added LFS3_T_REBUILDGBMAP and friends
This adds LFS3_T_REBUILDGBMAP and friends, and enables incremental gbmap
rebuilds as a part of gc/traversal work:

  LFS3_M_REBUILDGBMAP   0x00000400  Rebuild the gbmap
  LFS3_GC_REBUILDGBMAP  0x00000400  Rebuild the gbmap
  LFS3_I_REBUILDGBMAP   0x00000400  The gbmap is not full
  LFS3_T_REBUILDGBMAP   0x00000400  Rebuild the gbmap

On paper, this is more or less identical to repopulating the lookahead
buffer -- traverse the filesystem, mark blocks as in-use, adopt the new
gbmap/lookahead buffer on success -- but a couple nuances make
rebuilding the gbmap a bit trickier:

- Unlike the lookahead buffer, which eagerly zeros in allocation, we
  need an explicit zeroing pass before we start marking blocks as
  in-use. This means multiple traversals can potentially conflict with
  each other, risking the adoption of a clobbered gbmap.

- The gbmap, which stores information on disk, relies on block
  allocation and the temporary "in-flight window" defined by allocator
  ckpoints to avoid circular block states during gbmap rebuilds. This
  makes gbmap rebuilds sensitive to allocator ckpoints, which we
  consider more-or-less a noop in other parts of the system.

  Though now that I'm writing this, it might have been possible to
  instead include gbmap rebuild snapshots in fs traversals... but that
  would probably have been much more complicated.

- Rebuilding the gbmap requires writing to disk and is generally much
  more expensive/destructive. We want to avoid trying to rebuild the
  gbmap when it's not possible to actually make progress.

On top of this, the current trv-clobber system is a delicate,
error-prone mess.

---

To simplify everything related to gbmap rebuilds, I added a new
internal traversal flag: LFS3_t_CKPOINTED:

  LFS3_t_CKPOINTED  0x04000000  Filesystem ckpointed during traversal

LFS3_t_CKPOINTED is set, unconditionally, on all open traversals in
lfs3_alloc_ckpoint, and provides a simple, robust mechanism for checking
if _any_ allocator checkpoints have occured since a traversal was
started. Since lfs3_alloc_ckpoint is required before any block
allocation, this provides a strong guarantee that nothing funny happened
to any allocator state during a traversal.

This makes lfs3_alloc_ckpoint a bit less cheap, but the strong
guarantees that allocator state is unmodified during traversal are well
worth it.

This makes both lookahead and gbmap passes simpler, safer, and easier to
reason about.

I'd like to adopt something similar+stronger for LFs3_t_MUTATED, and
reduce this back to two flags, but that can be a future commit.

---

Unfortunately due to the potential for recursion, this ended up reusing
less logic between lfs3_alloc_rebuildgbmap and lfs3_mtree_gc than I had
hoped, but at like the main chunks (lfs3_alloc_remap,
lfs3_gbmap_setbptr, lfs3_alloc_adoptgbmap) could be split out into
common functions.

The result is a decent chunk of code and stack, but the value is high as
incremental gbmap rebuilds are the only option to reduce the latency
spikes introduced by the gbmap allocator (it's not significantly worse
than the lookahead buffer, but both do require traversing the entire
filesystem):

                 code          stack          ctx
  before:       37164           2352          684
  after:        37208 (+0.1%)   2360 (+0.3%)  684 (+0.0%)

                 code          stack          ctx
  gbmap before: 39708           2376          848
  gbmap after:  40100 (+1.0%)   2432 (+2.4%)  848 (+0.0%)

Note the gbmap build is now measured with LFS3_GBMAP=1, instead of
LFS3_YES_GBMAP=1 (maybe-gbmap) as before. This includes the cost of
mkgbmap, lfs3_f_isgbmap, etc.
2025-10-23 23:39:55 -05:00
61dc21ccb7 gbmap: Renamed/moved lookahead.bmapped -> gbmap.known
And:

- Tweaked the behavior of gbmap.window/known to _not_ match disk.
  gbmap.known matching disk is what required a separate
  lookahead.bmapped in the first place, but we never use both fields.

- _Don't_ revert gbmap on failed mdir commits!

  This was broken! If we reverted we risked inheriting outdated
  in-flight block information.

  This could be fixed by also zeroing lookahead.bmapped, but would force
  a gbmap rebuild. And why? The only interaction between mdir commit and
  the gbmap is block allocation, which is intentionally allowed to go
  out-of-sync to relax issues like this.

  Note we still revert in lfs3_fs_grow, the new gbmap we create there is
  incompatible with the previous disk size.

As a part of these changes, gbmap.window now behaves roughly the same as
gbmap.known and updates eagerly on block allocation.

This makes lookahead.window and gbmap.window somewhat redundant, but
simplifies the relevant logic (especially due to how lookahead.window
lags behind lookahead.off).

---

A bunch of bugs fell out-of-this, the interactions with lfs3_fs_mkgbmap
and lfs3_fs_grow being especially tricky, but fortunately our testing is
doing a good job.

At least the code changes were minimal, saves a bit of RAM:

                       code          stack          ctx
  no-gbmap before:    37168           2352          684
  no-gbmap after:     37168 (+0.0%)   2352 (+0.0%)  684 (+0.0%)

                       code          stack          ctx
  maybe-gbmap before: 39688           2392          852
  maybe-gbmap after:  39720 (+0.1%)   2376 (-0.7%)  848 (-0.5%)

                       code          stack          ctx
  yes-gbmap before:   39156           2392          852
  yes-gbmap after:    39208 (+0.1%)   2376 (-0.7%)  848 (-0.5%)
2025-10-17 14:02:47 -05:00
b5a94f3397 gbmap: Added mkgbmap and rmgbmap for enabling/disabling the gbmap
These two functions allow changing whether or not the gbmap is in use
after format:

  // Enable the global on-disk block-map
  //
  // Returns a negative error code on failure. Does nothing if a gbmap
  // already exists.
  int lfs3_fs_mkgbmap(lfs3_t *lfs3);

  // Disable the global on-disk block-map
  //
  // Returns a negative error code on failure. Does nothing if no gbmap
  // is found.
  int lfs3_fs_rmgbmap(lfs3_t *lfs3);

rmgbmap was easy enough, but implementing mkgbmap turned out to be
surprisingly tricky due to how gstate permeates the system:

- Even if we zero gstate when we removing the gbmap, mounting the
  image on a driver that doesn't understand the gbmap results in garbage
  gstate over time as mdir compacts drop unknown gdeltas.

  I think this sort of implicit gdelta cleanup is a good thing, but the
  possibility of garbage gstate is a bit annoying.

  Example A: the dbg scripts are currently printing a bunch of warnings
  for corrupt gstate that can be safely ignored.

  To support recovering from garbage gstate in mkgbmap, I changed
  lfs3_fs_commitgdelta to _always_ track p state even when disabled. We
  already needed to do this in lfs3_fs_flush/consumegdelta anyways,
  since we don't know if the gbmap is used until parsing wcompat flags.

- The commit that enables the gbmap is tricky. We need the gbmap enabled
  to calculate the new gdelta, but we also need it disabled so we don't
  traverse the existing gbmap_p (which may be garbage).

  As a workaround I added gbmap.b_p, which is in theory redundant with
  gbmap_p, but (1) avoids needing to decode gbmap_p during traversals,
  and (2) allows the two to temporarily fall out-of-sync in mkgbmap.

  This means we potentially have 5 (!) snaphots flying around when
  rebuilding the gbmap, which is starting to get a bit silly. But this
  was also motivated by gbmap_p decoding adding roughly the same amount
  of RAM to lfs3_mtree_traverse_, so the total RAM usage should in
  theory be roughly the same.

  There might be a better solution, but this at least gets mkgbmap
  working. The gbmap builds are not our most RAM senstive configurations
  anyways.

---

Also added a couple more tests in test_gbmap to test these:

- test_gbmap_files
- test_gbmap_rmgbmap
- test_gbmap_mkgbmap
- test_gbmap_rmmkgbmap
- test_gbmap_mkrmgbmap

And an explicit wraparound test to test_alloc. This was loosely implied
by the nospc tests, but it's probably better to have an explicit test.
The only downside is this implementation is limited to files:

- test_alloc_wraparound_files

---

Note we are currently dealing with three different configurations:
no-gbmap (the default), yes-gbmap (LFS3_YES_GBMAP), and maybe-gbmap
(LFS3_GBMAP + LFS3_F_GBMAP at runtime).

It only makes sense to include these in maybe-gbmap mode, so this is the
only mode with a notable code increase. However these functions are
relatively cheap. The stack/ctx changes also affect yes-gbmap, but
should mostly cancel out, see above:

                       code          stack          ctx
  no-gbmap before:    37168           2352          684
  no-gbmap after:     37168 (+0.0%)   2352 (+0.0%)  684 (+0.0%)

                       code          stack          ctx
  maybe-gbmap before: 39292           2456          800
  maybe-gbmap after:  39688 (+1.0%)   2392 (-2.6%)  852 (+6.5%)

                       code          stack          ctx
  yes-gbmap before:   39116           2456          800
  yes-gbmap after:    39156 (+0.1%)   2392 (-2.6%)  852 (+6.5%)
2025-10-17 14:02:05 -05:00
cb9bda5a94 gbmap: Renamed gbmap_scan_thresh -> gbmap_rebuild_thresh
I think a good rule of thumb is if you refer to some variable/config/
field with a different name in comments/writing/etc more often than not,
you should just rename the variable/config/field to match.

So yeah, gbmap_rebuild_thresh controls when the gbmap is rebuilt.

Also touched up the doc comment a bit.
2025-10-09 14:33:27 -05:00
ea05ad04b9 gbmap: Cleanup of gbmap comments, TODOs, code formatting, etc
Just cleaning up a bunch of outdated TODOs and commented out code, as
well as a little bit of code formatting, and scrubbing airspace/gbatc
names as these are no longer used and will just confuse new users.
2025-10-09 14:33:27 -05:00
9b4ee982bc gbmap: Tried to adopt the gbmap name more consistently
Having gbmap/bmap used in different places for the same thing was
confusing. Preferring gbmap as it is consistent with other gstate (grm
queue, gcksums), even if it is a bit noisy.

It's interesting to note what didn't change:

- The BM* range tags: LFS3_TAG_BMFREE, etc. These already differs from
  the GBMAP* prefix enough, and adopting GBM* would risk confusion for
  actual gstate.

- The gbmap revdbg string: "bb~r". We don't have enough characters for
  anything else!

- dbgbmap.py/dbgbmapsvg.py. These aren't actually related to the gbmap,
  so the name difference is a good thing.
2025-10-09 14:33:27 -05:00
9d322741ca bmap: Simplified bmap configs, reduced to one LFS3_F_GBMAP flag
TLDR: This drops the idea of different bmap strategies/modes, and sorts
out most of the compile-time/runtime conditional bmap interactions.

---

Motivation: Benchmarking (at least up to the 32-bit word limit) has
shown the bmap will unlikely be a significant bottleneck, even on large
disks. The largest disks tend to be NAND, and NAND's ridiculous block
size limits pressure on block allocation.

There are still concerns for areas I haven't measured yet:

- SD/eMMC/FTL - Small blocks, so more pressure on block allocation. In
  theory the logical block size can be artificially increased, but this
  comes with a granularity tradeoff.

- I've only measured throughput, latency is a whole other story.

  However, users have reported lfs3_fs_gc is useful for mitigating this,
  so maybe latency is less of a concern now?

But while there may still be room for improvement via alternative bmap
strategies, the risk a concerning amount of complexity. Yes,
configuration gets more complicated, but the real issue is any bmap
strategies that try to track _deallocations_ (the original idea being
treediffing) risk falling leaking blocks if all cases aren't covered.

The current "bmap cache" strategy strikes a really nice balance where it
reduces _amortized_ block allocation -> ~O(log n) without RAM, while
retaining the safe, bug-resistant, single-source-of-truth properties
that come with lookahead-based allocation.

---

So, long story short, dropping other strategies, and now the presence of
the bmap is a boolean flag.

This is also the first format-specific flag:

- Define LFS3_BMAP to enable the bmap logic, but note by default the
  bmap will still not be used.

- Define LFS3_YES_BMAP to force the bmap to be used.

- With LFS3_BMAP, passing LFS3_F_GBMAP to lfs3_format will include the
  on-disk block-map.

- No flag is needed during mount, the presence of the bmap is determined
  by the on-disk wcompat flags (LFS3_WCOMPAT_GBMAP). This also prevents
  rw mounting if the bmap is not supported, but rdonly mounting is
  allowed.

- Users can check if the bmap is in use via lfs3_fs_stat, which reports
  LFS3_I_GBMAP in the flags field.

There's still some missing pieces, but these will be a bit more
involved:

- lfs3_fs_grow needs to be made bmap aware!

- We probably want something like lfs3_fs_mkgbmap and lfs3_fs_rmgbmap to
  allow converting between bmap backed/not-backed filesystem images.

Code changes minimal:

                code          stack          ctx
  before:      37172           2352          684
  after:       37172 (+0.0%)   2352 (+0.0%)  684 (+0.0%)

                code          stack          ctx
  bmap before: 38844           2456          800
  bmap after:  38852 (+0.0%)   2456 (+0.0%)  800 (+0.0%)
2025-10-09 14:33:27 -05:00
2f6f7705f1 Limit crystal_thresh to >=prog_size
I confused myself a bit while benchmarking because crystal_thresh <
prog_size was showing some very confusing results. But it turns out the
relevant code was just not written well enough to support this
configuration.

And, to be fair, this configuration really doesn't make sense. The whole
point of the fragment + crystallization system is so we never have to
write unaligned data to blocks. I mean, we could explicitly write
padding in this case, but why?

---

This should probably eventually be either an assert or mutable limit,
but in the meantime I'm just adjusting crystal_thresh at runtime, which
adds a bit of code:

           code          stack          ctx
  before: 37076           2352          684
  after:  37112 (+0.1%)   2352 (+0.0%)  684 (+0.0%)

On the plus side, this prevents crystal_thresh=0 issues much more
elegantly.
2025-10-01 17:58:01 -05:00
8cc91ffa9e Prevent oscillation when crystal_thresh < fragment_size
When crystal_thresh < fragment_size, there was a risk that repeated
write operations would oscillate between crystallizing and fragmenting
every operation. Not only would this wreck performance, it would also
violently wear down blocks as each crystallization would trigger an
erase.

Fortunately all we need to do to prevent this is check both
fragment_size and crystal_thresh before fragmenting. Note this also
affects the fragment checks in truncate/fruncate.

---

crystal_thresh < fragment_size is kind of a weird configuration, to be
honest we should probably just assert if configured this way (we never
write fragments > crystal_thresh, because at that point we would just
crystallize).

But at the moment the extra leniency is useful for benchmarking.

Adds a bit of code, but will probably either assert or mutably limit in
the future:

           code          stack          ctx
  before: 37028           2352          684
  after:  37076 (+0.1%)   2352 (+0.0%)  684 (+0.0%)
2025-10-01 17:57:58 -05:00
eab526ad9f Fixed crystal_thresh=0 bugs
There was a mismatch between the lfs3_cfg comment and the actual
crystal_thresh math where crystal_thresh=0 would break things:

- In lfs3_file_flush_, crystal_thresh=0 meant we would never resume
  crystallization, leading to terrible, _terrible_, linear write
  performance.

- In lfs3_file_sync and lfs3_set, it's unclear if small file commit
  optimizations were working properly. I went ahead and added a
  lfs3_max(lfs3->cfg->crystal_thresh, 1) just to be safe.

The other references to crystal_thresh all check for >= crystal_thresh
conditions, so shouldn't be broken (except for an unrelated bug in
lfs3_file_flushset_).

The reason for this is because crystal_thresh=1 is technically the lower
bound for this math. Allowing crystal_thresh=0 is just a convenience,
and honestly allowing it may have a been a bad idea. Maybe we should
require crystal_thresh=1 at minimum? I added a TODO.

All the new v3 config needs revisiting anyways, for defaults, etc.

---

Curiously, this actually saved code? My best guess is maybe some weird
code path in lfs3_file_flush_ was eliminated:

           code          stack          ctx
  before: 37036           2352          684
  after:  37028 (-0.0%)   2352 (+0.0%)  684 (+0.0%)
2025-10-01 17:57:54 -05:00
14d0c4121c bmap: Dropped treediff buffers for now
We're not currently using these (at the moment it's unclear if the
original intention behind the treediff algorithms is worth pursuing),
and they are showing up in our heap benchmarks.

The good news is that means our heap benchmarks are working.

Also saves a bit of code/ctx in bmap mode:

                code          stack          ctx
  before:      37024           2352          684
  after:       37024 (+0.0%)   2352 (+0.0%)  684 (+0.0%)

                code          stack          ctx
  bmap before: 38752           2456          812
  bmap after:  38704 (-0.1%)   2456 (+0.0%)  800 (-1.5%)
2025-10-01 17:57:42 -05:00
a1b75497d6 bmap: rdonly: Got LFS3_RDONLY + LFS3_BMAP compiling
Counterintuitively, LFS3_RDONLY + LFS3_BMAP _does_ make sense for cases
where you want to include the bmap in things like ckmeta/ckdata scans.

Though this is another argument for a LFS3_RDONLY + LFS3_NO_TRV build.
Traversals add quite a bit of code to the rdonly build that is probably
not always needed.

---

This just required another bunch of ifdefs.

Current bmap rdonly code size:

                code          stack          ctx
  rdonly:      10616            896          532
  rdonly+bmap: 10892 (+2.6%)    896 (+0.0%)  636 (+19.5%)
2025-10-01 17:57:15 -05:00
58c5506e85 Brought back lazy grafting, but not too lazy
Continued benchmarking efforts are indicating this isn't really an
optional optimization.

This brings back lazy grafting, where the file leaf is allowed to fall
out-of-date to minimize bshrub/btree updates. This is controlled by
LFS3_o_UNGRAFT, which is similar, but independent from LFS3_o_UNCRYST:

- LFS3_o_UNCRYST - File's leaf not fully crystallized
- LFS3_o_UNGRAFT - File's leaf does not match disk

Note it makes sense for files to be UNGRAFT only, in the case where the
current crystal terminates at the end-of-file but future appends are
likely. And it makes sense for files to be UNCRYST only, in cases where
we graft uncrystallized blocks so the bshrub/btree makes sense.

Which brings us to the main change from the previous lazy-grafting
implementation: lfs3_file_lookupnext no longer includes ungrafted
leaves.

Instead, functions should call lfs3_file_graft if they need
lfs3_file_lookupnext to make sense.

This significantly reduces the code cost of lazy grafting, at the risk
of needing to graft more frequently. Fortunately we don't actually need
to call lfs3_file_graft all that often:

- lfs3_file_read already flushes caches/leaves before attempting any
  bshrub/btree reads for simplicity (heavy are not currently considered
  a priority, if you need this consider opening two file handles).

- lfs3_file_flush_ _does_ need to call lfs3_file_graft before the
  crystallization heuristic pokes, but if we can't resume
  crystallization, we would probably need to graft the crystal to
  satisfy the flush anyways.

---

Lazy grafting, i.e. procrastinating on bshrub/btree updates during block
appends, is an optimization previously dropped due to perceived
nicheness:

- We can only lazily graft blocks, inlined data fragments always require
  bshrub/btree updates since they live in the bshrub/btree.

- Sync forces bshrub/btree updates anyways, so lazy grafting has no
  benefit for most logging applications.

- This performance penalty of eagerly grafting goes away if your caches
  are large enough.

Note that the last argument is a non-argument in littlefs's case. They
whole point of littlefs is that you _don't_ need RAM to fix things.

However these arguments are all moot when you consider that the "niche
use case" -- linear file writes -- is the default bottleneck for most
applications. Any file operation becomes a linear write bottleneck when
the arguments are large enough. And this becomes a noticeable issue when
benchmarking.

So... This brings back lazy grafting. But with a more limited scope
w.r.t. internal file operations (the above lfs3_file_lookupnext/
lfs3_file_graft changes).

---

Long story short, lazy grafting is back again, reverting the ~3x
performance regression for linear file writes.

But now with quite a bit less code/stack cost:

           code          stack          ctx
  before: 36820           2368          684
  after:  37032 (+0.6%)   2352 (-0.7%)  684 (+0.0%)
2025-10-01 17:57:01 -05:00
316ca1cc05 bmap: The initial bmapcache algorithm seems to be working
At least at a proof-of-concept level, there's still a lot of cleanup
needed.

To make things work, lfs3_alloc_ckpoint now takes an mdir, which
provides the target for gbmap gstate updates.

When the bmap is close to empty (configurable via bmap_scan_thresh), we
opportunistically rebuild it during lfs3_alloc_ckpoints. The nice thing
about lfs3_alloc_ckpoint is we know the state of all in-flight blocks,
so rebuilding the bmap just requires traversing the filesystem + in-RAM
state.

We might still fall back to the lookahead buffer, but in theory a well
tuned bmap_scan_thresh can prevent this from becoming a bottleneck (at
the cost of more frequent bmap rebuilds).

---

This is also probably a good time to resume measuring code/ram costs,
though it's worth repeating the above note about the bmap work still
needing cleanup:

             code          stack          ctx
  before:   36840           2368          684
  after:    36920 (+0.2%)   2368 (+0.0%)  684 (+0.0%)

Haha, no, the bmap isn't basically free, it's just an opt-in features.
With -DLFS3_YES_BMAP=1:

             code          stack          ctx
  no bmap:  36920           2368          684
  yes bmap: 38552 (+4.4%)   2472 (+4.4%)  812 (+18.7%)
2025-10-01 17:56:14 -05:00
71b9ad2412 bmap: Enabled at least opportunistic bmap allocations
This doesn't fully replace the lookahead buffer, but at least augments
it with known bmap state when available.

To be honest, this is a minimal effort hack to try to get something
benchmarkable without dealing with all the catch-22 issues that a
self-support bmap allocator would encounter (allocating blocks for the
bmap requires a bmap, oh no).

Though now that I'm writing this, maybe this is a reasonable long-term
solution? Having the lookahead buffer to fall back on solves a lot of
problems, and, realistically, it's unlikely to be a performance
bottleneck unless the user has extreme write requests (>available
storage?).

---

Also tweaked field naming to be consistent between the bmap and
lookahead buffer.
2025-10-01 17:56:12 -05:00
ebae43898e bmap: Changing direction, store bmap mode in wcompat flags
The idea behind separate ctrled+unctrled airspaces was to try to avoid
multiple interpretations of the on-disk bmap, but I'm starting to think
this adds more complexity than it solves.

The main conflict is the meaning of "in-flight" blocks. When using the
"uncontrolled" bmap algorithm, in-flight blocks need to be
double-checked by traversing the filesystem. But in the "controlled"
bmap algorithm, blocks are only marked as "in-flight" while they are
truly in-flight (in-use in RAM, but not yet in use on disk).
Representing these both with the same "in-flight" state risks
incompatible algorithms misinterpreting the bmap across different
mounts.

In theory the separate airspaces solve this, but now all the algorithms
need to know how to convert the bmap from different modes, adding
complexity and code cost.

Well, in theory at least. I'm unsure separate airspaces actually solves
this due to subtleties between what "in-flight" means in the different
algorithms (note both in-use and free blocks are "in-flight" in the
unknown airspace!). It really depends on how the "controlled" algorithm
actually works, which isn't implemented/fully designed yet.

---

Long story short, due to a time crunch, I'm ripping this out for now and
just storing the current algorithm in the wcompat flags:

  LFS3_WCOMPAT_GBMAP       0x00006000  Global block-map in use
  LFS3_WCOMPAT_GBMAPNONE   0x00000000  Gbmap not in use
  LFS3_WCOMPAT_GBMAPCACHE  0x00002000  Gbmap in cache mode
  LFS3_WCOMPAT_GBMAPVFR    0x00004000  Gbmap in VFR mode
  LFS3_WCOMPAT_GBMAPIFR    0x00006000  Gbmap in IFR mode

Note GBMAPVFR/IFR != BMAPSLOW/FAST! At least BMAPSLOW/FAST can share
bmap representations:

- GBMAPVFR => Uncontrolled airspace, i.e. in-flight blocks may or may
  not be in use, need to traverse open files.

- GBMAPIFR => Controlled airspace, i.e. in-flight blocks are in use,
  at least until powerloss, no traversal needed, but requires more bmap
  writes.

- BMAPSLOW => Treediff by checking what blocks are in B but not in A,
  and what blocks are in A but not in B, O(n^2), but minimizes bmap
  updates.

  Can be optimized with a bloom filter.

- BMAPFAST => Treediff by clearing all blocks in A, and then setting all
  blocks in B, O(n), but also writes all blocks to the bmap twice even
  on small changes.

  Can be optimized with a sliding bitmap window (or a block hashtable,
  though a bitmap converges to the same thing in both algorithms when
  >=disk_size).

It will probably be worth unifying the bmap representation later (the
more algorithm-specific flags there are, the harder interop becomes for
users, but for now this opens a path to implementing/experimenting with
bmap algorithms without dealing with this headache.
2025-10-01 17:56:08 -05:00
beb1f1346a bmap: Started implementing ctrled/unctrled lfs3_alloc paths 2025-10-01 17:56:07 -05:00
e7c3755e21 bmap: Split known into ctrled+unctrled 2025-10-01 17:56:05 -05:00
98f016b07e bmap: Added initial gbatc interactions, up until out-of-known or remount
This only works immediately after format, and only for one pass of the
disk, but it's a good way to test bmap lookups/allocation without
worrying about more complicated filesystem-wide interactions.
2025-10-01 17:55:31 -05:00
88180b6081 bmap: Initial scaffolding for on-disk block map
This is pretty exploratory work, so I'm going to try to be less thorough
in commit messages until the dust settles.

---

New tag for gbmapdelta:

  LFS3_TAG_GBMAPDELTA   0x0104  v--- ---1 ---- -1rr

New tags for in-bmap block types:

  LFS3_TAG_BMRANGE      0x033u  v--- --11 --11 uuuu
  LFS3_TAG_BMFREE       0x0330  v--- --11 --11 ----
  LFS3_TAG_BMINFLIGHT   0x0331  v--- --11 --11 ---1
  LFS3_TAG_BMINUSE      0x0332  v--- --11 --11 --1-
  LFS3_TAG_BMBAD        0x0333  v--- --11 --11 --11
  LFS3_TAG_BMERASED     0x0334  v--- --11 --11 -1--

New gstate decoding for gbmap:

  .---+- -+- -+- -+- -. cursor: 1 leb128  <=5 bytes
  | cursor            | known:  1 leb128  <=5 bytes
  +---+- -+- -+- -+- -+ block:  1 leb128  <=5 bytes
  | known             | trunk:  1 leb128  <=4 bytes
  +---+- -+- -+- -+- -+ cksum:  1 le32    4 bytes
  | block             | total:            23 bytes
  +---+- -+- -+- -+- -'
  | trunk         |
  +---+- -+- -+- -+
  |     cksum     |
  '---+---+---+---'

New bmap node revdbg string:

  vvv---- -111111- -11---1- -11---1-  (62 62 7e v0  bb~r)  bmap node

New mount/format/info flags (still unsure about these):

  LFS3_M_BMAPMODE     0x03000000  On-disk block map mode
  LFS3_M_BMAPNONE     0x00000000  Don't use the bmap
  LFS3_M_BMAPCACHE    0x01000000  Use the bmap to cache lookahead scans
  LFS3_M_BMAPSLOW     0x02000000  Use the slow bmap algorithm
  LFS3_M_BMAPFAST     0x03000000  Use the fast bmap algorithm

New gbmap wcompat flag:

  LFS3_WCOMPAT_GBMAP  0x00002000  Global block-map in use
2025-10-01 17:55:13 -05:00
238dbc705d Abandoned data-backed cache, use indirect lfs3_data_t on stack
This abandons the data-backed cache idea due to concerns around
readability and maintainability. Mixing const/mutable buffers in
lfs3_data_t was not great.

Instead, we now just allocate an indirect lfs3_data_t on the stack in
lfs3_file_sync_ to avoid the previous undefined behavior.

This actually results in less stack usage total, due to lfs3_file_t
allocations in lfs3_set/read, and avoid the more long-term memory cost
in lfs3_file_t:

              code          stack          ctx
  before:    36832           2376          684
  after:     36840 (+0.0%)   2368 (-0.3%)  684 (+0.0%)

Oh. And lfs3_file_sync_ isn't even on the stack hot-path, so this is a
net benefit over the previous cache -> data cast:

              code          stack          ctx
  before sa: 36844           2368          684
  after sa:  36840 (-0.0%)   2368 (+0.0%)  684 (+0.0%)

Still less cool though.
2025-07-22 13:39:43 -05:00
5035aa566b Adopted data-backed cache in lfs3_file_t to avoid undefined behavior
This fixes a strict aliasing violation in lfs3_file_sync_, where we cast
the file cache -> lfs3_data_t to avoid an extra stack allocation, by
modifying the file's cache struct to use an lfs3_data_t directly.

- file.cache.pos -> file.cache.pos
- file.cache.buffer -> file.cache.d.u.buffer_
- file.cache.size -> file.cache.d.size
- (const lfs3_data_t*)&file->cache -> &file->cache.d

Note the underscore_ in file.cache.d.u.buffer_. This did not fit
together as well as I had hoped, due to different const expectation
between the file cache and lfs3_data_t.

Up until this point lfs3_data_t has only been used to refer to const
data (ignoring side-band pointer casting in lfs3_mtree_traverse*), while
the file cache very much contains mutable data. To work around this I
added data.u.buffer_ as a mutable variant, which works, but risks an
accidental const violation in the future.

---

Unfortunately this does come with a minor RAM cost, since we no longer
hide file.cache.pos in lfs3_data_t's buffer padding:

           code          stack          ctx
  before: 36844           2368          684
  after:  36832 (-0.0%)   2376 (+0.3%)  684 (+0.0%)

  lfs3_file_t before: 164
  lfs3_file_t after:  168 (+2.4%)

I think it's pretty fair to call C's strict aliasing rules a real wet
blanket. It would be interesting to create a -fno-strict-aliasing
variant of littlefs in the future, to see how much code/RAM could be
saved if we were given free reign to abuse the available memory.

Probably not enough to justify the extra work, but it would be an
interesting experiment.
2025-07-22 13:31:16 -05:00
fadf0cbd0e trv: Moved cycle detection tortoise into the shrub leaf
This forces our cycle detection tortoise (previously trv.u.mtortoise),
into the unused shrub leaf via pointer shenanigans.

This reclaims the remaining stack (and apparently code) we theoretically
gained from the btree traversal rework, up until the compiler got in the
way:

           code          stack          ctx
  before: 36876           2384          684
  after:  36852 (-0.1%)   2368 (-0.7%)  684 (+0.0%)

And it only required some _questionably_ defined behavior.

---

It's probably not well-defined behavior, but trying to understand what
the standard actually means on this is giving me a headache. I think I
have to agree C99+strict-aliasing lost the plot on this one. Note
mtortoise is only ever written/read through the same type.

What I want:

  lfs3_trv_t:          lfs3_bshrub_t:       lfs3_handle_t:
  .---+---+---+---. .. .---+---+---+---. .. .---+---+---+---.
  |     handle    |    |     handle    |    |     handle    |
  |               |    |               |    |               |
  +---+---+---+---+    +---+---+---+---+ .. '---+---+---+---'
  |   root rbyd   |    |   root rbyd   |
  |               |    |               |    lfs3_mtortoise_t:
  +---+---+---+---+    +---+---+---+---+ .. .---+---+---+---.
  |   leaf rbyd   |    |   leaf rbyd   |    |   mtortoise   |
  |               |    |               |    |               |
  +---+---+---+---+    +---+---+---+---+ .. '---+---+---+---'
  | staging rbyd  |    | staging rbyd  |
  |               |    |               |
  +---+---+---+---+ .. '---+---+---+---'
  |               |
  :               :

But I'm starting to think this is simply not possible in modern C.

At least this shows what is theoretically possible if we didn't have to
fight the compiler.
2025-07-22 12:49:49 -05:00
70872b5703 trv: Renamed trv.htrv -> trv.h
Just moving away from the *trv when unnecessary. This matches the h
variable used for local iteration.
2025-07-21 17:24:46 -05:00
ff7e196f92 btree: Renamed btree.leaf.rbyd -> btree.leaf.r
This matches other internal rbyds: btree.r, mdir.r, etc.

The intention of the single-char names is to reduce clutter around these
severely nested structs, both btrees and mdirs _are_ rbyds, so the name
doesn't really besides C-level type info.

I was hesitant on btree.leaf.rbyd, but decided consistency probably wins
here.
2025-07-21 16:43:39 -05:00
a871e02354 btree: Reworked btree traversal to leverage leaf caches
This comes from an observation that we never actually use the leaf cache
during traversals, and there is surprisingly little risk of a lookup
creating a conflict in the future.

Btree traversal fall into two categories:

1. Full traversals, where we traverse a full btree all at once. These
   are unlikely to have lookup conflicts because everything is
   usually self-contained in one chunk of logic.

2. Incremental traversals. These _are_ at risk, but in our current
   design limited to lfs3_trv_t, which already creates a fully
   bshrub/btree copy for tracking purposes.

   This copy unintentionally, but conveniently, protects against lookup
   conflicts.

So, why not reuse the btree leaf cache to hold the rbyd state during
traversals? In theory this makes lfs3_btree_traverse the same cost and
lfs3_btree_lookupnext, drops the need for lfs3_btrv_t, and simplifies
the internal API.

The only extra bit of state we need is the current target bid, which is
now expected as a caller-incremented argument similar to
lfs3_btree_lookupnext iteration.

There was a bit of futzing around with bid=-1 being necessary to
initialize traversal (to avoid conflicts with bid=-1 => 0 caused by
empty btrees). But the end result is a btree traversal that only needs
one extra word of state.

---

Unfortunately, in practice, the savings were not as great as expected:

           code          stack          ctx
  before: 36792           2400          684
  after:  36876 (+0.2%)   2384 (-0.7%)  684 (+0.0%)

This does claw back some stack, but less than a full rbyd due to the
union with the mtortoise in lfs3_trv_t. The mtortoise now dominates. It
might be possible to union the mtortoise and the bshrub/btree state
better (both are not needed at the same time), but strict aliasing rules
in C make this tricky.

The new lfs3_btree_traverse is also a bit more complicated in terms of
code cost. In theory this would be offset by the simpler traversal setup
logic, but we only actually call lfs3_btree_traverse twice:

1. In lfs3_mtree_traverse
2. In lfs3_file_ck

Still, some stack savings + a simpler internal API makes this worthwhile
for now. lfs3_trv_t is also due for a revisit, and hopefully it's
possible to better union things with btree leaf caches somehow.
2025-07-21 16:36:50 -05:00