Commit Graph

136 Commits

Author SHA1 Message Date
6d9c077261 Reordered LFSR_TAG_NAMELIMIT/FILELIMIT
Not sure why, but this just seems more intuitive/correct. Maybe because
LFSR_TAG_NAME is always the first tag in a file's attr set:

  LFSR_TAG_NAMELIMIT    0x0039  v--- ---- --11 1--1
  LFSR_TAG_FILELIMIT    0x003a  v--- ---- --11 1-1-

Seeing as several parts of the codebase still use the previous order,
it seems reasonable to switch back to that.

No code changes.
2025-05-24 21:51:06 -05:00
55ea13b994 scripts: Reverted del to resolve shadowed builtins
I don't know how I completely missed that this doesn't actually work!

Using del _does_ work in Python's repl, but it makes sense the repl may
differ from actual function execution in this case.

The problem is Python still thinks the relevant builtin is a local
variables after deletion, raising an UnboundLocalError instead of
performing a global lookup. In theory this would work if the variable
could be made global, but since global/nonlocal statements are lifted,
Python complains with "SyntaxError: name 'list' is parameter and
global".

And that's A-Ok! Intentionally shadowing language builtins already puts
this code deep into ugly hacks territory.
2025-05-15 14:10:42 -05:00
de7564e448 Added phase bits to cksum tags
This carves out two more bits in cksum tags to store the "phase" of the
rbyd block (maybe the name is too fancy, this is just the lowest 2 bits
of the block address):

  LFSR_TAG_CKSUM        0x300p  v-11 ---- ---- -pqq
                                                ^ ^
                                                | '-- phase bits
                                                '---- perturb bit

The intention here is to catch mrootanchors that are "out-of-phase",
i.e. they've been shifted by a small number of blocks.

This can happen if we find the wrong mrootanchor (after, say, a magic
scan), and risks filesystem corruption:

                formatted
  .-----------------'-----------------.
                          mounted
           .-----------------'-----------------.
  .--------+--------+--------+--------+ ...
  |(erased)| mroot  |
  |        | anchor |                   ...
  |        |        |
  '--------+--------+--------+--------+ ...

Including the lower 2 bits of the block address in cksum tags avoids
this, for up to a 3 block shift (the maximum number of redund
mrootanchors).

---

Note that cksum tags really are the only place we could put these bits.
Anywhere else and they would interfere with the canonical cksum, which
would break error correction. By definition these need to be different
per block.

We include these phase bits in every cksum tag (because it's easier),
but these don't really say much about mdirs that are not the
mrootanchor. Non-anchor mdirs can have arbitrary block addresses,
therefore arbitrary phase bits.

You _might_ be able to do something interesting if you sort the rbyd
addresses and use the index as the phase bits, but that would add quite
a bit of code for questionable benefit...

You could argue this adds noise to our cksums, but:

1. 2 bits seems like a really small amount of noise
2. our cksums are just crc32cs
3. the phase bits humorously never change when you rewrite a block

---

As with any feature this adds code, but only a small amount. I think
it's worth the extra protection:

           code          stack          ctx
  before: 35792           2368          636
  after:  35824 (+0.1%)   2368 (+0.0%)  636 (+0.0%)

Also added test_mount_incompat_out_of_phase to test this.

The dbg scripts _don't_ error (block mismatch seems likely when
debugging), but dbgrbyd.py at least adds phase mismatch notes in
-l/--log mode.
2025-04-30 00:57:17 -05:00
f2e6b60f36 Reworked grm encoding a bit
This drops the leading count/mode byte, and instead uses mid=0 to
terminate grms. This shaves off 1 bytes from grmdeltas.

Previously, we needed the count/mode byte for a couple reasons:

- We needed to know the number of grm entries somehow, and there wasn't
  always an obvious sentinel value. mid=-1, for example, is
  unrepresentable with our unsigned leb128 encoding.

  But now that development has settled, we can use mid=0.0 to figure out
  the end-of-queue. mid=0.0 should always map to the root bookmark,
  which doesn't make sense to delete, so it makes for a reasonable null
  terminator here.

- It provided a route for future grm extensions, which could use the >2
  count/mode encodings.

  But I think we can use additional grm tag encodings for this.

  There's only one gdelta tag so far, but the current plan for future
  gdelta tags is to carve out the bottom 2 bits for redund like we do
  with the struct tags:

    LFSR_TAG_GDELTA        0x01tt  v--- ---1 -ttt ttrr
    LFSR_TAG_GRMDELTA      0x0100  v--- ---1 ---- ----
    LFSR_TAG_GBMAPDELTA    0x0104  v--- ---1 ---- -1rr
    LFSR_TAG_GDDTREEDELTA  0x0108  v--- ---1 ---- 1-rr
    LFSR_TAG_GPTREEDELTA   0x010c  v--- ---1 ---- 11rr
    ...

  Decoding is a bit more complicated for gstate, since we will need to
  xor those bits if mutable, but this avoids needing a full byte just
  for redund in every auxiliary tree.

  Long story short, we can leverage the lower 2 bits of the grm tag for
  future extensions using the same mechanism.

This may seem like a lot of effort for only a handful of bytes, but keep
in mind each gdelta lives in more-or-less every mdir in the filesystem.

Also saves a bit of code/ctx:

           code          stack          ctx
  before: 35772           2368          640
  after:  35768 (-0.0%)   2368 (+0.0%)  636 (-0.6%)
2025-04-30 00:53:33 -05:00
1f4d7b3b7e scripts: dbgmtree.py: Dropped Mtree.lookupnext
I was toying with making this look more like the mtree API in lfs.c (so
no lookupleaf/namelookupleaf, only lookup/namelookup), but dropped the
idea:

- It would be tedious

- The Mtree class's lookupleaf/namelookupleaf are also helpful for
  returning inner btree nodes when printing debug info

- Not embedding mids in the Mdir class would complicate things

It's ok for these classes to not match littlefs's internal API
_exactly_. The goal is easy access for debug info, not to port the
filesystem to Python.

At least dropped Mtree.lookupnext, because that function really makes no
sense.
2025-04-30 00:44:16 -05:00
677c078b50 Added LFSR_TAG_BNAME/MNAME, stop btree lookups at first tag
Now that we don't have to worry about name tag conflicts as much, we
can add name tags for things that aren't files.

This adds LFSR_TAG_BNAME for branch names, and LFSR_TAG_MNAME for mtree
names. Note that the upper 4 bits of the subtype match LFSR_TAG_BRANCH
and LFSR_TAG_MDIR respectively:

  LFSR_TAG_BNAME        0x0200  v--- --1- ---- ----
  LFSR_TAG_MNAME        0x0220  v--- --1- --1- ----

  LFSR_TAG_BRANCH       0x030r  v--- --11 ---- --rr
  LFSR_TAG_MDIR         0x0324  v--- --11 --1- -1rr

The encoding is somewhat arbitrary, but I figured reserving ~31 types
for files is probably going to be plenty for littlefs. POSIX seems to
do just fine with only ~7 all these years, and I think custom attributes
will be more enticing for "niche" file types (symlinks, compressed
files, etc), given the easy backwards compatibility.

---

In addition to the debugging benefits, the new name tags let us stop
btree lookups on the first non-bname/branch tag. Previously we always
had to fetch the first struct tag as well to check if it was a branch.

In theory this saves one rbyd lookup, but in practice it's a bit muddy.

The problem is that there's two ways to use named btrees:

1. As buckets: mtree -> mdir -> mid
2. As a table: ddtree -> ddid

The only named btree we _currently_ have is the mtree. And the mtree
operates in bucket mode, with each mdir acting more-or-less as an
extension to the btree. So we end up needing to do the second tag lookup
anyways, and all we've done is complicated up the code.

But we will _eventually_ need the table mode for the ddtree, where we
care if the ddname is an exact match.

And returning the first tag is arguably the more "correct" internal API,
vs arbitrarily the first struct tag.

But then again this change is pretty pricey...

           code          stack          ctx
  before: 35732           2440          640
  after:  35888 (+0.4%)   2480 (+1.6%)  640 (+0.0%)

---

It's worth noting the new BNAME/MNAME tags don't _require_ the btree
lookup changes (which is why we can get away with not touching the dbg
scripts). The previous algorithm of always checking for branch tags
still works.

Maybe there's an argument for conditionally using the previous API when
compiling without the ddtree, but that sounds horrendously messy...
2025-04-30 00:25:30 -05:00
d308ec8322 Reworked tag encoding a little bit
Mainly to make room for some future planned stuff:

- Moved the mroot's redund bits from LFSR_TAG_GEOMETRY to
  LFSR_TAG_MAGIC:

    LFSR_TAG_MAGIC        0x003r  v--- ---- --11 --rr

  This has the benefit of living in a fixed location (off=0x5), which
  may make mounting/debugging easier. It also makes LFSR_TAG_GEOMETRY
  less of a special case (LFSR_TAG_MAGIC is already a _very_ special
  case).

  Unfortunately, this does get in the way of our previous magic=0x3
  encoding. To compensate (and to avoid conflicts with LFSR_TAG_NULL),
  I've added the 0x3_ prefix. This has the funny side-effect of
  rendering redunds 0-3 as ascii 0-3 (0x30-0x33), which is a complete
  accident but may actually be useful when debugging.

  Currently all config tags fit in the 0x3_ prefix, which is nice for
  debugging but not a hard requirement.

- Flipped LFSR_TAG_FILELIMIT/NAMELIMIT:

    LFSR_TAG_FILELIMIT    0x0039  v--- ---- --11 1--1
    LFSR_TAG_NAMELIMIT    0x003a  v--- ---- --11 1-1-

  The file limit is a _bit_ more fundamental. It's effectively the
  required integer size for the filesystem.

  These may also be followed by LFSR_TAG_ATTRLIMIT based on how future
  attr revisits go.

- Rearranged struct tags so that LFSR_TAG_BRANCH = 0x300:

    LFSR_TAG_BRANCH       0x030r  v--- --11 ---- --rr
    LFSR_TAG_DATA         0x0304  v--- --11 ---- -1--
    LFSR_TAG_BLOCK        0x0308  v--- --11 ---- 1err
    LFSR_TAG_DDKEY*       0x0310  v--- --11 ---1 ----
    LFSR_TAG_DID          0x0314  v--- --11 ---1 -1--
    LFSR_TAG_BSHRUB       0x0318  v--- --11 ---1 1---
    LFSR_TAG_BTREE        0x031c  v--- --11 ---1 11rr
    LFSR_TAG_MROOT        0x032r  v--- --11 --1- --rr
    LFSR_TAG_MDIR         0x0324  v--- --11 --1- -1rr
    LFSR_TAG_MTREE        0x032c  v--- --11 --1- 11rr

    *Planned

  LFSR_TAG_BRANCH is a very special tag when it comes to bshrub/btree
  traversal, so I think it deserves the subtype=0 slot.

  This also just makes everything fit together better, and makes room
  for the future planned ddkey tag.

Code changes minimal:

           code          stack          ctx
  before: 35728           2440          640
  after:  35732 (+0.0%)   2440 (+0.0%)  640 (+0.0%)
2025-04-29 16:25:00 -05:00
7dd473df82 Tweaked LFSR_TAG_STICKYNOTE encoding 0x205 -> 0x203
Now that LFS_TYPE_STICKYNOTE is a real type users can interact with, it
makes sense to group it with REG/DIR. This also has the side-effect of
making these contiguous.

---

LFSR_TAG_BOOKMARKs, however, are still hidden from the user. This
unfortunately means there will be a bit of a jump if we ever add
LFS_TYPE_SYMLINK in the future, but I'm starting to wonder if that's the
best way to approach symlinks in littlefs...

If instead LFS_TYPE_SYMLINKS were implied via custom attribute, you
could avoid the headache that comes with adding a new tag encoding, and
allow perfect compatibility with non-symlink drivers. Win win.

This seems like a better approach for _all_ of the theoretical future
types (compressed files, device files, etc), and avoids the risk of
oversaturating the type space.

---

This had a surprising impact on code for just a minor encoding tweak. I
guess the contiguousness pushed the compiler to use tables/ranges for
more things? Or maybe 3 vs 5 is just an easier constant to encode?

           code          stack          ctx
  before: 35952           2440          640
  after:  35928 (-0.1%)   2440 (+0.0%)  640 (+0.0%)
2025-04-24 14:35:52 -05:00
a73f221317 scripts: Fixed issue where rbyd lookups rejected shrub tags
This was caused by including the shrub bit in the tag comparison in
Rbyd.lookup.

Fixed by adding an extra key mask (0xfff). Note this is already how
lfsr_rbyd_lookup works in lfs.c.
2025-04-23 23:19:37 -05:00
6d97398efc scripts: dbglfs.py: Fixed a couple mid=-1 issues
- Fixed Mtree.lookupleaf accepting mbid=0, which caused dbglfs.py to
  double print all files with mbid=-1

- Fixed grm mids not being mapped to mbid=-1 and related orphan false
  positives
2025-04-23 23:19:05 -05:00
8f1ccf089e Adopted lookupleaf, reworked internal btree APIs
This was a surprising side-effect the script rework: Realizing the
internal btree/rbyd lookup APIs were awkwardly inconsistent and could be
improved with a couple tweaks:

- Adopted lookupleaf name for functions that return leaf rbyds/mdirs.

  There's an argument this should be called lookupnextleaf, since it
  returns the next bid, unlike lookup, but I'm going to ignore that
  argument because:

  1. A non-next lookupleaf doesn't really make sense for trees where
     you don't have to fetch the leaf (the mtree)

  2. It would be a bit too verbose

- Adopted commitleaf name for functions that accept leaf rbyds.

  This makes the lfsr_bshrub_commit -> lfsr_btree_commit__ mess a bit
  more readable.

- Strictly limited lookup and lookupnext to return rattrs, even in
  complex trees like the mtree.

  Most use cases will probably stick to the lookupleaf variants, but at
  least the behavior will be consistent.

- Strictly limited lookup to expect a known bid/rid.

  This only really matters for lfsr_btree/bshrub_lookup, which as a
  quirk of their implementation _can_ lookup both bid + rattr at the
  same time. But I don't think we'll need this functionality, and
  limited the behavior may allow for future optimizations.

  Note there is no lfsr_file_lookup. File btrees currently only ever
  have a single leaf rattr, so this API doesn't really make sense.

Internal API changes:

- lfsr_btree_lookupnext_ -> lfsr_btree_lookupleaf
- lfsr_btree_lookupnext  -> lfsr_btree_lookupnext
- lfsr_btree_lookup      -> lfsr_btree_lookup
- added                     lfsr_btree_namelookupleaf
- lfsr_btree_namelookup  -> lfsr_btree_namelookup
- lfsr_btree_commit__    -> lfsr_btree_commit_
- lfsr_btree_commit_     -> lfsr_btree_commitleaf
- lfsr_btree_commit      -> lfsr_btree_commit

- added                     lfsr_bshrub_lookupleaf
- lfsr_bshrub_lookupnext -> lfsr_bshrub_lookupnext
- lfsr_bshrub_lookup     -> lfsr_bshrub_lookup
- lfsr_bshrub_commit_    -> lfsr_bshrub_commitleaf
- lfsr_bshrub_commit     -> lfsr_bshrub_commit

- lfsr_mtree_lookup      -> lfsr_mtree_lookupleaf
- added                     lfsr_mtree_lookupnext
- added                     lfsr_mtree_lookup
- added                     lfsr_mtree_namelookupleaf
- lfsr_mtree_namelookup  -> lfsr_mtree_namelookup

- added                     lfsr_file_lookupleaf
- lfsr_file_lookupnext   -> lfsr_file_lookupnext
- added                     lfsr_file_commitleaf
- lfsr_file_commit       -> lfsr_file_commit

Also added lookupnext to Mdir/Mtree in the dbg scripts.

Unfortunately this did add both code and stack, but only because of the
optional mdir returns in the mtree lookups:

           code          stack          ctx
  before: 35520           2440          636
  after:  35548 (+0.1%)   2472 (+1.3%)  636 (+0.0%)
2025-04-20 15:53:18 -05:00
3ca6670dcd Always log mbid=-1 for mroots and inlined mdirs
So mbid=0 now implies the mdir is not inlined.

Downsides:

- A bit more work to calculate
- May lose information due to masking everything when mtree.weight==0
- Risk of confusion when in-lfs.c state doesn't match (mbid=-1 is
  implied by mtree.weight==0)

Upsides:

- Includes more information about the topology of the mtree
- Avoids multiple dbgmbids for the same physical mdir

Also added lfsr_dbgmbid and lfsr_dbgmrid to help make logging
easier/more consistent.

And updated dbg scripts.
2025-04-20 15:53:18 -05:00
04d3002f3a Adopted ceiling division in mbits formula
So now:
               (block_size)
  mbits = nlog2(----------) = nlog2(block_size) - 3
               (     8    )

Instead of:

               (     (block_size))
  mbits = nlog2(floor(----------)) = nlog2(block_size & ~0x7) - 3
               (     (     8    ))

This makes the post-log - 3 formula simpler, which we probably want to
prefer as it avoids a division. And ceiling is arguably more intuitive
corner case behavior.

This may seem like a minor detail, but because mbits is purely
block_size derived and not configurable, any quirks here will become
a permanent compatibility requirement.

And hey, it saves a couple bytes (I'm not really sure why, the division
should've been optimized to a shift):

           code          stack          ctx
  before: 35528           2440          636
  after:  35520 (-0.0%)   2440 (+0.0%)  636 (+0.0%)
2025-04-20 15:53:18 -05:00
bd70270e11 scripts: Added -w/--word-bits to bound dbgleb128/dbgle32 parsing
This is limited to dbgle32.py, dbgleb128.py, and dbgtag.py for now.

This more closely matches how littlefs behaves, in that we read a
bounded number of bytes before leb128 decoding. This minimizes bugs
related to leb128 overflow and avoids reading inherently undecodable
data.

The previous unbounded behavior is still available with -w0.

Note this gives dbgle32.py much more flexibility in that it can now
decode other integer widths. Uh, ignore the name for now. At least it's
self documenting that the default is 32-bits...

---

Also fixed a bug in fromleb128 where size was reported incorrectly on
offset + truncated leb128.
2025-04-16 15:23:12 -05:00
0cea8b96fb scripts: Fixed O(n^2) slicing in Rbyd.fetch
Do you see the O(n^2) behavior in this loop?

  j = 0
  while j < len(data):
      word, d = fromleb(data[j:])
      j += d

The slice, data[j:], creates a O(n) copy every iteration of the loop.

A bit tricky. Or at least I found it tricky to notice. Maybe because
array indexing being cheap is baked into my brain...

Long story short, this repeated slicing resulted in O(n^2) behavior in
Rbyd.fetch and probably some other functions. Even though we don't care
_too_ much about performance in these scripts, having Rbyd.fetch run in
O(n^2) isn't great.

Tweaking all from* functions to take an optional index solves this, at
least on paper.

---

In practice I didn't actually find any measurable performance gain. I
guess array slicing in Python is optimized enough that the constant
factor takes over?

(Maybe it's being helped by us limiting Rbyd.fetch to block_size in most
scripts? I haven't tested NAND block sizes yet...)

Still, it's good to at least know this isn't a bottleneck.
2025-04-16 15:23:11 -05:00
b5c3b97ae1 scripts: Reworked dbgtag.py, added -i/--input, included hex in output
This just gives dbgtag.py a few more bells and whistles that may be
useful:

- Can now parse multiple tags from hex:

    $ ./scripts/dbgtag.py -x 71 01 01 01 12 02 02 02
    71 01 01 01    altrgt 0x101 w1 -1
    12 02 02 02    shrubdir w2 2

  Note this _does_ skip attached data, which risks some confusion but
  not skipping attached data will probably end up printing a bunch of
  garbage for most use cases:

    $ ./scripts/dbgtag.py -x 01 01 01 04 02 02 02 02 03 03 03 03
    01 01 01 04    gdelta 0x01 w1 4
    03 03 03 03    struct 0x03 w3 3

- Included hex in output. This is helpful for learning about the tag
  encoding and also helps identify tags when parsing multiple tags.

  I considered also included offsets, which might help with
  understanding attached data, but decided it would be too noisy. At
  some point you should probably jump to dbgrbyd.py anyways...

- Added -i/--input to read tags from a file. This is roughly the same as
  -x/--hex, but allows piping from other scripts:

    $ ./scripts/dbgcat.py disk -b4096 0 -n4,8 | ./scripts/dbgtag.py -i-
    80 03 00 08    magic 8

  Note this reads the entire file in before processing. We'd need to fit
  everything into RAM anyways to figure out padding.
2025-04-16 15:23:10 -05:00
a5747bb2b2 scripts: dbgmtree.py: Fixed minor mtree rendering/traversal issues
- Added TreeArt __bool__ and __len__.

  This was causing a crash in _treeartfrommtreertree when rtree was
  empty.

  The code was not updated in the set -> TreeArt class transition, and
  went unnoticed because it's unlikely to be hit unless the filesystem
  is corrupt.

  Fortunately(?) realtime rendering creates a bunch of transiently
  corrupt filesystem images.

- Tweaked lookupleaf to not include mroots in their own paths.

  This matches the behavior of leaf mdirs, and is intentionally
  different from btree's lookupleaf which needs to lookup the leaf rattr
  to terminate.

- Tweaked leaves to not remove the last path entry if it is an mdir.

  This hid the previous lookupleaf inconsistency. We only remove the
  last rbyd from the path because it is redundant, and for mdirs/mroots
  it should never be redundant.

  I ended up just replacing the corrupt check with an explicit check
  that the rbyd is redundant. This should be more precise and avoid
  issues like this in the future.

  Also adopted explicit redundant checks in Btree.leaves and
  Lfs.File.leaves.
2025-04-16 15:23:08 -05:00
b715e9a749 scripts: Prefer 1;30-37m ansi codes over 90-97m
Reading Wikipedia:

> Later terminals added the ability to directly specify the "bright"
> colors with 90–97 and 100–107.

So if we want to stick to one pattern, we should probably go with
brightness as a separate modifier.

This shouldn't noticeably change any script, unless your terminal
interprets 90-97m colors differently from 1;30-37m, in which case things
should be more consistent now.
2025-04-16 15:22:43 -05:00
3ff25a4fdf scripts: dbgbmap[d3].py: Disabled gcksum checking by default
By default, we don't actually do anything if we find an invalid gcksum,
so there's no reason to calculate it everytime.

Though this performance improvement may not be very noticeable:

  dbgbmap.py w/  crc32c lib w/  no_ck --no-ckdata: 0m0.221s
  dbgbmap.py w/  crc32c lib w/o no_ck --no-ckdata: 0m0.269s
  dbgbmap.py w/o crc32c lib w/  no_ck --no-ckdata: 0m0.388s
  dbgbmap.py w/o crc32c lib w/o no_ck --no-ckdata: 0m0.490s
  dbgbmap.old.py:                                  0m0.231s

Note that there's no point in adopting this in dbgbmapd3.py: 1. svg
rendering dominates (probably, I haven't measured this), and 2. we
default to showing the littlefs mount string instead of mdir/btree/data
percentages.
2025-04-16 15:22:36 -05:00
3820be180d scripts: Adopted crc32c lib when available
Jumping from a simple Python implementation to the fully hardware
accelerated crc32c library basically deletes any crc32c related
bottlenecks:

  crc32c.py disk (1MiB) w/  crc32c lib: 0m0.027s
  crc32c.py disk (1MiB) w/o crc32c lib: 0m0.844s

This uses the same try-import trick we use for inotify_simple, so we get
the speed improvement without losing portability.

---

In dbgbmap.py:

  dbgbmap.py w/  crc32c lib:             0m0.273s
  dbgbmap.py w/o crc32c lib:             0m0.697s
  dbgbmap.py w/  crc32c lib --no-ckdata: 0m0.269s
  dbgbmap.py w/o crc32c lib --no-ckdata: 0m0.490s
  dbgbmap.old.py:                        0m0.231s

The bulk of the runtime is still in Rbyd.fetch, but this is now
dominated by leb128 decoding, which makes sense. We do ~twice as many
fetches in the new dbgbmap.py in order to calculate the gcksum (which
we then ignore...).
2025-04-16 15:22:34 -05:00
6ea18e6579 scripts: Tweaked bd.read to behave like an actual bd_read callback
This better matches what you would expect from a function called
bd.read, at least in the context of littlefs, while also decreasing the
state (seek) we have to worry about.

Note that bd.readblock already behaved mostly like this, and is
preferred by every class except for Bptr.
2025-04-16 15:22:32 -05:00
b2911fbbe7 scripts: Removed item/iter magic methods from fs object classes
So no more __getitem__, __contains__, or __iter__ for Rbyd, Btree, Mdir,
Mtree, Lfs.File, etc.

These were way too error-prone, especially when accidental unpacking
triggered unintended disk traversal and weird error states. We didn't
even use the implicit behavior because we preferred the full name for
heavy disk operations.

The motivation for this was Python not catching this bug, which is a bit
silly:

  rid, rattr, *path_ = rbyd
2025-04-16 15:22:28 -05:00
33120bf930 scripts: Reworked dbgbmap.py
This is a rework of dbgbmap.py to match dbgbmapd3.py, adopt the new
Rbyd/Lfs class abstractions, as well as Canvas, -k/--keep-open, etc.

Some of the main changes:

- dbgbmap.py now reports corrupt/conflict blocks, which can be useful
  for debugging.

  Note though that you will probably get false positives if running with
  -k/--keep-open while something is writing to the disk. littlefs is
  powerloss safe, not multi-write safe! Very different problem!

- dbgbmap.py now groups by blocks before mapping to the space filling
  curve. This matches dbgbmapd3.py and I think is more intuitive now
  that we have a bmap tiling algorithm.

  -%/--usage still works, but is rendered as a second space filling
  curve _inside_ the block tile. Different blocks can end up with
  slightly different sizes due to rounding, but it's not the end of the
  world.

  I wasn't originally going to keep it around, but ended up caving, so
  you can still get the original byte-level curve via -u/--contiguous.

- Like the other ascii rendering script, dbgbmap.py now supports
  -k/--keep-open and friends as a thin main wrapper. This just makes it
  a bit easier to watch a realtime bmap without needing to use watch.py.

- --mtree-only is supported, but filtering via --mdirs/--btrees/--data
  is _not_ supported. This was too much complexity for a minor feature,
  and doesn't cover other niche blocks like corrupted/conflict or parity
  in the future.

- Things are more customizable thanks to the Attr class. For an example
  you can now use the littlefs mount string as the title via
  --title-littlefs.

- Support for --to-scale and -t/--tiny mode, if you want to scale based
  on block_size.

One of the bigger differences dbgbmapd3.py -> dbgbmap.py is that
dbgbmap.py still supports -%/--usage. Should we backport -%/--usage to
dbgbmapd3.py? Uhhhh...

This ends up a funny example of raster graphics vs vector graphics. A
pixel-level space filling curve is easy with raster graphics, but with
an svg you'd need some sort of pixel -> path wrapping algorithm...

So no -%/--usage in dbgbmapd3.py for now.

Also just ripped out all of the -@/--blocks byte-level range stuff. Way
too complicated for what it was worth. -@/--blocks is limited to simple
block ranges now. High-level scripts should stick to high-level options.

One last thing to note is the adoption of "if '%' in label__" checks
before applying punescape. I wasn't sure if we should support punescape
in dbgbmap.py, since it's quite a bit less useful here, and may be
costly due to the lazy attr generation. Adding this simple check avoids
the cost and consistency question, so I adopted it in all scripts.
2025-04-16 15:22:24 -05:00
202636cccd scripts: Tweaked corrupt rbyd coloring to include addresses
This matches the coloring in dbglfs.py for other erroneous conditions,
and also matches how we color hidden items when shown.

Also fixed some minor bugs in grm printing.
2025-04-16 15:22:23 -05:00
5682fd6163 scripts: dbglfs.py: Added --ckmeta/--ckdata
For more aggressive checking of filesystem state. These should match the
behavior of LFS_M_CKMETA/CKDATA in lfs.c.

Also tweaked dbgbmapd3.py (and eventually dbgmap.py) to match, though we
don't need new flags there since we're already checking every block in
the filesystem.
2025-04-16 15:22:21 -05:00
97e2786545 scripts: Synced dbgbmapd3.py Lfs class changes
- Added Lfs.traverse for full filesystem traversal
- Added Rbyd.shrub flag so we can tell if an Rbyd is a shrub
- Removed redundant leaves from paths in leaf iters
2025-04-16 15:22:19 -05:00
5f06558cbe scripts: Added dbgbmapd3.py for bmap -> svg rendering
Like codemapd3.py this include an interactive UI for viewing the
underlying filesystem graph, including:

- mode-tree - Shows all reachable blocks from a given block
- mode-branches - Shows immediate children of a given block
- mode-references - Shows parents of a given block
- mode-redund - Shows sibling blocks in redund groups (This is
  currently just mdir pairs, but the plan is to add more)

This is _not_ a full filesystem explorer, so we don't embed all block
data/metadata in the svg. That's probably a project for another time.
However we do include interesting bits such as trunk addresses,
checksums, etc.

An example:

  # create an filesystem image
  $ make test-runner -j
  $ ./scripts/test.py -B test_files_many -a -ddisk -O- \
          -DBLOCK_SIZE=1024 \
          -DCHUNK=10 \
          -DSIZE=2050 \
          -DN=128 \
          -DBLOCK_RECYCLES=1
  ... snip ...
  done: 2/2 passed, 0/2 failed, 164pls!, in 0.16s

  # generate bmap svg
  $ ./scripts/dbgbmapd3.py disk -b1024 -otest.svg \
          -W1400 -H750 -Z --dark
  updated test.svg, littlefs v0.0 1024x1024 0x{26e,26f}.d8 w64.128, cksu
  m 41ea791e

And open test.svg in a browser of your choice.

Here's what the current colors mean:

- yellow => mdirs
- blue   => btree nodes
- green  => data blocks
- red    => corrupt/conflict issue
- gray   => unused blocks

But like codemapd3.py the output is decently customizable. See -h/--help
for more info.

And, just like codemapd3.py, this is based on ideas from d3 and
brendangregg's flamegraphs:

- d3 - https://d3js.org
- brendangregg's flamegraphs - https://github.com/brendangregg/FlameGraph

Note we don't actually use d3... the name might be a bit confusing...

---

One interesting change from the previous dbgbmap.py is the addition of
"corrupt" (bad checksum) and "conflict" (multiple parents) blocks, which
can help find bugs.

You may find the "conflict" block reporting a bit strange. Yes it's
useful for finding block allocation failures, but won't naturally formed
dags in file btrees also be reported as "conflicts"?

Yes, but the long-term plan is to move away from dags and make littlefs
a pure tree (for block allocator and error correction reasons). This
hasn't been implemented yet, so for now dags will result in false
positives.

---

Implementation wise, this script was pretty straightforward given prior
dbglfs.py and codemapd3.py work.

However there was an interesting case of https://xkcd.com/1425:

- Traverse the filesystem and build a graph - easy
- Tile a rectangle with n nice looking rectangles - uhhh

I toyed around with an analytical approach (something like block width =
sqrt(canvas_width*canvas_height/n) * block_aspect_ratio), but ended up
settling on an algorithm that divides the number of columns by 2 until
we hit our target aspect ratio.

This algorithm seems to work quite well, runs in only O(log n), and
perfectly tiles the grid for powers-of-two. Honestly the result is
better than I was expecting.
2025-04-16 15:22:17 -05:00
27370dec66 scripts: Tweaked mdir/shrub address printing
This fixes an issue where shrub trunks were never printed even with
-i/--internal.

While only showing mdir/shrub/btree/bptr addresses on block changes is
nice in theory, it results in shrub trunks never being printed because
the mdir -> shrub block doesn't change.

Also checking for changes in block type avoids this.
2025-04-16 15:22:16 -05:00
f550fa9a80 scripts: Changed most tree renderers to be pseudo-standalone
I'm trying to avoid having classes with different implementations across
scripts, as it makes updating things error-prone, but at same time
copying all the tree renderers to all dbg scripts would be a bit much.

Monkey-patching the TreeArt class in relevant scripts seems like a
reasonable compromise.
2025-04-16 15:22:15 -05:00
682f12a953 scripts: Moved tree renderers out into their own class
These are pretty script specific, so probably shouldn't be in the
abstract littlefs classes. This also avoids the tree renderers getting
copied into scripts that don't need them (mtree -> dbglfs.py, dbgbmap.py
in the future, etc).

This also makes TreeArt consistent with JumpArt and LifetimeArt.
2025-04-16 15:22:14 -05:00
002c2ea1e6 scripts: Tried to simplify optional path returns
So, instead of trying to be clever with python's tuple globbing, just
rely on lazy tuple unpacking and a whole bunch of if statements.

This is more verbose, but less magical. And generally, the less magic
there is, the easier things are to read.

This also drops the always-tupled lookup_ variants, which were
cluttering up the various namespaces.
2025-04-16 15:22:12 -05:00
82f4fd3c0f scripts: Dropped list/tuple distinction in Rbyd.fetch
Also tweaked how we fetch shrubs, adding Rbyd.fetchshrub and
Btree.fetchshrub instead of overloading the bd argument.

Oh, and also added --trunk to dbgmtree.py and dbglfs.py. Actually
_using_ --trunk isn't advised, since it will probably just result in a
corrupted filesystem, but these scripts are for accessing things that
aren't normally allowed anyways.

The reason for dropping the list/tuple distinction is because it was a
big ugly hack, unpythonic, and likely to catch users (and myself) by
surprise. Now, Rbyd.fetch and friends always require separate
block/trunk arguments, and the exercise of deciding which trunk to use
is left up to the caller.
2025-04-16 15:22:11 -05:00
270230a833 scripts: Adopted del to resolve shadowed builtins
So:

  all_ = all; del all

Instead of:

  import builtins
  all_, all = all, builtins.all

The del exposes the globally scoped builtin we accidentally shadow.

This requires less megic, and no module imports, though tbh I'm
surprised it works.

It also works in the case where you change a builtin globally, but
that's a bit too crazy even for me...
2025-04-16 15:22:08 -05:00
8324786121 scripts: Reverted skipped branches in -t/--tree render
The inconsistency between inner/non-inner (-i/--inner) views was a bit
too confusing.

At least now the bptr rendering in dbglfs.py matches behavior, showing
the bptr tag -> bptr jump even when not showing inner nodes.

If the point of these renderers is to show all jumps necessary to reach
a given piece of data, hiding bptr jumps only sometimes is somewhat
counterproductive...
2025-04-16 15:22:07 -05:00
97b6489883 scripts: Reworked dbglfs.py, adopted Lfs, Config, Gstate, etc
I'm starting to regret these reworks. They've been a big time sink. But
at least these should be much easier to extend with the future planned
auxiliary trees?

New classes:

- Bptr - A representation of littlefs's data-only block pointers.

  Extra fun is the lazily checked Bptr.__bool__ method, which should
  prevent slowing down scripts that don't actually verify checksums.

- Config - The set of littlefs config entries.

- Gstate - The set of littlefs gstate.

  I may have had too much fun with Config and Gstate. Not only do these
  provide lookup functions for config/gstate, but known config/gstate
  get lazily parsed classes that can provide easy access to the relevant
  metadata.

  These even abuse Python's __subclasses__, so all you need to do to add
  a new known config/gstate is extend the relevant Config.Config/
  Gstate.Gstate class.

  The __subclasses__ API is a weird but powerful one.

- Lfs - The big one, a high-level abstraction of littlefs itself.

  Contains subclasses for known files: Lfs.Reg, Lfs.Dir, Lfs.Stickynote,
  etc, which can be accessed by path, did+name, mid, etc. It even
  supports iterating over orphaned files, though it's expensive (but
  incredibly valuable for debugging!).

  Note that all file types can currently have attached bshrubs/btrees.
  In the existing implementation only reg files should actually end up
  with bshrubs/btrees, but the whole point of these scripts is to debug
  things that _shouldn't_ happen.

  I intentionally gave up on providing depth bounds in Lfs. Too
  complicated for something so high-level.

On noteworthy change is not recursing into directories by default. This
hopefully avoids overloading new users and matches the behavior of most
other Linux/Unix tools.

This adopts -r/--recurse/--file-depth for controlling how far to recurse
down directories, and -z/--depth/--tree-depth for controlling how far to
recurse down tree structures (mostly files). I like this API. It's
consistent with -z/--depth in the other dbg scripts, and -r/--recurse is
probably intuitive for most Linux/Unix users.

To make this work we did need to change -r/--raw -> -x/--raw. But --raw
is already a bit of a weird name for what really means "include a hex
dump".

Note that -z/--depth/--tree-depth does _not_ imply --files. Right now
only files can contain tree structures, but this will change when we get
around to adding the auxiliary trees.

This also adds the ability to specify a file path to use as the root
directory, though we need the leading slash to disambiguate file paths
and mroot addresses.

---

Also tagrepr has been tweaked to include the global/delta names,
toggleable with the optional global_ kwarg.

Rattr now has its own lazy parsers for did + name. A more organized
codebase would probably have a separate Name type, but it just wasn't
worth the hassle.

And the abstraction classes have all been tweaked to require the
explicit Rbyd.repr() function for a CLI-friendly representation. Relying
on __str__ hurt readability and debugging, especially since Python
prefers __str__ over __repr__ when printing things.
2025-04-16 15:22:06 -05:00
73127470f9 scripts: Adopted rbydaddr/tagrepr changes across scripts
Just some minor tweaks:

- rbydaddr: Return list instead of tuple, note we rely on the type
  distinction in Rbyd.fetch now.

- tagrepr: Rename w -> weight.
2025-04-16 15:21:59 -05:00
192c58318f scripts: Changed dbglfs.py's cksum mismatch hint to ?
Maybe it's just me, but it seems a bit more obvious cksum 2f629db4? is
an error vs cksum 2f629db4!. The second one just makes it seem like
dbglfs.py is really excited.
2025-02-08 15:02:31 -06:00
68f0534dd0 rbyd: Dropped special altn/alta encoding
altas, and to a lesser extend altns, are just too problematic for our
rbyd-append algorithm.

Main issue is these break our "narrowing" invariant, where each alt only
ever decreases the bounds.

I wanted to use altas to simplify lfsr_rbyd_appendcompaction, but
decided it wasn't worth it. Handling them correctly would require adding
a number of special cases to lfsr_rbyd_appendrat, adding complexity to
an already incredibly complex function.

---

Fortunately, we don't really need altns/altas on-disk, but we _do_ need
a way to mark alts as unreachable internally in order to know when we
can collapse alts when recoloring (at this point bounds information is
lost).

I was originally going to use the alt's sign bit for this, but it turns
out we already have this information thanks to setting jump=0 to assert
that an alt is unreachable. So no explicit flag needed!

This ends up saving a surprising amount of code for what is only a
couple lines of changes:

           code          stack          ctx
  before: 38512           2624          640
  after:  38440 (-0.2%)   2624 (+0.0%)  640 (+0.0%)
2025-02-08 14:53:47 -06:00
1c5adf71b3 Implemented self-validating global-checksums (gcksums)
This was quite a puzzle.

The problem: How do we detect corrupt mdirs?

Seems like a simple question, but we can't just rely on mdir cksums. Our
mdirs are independently updateable logs, and logs have this annoying
tendency to "rollback" to previously valid states when corrupted.

Rollback issues aren't littlefs-specific, but what _is_ littlefs-
specific is that when one mdir rolls back, it can disagree with other
mdirs, resulting in wildly incorrect filesystem state.

To solve this, or at least protect against disagreeable mdirs, we need
to somehow include the state of all other mdirs in each mdir commit.

---

The first thought: Why not use gstate?

We already have a system for storing distributed state. If we add the
xor of all of our mdir cksums, we can rebuild it during mount and verify
that nothing changed:

   .--------.   .--------.   .--------.   .--------.
  .| mdir 0 |  .| mdir 1 |  .| mdir 2 |  .| mdir 3 |
  ||        |  ||        |  ||        |  ||        |
  || gdelta |  || gdelta |  || gdelta |  || gdelta |
  |'-----|--'  |'-----|--'  |'-----|--'  |'-----|--'
  '------|-'   '------|-'   '------|-'   '------|-'
  '--.------'  '--.------'  '--.------'  '--.------'
   cksum |      cksum |      cksum |      cksum |
     |   |        v   |        v   |        v   |
     '---------> xor -------> xor -------> xor -------> gcksum
         |            v            v            v         =?
         '---------> xor -------> xor -------> xor ---> gcksum

Unfortunately it's not that easy. Consider what this looks like
mathematically (g is our gcksum, c_i is an mdir cksum, d_i is a
gcksumdelta, and +/-/sum is xor):

  g = sum(c_i) = sum(d_i)

If we solve for a new gcksumdelta, d_i:

  d_i = g' - g
  d_i = g + c_i - g
  d_i = c_i

The gcksum cancels itself out! We're left with an equation that depends
only on the current mdir, which doesn't help us at all.

Next thought: What if we permute the gcksum with a function t before
distributing it over our gcksumdeltas?

   .--------.   .--------.   .--------.   .--------.
  .| mdir 0 |  .| mdir 1 |  .| mdir 2 |  .| mdir 3 |
  ||        |  ||        |  ||        |  ||        |
  || gdelta |  || gdelta |  || gdelta |  || gdelta |
  |'-----|--'  |'-----|--'  |'-----|--'  |'-----|--'
  '------|-'   '------|-'   '------|-'   '------|-'
  '--.------'  '--.------'  '--.------'  '--.------'
   cksum |      cksum |      cksum |      cksum |
     |   |        v   |        v   |        v   |
     '---------> xor -------> xor -------> xor -------> gcksum
         |            |            |            |   .--t--'
         |            |            |            |   '-> t(gcksum)
         |            v            v            v          =?
         '---------> xor -------> xor -------> xor ---> t(gcksum)

In math terms:

  t(g) = t(sum(c_i)) = sum(d_i)

In order for this to work, t needs to be non-linear. If t is linear, the
same thing happens:

  d_i = t(g') - t(g)
  d_i = t(g + c_i) - t(g)
  d_i = t(g) + t(c_i) - t(g)
  d_i = t(c_i)

This was quite funny/frustrating (funnistrating?) during development,
because it means a lot of seemingly obvious functions don't work!

- t(g) = g              - Doesn't work
- t(g) = crc32c(g)      - Doesn't work because crc32cs are linear
- t(g) = g^2 in GF(2^n) - g^2 is linear in GF(2^n)!?

Fortunately, powers coprime with 2 finally give us a non-linear function
in GF(2^n), so t(g) = g^3 works:

  d_i = g'^3 - g^3
  d_i = (g + c_i)^3 - g^3
  d_i = (g^2 + gc_i + gc_i + c_i^2)(g + c_i) - g^3
  d_i = (g^2 + c_i^2)(g + c_i) - g^3
  d_i = g^3 + gc_i^2 + g^2c_i + c_i^3 - g^3
  d_i = gc_i^2 + g^2c_i + c_i^3

---

Bleh, now we need to implement finite-field operations? Well, not
entirely!

Note that our algorithm never uses division. This means we don't need a
full finite-field (+, -, *, /), but can get away with a finite-ring (+,
-, *). And conveniently for us, our crc32c polynomial defines a ring
epimorphic to a 31-bit finite-field.

All we need to do is define crc32c multiplication as polynomial
multiplication mod our crc32c polynomial:

  crc32cmul(a, b) = pmod(pmul(a, b), P)

And since crc32c is more-or-less just pmod(x, P), this lets us take
advantage of any crc32c hardware/tables that may be available.

---

Bunch of notes:

- Our 2^n-bit crc-ring maps to a 2^n-1-bit finite-field because our crc
  polynomial is defined as P(x) = Q(x)(x + 1), where Q(x) is a 2^n-1-bit
  irreducible polynomial.

  This is a common crc construction as it provides optimal odd-bit/2-bit
  error detection, so it shouldn't be too difficult to adapt to other
  crc sizes.

- t(g) = g^3 is not the only function that works, but it turns out to be
  a pretty good one:

  - 3 and 2^(2^n-1)-1 are coprime, which means our function t(g) = g^3
    provides a one-to-one mapping in the underlying fields of all crc
    rings of size 2^(2^n).

    We know 3 and 2^(2^n-1)-1 are coprime because 2^(2^n-1)-1 =
    2^(2^n)-1 (a Fermat number) - 2^(2^n-1) (a power-of-2), and 3
    divides Fermat numbers >=3 (A023394) and is not 2.

  - Our delta, when viewed as a polynomial in g: d(g) = gc^2 + g^2c +
    c^3, has degree 2, which implies there are at most 2 solutions or
    1-bit of information loss in the underlying field.

    This is optimal since the original definition already had 2
    solutions before we even chose a function:

      d(g) = t(g + c) - t(g)
      d(g) = t(g + c) - t((g + c) - c)
      d(g) = t((g + c) + c) - t(g + c)
      d(g) = d(g + c)

  Though note the mapping of our crc-ring to the underlying field
  already represents 1-bit of information loss.

- If you're using a cryptographic hash or other non-crc, you should
  probably just use an equal sized finite-field.

  Though note changing from a 2^n-1-bit field to a 2^n-bit field does
  change the math a bit, with t(g) = g^7 being a better non-linear
  function:

  - 7 is the smallest odd-number coprime with 2^n-1, a Fermat number,
    which makes t(g) = g^7 a one-to-one mapping.

    3 humorously divides all 2^n-1 Fermat numbers.

  - Expanding delta with t(g) = g^7 gives us a 6 degree polynomial,
    which implies at most 6 solutions or ~3-bits of information loss.

    This isn't actually the best you can do, some exhaustive searching
    over small fields (<=2^16) suggests t(g) = g^(2^(n-1)-1) _might_ be
    optimal, but that's a heck of a lot more multiplications.

- Because our crc32cs preserve parity/are epimorphic to parity bits,
  addition (xor) and multiplication (crc32cmul) also preserve parity,
  which can be used to show our entire gcksum system preserves parity.

  This is quite neat, and means we are guaranteed to detect any odd
  number of bit-errors across the entire filesystem.

- Another idea was to use two different addition operations: xor and
  overflowing addition (or mod a prime).

  This probably would have worked, but lacks the rigor of the above
  solution.

- You might think an RS-like construction would help here, where g =
  sum(c_ia^i), but this suffers from the same problem:

    d_i = g' - g
    d_i = g + c_ia^i - g
    d_i = c_ia^i

  Nothing here depends on anything outside of the current mdir.

- Another question is should we be using an RS-like construction anyways
  to include location information in our gcksum?

  Maybe in another system, but I don't think it's necessary in littlefs.

  While our mdir are independently updateable, they aren't _entirely_
  independent. The location of each mdir is stored in either the mtree
  or a parent mdir, so it always gets mixed into the gcksum somewhere.

  The only exception being the mrootanchor which is always at the fixed
  blocks 0x{0,1}.

- This does _not_ catch "global-rollback" issues, where the most recent
  commit in the entire filesystem is corrupted, revealing an older, but
  still valid, filesystem state.

  But as far as I am aware this is just a fundamental limitation of
  powerloss-resilient filesystems, short of doing destructive
  operations.

  At the very least, exposing the gcksum would allow the user to store
  it externally and prevent this issue.

---

Implementation details:

- Our gcksumdelta depends on the rbyd's cksum, so there's a catch-22 if
  we include it in the rbyd itself.

  We can avoid this by including it in the commit tags (actually the
  separate canonical cksum makes this easier than it would have been
  earlier), but this does mean LFSR_TAG_GCKSUMDELTA is not an
  LFSR_TAG_GDELTA subtype. Unfortunate but not a dealbreaker.

- Reading/writing the gcksumdelta gets a bit annoying with it not being
  in the rbyd. For now I've extended the low-level lfsr_rbyd_fetch_/
  lfsr_rbyd_appendcksum_ to accept an optional gcksumdelta pointer,
  which is a bit awkward, but I don't know of a better solution.

- Unlike the grm, _every_ mdir commit involves the gcksum, which means
  we either need to propagate the gcksumdelta up the mroot chain
  correctly, or somehow keep track of partially flushed gcksumdeltas.

  To make this work I modified the low-level lfsr_mdir_commit__
  functions to accept start_rid=-2 to indicate when gcksumdeltas should
  be flushed.

  It's a bit of a hack, but I think it might make sense to extend this
  to all gdeltas eventually.

The gcksum cost both code and RAM, but I think it's well worth it for
removing an entire category of filesystem corruption:

           code          stack          ctx
  before: 37796           2608          620
  after:  38428 (+1.7%)   2640 (+1.2%)  644 (+3.9%)
2025-02-08 14:53:30 -06:00
b6ab323eb1 Dropped the q-bit (previous-perturb) from cksum tags
Now that we perturb commit cksums with the odd-parity zero, the q-bit no
longer serves a purpose other than extra debug info. But this is a
double-edged sword, because redundant info just means another thing that
can go wrong.

For example, should we assert? If the q-bit doesn't reflect the
previous-perturb state it's a bug, but the only thing that would break
would be the q-bit itself. And if we don't assert what's the point of
keeping the q-bit around?

Dropping the q-bit avoids answering this question and saves a bit of
code:

           code          stack          ctx
  before: 37772           2608          620
  after:  37768 (-0.0%)   2608 (+0.0%)  620 (+0.0%)
2025-01-28 14:41:45 -06:00
d08d254cd2 Switched to writing compat flags as le32s
Most of littlefs's metadata is encoded in leb128s now, with the
exception of tags (be16, sort of), revision counts (le32), cksums
(le32), and flags.

It makes sense for tags to be a special case, these are written and
rewritten _everywhere_, but less so for flags, which are only written to
the mroot and updated infrequently.

We might as well save a bit of code by reusing our le32 machinery.

---

This changes lfsr_format to just write out compat flags as le32s, saving
a tiny bit of code at the cost of a tiny bit of disk usage (the real
benefit being a tiny bit of code simplification):

           code          stack          ctx
  before: 37792           2608          620
  after:  37772 (-0.1%)   2608 (+0.0%)  620 (+0.0%)

Compat already need to handle trailing zeros gracefully, so this doesn't
change anything at mount time.

Also had to switch from enums to #defines thanks to C's broken enums.
Wooh. We already use #defines for the other flags for this reason.
2025-01-28 14:41:45 -06:00
e5609c98ec Renamed bsprout -> bmoss, bleaf -> bsprout
I just really don't like saying bleaf. Also I think the term moss
describes inlined data a bit better.
2025-01-28 14:41:45 -06:00
66bf005bb8 Renamed LFSR_TAG_ORPHAN -> LFSR_TAG_STICKYNOTE
I've been unhappy with LFSR_TAG_ORPHAN for a while now. While it's true
these represent orphaned files, they also represent zombied files. And
as long as a reference to the file exists in-RAM, I find it hard to say
these files are truely "orphaned".

We're also just using the term "orphan" for too many things.

Really this tag just represents an mid reservation. The term stickynote
works well enough for this, and fits in with the other internal tag,
LFSR_TAG_BOOKMARK.
2025-01-28 14:41:45 -06:00
62cc4dbb14 scripts: Disabled local import hack on import
Moved local import hack behind if __name__ == "__main__"

These scripts aren't really intended to be used as python libraries.
Still, it's useful to import them for debugging and to get access to
their juicy internals.
2025-01-28 14:41:30 -06:00
7cfcc1af1d scripts: Renamed summary.py -> csv.py
This seems like a more fitting name now that this script has evolved
into more of a general purpose high-level CSV tool.

Unfortunately this does conflict with the standard csv module in Python,
breaking every script that imports csv (which is most of them).
Fortunately, Python is flexible enough to let us remove the current
directory before imports with a bit of an ugly hack:

  # prevent local imports
  __import__('sys').path.pop(0)

These scripts are intended to be standalone anyways, so this is probably
a good pattern to adopt.
2024-11-09 12:31:16 -06:00
a0ab7bda26 scripts: Avoid rereading shrub blocks
This extends Rbyd.fetch to accept another rbyd, in which case we inherit
the RAM-backed block without rereading it from disk. This avoids an
issue where shrubs can become corrupted if the disk is being
simultaneously written and debugged.

Normally we can detect the checksum mismatch and toss out the rbyd
during fetch, but shrub pointers don't include a checksum since they
assume the containing rbyd has already been checksummed.

It's interesting to note this even avoids the memory copy thanks to
Python's reference counting.
2024-11-08 02:24:56 -06:00
0260f0bcee scripts: Added better branch cksum checks
If we're fetching branches anyways, we might as well check that the
checksums match. This helps protect against infinite loops in B-tree
branches.

Also fixed an issue where we weren't xoring perturb state on finding an
explicit trunk.

Note this is equivalent to LFS_M_CKFETCHES in lfs.c.

---

This doesn't mean we always need LFS_M_CKFETCHES. Our dbg scripts just
need to be a little bit tougher because 1. running tests with -j creates
wildly corrupted and entangled littlefs images, and 2. Rbyd.fetch is
almost too forgiving in choosing the nearest trunk.
2024-11-08 02:20:19 -06:00
e3fdc3dbd7 scripts: Added simple mroot cycle detectors to dbg scripts
These work by keeping a set of all seen mroots as we descend down the
mroot chain. Simple, but it works.

The downside of this approach is that the mroot set grows unbounded, but
it's unlikely we'll ever have enough mroots in a system for this to
really matter.

This fixes scripts like dbgbmap.py getting stuck on intentional mroot
cycles created for testing. It's not a problem for a foreground script
to get stuck in an infinite loop, since you can just kill it, but a
background script getting stuck at 100% CPU is a bit more annoying.
2024-11-07 11:46:39 -06:00
007ac97bec scripts: Adopted double-indent on multiline expressions
This matches the style used in C, which is good for consistency:

  a_really_long_function_name(
          double_indent_after_first_newline(
              single_indent_nested_newlines))

We were already doing this for multiline control-flow statements, simply
because I'm not sure how else you could indent this without making
things really confusing:

  if a_really_long_function_name(
          double_indent_after_first_newline(
              single_indent_nested_newlines)):
      do_the_thing()

This was the only real difference style-wise between the Python code and
C code, so now both should be following roughly the same style (80 cols,
double-indent multiline exprs, prefix multiline binary ops, etc).
2024-11-06 15:31:17 -06:00
48c2e7784b scripts: Renamed import math alias m -> mt
Mainly to avoid conflicts with match results m, this frees up the single
letter variables m for other purposes.

Choosing a two letter alias was surprisingly difficult, but mt is nice
in that it somewhat matches it (for itertools) and ft (for functools).
2024-11-05 01:58:40 -06:00