littlefs

mirror of https://github.com/littlefs-project/littlefs.git synced 2025-12-01 12:20:02 +00:00

Author	SHA1	Message	Date
Christopher Haster	7da44f12ae	Added redund hints to more tags Well, kinda. At the moment we don't have any reund support (it's a TODO), so arguably redund=0 and this is just a comment tweak. Though our mdirs _are_ already redund=1... so maybe these should actually set redund=1? It's unclear, so for now I've just tweaked the comment, and we should probably revisit when _actually_ implementing meta/data redundancy. --- Note this only really affects struct tags: LFS3_TAG_STRUCT 0x04tt v--- -1-- +ttt tttt LFS3_TAG_BRANCH 0x040r v--- -1-- +--- --rr LFS3_TAG_DATA 0x0404 v--- -1-- +--- -1rr LFS3_TAG_BLOCK 0x0408 v--- -1-- +--- 1err LFS3_TAG_DDKEY* 0x0410 v--- -1-- +--1 --rr LFS3_TAG_DID 0x0420 v--- -1-- +-1- ---- LFS3_TAG_BSHRUB 0x0428 v--- -1-- +-1- 1-rr LFS3_TAG_BTREE 0x042c v--- -1-- +-1- 11rr LFS3_TAG_MROOT 0x0431 v--- -1-- +-11 --rr LFS3_TAG_MDIR 0x0435 v--- -1-- +-11 -1rr LFS3_TAG_MSHRUB+ 0x0438 v--- -1-- +-11 1-rr LFS3_TAG_MTREE 0x043c v--- -1-- +-11 11rr LFS3_TAG_BMRANGE 0x044u v--- -1-- +1-- ++uu LFS3_TAG_BMFREE 0x0440 v--- -1-- +1-- ---- LFS3_TAG_BMINUSE 0x0441 v--- -1-- +1-- ---1 LFS3_TAG_BMERASED 0x0442 v--- -1-- +1-- --1- LFS3_TAG_BMBAD 0x0443 v--- -1-- +1-- --11 LFS3_TAG_DDRC* 0x0450 v--- -1-- +1-1 ---- LFS3_TAG_DDPCOEFF* 0x0451 v--- -1-- +1-1 ---1 LFs3_TAG_PCOEFFMAP* 0x0460 v--- -1-- +11- ---- This redund hint may be useful for debugging and the theoretical CKMETAREDUND feature.	2025-11-18 00:58:18 -06:00
Christopher Haster	cf34ba9aca	Rearranged tag encodings, reserved suptype=0 for internal tags This was motivated by a discussion with a gh user, in which it was noted that not having a reserved suptype for internal tags risks potential issues with long-term future tag compatibility. I think the risk is low, but, without a reserved suptype, it _is_ possible for a future tag to conflict with an internal tag in an older driver version, potentially and unintentionally breaking compatibility. Note this is especially concerning during mdir compactions, where we copy tags we may not understand otherwise. In littlefs2 we reserved suptype=0x100, though this was mostly an accident due to saturating the 3-bit suptype space. With the larger tag space in littlefs3, the reserved suptype=0x100 was dropped. --- Long story short, this reserves suptype=0 for internal flags (well, and null, which is _mostly_ internal only, but does get written to disk as unreachable tags). Unfortunately, adding a new suptype _did_ require moving a bunch of stuff around: LFS3_TAG_NULL 0x0000 v--- ---- +--- ---- LFS3_TAG_INTERNAL 0x00tt v--- ---- +ttt tttt LFS3_TAG_CONFIG 0x01tt v--- ---1 +ttt tttt LFS3_TAG_MAGIC 0x0131 v--- ---1 +-11 --rr LFS3_TAG_VERSION 0x0134 v--- ---1 +-11 -1-- LFS3_TAG_RCOMPAT 0x0135 v--- ---1 +-11 -1-1 LFS3_TAG_WCOMPAT 0x0136 v--- ---1 +-11 -11- LFS3_TAG_OCOMPAT 0x0137 v--- ---1 +-11 -111 LFS3_TAG_GEOMETRY 0x0138 v--- ---1 +-11 1--- LFS3_TAG_NAMELIMIT 0x0139 v--- ---1 +-11 1--1 LFS3_TAG_FILELIMIT 0x013a v--- ---1 +-11 1-1- LFS3_TAG_ATTRLIMIT? 0x013b v--- ---1 +-11 1-11 LFS3_TAG_GDELTA 0x02tt v--- --1- +ttt tttt LFS3_TAG_GRMDELTA 0x0230 v--- --1- +-11 ---- LFS3_TAG_GBMAPDELTA 0x0234 v--- --1- +-11 -1rr LFS3_TAG_GDDTREEDELTA* 0x0238 v--- --1- +-11 1-rr LFS3_TAG_GPTREEDELTA* 0x023c v--- --1- +-11 11rr LFS3_TAG_NAME 0x03tt v--- --11 +ttt tttt LFS3_TAG_BNAME 0x0300 v--- --11 +--- ---- LFS3_TAG_REG 0x0301 v--- --11 +--- ---1 LFS3_TAG_DIR 0x0302 v--- --11 +--- --1- LFS3_TAG_STICKYNOTE 0x0303 v--- --11 +--- --11 LFS3_TAG_BOOKMARK 0x0304 v--- --11 +--- -1-- LFS3_TAG_SYMLINK? 0x0305 v--- --11 +--- -1-1 LFS3_TAG_SNAPSHOT? 0x0306 v--- --11 +--- -11- LFS3_TAG_MNAME 0x0330 v--- --11 +-11 ---- LFS3_TAG_DDNAME* 0x0350 v--- --11 +1-1 ---- LFS3_TAG_DDTOMB* 0x0351 v--- --11 +1-1 ---1 LFS3_TAG_STRUCT 0x04tt v--- -1-- +ttt tttt LFS3_TAG_BRANCH 0x040r v--- -1-- +--- --rr LFS3_TAG_DATA 0x0404 v--- -1-- +--- -1-- LFS3_TAG_BLOCK 0x0408 v--- -1-- +--- 1err LFS3_TAG_DDKEY* 0x0410 v--- -1-- +--1 ---- LFS3_TAG_DID 0x0420 v--- -1-- +-1- ---- LFS3_TAG_BSHRUB 0x0428 v--- -1-- +-1- 1--- LFS3_TAG_BTREE 0x042c v--- -1-- +-1- 11rr LFS3_TAG_MROOT 0x0431 v--- -1-- +-11 --rr LFS3_TAG_MDIR 0x0435 v--- -1-- +-11 -1rr LFS3_TAG_MSHRUB+ 0x0438 v--- -1-- +-11 1--- LFS3_TAG_MTREE 0x043c v--- -1-- +-11 11rr LFS3_TAG_BMRANGE 0x044u v--- -1-- +1-- ++uu LFS3_TAG_BMFREE 0x0440 v--- -1-- +1-- ---- LFS3_TAG_BMINUSE 0x0441 v--- -1-- +1-- ---1 LFS3_TAG_BMERASED 0x0442 v--- -1-- +1-- --1- LFS3_TAG_BMBAD 0x0443 v--- -1-- +1-- --11 LFS3_TAG_DDRC* 0x0450 v--- -1-- +1-1 ---- LFS3_TAG_DDPCOEFF* 0x0451 v--- -1-- +1-1 ---1 LFs3_TAG_PCOEFFMAP* 0x0460 v--- -1-- +11- ---- LFS3_TAG_ATTR 0x06aa v--- -11a +aaa aaaa LFS3_TAG_UATTR 0x06aa v--- -11- +aaa aaaa LFS3_TAG_SATTR 0x07aa v--- -111 +aaa aaaa LFS3_TAG_SHRUB 0x1kkk v--1 kkkk +kkk kkkk LFS3_TAG_ALT 0x4kkk v1cd kkkk +kkk kkkk LFS3_TAG_CKSUM 0x300p v-11 ---- ++++ +pqq LFS3_TAG_NOTE 0x3100 v-11 ---1 ++++ ++++ LFS3_TAG_ECKSUM 0x3200 v-11 --1- ++++ ++++ LFS3_TAG_GCKSUMDELTA 0x3300 v-11 --11 ++++ ++++ * Planned + Reserved ? Hypothetical Some additional notes: - I was on the fence on keeping the 0x30 prefix on config tags now that it is not longer needed to differentiate from null, but ultimately decided to keep it because: 1. it's fun, 2. it decreases the chance of false positives, 3. it keeps the redund bits readable in hexdumps, and 4. it reserves some tags < config, which is useful since order matters. Instead, I pushed the 0x30 prefix to _more_ tags, mainly gstate. As a coincidence, meta related tags (MNAME, MROOT, MRTREE) all shifted to also have the 0x30 prefix, which is a nice bit of unexpected consistency. - I also considered reserving the redund bits across the config tags similarly to what we've done in struct/gstate tags, but decided against it as 1. it significantly reduces the config tag space available, and 2. makes alignment with VERSION + R/W/OCOMPAT a bit awkward. Instead I think would should relax the redund bit alignment in other suptypes, though in practice the intermixing of non-redund and redund tags makes this a bit difficult. Maybe we should consider including redund bits as a hint for things like DATA? DDKEY? BSHRUB? etc? - I created a bit more space for file btree struct tags, allowing for both the future planned DDKEY, and BLOCK with optional erased-bit. We don't currently use this, but it may be useful for the future planned gddtree, which in-theory can track erased-state in partially written file blocks. Currently tracking erased-state in file blocks is difficult due to the potential of multiple references, and inability to prevent ecksum conflicts in raw data blocks. - UATTR/SATTR bumped up to 0x600/0x700 to keep the 1-bit alignment, leaving the suptype 0x500 unused. Though this may be useful if we ever run out of struct tags (suptype=0x400), which is likely where most new tags will go. --- Code changes were minimal, but with a bunch of noise: code stack ctx before: 35912 2280 660 after: 35920 (+0.0%) 2280 (+0.0%) 660 (+0.0%) code stack ctx gbmap before: 38800 2296 772 gbmap after: 38812 (+0.0%) 2296 (+0.0%) 772 (+0.0%)	2025-11-18 00:56:48 -06:00
Christopher Haster	ee519f43b5	scripts: Renamed lookupleaf -> lookupnext_ to match lfs3.c - lookupleaf -> lookupnext_ - namelookupleaf -> namelookup_ I want to move away from lookupleaf usage in general in the dbg scripts, like we have in lfs3.c, but I also just really don't want to touch these scripts again unless I need to. They've been useful, but also a big time sink. Maybe I should actually learn Python's new type system. That would probably help here...	2025-10-26 15:34:45 -05:00
Christopher Haster	ffc40da878	scripts: Reworked tagrepr -> Tag.repr to rely more on self-parsing This should make tag editing less tedious/error-prone. We already used self-parsing to generate -l/--list in dbgtag.py, but this extends the idea to tagrepr (now Tag.repr), which is used in quite a few more scripts. To make this work the little tag encoding spec had to become a bit more rigorous, fortunately the only real change was the addition of '+' characters to mark reserved-but-expected-zero bits. Example: TAG_CKSUM = 0x3000 ## v-11 ---- ++++ +pqq ^--^----^----^--^-^-- valid bit, unmatched '----\|----\|--\|-\|-- matches 1 '----\|--\|-\|-- matches 0 '--\|-\|-- reserved 0, unmatched '-\|-- perturb bit, unmatched '-- phase bits, unmatched dbgtag.py 0x3000 => cksumq0 dbgtag.py 0x3007 => cksumq3p dbgtag.py 0x3017 => cksumq3p 0x10 dbgtag.py 0x3417 => 0x3417 Though Tag.repr still does a bit of manual formatting for the differences between shrub/normal/null/alt tags. Still, this should reduce the number of things that need to be changed from 2 -> 1 when adding/editing most new tags.	2025-10-24 00:15:21 -05:00
Christopher Haster	e622656538	bmap: Tweaked bmap ranges, dropped in-flight tag for now New bmap range tags: LFS3_TAG_BMRANGE 0x033u v--- --11 --11 uuuu LFS3_TAG_BMFREE 0x0330 v--- --11 --11 ---- LFS3_TAG_BMINUSE 0x0331 v--- --11 --11 ---1 LFS3_TAG_BMERASED 0x0332 v--- --11 --11 --1- LFS3_TAG_BMBAD 0x0333 v--- --11 --11 --11 Note 0x334-0x33f are still reserved for future bmap tags, but the new encoding fits in the surprisingly common 2-bit subfield that may deduplicate some decoding code. Fitting in 2-bits is the main reason for this, now that in-flight ranges look like they won't be worth exploring further. Worst case we can always add more bm tags in the future. And it may even make sense to use an entire bit for in-flight tags, since in theory the concept can apply to more than just in-use blocks. --- Another benefit of this encoding: In-use vs free is a bit check, and I like the implication that an in-use + erased block can only be a bad block. No code changes: code stack ctx before: 37172 2352 684 after: 37172 (+0.0%) 2352 (+0.0%) 684 (+0.0%) code stack ctx bmap before: 38844 2456 800 bmap after: 38844 (+0.0%) 2456 (+0.0%) 800 (+0.0%)	2025-10-09 14:33:24 -05:00
Christopher Haster	88180b6081	bmap: Initial scaffolding for on-disk block map This is pretty exploratory work, so I'm going to try to be less thorough in commit messages until the dust settles. --- New tag for gbmapdelta: LFS3_TAG_GBMAPDELTA 0x0104 v--- ---1 ---- -1rr New tags for in-bmap block types: LFS3_TAG_BMRANGE 0x033u v--- --11 --11 uuuu LFS3_TAG_BMFREE 0x0330 v--- --11 --11 ---- LFS3_TAG_BMINFLIGHT 0x0331 v--- --11 --11 ---1 LFS3_TAG_BMINUSE 0x0332 v--- --11 --11 --1- LFS3_TAG_BMBAD 0x0333 v--- --11 --11 --11 LFS3_TAG_BMERASED 0x0334 v--- --11 --11 -1-- New gstate decoding for gbmap: .---+- -+- -+- -+- -. cursor: 1 leb128 <=5 bytes \| cursor \| known: 1 leb128 <=5 bytes +---+- -+- -+- -+- -+ block: 1 leb128 <=5 bytes \| known \| trunk: 1 leb128 <=4 bytes +---+- -+- -+- -+- -+ cksum: 1 le32 4 bytes \| block \| total: 23 bytes +---+- -+- -+- -+- -' \| trunk \| +---+- -+- -+- -+ \| cksum \| '---+---+---+---' New bmap node revdbg string: vvv---- -111111- -11---1- -11---1- (62 62 7e v0 bb~r) bmap node New mount/format/info flags (still unsure about these): LFS3_M_BMAPMODE 0x03000000 On-disk block map mode LFS3_M_BMAPNONE 0x00000000 Don't use the bmap LFS3_M_BMAPCACHE 0x01000000 Use the bmap to cache lookahead scans LFS3_M_BMAPSLOW 0x02000000 Use the slow bmap algorithm LFS3_M_BMAPFAST 0x03000000 Use the fast bmap algorithm New gbmap wcompat flag: LFS3_WCOMPAT_GBMAP 0x00002000 Global block-map in use	2025-10-01 17:55:13 -05:00
Christopher Haster	6d9c077261	Reordered LFSR_TAG_NAMELIMIT/FILELIMIT Not sure why, but this just seems more intuitive/correct. Maybe because LFSR_TAG_NAME is always the first tag in a file's attr set: LFSR_TAG_NAMELIMIT 0x0039 v--- ---- --11 1--1 LFSR_TAG_FILELIMIT 0x003a v--- ---- --11 1-1- Seeing as several parts of the codebase still use the previous order, it seems reasonable to switch back to that. No code changes.	2025-05-24 21:51:06 -05:00
Christopher Haster	de7564e448	Added phase bits to cksum tags This carves out two more bits in cksum tags to store the "phase" of the rbyd block (maybe the name is too fancy, this is just the lowest 2 bits of the block address): LFSR_TAG_CKSUM 0x300p v-11 ---- ---- -pqq ^ ^ \| '-- phase bits '---- perturb bit The intention here is to catch mrootanchors that are "out-of-phase", i.e. they've been shifted by a small number of blocks. This can happen if we find the wrong mrootanchor (after, say, a magic scan), and risks filesystem corruption: formatted .-----------------'-----------------. mounted .-----------------'-----------------. .--------+--------+--------+--------+ ... \|(erased)\| mroot \| \| \| anchor \| ... \| \| \| '--------+--------+--------+--------+ ... Including the lower 2 bits of the block address in cksum tags avoids this, for up to a 3 block shift (the maximum number of redund mrootanchors). --- Note that cksum tags really are the only place we could put these bits. Anywhere else and they would interfere with the canonical cksum, which would break error correction. By definition these need to be different per block. We include these phase bits in every cksum tag (because it's easier), but these don't really say much about mdirs that are not the mrootanchor. Non-anchor mdirs can have arbitrary block addresses, therefore arbitrary phase bits. You _might_ be able to do something interesting if you sort the rbyd addresses and use the index as the phase bits, but that would add quite a bit of code for questionable benefit... You could argue this adds noise to our cksums, but: 1. 2 bits seems like a really small amount of noise 2. our cksums are just crc32cs 3. the phase bits humorously never change when you rewrite a block --- As with any feature this adds code, but only a small amount. I think it's worth the extra protection: code stack ctx before: 35792 2368 636 after: 35824 (+0.1%) 2368 (+0.0%) 636 (+0.0%) Also added test_mount_incompat_out_of_phase to test this. The dbg scripts _don't_ error (block mismatch seems likely when debugging), but dbgrbyd.py at least adds phase mismatch notes in -l/--log mode.	2025-04-30 00:57:17 -05:00
Christopher Haster	677c078b50	Added LFSR_TAG_BNAME/MNAME, stop btree lookups at first tag Now that we don't have to worry about name tag conflicts as much, we can add name tags for things that aren't files. This adds LFSR_TAG_BNAME for branch names, and LFSR_TAG_MNAME for mtree names. Note that the upper 4 bits of the subtype match LFSR_TAG_BRANCH and LFSR_TAG_MDIR respectively: LFSR_TAG_BNAME 0x0200 v--- --1- ---- ---- LFSR_TAG_MNAME 0x0220 v--- --1- --1- ---- LFSR_TAG_BRANCH 0x030r v--- --11 ---- --rr LFSR_TAG_MDIR 0x0324 v--- --11 --1- -1rr The encoding is somewhat arbitrary, but I figured reserving ~31 types for files is probably going to be plenty for littlefs. POSIX seems to do just fine with only ~7 all these years, and I think custom attributes will be more enticing for "niche" file types (symlinks, compressed files, etc), given the easy backwards compatibility. --- In addition to the debugging benefits, the new name tags let us stop btree lookups on the first non-bname/branch tag. Previously we always had to fetch the first struct tag as well to check if it was a branch. In theory this saves one rbyd lookup, but in practice it's a bit muddy. The problem is that there's two ways to use named btrees: 1. As buckets: mtree -> mdir -> mid 2. As a table: ddtree -> ddid The only named btree we _currently_ have is the mtree. And the mtree operates in bucket mode, with each mdir acting more-or-less as an extension to the btree. So we end up needing to do the second tag lookup anyways, and all we've done is complicated up the code. But we will _eventually_ need the table mode for the ddtree, where we care if the ddname is an exact match. And returning the first tag is arguably the more "correct" internal API, vs arbitrarily the first struct tag. But then again this change is pretty pricey... code stack ctx before: 35732 2440 640 after: 35888 (+0.4%) 2480 (+1.6%) 640 (+0.0%) --- It's worth noting the new BNAME/MNAME tags don't _require_ the btree lookup changes (which is why we can get away with not touching the dbg scripts). The previous algorithm of always checking for branch tags still works. Maybe there's an argument for conditionally using the previous API when compiling without the ddtree, but that sounds horrendously messy...	2025-04-30 00:25:30 -05:00
Christopher Haster	d308ec8322	Reworked tag encoding a little bit Mainly to make room for some future planned stuff: - Moved the mroot's redund bits from LFSR_TAG_GEOMETRY to LFSR_TAG_MAGIC: LFSR_TAG_MAGIC 0x003r v--- ---- --11 --rr This has the benefit of living in a fixed location (off=0x5), which may make mounting/debugging easier. It also makes LFSR_TAG_GEOMETRY less of a special case (LFSR_TAG_MAGIC is already a _very_ special case). Unfortunately, this does get in the way of our previous magic=0x3 encoding. To compensate (and to avoid conflicts with LFSR_TAG_NULL), I've added the 0x3_ prefix. This has the funny side-effect of rendering redunds 0-3 as ascii 0-3 (0x30-0x33), which is a complete accident but may actually be useful when debugging. Currently all config tags fit in the 0x3_ prefix, which is nice for debugging but not a hard requirement. - Flipped LFSR_TAG_FILELIMIT/NAMELIMIT: LFSR_TAG_FILELIMIT 0x0039 v--- ---- --11 1--1 LFSR_TAG_NAMELIMIT 0x003a v--- ---- --11 1-1- The file limit is a _bit_ more fundamental. It's effectively the required integer size for the filesystem. These may also be followed by LFSR_TAG_ATTRLIMIT based on how future attr revisits go. - Rearranged struct tags so that LFSR_TAG_BRANCH = 0x300: LFSR_TAG_BRANCH 0x030r v--- --11 ---- --rr LFSR_TAG_DATA 0x0304 v--- --11 ---- -1-- LFSR_TAG_BLOCK 0x0308 v--- --11 ---- 1err LFSR_TAG_DDKEY* 0x0310 v--- --11 ---1 ---- LFSR_TAG_DID 0x0314 v--- --11 ---1 -1-- LFSR_TAG_BSHRUB 0x0318 v--- --11 ---1 1--- LFSR_TAG_BTREE 0x031c v--- --11 ---1 11rr LFSR_TAG_MROOT 0x032r v--- --11 --1- --rr LFSR_TAG_MDIR 0x0324 v--- --11 --1- -1rr LFSR_TAG_MTREE 0x032c v--- --11 --1- 11rr *Planned LFSR_TAG_BRANCH is a very special tag when it comes to bshrub/btree traversal, so I think it deserves the subtype=0 slot. This also just makes everything fit together better, and makes room for the future planned ddkey tag. Code changes minimal: code stack ctx before: 35728 2440 640 after: 35732 (+0.0%) 2440 (+0.0%) 640 (+0.0%)	2025-04-29 16:25:00 -05:00
Christopher Haster	7dd473df82	Tweaked LFSR_TAG_STICKYNOTE encoding 0x205 -> 0x203 Now that LFS_TYPE_STICKYNOTE is a real type users can interact with, it makes sense to group it with REG/DIR. This also has the side-effect of making these contiguous. --- LFSR_TAG_BOOKMARKs, however, are still hidden from the user. This unfortunately means there will be a bit of a jump if we ever add LFS_TYPE_SYMLINK in the future, but I'm starting to wonder if that's the best way to approach symlinks in littlefs... If instead LFS_TYPE_SYMLINKS were implied via custom attribute, you could avoid the headache that comes with adding a new tag encoding, and allow perfect compatibility with non-symlink drivers. Win win. This seems like a better approach for _all_ of the theoretical future types (compressed files, device files, etc), and avoids the risk of oversaturating the type space. --- This had a surprising impact on code for just a minor encoding tweak. I guess the contiguousness pushed the compiler to use tables/ranges for more things? Or maybe 3 vs 5 is just an easier constant to encode? code stack ctx before: 35952 2440 640 after: 35928 (-0.1%) 2440 (+0.0%) 640 (+0.0%)	2025-04-24 14:35:52 -05:00
Christopher Haster	a73f221317	scripts: Fixed issue where rbyd lookups rejected shrub tags This was caused by including the shrub bit in the tag comparison in Rbyd.lookup. Fixed by adding an extra key mask (0xfff). Note this is already how lfsr_rbyd_lookup works in lfs.c.	2025-04-23 23:19:37 -05:00
Christopher Haster	bd70270e11	scripts: Added -w/--word-bits to bound dbgleb128/dbgle32 parsing This is limited to dbgle32.py, dbgleb128.py, and dbgtag.py for now. This more closely matches how littlefs behaves, in that we read a bounded number of bytes before leb128 decoding. This minimizes bugs related to leb128 overflow and avoids reading inherently undecodable data. The previous unbounded behavior is still available with -w0. Note this gives dbgle32.py much more flexibility in that it can now decode other integer widths. Uh, ignore the name for now. At least it's self documenting that the default is 32-bits... --- Also fixed a bug in fromleb128 where size was reported incorrectly on offset + truncated leb128.	2025-04-16 15:23:12 -05:00
Christopher Haster	0cea8b96fb	scripts: Fixed O(n^2) slicing in Rbyd.fetch Do you see the O(n^2) behavior in this loop? j = 0 while j < len(data): word, d = fromleb(data[j:]) j += d The slice, data[j:], creates a O(n) copy every iteration of the loop. A bit tricky. Or at least I found it tricky to notice. Maybe because array indexing being cheap is baked into my brain... Long story short, this repeated slicing resulted in O(n^2) behavior in Rbyd.fetch and probably some other functions. Even though we don't care _too_ much about performance in these scripts, having Rbyd.fetch run in O(n^2) isn't great. Tweaking all from* functions to take an optional index solves this, at least on paper. --- In practice I didn't actually find any measurable performance gain. I guess array slicing in Python is optimized enough that the constant factor takes over? (Maybe it's being helped by us limiting Rbyd.fetch to block_size in most scripts? I haven't tested NAND block sizes yet...) Still, it's good to at least know this isn't a bottleneck.	2025-04-16 15:23:11 -05:00
Christopher Haster	b5c3b97ae1	scripts: Reworked dbgtag.py, added -i/--input, included hex in output This just gives dbgtag.py a few more bells and whistles that may be useful: - Can now parse multiple tags from hex: $ ./scripts/dbgtag.py -x 71 01 01 01 12 02 02 02 71 01 01 01 altrgt 0x101 w1 -1 12 02 02 02 shrubdir w2 2 Note this _does_ skip attached data, which risks some confusion but not skipping attached data will probably end up printing a bunch of garbage for most use cases: $ ./scripts/dbgtag.py -x 01 01 01 04 02 02 02 02 03 03 03 03 01 01 01 04 gdelta 0x01 w1 4 03 03 03 03 struct 0x03 w3 3 - Included hex in output. This is helpful for learning about the tag encoding and also helps identify tags when parsing multiple tags. I considered also included offsets, which might help with understanding attached data, but decided it would be too noisy. At some point you should probably jump to dbgrbyd.py anyways... - Added -i/--input to read tags from a file. This is roughly the same as -x/--hex, but allows piping from other scripts: $ ./scripts/dbgcat.py disk -b4096 0 -n4,8 \| ./scripts/dbgtag.py -i- 80 03 00 08 magic 8 Note this reads the entire file in before processing. We'd need to fit everything into RAM anyways to figure out padding.	2025-04-16 15:23:10 -05:00
Christopher Haster	a5747bb2b2	scripts: dbgmtree.py: Fixed minor mtree rendering/traversal issues - Added TreeArt __bool__ and __len__. This was causing a crash in _treeartfrommtreertree when rtree was empty. The code was not updated in the set -> TreeArt class transition, and went unnoticed because it's unlikely to be hit unless the filesystem is corrupt. Fortunately(?) realtime rendering creates a bunch of transiently corrupt filesystem images. - Tweaked lookupleaf to not include mroots in their own paths. This matches the behavior of leaf mdirs, and is intentionally different from btree's lookupleaf which needs to lookup the leaf rattr to terminate. - Tweaked leaves to not remove the last path entry if it is an mdir. This hid the previous lookupleaf inconsistency. We only remove the last rbyd from the path because it is redundant, and for mdirs/mroots it should never be redundant. I ended up just replacing the corrupt check with an explicit check that the rbyd is redundant. This should be more precise and avoid issues like this in the future. Also adopted explicit redundant checks in Btree.leaves and Lfs.File.leaves.	2025-04-16 15:23:08 -05:00
Christopher Haster	b715e9a749	scripts: Prefer 1;30-37m ansi codes over 90-97m Reading Wikipedia: > Later terminals added the ability to directly specify the "bright" > colors with 90–97 and 100–107. So if we want to stick to one pattern, we should probably go with brightness as a separate modifier. This shouldn't noticeably change any script, unless your terminal interprets 90-97m colors differently from 1;30-37m, in which case things should be more consistent now.	2025-04-16 15:22:43 -05:00
Christopher Haster	3820be180d	scripts: Adopted crc32c lib when available Jumping from a simple Python implementation to the fully hardware accelerated crc32c library basically deletes any crc32c related bottlenecks: crc32c.py disk (1MiB) w/ crc32c lib: 0m0.027s crc32c.py disk (1MiB) w/o crc32c lib: 0m0.844s This uses the same try-import trick we use for inotify_simple, so we get the speed improvement without losing portability. --- In dbgbmap.py: dbgbmap.py w/ crc32c lib: 0m0.273s dbgbmap.py w/o crc32c lib: 0m0.697s dbgbmap.py w/ crc32c lib --no-ckdata: 0m0.269s dbgbmap.py w/o crc32c lib --no-ckdata: 0m0.490s dbgbmap.old.py: 0m0.231s The bulk of the runtime is still in Rbyd.fetch, but this is now dominated by leb128 decoding, which makes sense. We do ~twice as many fetches in the new dbgbmap.py in order to calculate the gcksum (which we then ignore...).	2025-04-16 15:22:34 -05:00
Christopher Haster	6ea18e6579	scripts: Tweaked bd.read to behave like an actual bd_read callback This better matches what you would expect from a function called bd.read, at least in the context of littlefs, while also decreasing the state (seek) we have to worry about. Note that bd.readblock already behaved mostly like this, and is preferred by every class except for Bptr.	2025-04-16 15:22:32 -05:00
Christopher Haster	b2911fbbe7	scripts: Removed item/iter magic methods from fs object classes So no more __getitem__, __contains__, or __iter__ for Rbyd, Btree, Mdir, Mtree, Lfs.File, etc. These were way too error-prone, especially when accidental unpacking triggered unintended disk traversal and weird error states. We didn't even use the implicit behavior because we preferred the full name for heavy disk operations. The motivation for this was Python not catching this bug, which is a bit silly: rid, rattr, *path_ = rbyd	2025-04-16 15:22:28 -05:00
Christopher Haster	33120bf930	scripts: Reworked dbgbmap.py This is a rework of dbgbmap.py to match dbgbmapd3.py, adopt the new Rbyd/Lfs class abstractions, as well as Canvas, -k/--keep-open, etc. Some of the main changes: - dbgbmap.py now reports corrupt/conflict blocks, which can be useful for debugging. Note though that you will probably get false positives if running with -k/--keep-open while something is writing to the disk. littlefs is powerloss safe, not multi-write safe! Very different problem! - dbgbmap.py now groups by blocks before mapping to the space filling curve. This matches dbgbmapd3.py and I think is more intuitive now that we have a bmap tiling algorithm. -%/--usage still works, but is rendered as a second space filling curve _inside_ the block tile. Different blocks can end up with slightly different sizes due to rounding, but it's not the end of the world. I wasn't originally going to keep it around, but ended up caving, so you can still get the original byte-level curve via -u/--contiguous. - Like the other ascii rendering script, dbgbmap.py now supports -k/--keep-open and friends as a thin main wrapper. This just makes it a bit easier to watch a realtime bmap without needing to use watch.py. - --mtree-only is supported, but filtering via --mdirs/--btrees/--data is _not_ supported. This was too much complexity for a minor feature, and doesn't cover other niche blocks like corrupted/conflict or parity in the future. - Things are more customizable thanks to the Attr class. For an example you can now use the littlefs mount string as the title via --title-littlefs. - Support for --to-scale and -t/--tiny mode, if you want to scale based on block_size. One of the bigger differences dbgbmapd3.py -> dbgbmap.py is that dbgbmap.py still supports -%/--usage. Should we backport -%/--usage to dbgbmapd3.py? Uhhhh... This ends up a funny example of raster graphics vs vector graphics. A pixel-level space filling curve is easy with raster graphics, but with an svg you'd need some sort of pixel -> path wrapping algorithm... So no -%/--usage in dbgbmapd3.py for now. Also just ripped out all of the -@/--blocks byte-level range stuff. Way too complicated for what it was worth. -@/--blocks is limited to simple block ranges now. High-level scripts should stick to high-level options. One last thing to note is the adoption of "if '%' in label__" checks before applying punescape. I wasn't sure if we should support punescape in dbgbmap.py, since it's quite a bit less useful here, and may be costly due to the lazy attr generation. Adding this simple check avoids the cost and consistency question, so I adopted it in all scripts.	2025-04-16 15:22:24 -05:00
Christopher Haster	202636cccd	scripts: Tweaked corrupt rbyd coloring to include addresses This matches the coloring in dbglfs.py for other erroneous conditions, and also matches how we color hidden items when shown. Also fixed some minor bugs in grm printing.	2025-04-16 15:22:23 -05:00
Christopher Haster	e5b430cb8c	scripts: Adopted -q/--quiet in most debug scripts This can be useful when you just want to check for errors. The only exception being dbgblock.py/dbgcat.py, since these don't really have a concept of an error.	2025-04-16 15:22:22 -05:00
Christopher Haster	97e2786545	scripts: Synced dbgbmapd3.py Lfs class changes - Added Lfs.traverse for full filesystem traversal - Added Rbyd.shrub flag so we can tell if an Rbyd is a shrub - Removed redundant leaves from paths in leaf iters	2025-04-16 15:22:19 -05:00
Christopher Haster	5f06558cbe	scripts: Added dbgbmapd3.py for bmap -> svg rendering Like codemapd3.py this include an interactive UI for viewing the underlying filesystem graph, including: - mode-tree - Shows all reachable blocks from a given block - mode-branches - Shows immediate children of a given block - mode-references - Shows parents of a given block - mode-redund - Shows sibling blocks in redund groups (This is currently just mdir pairs, but the plan is to add more) This is _not_ a full filesystem explorer, so we don't embed all block data/metadata in the svg. That's probably a project for another time. However we do include interesting bits such as trunk addresses, checksums, etc. An example: # create an filesystem image $ make test-runner -j $ ./scripts/test.py -B test_files_many -a -ddisk -O- \ -DBLOCK_SIZE=1024 \ -DCHUNK=10 \ -DSIZE=2050 \ -DN=128 \ -DBLOCK_RECYCLES=1 ... snip ... done: 2/2 passed, 0/2 failed, 164pls!, in 0.16s # generate bmap svg $ ./scripts/dbgbmapd3.py disk -b1024 -otest.svg \ -W1400 -H750 -Z --dark updated test.svg, littlefs v0.0 1024x1024 0x{26e,26f}.d8 w64.128, cksu m 41ea791e And open test.svg in a browser of your choice. Here's what the current colors mean: - yellow => mdirs - blue => btree nodes - green => data blocks - red => corrupt/conflict issue - gray => unused blocks But like codemapd3.py the output is decently customizable. See -h/--help for more info. And, just like codemapd3.py, this is based on ideas from d3 and brendangregg's flamegraphs: - d3 - https://d3js.org - brendangregg's flamegraphs - https://github.com/brendangregg/FlameGraph Note we don't actually use d3... the name might be a bit confusing... --- One interesting change from the previous dbgbmap.py is the addition of "corrupt" (bad checksum) and "conflict" (multiple parents) blocks, which can help find bugs. You may find the "conflict" block reporting a bit strange. Yes it's useful for finding block allocation failures, but won't naturally formed dags in file btrees also be reported as "conflicts"? Yes, but the long-term plan is to move away from dags and make littlefs a pure tree (for block allocator and error correction reasons). This hasn't been implemented yet, so for now dags will result in false positives. --- Implementation wise, this script was pretty straightforward given prior dbglfs.py and codemapd3.py work. However there was an interesting case of https://xkcd.com/1425: - Traverse the filesystem and build a graph - easy - Tile a rectangle with n nice looking rectangles - uhhh I toyed around with an analytical approach (something like block width = sqrt(canvas_widthcanvas_height/n) block_aspect_ratio), but ended up settling on an algorithm that divides the number of columns by 2 until we hit our target aspect ratio. This algorithm seems to work quite well, runs in only O(log n), and perfectly tiles the grid for powers-of-two. Honestly the result is better than I was expecting.	2025-04-16 15:22:17 -05:00
Christopher Haster	27370dec66	scripts: Tweaked mdir/shrub address printing This fixes an issue where shrub trunks were never printed even with -i/--internal. While only showing mdir/shrub/btree/bptr addresses on block changes is nice in theory, it results in shrub trunks never being printed because the mdir -> shrub block doesn't change. Also checking for changes in block type avoids this.	2025-04-16 15:22:16 -05:00
Christopher Haster	f550fa9a80	scripts: Changed most tree renderers to be pseudo-standalone I'm trying to avoid having classes with different implementations across scripts, as it makes updating things error-prone, but at same time copying all the tree renderers to all dbg scripts would be a bit much. Monkey-patching the TreeArt class in relevant scripts seems like a reasonable compromise.	2025-04-16 15:22:15 -05:00
Christopher Haster	682f12a953	scripts: Moved tree renderers out into their own class These are pretty script specific, so probably shouldn't be in the abstract littlefs classes. This also avoids the tree renderers getting copied into scripts that don't need them (mtree -> dbglfs.py, dbgbmap.py in the future, etc). This also makes TreeArt consistent with JumpArt and LifetimeArt.	2025-04-16 15:22:14 -05:00
Christopher Haster	002c2ea1e6	scripts: Tried to simplify optional path returns So, instead of trying to be clever with python's tuple globbing, just rely on lazy tuple unpacking and a whole bunch of if statements. This is more verbose, but less magical. And generally, the less magic there is, the easier things are to read. This also drops the always-tupled lookup_ variants, which were cluttering up the various namespaces.	2025-04-16 15:22:12 -05:00
Christopher Haster	82f4fd3c0f	scripts: Dropped list/tuple distinction in Rbyd.fetch Also tweaked how we fetch shrubs, adding Rbyd.fetchshrub and Btree.fetchshrub instead of overloading the bd argument. Oh, and also added --trunk to dbgmtree.py and dbglfs.py. Actually _using_ --trunk isn't advised, since it will probably just result in a corrupted filesystem, but these scripts are for accessing things that aren't normally allowed anyways. The reason for dropping the list/tuple distinction is because it was a big ugly hack, unpythonic, and likely to catch users (and myself) by surprise. Now, Rbyd.fetch and friends always require separate block/trunk arguments, and the exercise of deciding which trunk to use is left up to the caller.	2025-04-16 15:22:11 -05:00
Christopher Haster	8324786121	scripts: Reverted skipped branches in -t/--tree render The inconsistency between inner/non-inner (-i/--inner) views was a bit too confusing. At least now the bptr rendering in dbglfs.py matches behavior, showing the bptr tag -> bptr jump even when not showing inner nodes. If the point of these renderers is to show all jumps necessary to reach a given piece of data, hiding bptr jumps only sometimes is somewhat counterproductive...	2025-04-16 15:22:07 -05:00
Christopher Haster	97b6489883	scripts: Reworked dbglfs.py, adopted Lfs, Config, Gstate, etc I'm starting to regret these reworks. They've been a big time sink. But at least these should be much easier to extend with the future planned auxiliary trees? New classes: - Bptr - A representation of littlefs's data-only block pointers. Extra fun is the lazily checked Bptr.__bool__ method, which should prevent slowing down scripts that don't actually verify checksums. - Config - The set of littlefs config entries. - Gstate - The set of littlefs gstate. I may have had too much fun with Config and Gstate. Not only do these provide lookup functions for config/gstate, but known config/gstate get lazily parsed classes that can provide easy access to the relevant metadata. These even abuse Python's __subclasses__, so all you need to do to add a new known config/gstate is extend the relevant Config.Config/ Gstate.Gstate class. The __subclasses__ API is a weird but powerful one. - Lfs - The big one, a high-level abstraction of littlefs itself. Contains subclasses for known files: Lfs.Reg, Lfs.Dir, Lfs.Stickynote, etc, which can be accessed by path, did+name, mid, etc. It even supports iterating over orphaned files, though it's expensive (but incredibly valuable for debugging!). Note that all file types can currently have attached bshrubs/btrees. In the existing implementation only reg files should actually end up with bshrubs/btrees, but the whole point of these scripts is to debug things that _shouldn't_ happen. I intentionally gave up on providing depth bounds in Lfs. Too complicated for something so high-level. On noteworthy change is not recursing into directories by default. This hopefully avoids overloading new users and matches the behavior of most other Linux/Unix tools. This adopts -r/--recurse/--file-depth for controlling how far to recurse down directories, and -z/--depth/--tree-depth for controlling how far to recurse down tree structures (mostly files). I like this API. It's consistent with -z/--depth in the other dbg scripts, and -r/--recurse is probably intuitive for most Linux/Unix users. To make this work we did need to change -r/--raw -> -x/--raw. But --raw is already a bit of a weird name for what really means "include a hex dump". Note that -z/--depth/--tree-depth does _not_ imply --files. Right now only files can contain tree structures, but this will change when we get around to adding the auxiliary trees. This also adds the ability to specify a file path to use as the root directory, though we need the leading slash to disambiguate file paths and mroot addresses. --- Also tagrepr has been tweaked to include the global/delta names, toggleable with the optional global_ kwarg. Rattr now has its own lazy parsers for did + name. A more organized codebase would probably have a separate Name type, but it just wasn't worth the hassle. And the abstraction classes have all been tweaked to require the explicit Rbyd.repr() function for a CLI-friendly representation. Relying on __str__ hurt readability and debugging, especially since Python prefers __str__ over __repr__ when printing things.	2025-04-16 15:22:06 -05:00
Christopher Haster	cc20610488	scripts: Skip branches in -t/--tree render, fixed color-repro issues The main difference between -t/--tree and -R/--tree-rbyd is that only the latter shows all internal jumps (unconditional alt->alt), so it makes sense to also hide internal branches (rbyd->rbyd). Note that we already hide the rbyd->block branches in dbglfs.py. Also added color-ignoring comparison operators to our internal TreeBranch struct. This fixes an issue where our non-inner branch merging logic could end up with identical branches with different colors, resulting in different colorings per run. Not the end of the world, but something we want to avoid.	2025-04-16 15:22:05 -05:00
Christopher Haster	582f92d073	scripts: Reworked dbgmtree.py, adopted Mtree, Mdir, etc This is where the high-level structure of littlefs starts to reveal itself. This is also where a lot of really annoying Mtree vs Btree API questions come to a head, like should Mtree.lookup return an Mdir or an Rattr? What about Btree.lookup? What gets included in the returned path in all of these? Well, at least this is an interesting exercise in rethinking littlefs's internal APIs... New classes: - Mid - A representation of littlefs's metadata ids. I've just gone ahead and included the block_size-dependent mbits as a field in every Mid instance to try to make Mid operations easier. It's not like we care about one extra word of storage in Python. - Mdir - Again, we intentionally _don't_ inherit Rbyd to try to reduce type errors, though Mdirs really are just Rbyds in this design. - Mtree - The skeleton of littlefs. Tricky bits include traversing the mroot chain and handling mroot-inlined mdirs. Note mroots are included in the mdir/mid iteration methods. Getting the tree renderers all working again was a real pain in the ass.	2025-04-16 15:22:03 -05:00
Christopher Haster	0e2a302d35	scripts: Dropped tag/weight when returning rattrs Now that these are contained in the Rattr class, including the tag/weight just clutters these APIs and makes things more confusing. To make this more convenient, I've adding __iter__ methods that allow unpacking both the Rattr and Ralt classes. These more-or-less represent tag+weight+data tuples anyways.	2025-04-16 15:22:02 -05:00
Christopher Haster	46c55722a5	scripts: Reworked dbgbtree.py, adopted Btree class Like the Rbyd class, Btree serves as an abstraction for littlefs's btrees in Python. New classes: - Btree - btree abstraction, note this does _not_ inherit from Rbyd. I find that sort of inheritance too error-prone. Instead Btree _contains_ the root rbyd, which can always be accessed via Btree.rbyd. If you want low-level root-rbyd details, just access Btree.rbyd. Though most fields that are relevant to the Btree are also forwarded via Python's @property properties. - Bd - This just serves as a handle for the disk file that includes block_size/block_count metadata. One important change to note is the adoption of required vestigial names in all btree nodes (yes this scripts was written... checks notes... 2 years ago... even the same month huh). This means we don't need the parent name mapping, so the non-inner btree printing code no longer needs to be extremely confusing at all times. Also adopted the Rbyd class and friends, and backported Bd to dbgrbyd.py. Also tried to give a couple useful algorithms their own self-contained functions, mainly: - pathdelta - for emulating a traversal over exhaustive paths - treerepr - for the common ascii tree rendering code	2025-04-16 15:22:00 -05:00
Christopher Haster	73127470f9	scripts: Adopted rbydaddr/tagrepr changes across scripts Just some minor tweaks: - rbydaddr: Return list instead of tuple, note we rely on the type distinction in Rbyd.fetch now. - tagrepr: Rename w -> weight.	2025-04-16 15:21:59 -05:00
Christopher Haster	68f0534dd0	rbyd: Dropped special altn/alta encoding altas, and to a lesser extend altns, are just too problematic for our rbyd-append algorithm. Main issue is these break our "narrowing" invariant, where each alt only ever decreases the bounds. I wanted to use altas to simplify lfsr_rbyd_appendcompaction, but decided it wasn't worth it. Handling them correctly would require adding a number of special cases to lfsr_rbyd_appendrat, adding complexity to an already incredibly complex function. --- Fortunately, we don't really need altns/altas on-disk, but we _do_ need a way to mark alts as unreachable internally in order to know when we can collapse alts when recoloring (at this point bounds information is lost). I was originally going to use the alt's sign bit for this, but it turns out we already have this information thanks to setting jump=0 to assert that an alt is unreachable. So no explicit flag needed! This ends up saving a surprising amount of code for what is only a couple lines of changes: code stack ctx before: 38512 2624 640 after: 38440 (-0.2%) 2624 (+0.0%) 640 (+0.0%)	2025-02-08 14:53:47 -06:00
Christopher Haster	1c5adf71b3	Implemented self-validating global-checksums (gcksums) This was quite a puzzle. The problem: How do we detect corrupt mdirs? Seems like a simple question, but we can't just rely on mdir cksums. Our mdirs are independently updateable logs, and logs have this annoying tendency to "rollback" to previously valid states when corrupted. Rollback issues aren't littlefs-specific, but what _is_ littlefs- specific is that when one mdir rolls back, it can disagree with other mdirs, resulting in wildly incorrect filesystem state. To solve this, or at least protect against disagreeable mdirs, we need to somehow include the state of all other mdirs in each mdir commit. --- The first thought: Why not use gstate? We already have a system for storing distributed state. If we add the xor of all of our mdir cksums, we can rebuild it during mount and verify that nothing changed: .--------. .--------. .--------. .--------. .\| mdir 0 \| .\| mdir 1 \| .\| mdir 2 \| .\| mdir 3 \| \|\| \| \|\| \| \|\| \| \|\| \| \|\| gdelta \| \|\| gdelta \| \|\| gdelta \| \|\| gdelta \| \|'-----\|--' \|'-----\|--' \|'-----\|--' \|'-----\|--' '------\|-' '------\|-' '------\|-' '------\|-' '--.------' '--.------' '--.------' '--.------' cksum \| cksum \| cksum \| cksum \| \| \| v \| v \| v \| '---------> xor -------> xor -------> xor -------> gcksum \| v v v =? '---------> xor -------> xor -------> xor ---> gcksum Unfortunately it's not that easy. Consider what this looks like mathematically (g is our gcksum, c_i is an mdir cksum, d_i is a gcksumdelta, and +/-/sum is xor): g = sum(c_i) = sum(d_i) If we solve for a new gcksumdelta, d_i: d_i = g' - g d_i = g + c_i - g d_i = c_i The gcksum cancels itself out! We're left with an equation that depends only on the current mdir, which doesn't help us at all. Next thought: What if we permute the gcksum with a function t before distributing it over our gcksumdeltas? .--------. .--------. .--------. .--------. .\| mdir 0 \| .\| mdir 1 \| .\| mdir 2 \| .\| mdir 3 \| \|\| \| \|\| \| \|\| \| \|\| \| \|\| gdelta \| \|\| gdelta \| \|\| gdelta \| \|\| gdelta \| \|'-----\|--' \|'-----\|--' \|'-----\|--' \|'-----\|--' '------\|-' '------\|-' '------\|-' '------\|-' '--.------' '--.------' '--.------' '--.------' cksum \| cksum \| cksum \| cksum \| \| \| v \| v \| v \| '---------> xor -------> xor -------> xor -------> gcksum \| \| \| \| .--t--' \| \| \| \| '-> t(gcksum) \| v v v =? '---------> xor -------> xor -------> xor ---> t(gcksum) In math terms: t(g) = t(sum(c_i)) = sum(d_i) In order for this to work, t needs to be non-linear. If t is linear, the same thing happens: d_i = t(g') - t(g) d_i = t(g + c_i) - t(g) d_i = t(g) + t(c_i) - t(g) d_i = t(c_i) This was quite funny/frustrating (funnistrating?) during development, because it means a lot of seemingly obvious functions don't work! - t(g) = g - Doesn't work - t(g) = crc32c(g) - Doesn't work because crc32cs are linear - t(g) = g^2 in GF(2^n) - g^2 is linear in GF(2^n)!? Fortunately, powers coprime with 2 finally give us a non-linear function in GF(2^n), so t(g) = g^3 works: d_i = g'^3 - g^3 d_i = (g + c_i)^3 - g^3 d_i = (g^2 + gc_i + gc_i + c_i^2)(g + c_i) - g^3 d_i = (g^2 + c_i^2)(g + c_i) - g^3 d_i = g^3 + gc_i^2 + g^2c_i + c_i^3 - g^3 d_i = gc_i^2 + g^2c_i + c_i^3 --- Bleh, now we need to implement finite-field operations? Well, not entirely! Note that our algorithm never uses division. This means we don't need a full finite-field (+, -, , /), but can get away with a finite-ring (+, -, ). And conveniently for us, our crc32c polynomial defines a ring epimorphic to a 31-bit finite-field. All we need to do is define crc32c multiplication as polynomial multiplication mod our crc32c polynomial: crc32cmul(a, b) = pmod(pmul(a, b), P) And since crc32c is more-or-less just pmod(x, P), this lets us take advantage of any crc32c hardware/tables that may be available. --- Bunch of notes: - Our 2^n-bit crc-ring maps to a 2^n-1-bit finite-field because our crc polynomial is defined as P(x) = Q(x)(x + 1), where Q(x) is a 2^n-1-bit irreducible polynomial. This is a common crc construction as it provides optimal odd-bit/2-bit error detection, so it shouldn't be too difficult to adapt to other crc sizes. - t(g) = g^3 is not the only function that works, but it turns out to be a pretty good one: - 3 and 2^(2^n-1)-1 are coprime, which means our function t(g) = g^3 provides a one-to-one mapping in the underlying fields of all crc rings of size 2^(2^n). We know 3 and 2^(2^n-1)-1 are coprime because 2^(2^n-1)-1 = 2^(2^n)-1 (a Fermat number) - 2^(2^n-1) (a power-of-2), and 3 divides Fermat numbers >=3 (A023394) and is not 2. - Our delta, when viewed as a polynomial in g: d(g) = gc^2 + g^2c + c^3, has degree 2, which implies there are at most 2 solutions or 1-bit of information loss in the underlying field. This is optimal since the original definition already had 2 solutions before we even chose a function: d(g) = t(g + c) - t(g) d(g) = t(g + c) - t((g + c) - c) d(g) = t((g + c) + c) - t(g + c) d(g) = d(g + c) Though note the mapping of our crc-ring to the underlying field already represents 1-bit of information loss. - If you're using a cryptographic hash or other non-crc, you should probably just use an equal sized finite-field. Though note changing from a 2^n-1-bit field to a 2^n-bit field does change the math a bit, with t(g) = g^7 being a better non-linear function: - 7 is the smallest odd-number coprime with 2^n-1, a Fermat number, which makes t(g) = g^7 a one-to-one mapping. 3 humorously divides all 2^n-1 Fermat numbers. - Expanding delta with t(g) = g^7 gives us a 6 degree polynomial, which implies at most 6 solutions or ~3-bits of information loss. This isn't actually the best you can do, some exhaustive searching over small fields (<=2^16) suggests t(g) = g^(2^(n-1)-1) _might_ be optimal, but that's a heck of a lot more multiplications. - Because our crc32cs preserve parity/are epimorphic to parity bits, addition (xor) and multiplication (crc32cmul) also preserve parity, which can be used to show our entire gcksum system preserves parity. This is quite neat, and means we are guaranteed to detect any odd number of bit-errors across the entire filesystem. - Another idea was to use two different addition operations: xor and overflowing addition (or mod a prime). This probably would have worked, but lacks the rigor of the above solution. - You might think an RS-like construction would help here, where g = sum(c_ia^i), but this suffers from the same problem: d_i = g' - g d_i = g + c_ia^i - g d_i = c_ia^i Nothing here depends on anything outside of the current mdir. - Another question is should we be using an RS-like construction anyways to include location information in our gcksum? Maybe in another system, but I don't think it's necessary in littlefs. While our mdir are independently updateable, they aren't _entirely_ independent. The location of each mdir is stored in either the mtree or a parent mdir, so it always gets mixed into the gcksum somewhere. The only exception being the mrootanchor which is always at the fixed blocks 0x{0,1}. - This does _not_ catch "global-rollback" issues, where the most recent commit in the entire filesystem is corrupted, revealing an older, but still valid, filesystem state. But as far as I am aware this is just a fundamental limitation of powerloss-resilient filesystems, short of doing destructive operations. At the very least, exposing the gcksum would allow the user to store it externally and prevent this issue. --- Implementation details: - Our gcksumdelta depends on the rbyd's cksum, so there's a catch-22 if we include it in the rbyd itself. We can avoid this by including it in the commit tags (actually the separate canonical cksum makes this easier than it would have been earlier), but this does mean LFSR_TAG_GCKSUMDELTA is not an LFSR_TAG_GDELTA subtype. Unfortunate but not a dealbreaker. - Reading/writing the gcksumdelta gets a bit annoying with it not being in the rbyd. For now I've extended the low-level lfsr_rbyd_fetch_/ lfsr_rbyd_appendcksum_ to accept an optional gcksumdelta pointer, which is a bit awkward, but I don't know of a better solution. - Unlike the grm, _every_ mdir commit involves the gcksum, which means we either need to propagate the gcksumdelta up the mroot chain correctly, or somehow keep track of partially flushed gcksumdeltas. To make this work I modified the low-level lfsr_mdir_commit__ functions to accept start_rid=-2 to indicate when gcksumdeltas should be flushed. It's a bit of a hack, but I think it might make sense to extend this to all gdeltas eventually. The gcksum cost both code and RAM, but I think it's well worth it for removing an entire category of filesystem corruption: code stack ctx before: 37796 2608 620 after: 38428 (+1.7%) 2640 (+1.2%) 644 (+3.9%)	2025-02-08 14:53:30 -06:00
Christopher Haster	b6ab323eb1	Dropped the q-bit (previous-perturb) from cksum tags Now that we perturb commit cksums with the odd-parity zero, the q-bit no longer serves a purpose other than extra debug info. But this is a double-edged sword, because redundant info just means another thing that can go wrong. For example, should we assert? If the q-bit doesn't reflect the previous-perturb state it's a bug, but the only thing that would break would be the q-bit itself. And if we don't assert what's the point of keeping the q-bit around? Dropping the q-bit avoids answering this question and saves a bit of code: code stack ctx before: 37772 2608 620 after: 37768 (-0.0%) 2608 (+0.0%) 620 (+0.0%)	2025-01-28 14:41:45 -06:00
Christopher Haster	66bf005bb8	Renamed LFSR_TAG_ORPHAN -> LFSR_TAG_STICKYNOTE I've been unhappy with LFSR_TAG_ORPHAN for a while now. While it's true these represent orphaned files, they also represent zombied files. And as long as a reference to the file exists in-RAM, I find it hard to say these files are truely "orphaned". We're also just using the term "orphan" for too many things. Really this tag just represents an mid reservation. The term stickynote works well enough for this, and fits in with the other internal tag, LFSR_TAG_BOOKMARK.	2025-01-28 14:41:45 -06:00
Christopher Haster	62cc4dbb14	scripts: Disabled local import hack on import Moved local import hack behind if __name__ == "__main__" These scripts aren't really intended to be used as python libraries. Still, it's useful to import them for debugging and to get access to their juicy internals.	2025-01-28 14:41:30 -06:00
Christopher Haster	7cfcc1af1d	scripts: Renamed summary.py -> csv.py This seems like a more fitting name now that this script has evolved into more of a general purpose high-level CSV tool. Unfortunately this does conflict with the standard csv module in Python, breaking every script that imports csv (which is most of them). Fortunately, Python is flexible enough to let us remove the current directory before imports with a bit of an ugly hack: # prevent local imports __import__('sys').path.pop(0) These scripts are intended to be standalone anyways, so this is probably a good pattern to adopt.	2024-11-09 12:31:16 -06:00
Christopher Haster	a0ab7bda26	scripts: Avoid rereading shrub blocks This extends Rbyd.fetch to accept another rbyd, in which case we inherit the RAM-backed block without rereading it from disk. This avoids an issue where shrubs can become corrupted if the disk is being simultaneously written and debugged. Normally we can detect the checksum mismatch and toss out the rbyd during fetch, but shrub pointers don't include a checksum since they assume the containing rbyd has already been checksummed. It's interesting to note this even avoids the memory copy thanks to Python's reference counting.	2024-11-08 02:24:56 -06:00
Christopher Haster	0260f0bcee	scripts: Added better branch cksum checks If we're fetching branches anyways, we might as well check that the checksums match. This helps protect against infinite loops in B-tree branches. Also fixed an issue where we weren't xoring perturb state on finding an explicit trunk. Note this is equivalent to LFS_M_CKFETCHES in lfs.c. --- This doesn't mean we always need LFS_M_CKFETCHES. Our dbg scripts just need to be a little bit tougher because 1. running tests with -j creates wildly corrupted and entangled littlefs images, and 2. Rbyd.fetch is almost too forgiving in choosing the nearest trunk.	2024-11-08 02:20:19 -06:00
Christopher Haster	e3fdc3dbd7	scripts: Added simple mroot cycle detectors to dbg scripts These work by keeping a set of all seen mroots as we descend down the mroot chain. Simple, but it works. The downside of this approach is that the mroot set grows unbounded, but it's unlikely we'll ever have enough mroots in a system for this to really matter. This fixes scripts like dbgbmap.py getting stuck on intentional mroot cycles created for testing. It's not a problem for a foreground script to get stuck in an infinite loop, since you can just kill it, but a background script getting stuck at 100% CPU is a bit more annoying.	2024-11-07 11:46:39 -06:00
Christopher Haster	007ac97bec	scripts: Adopted double-indent on multiline expressions This matches the style used in C, which is good for consistency: a_really_long_function_name( double_indent_after_first_newline( single_indent_nested_newlines)) We were already doing this for multiline control-flow statements, simply because I'm not sure how else you could indent this without making things really confusing: if a_really_long_function_name( double_indent_after_first_newline( single_indent_nested_newlines)): do_the_thing() This was the only real difference style-wise between the Python code and C code, so now both should be following roughly the same style (80 cols, double-indent multiline exprs, prefix multiline binary ops, etc).	2024-11-06 15:31:17 -06:00
Christopher Haster	48c2e7784b	scripts: Renamed import math alias m -> mt Mainly to avoid conflicts with match results m, this frees up the single letter variables m for other purposes. Choosing a two letter alias was surprisingly difficult, but mt is nice in that it somewhat matches it (for itertools) and ft (for functools).	2024-11-05 01:58:40 -06:00
Christopher Haster	4d8bfeae71	attrs: Reduced UATTR/SATTR range down to 7-bits It would be nice to have a full 8-bit range for both user attrs and system attrs, for both backwards compatibility and maximizing the available attr space, but I think it just doesn't make sense from an API perspective. Sure we could finagle the user/sys bit into a flags argument, or provide separate lfsr_getuattr/getsattr functions, but asking users to use a 9-bit int for higher-level operations (dynamic attrs, iteration, etc) is a bit much... So this reduces the two attr ranges down to 7-bits, requiring 8-bits total to store all possible attr types in the current system: TAG_ATTR 0x0400 v--- -1-a -aaa aaaa TAG_UATTR 0x04aa v--- -1-- -aaa aaaa TAG_SATTR 0x05aa v--- -1-1 -aaa aaaa This really just affects scripts, since we haven't actually implemented attributes yet. Worst case we still have the 9-bit encoding space carved out, so we can always add an additional set of attrs in the future if we start running into attr pressure. Or, you know, just turn on the subtype leb128 encoding the 8th subtype bit is reserved for. Then you'd only be limited by internal driver details, probably 24-bits per attr range if we make tags 32-bits internally. Though this would probably come with quite a code cost...	2024-08-22 00:59:09 -05:00
Christopher Haster	c00e0b2af6	Fixed explicit trunks messing with canonical checksums Updating the canonical checksum should only depend on if the tag is a trunkish tag (not a checksum tag), and not if the tag is in the current trunk. The trunk parameter to lfsr_rbyd_fetch should have no effect on the canonical checksum. Fixed in boath lfsr_rbyd_fetch and scripts. Curiously no code changes: code stack before: 36416 2616 after: 36416 (+0.0%) 2616 (+0.0%	2024-08-20 12:03:48 -05:00

1 2 3

141 Commits