Commit Graph

98 Commits

Author SHA1 Message Date
5dc88e3e00 scripts: csv.py: Added delta expr
This is the inverse of accumulate, returning the difference between
subsequent results. In theory accumulate(delta(x)) and
delta(accumulate(x)) are noops.

This is particularly useful for normalizing our bench n value in
scripts. It's the only value still returned as a cumulative measurement,
which is a bit inconsistent, but necessary for uniquely identifying
probe steps.
2026-02-19 14:04:25 -06:00
484b7dd1e8 scripts: csv.py: Ignore missing by fields in enumerate/accumulate
Note this matches the behavior of mods, e.g. I would expect this to not
break if ORDER is missing:

  ./scripts/csv.py \
      -bcase='%(case)s+%(probe)s+%(ORDER)s' \
      -ft=accumulate(bench_simtime, case, probe, ORDER)

Normally the expr compiler would force typechecking of ORDER, giving it
a default value of int(0) if missing, but we intentionally bypass
typechecking in enumerate/accumulate's by fields since they may be
strings.
2026-02-19 14:04:10 -06:00
8a35b9870b scripts: Tweaked table renderer to not hide conflicting results
I think this is currently only possible with overlapping by/field
fields, but hiding results with conflicting by fields is not ideal.
Especially since this function is central to so many scripts:

  cat test.csv
  a,b,c
  x,2,1
  x,1,2
  x,1,3

Before:

  ./scripts/csv.py test.csv -ba -bb -fb -fc
  warning: by fields are unstable
  a,b            b        c
  x,2            2        1
  TOTAL          4        6

After:

  ./scripts/csv.py test.csv -ba -bb -fb -fc
  a,b            b        c
  x,2            2        5
  x,2            2        1
  TOTAL          4        6

This solves the main issue with unstable by fields, so no more warning.

Note that some features rely on by being unique to work (added/removed
numbers, compare fields, etc). They shouldn't error, but may be
incorrect/unintuitive with conflicting by fields, so avoiding
conflicting by fields is still a good idea.
2026-02-19 14:01:35 -06:00
a3082437df scripts: Relaxed lost results due to unstable by fields to a warning
So it turns out this _can_ happen, without an in-script coding error.

Consider the behavior of a script with overlapping by/field fields:

  $ cat test.csv
  a,b
  x,2
  x,1
  x,1
  $ ./scripts/csv.py test.csv -ba -bb -fb

During the first fold, rows 2 and 3 will contain b=1, but during the
second fold they will have been merged, resulting in b=2.

So, relaxing to a warning for now. Maybe the table renderer should be
rewritten to avoid folding? (note diffing results may be tricky)
2026-02-19 13:59:32 -06:00
85b7a48df7 scripts: csv.py: Added bounded examples to -l/--list-fields
Now -l/--list-fields includes however many results fit in 36 chars:

  $ ./scripts/csv.py --list-fields test.csv
  i              int    # 16,17,14,18,19,15,20,13,12,29,27,28,...
  suite          ?      # bench_p26_wt
  case           ?      # bench_p26_wt_linear,bench_p26_wt_ran...
  NO_FRUNCATE    int    # 0
  SIZE           int    # 2097152
  SEED           int    # 42

The whole point of -l/--list-fields is to give a quick information dump
about what's inside a csv file, and we're already parsing everything to
try to figure out types, so why not?

Much easier to read than head:

  $ head -n5 test.csv
  i,suite,case,NO_FRUNCATE,SIZE,SEED,BLOCK_SIZE,FILE_SIZE,SIM_...
  16,bench_p26_wt,bench_p26_wt_linear,0,2097152,42,65536,64,36...
  16,bench_p26_wt,bench_p26_wt_linear,0,2097152,42,65536,64,36...
  16,bench_p26_wt,bench_p26_wt_linear,0,2097152,42,65536,64,36...
  16,bench_p26_wt,bench_p26_wt_linear,0,2097152,42,65536,64,36...
2026-02-19 13:35:21 -06:00
efde754f88 scripts: csv.py: Optional by fields for unique enumerates/accumulates
This extends csv.py's enumerate/accumulate exprs with optional by field
arguments. Each set of by fields gets its own state, allowing multiple
parallel enumerates/accumulates to be processed simultaneously.

This is especially useful when the number of by field sets is unknown.
In theory you could split/merge each by field set with a separate csv.py
call, but it'd be a real pain.

Consider some bench results:

  case,n,simtime
  bench_rbyd,1,100
  bench_rbyd,2,10
  bench_rbyd,3,100
  bench_btree,1,200
  bench_rbyd,4,10
  bench_btree,2,20
  bench_btree,3,2000
  bench_btree,4,200

It was a bit awkward to handle these with csv.py's accumulate, as
accumulate operated strictly per-row, ignoring the case field.

But now with optional by fields:

  $ ./scripts/csv.py test.csv \
        -bcase -bn \
        -fsimtime='accumulate(simtime, case)'
  case,n           simtime
  bench_btree,1        200
  bench_btree,2        220
  bench_btree,3       2220
  bench_btree,4       2420
  bench_rbyd,1         100
  bench_rbyd,2         110
  bench_rbyd,3         210
  bench_rbyd,4         220
  TOTAL               5700

Note that these by fields are a bit special in csv.py's grammar. So far,
they are the only fields in field exprs that aren't typechecked. The
alternative would be string types in csv.py, but I'm not sure I want to
go that far.

---

It's tempting to try to invert this logic (accumulate(simtime, n)), but
I'm not sure how it would work internally. The duplicate by fields
("case") do get annoying, but specifying them in the expr helps make the
relevant state explicit.

Keep in mind we don't evaluate the actual by fields until much later in
csv.py. Entangling these stages risks confusion (-ba='%(b)s'
-c='enumerate(n)'? hidden by fields? overlapping by+field fields?).
2026-02-19 13:07:08 -06:00
cf7e0e3fef scripts: csv.py: Tweaked foldchecking to check that folds match
I mean, what would you expect this to do?

  max(a) + sum(b)

Whatever your answer is, it's wrong (the way csv.py works, we always
compute folds after expr evaluation). The best option is to error,
matching the behavior of mismatched types.
2026-02-19 13:07:01 -06:00
9a224a1c52 scripts: csv.py: Fixed incorrect fold type when type changes
csv.py's -L/--list-computed was returning some confusing types:

  $ ./scripts/csv.py /dev/null -fa='float(1)' -L
          a  int  sum
              ^-- huh!?

Turns out csv.py's fold typechecking was all broken. Folds can change
the type, but only at the invocation:

  $ ./scripts/csv.py /dev/null -fa='sum(float(1))' -L
          a  int  sum
  $ ./scripts/csv.py /dev/null -fa='avg(int(1))' -L
          a  float  avg
  $ ./scripts/csv.py /dev/null -fa='int(avg(1))' -L
          a  float  avg

This is maybe defensible for explicit folds, since their evaluation is
also lifted, but not so much for things like literals/fields/etc.

---

Fixed by allowing None to indicate a generic fold, and allowing types to
be lazily figured out in csv.compile.
2026-02-19 13:00:35 -06:00
fe93d62523 scripts: csv.py: Added -l and -L shortform flags
These seem useful enough to have shortform flags:

- -l/--list-fields - Input fields before processing
- -L/--list-computed - Computed fields and expr dependencies

Note while -L/--list-computed has more information, it's also more
likely to trigger an assert/error due to poorly implemented field exprs.
2026-02-13 13:45:01 -06:00
6093fa79ac scripts: csv.py: Tweaked --list-computed to infer all input field types
On one hand, only inferring the used input fields is conceptually
correct because that's how csv.py works. On the other, it doesn't really
make sense for --list-computed to show _less_ information than
--list-fields.

So, showing all inferred types now:

  $ ./scripts/csv.py --list-computed test.csv \
        -bcase='%(case)s+%(m)s' \
        -fsimtime='float(bench_simtime)/1.0e9' \
        -fsimthroughput='float(n)/max(float(bench_simtime)/1.0e9,1.0e-9)'
  i              int   .-->  case           ?    ?
  suite          ?     |.->  simtime        int  sum
  case           ?    -+|.>  simthroughput  int  sum
  SKIP_WARMUP    int   |||
  FILE_SIZE      int   |||
  SEED           int   |||
  ...
  m              ?    -'||
  n              int  ---+
  bench_reads    int    ||
  bench_progs    int    ||
  bench_erases   int    ||
  bench_readed   int    ||
  bench_progged  int    ||
  bench_erased   int    ||
  bench_simtime  int  ---'

I think this makes --list-computed a strict superset of --list-fields
now.
2026-02-13 13:45:01 -06:00
60dec6b77d scripts: csv.py: Tweaked expr-less -F to still typecheck
I was expecting -ba -Fa to sort numerically, but it was not. Turns out
hidden field fields (-F/--hidden-field) without exprs were never
typechecked.

This is not an issue for non-hidden field fields (-f/--field), because
we typecheck these explicitly in compile.
2026-02-13 13:45:01 -06:00
49e3b22907 scripts: csv.py: Fixed bottleneck from overlapping by/from fields
Found from some confusing behavior when by/from fields overlap. It turns
out when this happens (-bhi -Fhi, for example), the generated getattr
for the by field would trigger the __getattribute__ for the overlapping
field field, resulting in a fold on _every add operation_.

Hopefully you can see where this is a bit of a problem when summing a
large number of results (O(n^2)?).

---

Fixed by switching getattr to object.__getattribute__ and reconsidering
csv.py's entire design.
2026-02-13 13:45:01 -06:00
68de9efd17 scripts: csv.py: Added --list-computed to expose expr deps/types/etc
Less useful than --list-fields, but fun.

This shows more of the internal expr eval info: input fields + types,
output fields + types + folds, and a small dependency graph showing what
goes where:

  $ ./scripts/csv.py --list-computed test.csv \
        -bcase='%(case)s+%(m)s' \
        -fsimtime='float(bench_simtime)/1.0e9' \
        -fsimthroughput='float(n)/max(float(bench_simtime)/1.0e9,1.0e-9)'
  i              ?     .-->  case           ?    ?
  suite          ?     |.->  simtime        int  sum
  case           ?    -+|.>  simthroughput  int  sum
  SKIP_WARMUP    ?     |||
  FILE_SIZE      ?     |||
  SEED           ?     |||
  ...
  m              ?    -'||
  n              int  ---+
  bench_reads    ?      ||
  bench_progs    ?      ||
  bench_erases   ?      ||
  bench_readed   ?      ||
  bench_progged  ?      ||
  bench_erased   ?      ||
  bench_simtime  int  ---'

Maybe I was just itching to write another ascii-art renderer.
2026-02-13 13:00:22 -06:00
187f35df61 scripts: csv.py: Added --list-fields for quick field access
One issue I keep running into with csv.py is that it's difficult to get
started with a new/unfamiliar csv file.

csv.py itself doesn't know what to do until you start specifying fields,
but you can't start specifying fields until you know what fields there
are. Add to this the fact that our csv files have so much info shoved in
them that their "human readability" is mostly theoretical.

The --list-fields flag provides a quick solution to this:

  $ ./scripts/csv.py --list-fields test.csv
  i              int
  suite          ?
  case           ?
  SKIP_WARMUP    int
  FILE_SIZE      int
  SEED           int
  ...

csv.py doesn't have much info at this stage, but we can at least include
the best-effort type guessing we use for field exprs.
2026-02-13 12:41:02 -06:00
6c458b321c scripts: csv.py: Simplified -i/--enumerate to alias -bi -Fi=enumerate()
Now that we have the enumerate expr, -i/--enumerate can be implemented
entirely during expr eval:

- -i/--enumerate        => -bi -Fi=enumerate()
- -I/--hidden-enumerate => -Bi -Fi=enumerate()

Instead of internally reimplementing the same behavior.

This is what our help text implies, so might as well put our money where
our mouth is. And the less special internals we have, the better.

I considered removing -i/-I completely, but it's quite a convenient flag
when debugging csv.py expressions.
2026-02-10 17:33:01 -06:00
ac338e66f0 scripts: csv.py: Added explicit z field, reusing -Z/--children
In an effort to move away from magic usage of -i/--enumerate, this adds
an explicit z field for differentiating -r/--hot results (and for normal
recursive results).

Instead of trying to think of a new flag to control this, this just
piggybacks on -Z/--children, which now accepts a tuple:

- ./scripts/csv.py -z3 -Z
- ./scripts/csv.py -z3 -Zchildren
- ./scripts/csv.py -z3 -Zz,children

The only tricky bit was needing to insert z in front of the by fields,
otherwise it was mostly a simplification from the enumerate mess.

Another positive side-effect: -r/--hot (and -z/--depth) now implies
-Zz,children, removing the annoying/confusing behavior of hotify folding
results by default.
2026-02-10 17:33:01 -06:00
078a1fb4c6 scripts: Adopted explicit underscore in Result._prefix
For consistency with the --prefix flag.

I confused myself while debugging some script behavior, and that's
no good.
2026-02-10 17:32:55 -06:00
98279a0b36 scripts: csv.py: Simplified --prefix, moved to collect_csv
The current... attempt at an approach was broken and becoming horribly
unmaintainable. Two issues found without even looking:

1. Field inference didn't understand prefixes, leading to duplicate
   by/field fields when attempting to infer by fields with --prefix.

2. Sort wasn't working for some reason, probably because they behavior
   of sort, defines, etc are really weird since they apply to both by
   fields and field fields.

I considered just dropping support for --prefix completely, this really
isn't worth the time, but instead found a simple solution of moving
prefix handling to one of the first steps in collect_csv.

This has the downside of creating conflicts when a prefixed/non-prefixed
field has the same name, but I don't care. --prefix is a niche flag that
shouldn't mess with the rest of the code like this, and none of the
other scripts really handle field conflicts correctly anyways.
2026-02-10 17:14:49 -06:00
695b1e94df scripts: csv.py: Fixed issues with non-default children/notes fields
- Fixed the initial filter using explicit 'children'/'notes' literals

  Whoops, how did this happen?

- Fixed fold using default children/notes result attributes

  This one is a bit more excusable, self.children is easy to overlook.
  But not actual string literals, that's silly.
2026-02-10 17:13:13 -06:00
84b2e73a30 scripts: csv.py: Added enumerate/accumulate exprs
This adds two new exprs to csv.py, useful for sequential data:

  enumerate()    A number incremented each result
  accumulate(a)  A running sum across results

To make these work required adding support for cross-row state, thus the
new state field in CsvExpr.Expr.eval.

Once you have that cross-row state, implementing enumerate/accumulate is
pretty straightforward. The only complication being that we need to hash
state by the unique Python id (`id(self)`), otherwise multiple exprs
would share state, which would be pretty weird.

Note that csv.py's pipeline is now quite complex, and stage order is
important!

  input --> define    --> expr --> folding --> sorting --> output
            filtering     eval

As a result, it's unfortunately not possible to organize enumerate/
accumulate by by fields. I poked around with the idea but decided it was
too complex (aren't I supposed be building a filesystem?). The guiding
principle behind csv.py is most problems can be solved with more process
substitution.

---

This is a bit clunky since we can't use the existing fold system, but
csv.py is already a pile of hacks, so what's one more?

The reason for the clunkiness is that the original idea behind csv.py
was to treat each folded row independently and order-agnostic. Not the
greatest idea in hindsight, cross-row operations are useful!
2026-02-10 16:52:58 -06:00
3978a32156 scripts: csv.py: Started adding -g/--accumulate
The idea here is to add some sort of accumulate operation to csv.py, so
we can stop cumulative-result clunkiness. It would also be immensely
useful as a general function, and -i/--enumerate already sets a
precedent for this sort of cross-row behavior.

But I'm starting to think using flags here is not the best way, maybe
this would be better as a field expr?
2026-02-10 16:52:11 -06:00
45f1850028 scripts: csv.py: Fixed missing -F/--hidden-field parsing
For some reason -F/--hidden-field fields weren't being parsed as a
CsvExpr, breaking any attempt to use exprs with hidden fields. Probably
just broken during a refactor.

Fortunately an easy fix.
2026-02-10 16:40:15 -06:00
be118ab93d scripts: Fixed -s/-S sorting of .csv/.json outputs
I'm not sure if this was ever implemented, or broken during a refactor,
but we were ignoring -s/-S flags when writing .csv/.json output with
-o/-O.

Curious, because the functionality _was_ implemented in fold, just
unused. All this required was passing -s/-S to fold correctly.

Note we _don't_ sort diff_results, because these are never written to
.csv/.json output.

At some point this behavior may have been a bit more questionable, since
we use to allow mixing -o/-O and table rendering. But now that -o/-O is
considered an exclusive operation, ignoring -s/-S doesn't really make
sense.

---

Why did this come up? Well imagine my frustration when:

1. In tikz/pgfplots, \addplot table only really works with sorted data

2. csv.py has a -s/-S flag for sorting!

3. -s/-S doesn't work!
2025-10-01 17:57:49 -05:00
6ba3204816 scripts: Some csv script tweaks to better interact with other scripts
- Added --small-total. Like --small-header, this omits the first column
  which usually just has the informative text TOTAL.

- Tweaked -Q/--small-table so it renders with --small-total if
  -Y/--summary is provided.

- Added --total as an alias for --summary + --no-header + --small-total,
  i.e. printing only the totals (which may be multiple columns) and no
  other decoration.

  This is useful for scripting, now it's possible to extract just, say,
  the sum of some csv and embed with $():

    echo $(./scripts/code.py lfs3.o --total)

- Tweaked total to always output a number (0) instead of a dash (-),
  even if we have no results.

  This relies on Result() with no args, which risks breaking scripts
  where the Result type expects an argument. To hopefully catch this
  early, the table renderer currently creates a Result() before trying
  to fold the total result.

- If first column is empty (--small-total + --small-header, --no-header,
  etc) collapse width to zero. This avoids a bunch of extra whitespace,
  but still includes the two spaces normal used to separate names from
  fields.

  But I think those spaces are a good thing. It makes it hard to miss
  the implicit padding in the table renderer that risks breaking
  dependent scripts.
2025-10-01 17:57:37 -05:00
27a722456e scripts: Added support for SI-prefixes as iI punescape modifiers
This adds %i and %I as punescape modifiers for limited printing of
integers with SI prefixes:

- %(field)i - base-10 SI prefixes
  - 100   => 100
  - 10000 => 10K
  - 0.01  => 10m

- %(field)I - base-2SI prefixes
  - 128   => 128
  - 10240 => 10Ki
  - 0.125 => 128mi

These can also easily include units as a part of the punescape string:

- %(field)iops/s => 10Kops/s
- %(field)IB => 10KiB

This is particularly useful in plotmpl.py for adding explicit
x/yticklabels without sacrificing the automatic SI-prefixes.
2025-10-01 17:56:51 -05:00
2a4e0496b6 scripts: csv.py: Fixed lexing of signed float exponents
So now these lex correctly:

- 1e9  =>  1000000000
- 1e+9 =>  1000000000
- 1e-9 => -1000000000

A bit tricky when you think about how these could be confused for binary
addition/subtraction. To fix we just eagerly grab any signs after the e.

These are particularly useful for manipulating simulated benchmarks,
where we need to convert things to/from nanoseconds.
2025-10-01 17:56:29 -05:00
d4c772907d scripts: csv.py: Fixed completely broken float parsing
Whoops! A missing splat repetition here meant we only ever accepted
floats with a single digit of precision and no e/E exponents.

Humorously this went unnoticed because our scripts were only
_outputting_ single digit floats, but now that that's fixed, float
parsing also needs a fix.

Fixed by allowing >1 digit of precision in our CsvFloat regex.
2025-05-15 15:44:30 -05:00
d5b28df33a scripts: Fixed excessive rounding when writing floats to csv/json files
This adds __csv__ methods to all Csv* classes to indicate how to write
csv/json output, and adopts Python's default float repr. As a plus, this
also lets us use "inf" for infinity in csv/json files, avoiding
potential unicode issues.

Before this we were reusing __str__ for both table rendering and
csv/json writing, which rounded to a single decimal digit! This made
float output pretty much useless outside of trivial cases.

---

Note Python apparently does some of its own rounding (1/10 -> 0.1?), so
the result may still not be round-trippable, but this is probably fine
for our somewhat hack-infested csv scripts.
2025-05-15 15:44:30 -05:00
43c2330edc scripts: csv.py: Tweaked hidden fields to not imply -b/--by defaults
So now the hidden variants of field specifiers can be used to manipulate
by fields and field fields without implying a complete field set:

  $ ./scripts/csv.py lfs.code.csv \
          -Bsubsystem=lfsr_file -Dfunction='lfsr_file_*' \
          -fcode_size

Is the same as:

  $ ./scripts/csv.py lfs.code.csv \
          -bfile -bsubsystem=lfsr_file -Dfunction='lfsr_file_*' \
          -fcode_size

Attempting to use -b/--by here would delete/merge the file field, as
cvs.py assumes -b/-f specify all of the relevant field type.

Note that fields can also be explicitly deleted with -D/--define's new
glob support:

  $ ./scripts/csv.py lfs.code.csv -Dfile='*' -fcode_size

---

This solves an annoying problem specific to csv.py, where manipulating
by fields and field fields would often force you to specify all relevant
-b/-f fields. With how benchmarks are parameterized, this list ends up
_looong_.

It's a bit of a hack/abuse of the hidden flags, but the alternative
would be field globbing, which 1. would be a real pain-in-the-ass to
implement, and 2. affect almost all of the scripts. Reusing the hidden
flags for this keeps the complexity limited to csv.py.
2025-05-15 15:44:14 -05:00
7526b469b9 scripts: Adopted globs in all field matchers (-D/--define, -c/--compare)
Globs in CLI attrs (-L'*=bs=%(bs)s' for example), have been remarkably
useful. It makes sense to extend this to the other flags that match
against CSV fields, though this does add complexity to a large number of
smaller scripts.

- -D/--define can now use globs when filtering:

    $ ./scripts/code.py lfs.o -Dfunction='lfsr_file_*'

  -D/--define already accepted a comma-separated list of options, so
  extending this to globs makes sense.

  Note this differs from test.py/bench.py's -D/--define. Globbing in
  test.py/bench.py wouldn't really work since -D/--define is generative,
  not matching. But there's already other differences such as integer
  parsing, range, etc. It's not worth making these perfectly consistent
  as they are really two different tools that just happen to look the
  same.

- -c/--compare now matches with globs when finding the compare entry:

    $ ./scripts/code.py lfs.o -c'lfs*_file_sync'

  This is quite a bit less useful that -D/--define, but makes sense for
  consistency.

  Note -c/--compare just chooses the first match. It doesn't really make
  sense to compare against multiple entries.

This raised the question of globs in the field specifiers themselves
(-f'bench_*' for example), but I'm rejecting this for now as I need to
draw the complexity/scope _somewhere_, and I'm worried it's already way
over on the too-complex side.

So, for now, field names must always be specified explicitly. Globbing
field names would add too much complexity. Especially considering how
many flags accept field names in these scripts.
2025-05-15 14:28:57 -05:00
55ea13b994 scripts: Reverted del to resolve shadowed builtins
I don't know how I completely missed that this doesn't actually work!

Using del _does_ work in Python's repl, but it makes sense the repl may
differ from actual function execution in this case.

The problem is Python still thinks the relevant builtin is a local
variables after deletion, raising an UnboundLocalError instead of
performing a global lookup. In theory this would work if the variable
could be made global, but since global/nonlocal statements are lifted,
Python complains with "SyntaxError: name 'list' is parameter and
global".

And that's A-Ok! Intentionally shadowing language builtins already puts
this code deep into ugly hacks territory.
2025-05-15 14:10:42 -05:00
7c26bfc0a3 scripts: Simplified csv.py's func/uop/bop/top helpers
Now that I know my way around the weirdness that is Python's class
scope, this just required another function indirection to capture the
class-level dicts correctly.

I was considering using the __subclasses__ trick, but it seems like that
would actually be more complicated here.
2025-04-16 15:23:13 -05:00
71930a5c01 scripts: Tweaked openio comment
Dang, this touched like every single script.
2025-04-16 15:23:06 -05:00
c63ed79c5f scripts: Prefer .a for single entry namedtuples
- CsvInt.x -> CsvInt.a
- CsvFloat.x -> CsvFloat.a
- Rev.x -> Rev.a

This matches CsvFrac.a (paired with CsvFrac.b), and avoids confusion
with x/y variables such as Tile.x and Tile.y.

The other contender was .v, since these are cs*v* related types, but
sticking with .a gets the point across that the name really doesn't have
any meaning.

There's also some irony that we're forcing namedtuples to have
meaningless names, but it is useful to have a quick accessor for the
internal value.
2025-04-16 15:23:03 -05:00
98b16a9013 scripts: Renamed RInt (and friends) -> CsvInt (and friends)
This prefix was extremely arbitrary anyways.

The prefix Csv* has slightly more meaning than R*, since these scripts
interact with .csv files quite a bit, and it avoids confusion with
rbyd-related things such as Rattr, Ralt, etc.
2025-04-16 15:23:02 -05:00
26a29bda31 scripts: Tweaked RFrac to return +-∞ when evaluated as a float
This affects the table renderers as well as csv.py's ratio expr.

This is a bit more correct, handwaving 0/0 (mapping 0/0 -> 100% is
useful for cov.py, please don't kill me mathematicians):

  frac(1,0) => 1/0 (∞%)
  frac(0,0) => 0/0 (100.0%)
  frac(0,1) => 0/1 (0.0%)
2025-04-16 15:23:02 -05:00
613fa0f27a scripts: Reverted to -p/--percent not providing a path
So now the result scripts always require -d/--diff to diff:

- before: ./scripts/csv.py a.csv -pb.csv
- after:  ./scripts/csv.py a.csv -db.csv -p

For a couple reasons:

- Easier to toggle
- Simpler internally to only have one diff path flag
- The previous behavior was a bit unintuitive
2025-04-16 15:23:00 -05:00
8e3760c5b8 scripts: Tweaked punescape to expect dict-like attrs
This simplifies attrs a bit, and scripts can always override
__getitem__ if they want to provide lazy attr generation.

The original intention of accepting functions was to make lazy attr
generation easier, but while tinkering around with the idea I realized
the actual attr mapping/generation would be complicated enough that
you'd probably want a full class anyways.

All of our scripts are only using dict attrs anyways. And lazy attr
generation is probably a premature optimization for the same reason
everyone's ok with Python's slices being O(n).
2025-04-16 15:22:45 -05:00
270230a833 scripts: Adopted del to resolve shadowed builtins
So:

  all_ = all; del all

Instead of:

  import builtins
  all_, all = all, builtins.all

The del exposes the globally scoped builtin we accidentally shadow.

This requires less megic, and no module imports, though tbh I'm
surprised it works.

It also works in the case where you change a builtin globally, but
that's a bit too crazy even for me...
2025-04-16 15:22:08 -05:00
313696ecf9 scripts: Fixed openio issue where some scripts didn't import os
This only failed if "-" was used as an argument (for stdin/stdout), so
the issue was pretty hard to spot.

openio is a heavily copy-pasted function, so it makes sense to just add
the import os to openio directly. Otherwise this mistake will likely
happen again in the future.
2025-03-12 21:18:51 -05:00
92ac2a757e scripts: Adopted json -> is_json tweak, avoiding name conflict
This was a humorous name conflict that went unnoticed only because we
lazily import json in read_csv.
2025-03-12 21:12:12 -05:00
c60301719a scripts: Adopted dat tweak in other scripts
This just makes dat behave similarly to Python's getattr, etc:

- dat("bogus")       -> raises ValueError
- dat("bogus", 1234) -> returns 1234

This replaces try_dat, which is easy to forget about when copy-pasting
between scripts.

Though all of this wouldn't be necessary if only we could catch
exceptions in expressions...
2025-03-12 21:12:12 -05:00
0d134a2830 scripts: Re-added -q/--quiet to result scripts
I forgot that this is still useful for erroring scripts, such as
stack.py when checking for recursion.

Technically this is possible with -o/dev/null, but that's both
unnecessarily complicated and includes the csv encoding cost for no
reason.
2025-03-12 20:02:19 -05:00
9e22167a31 scripts: Re-adopted result prefixes
Now that I'm looking into some higher-level scripts, being able to merge
results without first renaming everything is useful.

This gives most scripts an implicit prefix for field fields, but _not_
by fields, allowing easy merging of results from different scripts:

  $ ./scripts/stack.py lfs.ci -o-
  function,stack_frame,stack_limit
  lfs_alloc,288,1328
  lfs_alloc_discard,8,8
  lfs_alloc_findfree,16,32
  ...

At least now these have better support in scripts with the addition of
the --prefix flag (this was tricky for csv.py), which allows explicit
control over field field prefixes:

  $ ./scripts/stack.py lfs.ci -o- --prefix=
  function,frame,limit
  lfs_alloc,288,1328
  lfs_alloc_discard,8,8
  lfs_alloc_findfree,16,32
  ...

  $ ./scripts/stack.py lfs.ci -o- --prefix=wonky_
  function,wonky_frame,wonky_limit
  lfs_alloc,288,1328
  lfs_alloc_discard,8,8
  lfs_alloc_findfree,16,32
  ...
2025-03-12 19:10:17 -05:00
aae03be54b scripts: Fixed diff result sorting
This was a bit broken when r was None. Which is unusual, but happens
when rendering added/removed diff results.
2025-03-12 19:10:17 -05:00
299e2604c6 scripts: Changed -o/-O to an exclusive operation
So:

  $ ./scripts/code.py lfs.o -o- -q

Becomes:

  $ ./scripts/code.py lfs.o -o-

The original intention of -o/-O _not_ being exclusive (aka table is
still rendered unless disabled with -q/--quiet), was to allow results to
be written to csv files and rendered to tables in a single pass.

But this was never useful. Heck, we're not even using this in our
Makefile right now because it would make the rule dependencies more
complicated than it's worth. Even for long-running result scripts
(perf.py, perfbd.py, etc), most of the work is building that csv file,
the cost of rendering a table in a second pass is negligible.

In every case I've used -o/-O, I've also wanted -q/--quiet, and almost
always forget this on the first run. So might as well make the expected
behavior the actual behavior.

---

As a plus, this let us simplify some of the scripts a bit, by replacing
visibility filters with -o/-O dependent by-fields.
2025-03-12 19:10:17 -05:00
e71aca65d9 scripts: Adopted default visibility in scripts with complex fields
This makes it so scripts with complex fields will still output all
fields to output csv/json files, while only showing a user-friendly
subset unless -f/--field is explicitly provided.

While internal fields are often too much information to show by default,
csv/json files are expected to go to other scripts, not humans. So more
information is more useful up until you actually hit a performance
bottleneck.

And if you _do_ somehow manage to hit a performance bottleneck, you can
always limit the output with explicit -f/--field flags.
2025-03-12 19:10:17 -05:00
051bf66f9a scripts: Tried to handle -d/--diff results consistently
With this, we apply the same result modifiers (exprs/defines/hot/etc) to
both the input results and -d/--diff results. So if both start with the
same format, diffing/hotifying/etc should work as expected.

This is really the only way I can seen -d/--diff results working with
result modifiers in a way that makes sense.

The downside of this is that you can't save results with some complex
operation applied, and then diff while applying the same operation,
since most of the newer operations (hotify) are _not_ idempotent.

Fortunately the two alternatives are not unreasonable:

1. Save results _without_ the operation applied, since the operation
   will be applied to both the input and diff results.

   This is a bit asymmetric, but should work.

2. Apply the operation to the input and then pipe to csv.py for diffing.

This used to "just work" when we did _not_ apply operations to output
csv/json, but this was really just equivalent to 1..

I think the moral of the story is you can solve any problem with enough
chained csv.py calls.
2025-03-12 19:10:17 -05:00
2f20f53e90 scripts: csv.py: Reverted define filtering to before expr eval
It's just too unintuitive to filter after exprs.

Note this is consistent with how exprs/mods are evaluated. Exprs/mods
can't reference other exprs/mods because csv.py is only single-pass, so
allowing defines to reference exprs/mods is surprising.

And the solution to needing these sort of post-expr/mod references is
the same for defines: You can always chain multiple csv.py calls.

The reason defines were change to evaluate after expr eval was because
this seemed inconsistent with other result scripts, but this is not
actually the case. Other result scripts simply don't have exprs/mods, so
filtering in fold is the same as filtering during collection. Note that
even in fold, filtering is done _before_ the actual fold/sum operation.

---

Also fixed a recursive-define regression when folding. Counter-
intuitively, we _don't_ want to recursively apply define filters. If we
do the results will just end up too confusing to be useful.
2025-03-12 19:10:17 -05:00
e851c654c5 scripts: Fixed typo hiding zero-sized results in table renderer
This should either have checked diff_result==None, or we should be
mapping diff_result=None => diff_result_=None. To be safe I've done
both.

This was a nasty typo and I only noticed because ctx.py stopped printing
"cycle detected" for our linked-lists (which are expected to be cyclic).
2025-03-12 19:10:17 -05:00