Stream: Mirror: Isabelle Users Mailing List

Topic: [isabelle] Performance bottleneck in Theory_Data.merge


view this post on Zulip Email Gateway (Jul 13 2021 at 15:03):

From: Norbert Schirmer via Cl-isabelle-users <cl-isabelle-users@lists.cam.ac.uk>
Hi,

I try to track down a performance bottleneck in theory merge.

I enabled timing information by

ML “val _ = Context.timing := true”

Unfortunately the position information of the “slow" data slot is not available, I just get the message:

Theory_Data.merge
44.390s elapsed time, 88.221s cpu time, 0.000s GC time

Compared to other data slots where the position information is available as a link or a file reference:

Theory_Data.merge⌂
0.016s elapsed time, 0.030s cpu time, 0.000s GC time

Theory_Data.merge (file "global_theory.ML")
0.264s elapsed time, 0.523s cpu time, 0.000s GC time

Any idea how to continue?

What is the reason that the position information is lost? Could this be a hint which kind of data slot it could possibly be?

Regards,
Norbert

view this post on Zulip Email Gateway (Jul 13 2021 at 15:45):

From: Thomas Sewell <tals4@cam.ac.uk>
I think that the position information there boils down to
that fetched by
let val pos = Position.thread_data () in
in the definition of the function Theory_Data' in src/Pure.context.ML.

I guess means that the data tables which don't have one are
defined at points at which the local thread_data position
wasn't set. It seems to link to ML blocks and to the position
of ML_file commands pretty consistently for those kinds, so probably
this is theory data defined in some special ML file, maybe one of the
ones in Pure. That's not much help, is it?

If you're very keen to figure it out, I'd be tempted to locally edit
context.ML to expose an alternative interface to Theory_Data that lets
the client suggest a name which appears with the position info in
timing.

I was motivated to investigate this a little because I've had issues
with
this in the past.

Best regards,
Thomas.

view this post on Zulip Email Gateway (Jul 14 2021 at 16:26):

From: Norbert Schirmer via Cl-isabelle-users <cl-isabelle-users@lists.cam.ac.uk>
I followed your advice and manually named all data. Here is what I found:

  1. The inefficient data is in locale.ML, namely: structure Thms. Sets of theorems are represented as lists there. Changing this to Thmtab.set, and doing a pointer_eq test on merge seems to solve my performance issue.

  2. Whether position information is displayed or not seems to depend on how the structure is actually generated, ie.
    via. Theory_Data’, Theory_Data, or Generic_Data and also how the arguments are declared. Some observations:

Regards,
Norbert

view this post on Zulip Email Gateway (Sep 22 2021 at 11:03):

From: Makarius <makarius@sketis.net>
(This rather old thread still requires more comments.)

On 14/07/2021 18:26, Norbert Schirmer via Cl-isabelle-users wrote:

On 13. Jul 2021, at 17:44, Thomas Sewell <tals4@cam.ac.uk

<mailto:tals4@cam.ac.uk>> wrote:

If you're very keen to figure it out, I'd be tempted to locally edit
context.ML to expose an alternative interface to Theory_Data that lets
the client suggest a name which appears with the position info in timing.

I followed your advice and manually named all data. Here is what I found:

  1. The inefficient data is in locale.ML, namely: structure Thms. Sets of
    theorems are represented as lists there. Changing this to Thmtab.set, and
    doing a pointer_eq test on merge seems to solve my performance issue.

I have already changed that some weeks ago, to use more scalable data
structures. See
https://isabelle-dev.sketis.net/rISABELLEc3b3517ef4ba

Note that thm Item_Net.T preserves the original "canonical order of
declarations": it is an index structure paired with a plain list; thus I don't
have to argue if the order matters or not. (A minor disadvantage is that the
net produces more heap garbage, so in extreme applications it might need to be
revisited.)

  1. Whether position information is displayed or not seems to depend on how the
    structure is actually generated, ie.
    via. Theory_Data’, Theory_Data, or Generic_Data and also how the arguments are
    declared. Some observations: * Theory_Data’ seems to result in a filename as position * Generic_Data(struct Type T = … end) seems to result in a link as position * Generic_Data(Type T = …) (without the explicit struct) seems to have no
    position information.

I could not reproduce this observation.

Makarius

view this post on Zulip Email Gateway (Sep 27 2021 at 07:50):

From: Norbert Schirmer via Cl-isabelle-users <cl-isabelle-users@lists.cam.ac.uk>
Below is some example output from my debug-version (based on Isabelle2021) were I have explicitly named the data slots (cf. "name" below). Note the difference in the output for
sign.ML, locale.ML (locale), classical.ML. My guess was that it might be related to how the data slot is declared. Or is it related to the build (bootstrap) process of the sessions?

From context.ML:

type kind =
{pos: Position.T,
name: string,
empty: Any.T,
extend: Any.T -> Any.T,
merge: theory * theory -> Any.T * Any.T -> Any.T};

fun invoke name f k x =
(case Datatab.lookup (Synchronized.value kinds) k of
SOME kind =>
if ! timing andalso name <> "" then
Timing.cond_timeit true ("Theory_Data." ^ name ^ Position.here (#pos kind) ^ " " ^ (#name kind))
(fn () => f kind x)
else f kind x
| NONE => raise Fail "Invalid theory data identifier");

From sign.ML:

structure Data = Theory_Data'
(
type T = sign;
val name = "sign.ML";
val extend = I;
val empty = make_sign (Syntax.empty_syntax, Type.empty_tsig, Consts.empty);
fun merge old_thys (sign1, sign2) =
let
val Sign {syn = syn1, tsig = tsig1, consts = consts1} = sign1;
val Sign {syn = syn2, tsig = tsig2, consts = consts2} = sign2;

val syn = Syntax.merge_syntax (syn1, syn2);
val tsig = Type.merge_tsig (Context.Theory (fst old_thys)) (tsig1, tsig2);
val consts = Consts.merge (consts1, consts2);
in make_sign (syn, tsig, consts) end;
);

From locale.ML:

structure Locales = Theory_Data
(
type T = locale Name_Space.table;
val name = "locale.ML (locale)";
val empty : T = Name_Space.empty_table "locale";
val extend = I;
val merge = Name_Space.join_tables (K merge_locale);
);

From classical.ML:

structure Claset = Generic_Data
(
type T = claset;
val name = "classical.ML";
val empty = empty_cs;
val extend = I;
val merge = merge_cs;
);

=======================================
Theory_Data.merge (file "sign.ML") sign.ML
0.223s elapsed time, 1.291s cpu time, 0.707s GC time
Theory_Data.merge (file "theory.ML") theory.ML
0.017s elapsed time, 0.105s cpu time, 0.000s GC time
Theory_Data.merge (file "thm.ML") thm.ML
0.003s elapsed time, 0.020s cpu time, 0.000s GC time
Theory_Data.merge (file "global_theory.ML") global_theory.ML
0.073s elapsed time, 0.432s cpu time, 0.000s GC time
Theory_Data.merge (file "raw_simplifier.ML") raw_simplifier.ML (simpset)
0.005s elapsed time, 0.032s cpu time, 0.000s GC time
Theory_Data.merge (file "ML/ml_env.ML") ml_env.ML
0.000s elapsed time, 0.001s cpu time, 0.000s GC time
Theory_Data.merge attrib.ML (attribute)
0.000s elapsed time, 0.001s cpu time, 0.000s GC time
Theory_Data.merge context_rules.ML
0.006s elapsed time, 0.047s cpu time, 0.000s GC time
Theory_Data.merge locale.ML (locale)
0.000s elapsed time, 0.001s cpu time, 0.000s GC time
Theory_Data.merge locale.ML (idents)
0.017s elapsed time, 0.089s cpu time, 0.000s GC time
Theory_Data.merge locale.ML (global_registrations)
0.000s elapsed time, 0.006s cpu time, 0.000s GC time
Theory_Data.merge locale.ML (thms)
0.001s elapsed time, 0.004s cpu time, 0.000s GC time
Theory_Data.merge bundle.ML
0.000s elapsed time, 0.002s cpu time, 0.000s GC time
Theory_Data.merge axclass.ML
0.000s elapsed time, 0.005s cpu time, 0.000s GC time
Theory_Data.merge class.ML
0.000s elapsed time, 0.002s cpu time, 0.000s GC time
Theory_Data.merge code.ML
0.001s elapsed time, 0.014s cpu time, 0.000s GC time
Theory_Data.merge spec_rules.ML
0.002s elapsed time, 0.010s cpu time, 0.000s GC time
Theory_Data.merge named_theorems.ML
0.003s elapsed time, 0.011s cpu time, 0.000s GC time
Theory_Data.merge generated_files.ML
0.000s elapsed time, 0.001s cpu time, 0.000s GC time
Theory_Data.merge (line 788 of "~~/src/HOL/HOL.thy") classical.ML
0.002s elapsed time, 0.009s cpu time, 0.000s GC time
Theory_Data.merge (line 1529 of "~~/src/HOL/HOL.thy") induct.ML
0.001s elapsed time, 0.006s cpu time, 0.000s GC time
Theory_Data.merge (line 416 of "~~/src/HOL/Inductive.thy") inductive.ML
0.000s elapsed time, 0.001s cpu time, 0.000s GC time
Theory_Data.merge (line 13 of "~~/src/HOL/Fun_Def_Base.thy") function_common.ML
0.000s elapsed time, 0.006s cpu time, 0.000s GC time
Theory_Data.merge (line 14 of "~~/src/HOL/Fun_Def_Base.thy") function_context_tree.ML
0.000s elapsed time, 0.004s cpu time, 0.000s GC time
Theory_Data.merge (line 260 of "~~/src/HOL/Transfer.thy") transfer.ML
0.001s elapsed time, 0.007s cpu time, 0.000s GC time

Regards,
Norbert


Last updated: Jul 15 2022 at 23:21 UTC