Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] find\_theorems ?


view this post on Zulip Email Gateway (Aug 22 2022 at 20:59):

From: "Dr A. Koutsoukou-Argyraki" <ak2110@cam.ac.uk>
Dear All,

could anyone explain how search in find\_theorems works from the inside?
Do you have any literature to recommend?

Many thanks in advance,

Best wishes,
Angeliki

view this post on Zulip Email Gateway (Aug 22 2022 at 20:59):

From: "Klein, Gerwin (Data61, Kensington NSW)" <Gerwin.Klein@data61.csiro.au>
There’s no literature on its internals, it was too small a tool to publish about. The user interface is documented in the Isar reference manual.

What aspect are you interested in?

It’s basically filtering a list of theorems extracted from the current context by search predicates. The main predicates are pattern matching on the term, string matching on the theorem name, and a few more such as “conclusion matches current goal”, etc.

Cheers,
Gerwin

view this post on Zulip Email Gateway (Aug 22 2022 at 20:59):

From: "Dr A. Koutsoukou-Argyraki" <ak2110@cam.ac.uk>
Dear Gerwin, CC:Yiannos
thanks a lot for your reply!
The reason I'm asking is that
my colleague Yiannos Stathopoulos at Cambridge
and I have been working on designing a new, different search engine for
the Isabelle
library and we'd like to compare its efficiency with find\_theorems.
While comparing them, looking into how find\_theorems works internally
might give us some insights about what it is that makes search work
better and why.

Thanks again,
Best,
Angeliki

view this post on Zulip Email Gateway (Aug 22 2022 at 21:00):

From: "Klein, Gerwin (Data61, Kensington NSW)" <Gerwin.Klein@data61.csiro.au>
Interesting. Is efficiency an issue with the current find_theorems?

Not saying that it can’t be improved. The main feature that people keep asking for is to be able to search outside the current context and session.

Cheers,
Gerwin

view this post on Zulip Email Gateway (Aug 22 2022 at 21:00):

From: "Dr A. Koutsoukou-Argyraki" <ak2110@cam.ac.uk>
Dear Gerwin,

On 2019-11-21 00:39, Klein, Gerwin (Data61, Kensington NSW) wrote:

On 21 Nov 2019, at 11:30, Dr A. Koutsoukou-Argyraki <ak2110@cam.ac.uk>
wrote:

Dear Gerwin, CC:Yiannos
thanks a lot for your reply!
The reason I'm asking is that
my colleague Yiannos Stathopoulos at Cambridge
and I have been working on designing a new, different search engine
for the Isabelle
library and we'd like to compare its efficiency with find\_theorems.
While comparing them, looking into how find\_theorems works internally
might give us some insights about what it is that makes search work
better and why.

Interesting. Is efficiency an issue with the current find_theorems?

most of the time it works well, but there are some cases where it
doesn't give any results where there are plenty of related lemmas.

Of course this is a very tricky problem for many different reasons.

(e.g. Here are some examples of search words that within the entire
Analysis don't give results :

Borel'', Zorn'', Gauss'', product'', operator'', Hilbert'',
Lebesgue'', derivative'', Euclidean'', rational'',
polynomial'', series'' , Weierstrass'', Noether'',
summation'', fraction'', supremum'', infimum'', pythagorean'', multiplication'', converge'', convergence'', ``mapping'')

Also it's case-sensitive, i.e.borel''gives about 509 results while Borel''gives nothing.

Not saying that it can’t be improved. The main feature that people
keep asking for is to be able to search outside the current context
and session.

We also hope to get this to work, and in general the hope is to manage
to get more results (of high relevance) for search queries.
We can keep you updated about our progress if you are interested.

Best,
Angeliki

view this post on Zulip Email Gateway (Aug 22 2022 at 21:00):

From: Tobias Nipkow <nipkow@in.tum.de>
When I use the Query panel with "Find Theorems" and type "name: product" it says
"found 179 theorem(s)" and displays 40.

Tobias
smime.p7s

view this post on Zulip Email Gateway (Aug 22 2022 at 21:00):

From: Lawrence Paulson <lp15@cam.ac.uk>
The point is that you have to know what to search for. For example, “prod”, “mult” and “times” all yield different results because these tokens are used arbitrarily in the names of theorems. The objective here is to involve mathematical knowledge in search.

Larry

view this post on Zulip Email Gateway (Aug 22 2022 at 21:00):

From: mailing-list anonymous <mailing.list.anonymous@gmail.com>
Dear All,

I hope it is not inappropriate to provide an idea/feature request for the
new find_theorems infrastructure.

I believe that it would be impossible to cover all possible use cases by
predefining filters for a command such as find_theorems. Thence, it would
be useful to provide a generic Isabelle/ML API, which would allow users to
write their own filters (e.g. values of the type "thm -> bool", "thm list
-> bool" or anything similar) and combine them using predefined logical
connectives, such as \/, /\, etc.

For example (the code below is a very rough sketch of the idea),

(* Generic find theorems *)

ML‹

infix 1 \/ /\;

signature FIND_THEOREMS_GENERIC =
sig

val /\ : (string * thm -> bool) * (string * thm -> bool) -> string * thm ->
bool
val \/ : (string * thm -> bool) * (string * thm -> bool) -> string * thm ->
bool
val find_theorems : theory -> (string * thm -> bool) -> (string * thm) list

end

structure Find_Theorems_Generic : FIND_THEOREMS_GENERIC =
struct

fun f1 /\ f2 = fn thm => f1 thm andalso f2 thm;
fun f1 \/ f2 = fn thm => f1 thm orelse f2 thm;

fun find_theorems thy f =
let val facts = Global_Theory.all_thms_of thy false |> filter f
in facts end;

end;

open Find_Theorems_Generic;

(* Use case example: custom filters *)

ML‹

(* Definition of custom filters *)

(* Type constraint *)
fun has_type_constraintt has_type_constraintT (Const (_, T)) =
has_type_constraintT T
| has_type_constraintt has_type_constraintT (Free (_, T)) =
has_type_constraintT T
| has_type_constraintt has_type_constraintT (Var (_, T)) =
has_type_constraintT T
| has_type_constraintt has_type_constraintT (Abs (_, T, t)) =
if has_type_constraintT T
then true
else has_type_constraintt has_type_constraintT t
| has_type_constraintt has_type_constraintT (t $ u) =
if has_type_constraintt has_type_constraintT t
then true
else has_type_constraintt has_type_constraintT u
| has_type_constraintt _ _ = false

fun has_type_constraint_thm has_type_constraintT thm = thm
|> Thm.full_prop_of
|> has_type_constraintt has_type_constraintT

fun has_classT _ _ (Type (_, [])) = false
| has_classT sort_rel sort (Type (c, T::Ts)) =
if has_classT sort_rel sort T
then true
else has_classT sort_rel sort (Type (c, Ts))
| has_classT sort_rel sort (TFree (_, sort')) = sort_rel (sort', sort)
| has_classT sort_rel sort (TVar (_, sort')) = sort_rel (sort', sort)

fun has_type_constructorT c (Type (c', [])) = (c = c')
| has_type_constructorT c (Type (c', T::Ts)) =
if has_type_constructorT c T
then true
else has_type_constructorT c (Type (c', Ts))
| has_type_constructorT _ _ = false

(* Presence of a given class *)
fun has_class_thm S nthm = nthm
|> snd
|> (S |> has_classT (subset op= o swap) |> has_type_constraint_thm)

(* Presence of a given type constructor *)
fun has_type_constructor_thm c nthm = nthm
|> snd
|> (has_type_constructorT c |> has_type_constraint_thm )

fun sort_and_tc_filter c S = has_class_thm S /\ has_type_constructor_thm c

fun sort_or_tc_filter c S = has_class_thm S \/ has_type_constructor_thm c

(* Use case example: all theorems that contain the sort constraint
topological_space or the type constructor filter *)

ML‹

val thms = find_theorems
@{theory}
(sort_or_tc_filter \<^type_name>‹filter› \<^sort>‹topological_space›)

val _ = thms |> length |> Int.toString |> writeln;
val _ = thms |> map fst |> map writeln;

(* Use case example: all theorems that contain the sort constraint
topological_space and the type constructor filter *)

ML‹

val thms = find_theorems
@{theory}
(sort_and_tc_filter \<^type_name>‹filter› \<^sort>‹topological_space›)

val _ = thms |> length |> Int.toString |> writeln;
val _ = thms |> map fst |> map writeln;

Thank you

view this post on Zulip Email Gateway (Aug 22 2022 at 21:00):

From: Tobias Nipkow <nipkow@in.tum.de>
On 21/11/2019 12:24, Lawrence Paulson wrote:

The point is that you have to know what to search for. For example, “prod”, “mult” and “times” all yield different results because these tokens are used arbitrarily in the names of theorems. The objective here is to involve mathematical knowledge in search.

Her point was that the search for a long list of example words yielded 0
results. I am merely pointing out that find_theorems works fine on these words
if you use "name:".

Your point is of course important.

Tobias

Larry

On 21 Nov 2019, at 06:14, Tobias Nipkow <nipkow@in.tum.de> wrote:

When I use the Query panel with "Find Theorems" and type "name: product" it says "found 179 theorem(s)" and displays 40.

smime.p7s

view this post on Zulip Email Gateway (Aug 22 2022 at 21:00):

From: mailing-list anonymous <mailing.list.anonymous@gmail.com>
Dear All,

I noticed a mistake in my previous post: (once again) I was trying to write
an email within a fixed and very limited timeframe. Of course, I did not
mean to use the term "logical connective" in the context of a function that
combines two filters together. I believe that the term "combinator" would
be more suitable.

Thank you

view this post on Zulip Email Gateway (Aug 22 2022 at 21:01):

From: "Dr A. Koutsoukou-Argyraki" <ak2110@cam.ac.uk>
Indeed, for several of them using "name:" works, thank you for noticing
, Tobias.

But when I search via the Query search box I still don't get any results
for any of the words mentioned, including "product".

Using "name:" does give me results for some of them.

I still don't get any results using "name:" for
"Gauss", "Noether", "summation", "supremum", "infimum", "pythagorean",
"multiplication".

In general it would be good if performance could get a bit improved.

Best wishes,
Angeliki

view this post on Zulip Email Gateway (Aug 22 2022 at 21:01):

From: Tobias Nipkow <nipkow@in.tum.de>
On 21/11/2019 21:11, Dr A. Koutsoukou-Argyraki wrote:

Indeed, for several  of them using "name:" works, thank you for noticing , Tobias.

But when I search via the Query search box I still don't get any results for any
of the words mentioned, including "product".

Strange, because it does work for me. I just tried it again.

Using "name:" does give me results for some of them.

I still don't get any results using "name:"  for
"Gauss", "Noether", "summation", "supremum", "infimum", "pythagorean",
"multiplication".

I can confirm that for "Gauss" but the reason is simple: there is no theorem
whose name contains "Gauss" in the Isabelle/HOL distribution. I suspect it is
the same for your other words.

Tobias

In general it would be good if performance could get a bit improved.

Best wishes,
Angeliki

Her point was that the search for a long list of example words yielded
0 results. I am merely pointing out that find_theorems works fine on
these words if you use "name:".

Your point is of course important.

Tobias

Larry

On 21 Nov 2019, at 06:14, Tobias Nipkow <nipkow@in.tum.de> wrote:

When I use the Query panel with "Find Theorems" and type "name: product" it
says "found 179 theorem(s)" and displays 40.

smime.p7s

view this post on Zulip Email Gateway (Aug 22 2022 at 21:01):

From: Manuel Eberl <eberlm@in.tum.de>

But when I search via the Query search box I still don't get any
results for any of the words mentioned, including "product".

Strange, because it does work for me. I just tried it again.

Note that, of course, it also depends what context you are in (e.g. what
theories you have loaded).

I can confirm that for "Gauss" but the reason is simple: there is no
theorem whose name contains "Gauss" in the Isabelle/HOL distribution.
I suspect it is the same for your other words.

That is not entirely true. There is something in HOL-Number_Theory, and
if you count "gauss" as well there are some in HOL-Probability and even
HOL itself.

The underlying problem, I suppose, is that the only way to attach any
kind of meta-information to a theorem in a way that the current search
can pick up is through its name. And the other problem is that it only
works for things that you have loaded.

Manuel

view this post on Zulip Email Gateway (Aug 22 2022 at 21:01):

From: "Dr A. Koutsoukou-Argyraki" <ak2110@cam.ac.uk>
well, when searching for "Gauss" within the Computational Algebra
library I would hope to get some facts related to the fundamental
theorem of algebra,
similarly when searching for "Noether" within the algebra
library I would hope to get some facts related to rings etc. even if
noetherian rings are not there.

What I'm trying to say is that it would be useful to get associated
results with the search words, not just results that have the name
itself included.
Also, e.g. "multiplication" ideally would give different kinds of
multiplication, "infimum"/"supremum"/ ideally would give related facts
etc.

So, what we are hoping to achieve in what we're developing with Yiannos
is "associated" concepts
in the search results for queries.

Best wishes,
Angeliki

view this post on Zulip Email Gateway (Aug 22 2022 at 21:01):

From: "Dr A. Koutsoukou-Argyraki" <ak2110@cam.ac.uk>

The underlying problem, I suppose, is that the only way to attach any
kind of meta-information to a theorem in a way that the current search
can pick up is through its name.

exactly! that is why I write in my other email that we are hoping for
more associations even if the name is not explicit.

And the other problem is that it only
works for things that you have loaded.

yes, I'm aware of this, but luckily the user can control this (e.g. by
loading the whole library)
Of course ideally the search would work even for unloaded things

Best wishes,
Angeliki

view this post on Zulip Email Gateway (Aug 22 2022 at 21:01):

From: Tobias Nipkow <nipkow@in.tum.de>
Yes, a theory called Gauss, but not a theorem.

I am merely pointing out that find_theorems meets its specification.

"What I'm trying to say is that it would be useful to get associated results
with the search words, not just results that have the name itself included."

I completely agree.

Tobias
smime.p7s

view this post on Zulip Email Gateway (Aug 22 2022 at 21:01):

From: "Klein, Gerwin (Data61, Kensington NSW)" <Gerwin.Klein@data61.csiro.au>

On 22 Nov 2019, at 08:16, Dr A. Koutsoukou-Argyraki <ak2110@cam.ac.uk> wrote:

The underlying problem, I suppose, is that the only way to attach any
kind of meta-information to a theorem in a way that the current search
can pick up is through its name.

exactly! that is why I write in my other email that we are hoping for more associations even if the name is not explicit.

Thanks for the explanations, I think I understand now what the goal is.

Having lived through the early days of web search before Google, I have some opinions on this. Compared to Google, the initial search engines were pretty much useless, mostly because they were based on string search only and returned too many results for the results you actually wanted to appear anywhere near the top. Many of them tried to fix the problem of returning irrelevant results by trying to match more things, i.e. make the search more fuzzy. That made it worse. Google mainly figured out how to rank search results such that popular pages (after their search rank measure) come first on the assumption that those will be the things you want. That worked spectacularly well.

For Isabelle, I also usually get too many results, not too few. That is not true in the case you describe, although the use cases so far seems to describe finding definition concepts, not theorems. There is a separate command for that (find_consts). Maybe they should be merged so the user doesn’t have to distinguish between them, but I think there is a lot more room for improvement for finding constants, possibly with the same basic ideas you have described. I.e. it’s worth looking into the concept separately.

It was an explicit design goal for find_theorems not to match fuzzily, because you can use pattern matching for that, and you want to be able to reduce the matches to what is interesting.

Maybe that was the wrong idea. Google in a sense has ruined structured search, because Google is so good at unstructured search that nobody is used to the idea any more, even where we have the structure and information that we don’t have on the web. I.e. the usual search interaction is to type in a few words, like you described, not terms or patterns. So maybe we should support that. If we find a good way of ranking results instead, it doesn’t matter so much if we return more results, as long as we return the one the user actually needs in the first 5 or 10.

So, the real problem to solve is ranking results. The other part (matching more from a search) is comparatively easy.

And the other problem is that it only
works for things that you have loaded.
yes, I'm aware of this, but luckily the user can control this (e.g. by loading the whole library)
Of course ideally the search would work even for unloaded things

I’d be keen on that feature :)

Cheers,
Gerwin

view this post on Zulip Email Gateway (Aug 22 2022 at 21:02):

From: "Dr A. Koutsoukou-Argyraki" <ak2110@cam.ac.uk>
Dear Gerwin,

many thanks for your comments!
On 2019-11-21 22:20, Klein, Gerwin (Data61, Kensington NSW) wrote:

So, the real problem to solve is ranking results. The other part
(matching more from a search) is comparatively easy.

yes, this is an excellent point of course, and we do not know how to
solve the ranking problem. Combining a number of search words sometimes
helps a lot, but it is not always enough.

And the other problem is that it only
works for things that you have loaded.
yes, I'm aware of this, but luckily the user can control this (e.g.
by loading the whole library)
Of course ideally the search would work even for unloaded things

I’d be keen on that feature :)

This will be possible with what Yiannos and I are trying
out, as the search will be done offline on the entire library. This
again may have the consequence of too many results, which brings back
the ranking problem, so we have to see how to treat this.

Best,
Angeliki

view this post on Zulip Email Gateway (Aug 22 2022 at 21:02):

From: Clemens Ballarin <ballarin@in.tum.de>
Proof objects reference theorems (or concepts), so you have a graph on
which page rank could be applied ... on theorems, not web pages, though.

Clemens

view this post on Zulip Email Gateway (Aug 22 2022 at 21:02):

From: "mailing.list.anonymous" <mailing.list.anonymous@gmail.com>
Dear All,

I hope that my previous comment
(https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2019-November/msg00068.html)
was not too out of place. I wrote and submitted it before the following
posts (providing more background information with regard to the further
development plans for the 'find_theorems' infrastructure) were submitted
to the list:
https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2019-November/msg00076.html,
https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2019-November/msg00081.html.
Please accept my apologies for my haste.

I have to admit that I also found the existing 'find_theorems'
infrastructure slightly limiting and, most certainly, I am not the only
one (e.g. see
https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2019-November/msg00036.html).
At some point when I was working on the extension of Types-To-Sets, I
did think quite seriously about introducing something similar to
Isabelle/ML-based query language (most likely, building on the ideas
drawn from the general query languages and the language of patterns for
subterm selection) for searching the databases of theorems and constants
as an independent user-level package . At the time, it was meant to be
embedded in Isabelle/ML only (but designed in a way such that,
potentially, it could be translated to Isar). Nevertheless, it seemed
too much of a tangent at the time and I opted for a much simpler
application-specific solution. Nevertheless, I was never entirely happy
about the outcome and the thought of a possibility of introducing
something better/more general (at least, for my own application) keeps
nagging me.

I am very glad that there is an ongoing project towards the improvement
of the infrastructure for searching for theorems/constants. Of course,
having any capacity for doing unstructured search would be very useful
for the end users of Isar. However, I am curious if, as part of this
development, you are also considering providing anything akin to a
generic and application specific query language with pattern matching (I
guess, this would have to be aimed, primarily, at the structured
search). The reason why I am asking this question is that this is still
something that I am considering doing at some point in the future. Of
course, I would only be happier if someone else developed something that
provides the functionality that I seek. Otherwise, this will remain on
my “To Do List”.

In summary, I would highly appreciate if you could let me know whether
the plan for the work on the improvement of the functionality for
searching for theorems/constants does include the development of a query
language for structured search of theorems and constants with some form
of pattern matching that goes beyond the capabilities of the existing
implementation of find_theorems.

Thank you


Last updated: Nov 21 2024 at 12:39 UTC