Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] I need a fixed mutable array


view this post on Zulip Email Gateway (Aug 19 2022 at 14:35):

From: Peter Lammich <lammich@in.tum.de>
HOL itself does not support mutable arrays.

However, there is Imperative_HOL, which has a heap monad supporting
mutable arrays.

Then there is afp/Collections/Lib/Diff_Array, which provides an
implementation of arrays that behaves purely functional, but is
efficient if only the last version is accessed.

However, if you are not after efficient executability, but only
looking for an abstract model of a memory, it makes no sense using the
above types, as the efficiency comes at teh price of additional
formalization overhead.

view this post on Zulip Email Gateway (Aug 19 2022 at 14:35):

From: Gottfried Barrow <igbi@gmx.com>
Ramana,

Thanks for the suggestion. There's always more to say than anyone wants
to read.

I'll say the same thing here that I'll say to Peter Lammich about the
info he gave me. It's probably better that I didn't get this information
from you before now. In particular, my finding this link from your tip:

www.cl.cam.ac.uk/~acjf3/arm

That would be about ARMv7. The ARM Cortex-A15 was the CPU I thought I
was going to go with. Please see [1] and [2] below.

There's like three main factors in looking at CPUs: they need to be sold
on a board that's affordable, they need to have a large following, and
they need to have some lasting power.

The Sony Playstation 3 CPU looks great, and it's fairly cheap, but then
the Sony PS4 is changing to an AMD CPU. That's a bad sign.

Basically, there's GPUs, ARM, and Intel i7. Part of my pursuit began
with looking for processors that have 256-bit data registers, because I
used to work in graphics image generators, which had 256-bit wide data
busses.

It was there all the time with the i7, and even before for 128-bit, with
SSE, SSE2, SSS3, SS3, SS4, and now it's there for 256-bit, with AVX
extensions.

RISC and ARM sounds like it should be fast, but then I looked at [3],
and it's not like they blow everyone away.

I think an i7 is hard to beat, with 4 cores and 8 threads, with a clock
at 3 to 4 GHz.. Lots of people have them, and people don't have to do
anything special to run them, unlike embedded processors, or external,
single board Linux computers.

All these practical considerations have a big part in influencing what
route I think I should go. ARM is good, but then there's all sorts of
special things I have to do for, among other things, develop tools.

Regards,
GB

[1] Samsung Exynos 5 Octa:
http://www.samsung.com/global/business/semiconductor/minisite/Exynos/products5octa_5410.html

[2] ODROID-X3: ARM Cortex-A15 / Cortex A7 / Samsung Exynos 5 Octa based
computer for $169.
http://hardkernel.com/main/products/prdt_info.php?g_code=G137510300620

[3] CoreMark integers tests: http://www.eembc.org/coremark/index.php
(sort by CoreMark/MHz)

[4] Brix with i7-4770R:
http://www.amazon.com/BRIX-Pro-GB-BXI7-4770R-Desktop-Computer/dp/B00HX3OJSG/ref=sr_1_5?ie=UTF8&qid=1402845781&sr=8-5&keywords=gigabyte+brix

view this post on Zulip Email Gateway (Aug 19 2022 at 14:35):

From: Gottfried Barrow <igbi@gmx.com>
On 14-06-18 16:19, Peter Lammich wrote:

On Di, 2014-06-17 at 10:56 -0500, Gottfried Barrow wrote:

I did a few searches to find whether there are mutable arrays in
Isabelle. It appears there's not, though I found a little info:
However, there is Imperative_HOL, which has a heap monad supporting
mutable arrays.

Peter,

Thanks for the info. I never consider using anything but Isabelle/HOL,
since it has the biggest user base. Limitations can sometimes be a good
thing, by forcing me to figure out how to do things different.

Then there is afp/Collections/Lib/Diff_Array, which provides an
implementation of arrays that behaves purely functional, but is
efficient if only the last version is accessed.

I searched on "array" at the AFP site before, but I didn't find that,
since it's buried in the tar file. It's good that I didn't, but it's
good to see it now, and see how they're implementing things with lists.

It might come in handy later, and the collections in general.

However, if you are not after efficient executability, but only
looking for an abstract model of a memory, it makes no sense using the
above types, as the efficiency comes at teh price of additional
formalization overhead.

I just need something to give me a decent model for the use a subset of
assembly language instructions, which involves reading and writing to
memory.

List and nat, they're part of the foundation of HOL. They're used a lot,
so there's a lot of development around them. Proof can be a scary
thought, so I try to play it safe, and keep things simple with lists,
when I can.

Regards,
GB

view this post on Zulip Email Gateway (Aug 19 2022 at 14:35):

From: Gottfried Barrow <igbi@gmx.com>
It would also be "they", but then "you" as the primary.

Thanks,
GB

view this post on Zulip Email Gateway (Aug 19 2022 at 14:35):

From: Gerwin Klein <Gerwin.Klein@nicta.com.au>
Hi Gottfried,

are you aware of the HOL/Word library? It gives you arbitrary fixed-size words. With that, memory would be something like

type_synonym mem = “64 word => 8 word”

If the purpose is to model and reason only, basic functions are as mutable as records. If you want to generate efficient code from it, then it might be better to go with implementations like Peter’s.

A formalisation of (very) basic machine language is explained in Section 8.1 of Concrete Semantics:
http://www21.in.tum.de/~nipkow/Concrete-Semantics/

Cheers,
Gerwin


The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.

view this post on Zulip Email Gateway (Aug 19 2022 at 14:35):

From: Gottfried Barrow <igbi@gmx.com>
On 14-06-18 19:43, Gerwin Klein wrote:

A formalisation of (very) basic machine language is explained in Section 8.1 of Concrete Semantics:
http://www21.in.tum.de/~nipkow/Concrete-Semantics/

Gerwin,

Thanks again. That's short, but very useful. I kind of knew that my
instruction set is one, big datatype, but it hadn't yet entered my mind
what my execution function was going to be for the datatype, so seeing
"iexec" is helpful.

Also, I'll be looking at the info about the program counter again, and
other tips there.

are you aware of the HOL/Word library? It gives you arbitrary fixed-size words. With that, memory would be something like

type_synonym mem = “64 word => 8 word”

I've looked at that briefly now, and I start at section 3.1, where
they're starting with num0 and num1. It would take me a lot of work to
work through the details, and I'm assuming, it's safe to assume, it's
not what I need, where my assumption is based on my work with Num.num,
which I really like.

For what I'm trying to do, everything is about trying to precisely model
the specific registers of a particular CPU. With a CPU, the register set
and size of registers is fixed, and there's nothing that complex about
them, though how they're used may be complex. For 64-bit registers, you
usually only have 8-bit, 16-bit, 32-bit, and 64-bit words. Also, my
choice of bool for the bits is based on the fact that True and False are
fundamental to HOL, which might become very useful to me.

Arbitrary will be important, if I can get to it, but arbitrary has to be
implemented as it's implemented by the good math libraries, and the
precise way that they use fixed sized registers. For arbitrary, the
first place I'll look to see how arithmetic is done will the GnuMP
library. I'll be asking, "What's special about the way they deal with
registers, and assembly language instructions?"

There's another issue here. For pattern matching with "fun", it's
important that a datatype use only 0-ary type constructors, like True
and False. My awareness of this is a result of having been working with
Num.num. What I've done is use Num.num as a template, and I've implement
the functionality of Num.num as a "bool list". For example, "[]" is 1,
and [True,True] is 7. That allows me to pattern match in ways I
otherwise couldn't pattern match.

Having spent a lot of time learning how Num.num works, I'm not very
motivated to do it all again with Word, and I assume that doing things
more straightforward will help me in some ways.

There always more. I'm talking prematurely, but suppose I need a sign
bit for signed integers. I could do something like this for a 16-bit
signed integer:

datatype d7T = d7C bool bool bool bool bool bool bool
datatype d8uT = d8uC bool bool bool bool bool bool bool bool
datatype d16sT = d16C bool d7T d8uT

If the purpose is to model and reason only, basic functions are as mutable as records. If you want to generate efficient code from it, then it might be better to go with implementations like Peter’s.

In times like these, a person like me would actually like to understand
a phrase like "basic functions are as mutable as records", but I don't
worry about understanding everything.

I'm actually liking what I stumbled onto, which is to generate a list of
the CPU register and memory state for each instruction. That would get
me a debugger by default. Compiled languages are bigger hassle to use,
and with assembly language, it becomes more important to use GNU gdb, or
some other IDE.

If I could get a state machine thing accurately modeling assembly
language, then I could run it with a combination of Isabelle/HOL and ML,
in Isabelle/jEdit. That would be nice.

Well, this is all premature.

Regards,
GB

view this post on Zulip Email Gateway (Aug 19 2022 at 14:49):

From: Gottfried Barrow <igbi@gmx.com>
Hi,

I did a few searches to find whether there are mutable arrays in
Isabelle. It appears there's not, though I found a little info:

https://groups.google.com/forum/?fromgroups#!searchin/fa.isabelle/mutable$20arrayshttps://groups.google.com/forum/?fromgroups#!searchin/fa.isabelle/mutable$20arrays

I need to simulate the RAM of a cpu, and this is what I've done with a
record and list:

record 'a marray = MList :: "'a list"

definition "array_size_512 = ( (| MList = replicate 0x100 (0::nat) |),
0x100::nat )"

fun array_write :: "('a marray * nat) => nat => 'a => ('a marray * nat)
option" where
"array_write (an_array, a_size) index data = (
if (index >= a_size) then None
else Some (an_array(| MList := (MList an_array)[index := data] |),
a_size)
)"

value "array_write array_size_512 3 1234"
value "array_write array_size_512 0x100 1234"

I'm wondering if there's a better way to do that. Better efficiency
would be good, but efficiency in Isabelle is not of utmost concern,
since it will hopefully represent what will be done in assembly language.

I'm just getting started, but a partial record for a Intel 64-bit cpu
might be something like this, where "Mem" is the fixed mutable array to
represent RAM:

record cpu64 =
Ax :: nat
Bx :: nat
Mem :: "nat list"

definition cpu_mem256 :: "cpu64 * nat" where
"cpu_mem256 =
( (| Ax = 0, Bx = 0, Mem = (replicate 0x100 (0::nat)) |), 0x100::nat )"

fun mem_write :: "(cpu64 * nat) => nat => nat => (cpu64 * nat) option" where
"mem_write (cpu, memsize) addr data = (
if (addr >= memsize) then None
else Some (cpu(| Mem := (Mem cpu)[addr := data] |), memsize)
)"

This is the beginning of an attempt to tie into the SIMD instruction
sets of the AMD and Intel microprocessors. The SIMD instructions use
128-bit and 256-bit registers to do multiple operations at the same
time. I include a lot of links below.

Thanks,
GB

Wiki Pages

http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions
http://en.wikipedia.org/wiki/SSE2
http://en.wikipedia.org/wiki/SSE3
http://en.wikipedia.org/wiki/SSSE3
http://en.wikipedia.org/wiki/SSE4
http://en.wikipedia.org/wiki/Advanced_Vector_Extensions

INTEL

http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
https://software.intel.com/en-us/intel-isa-extensions

AMD (See AMD64 Arch Programmer's Manuals under the manual heading)

http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/

PRO ASSEMBLY LANGUAGE (chapter 17)

http://www.wrox.com/WileyCDA/WroxTitle/Professional-Assembly-Language.productCd-0764579010,descCd-tableOfContents.html

http://media.wiley.com/product_data/excerpt/10/07645790/0764579010-2.pdf

view this post on Zulip Email Gateway (Aug 19 2022 at 14:49):

From: Gerwin Klein <Gerwin.Klein@nicta.com.au>
Hi Gottfried,

most assembly formalisation don’t represent memory as an array, but as a function from address to value (usually byte or word). With that, read is just function application, store is function update. These are easier to reason about than arrays.

There are existing large formalisations of x86 out there that you might want to look at. E.g. the Cambridge model in HOL4, which is close to what you’d do in Isabelle, or the model of Ben Pierce’s group in Coq.

Cheers,
Gerwin


The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.

view this post on Zulip Email Gateway (Aug 19 2022 at 14:49):

From: Gottfried Barrow <igbi@gmx.com>
A little feedback can go a long way towards getting me to the next step.

This all falls under a more general question: "How do I execute a list
of instructions in Isabelle/HOL, like in a normal programming language?"

Even further: "How do I use GOTO in a functional programming language?"

That led, with some past information, to this thought (possibly wrong):

1) There is nothing mutable in Isabelle/HOL. Records only have the
appearance of mutability. Consequently, for my case, a record adds
needless complexity.

So, no mutability can lead to good things.

I include some source and attach some source. The idea behind it is to
have a cpu datatype that holds the register and memory state. I execute
a list of instructions, and it produces a list of states, not that I
know anything about state machines, only that Ramon Zuniga used to
implement them in programmable logic. I wonder what that guy's up to
these days.

I did a trivial proof, to make sure I had a token proof to show, but the
value wasn't trivial. There was a simple condition I needed, and it
hadn't occurred to me that I needed it. Another example that shows that
proofs are a good thing.

With no mutability, based on the basic idea, it seems I should be able
to prove some complex things about executing a list of instructions.

Regards,
GB

(*********)
theory i140618a__mem_write_execute_state_list
imports Complex_Main "~~/src/HOL/Library/Code_Target_Nat"
begin

(16-bit dataword and a little notation.)

type_synonym b8T = "bool * bool * bool * bool * bool * bool * bool *
bool"
type_synonym b16T = "b8T * b8T"

notation (input)
False ("0\<^sub>B")
notation (input)
True ("1\<^sub>B")
abbreviation hex00 :: b8T ("00\<cdot>") where
"hex00 ==
(0\<^sub>B,0\<^sub>B,0\<^sub>B,0\<^sub>B,0\<^sub>B,0\<^sub>B,0\<^sub>B,0\<^sub>B)"
abbreviation hex55 :: b8T ("55\<cdot>") where
"hex55 ==
(0\<^sub>B,1\<^sub>B,0\<^sub>B,1\<^sub>B,0\<^sub>B,1\<^sub>B,0\<^sub>B,1\<^sub>B)"
abbreviation hexAA :: b8T ("AA\<cdot>") where
"hexAA ==
(1\<^sub>B,0\<^sub>B,1\<^sub>B,0\<^sub>B,1\<^sub>B,0\<^sub>B,1\<^sub>B,0\<^sub>B)"
abbreviation hex16 :: "b8T => b8T => b16T" where
"hex16 b1 b0 == (b1, b0)"
notation hex16 ("x\<bar>__" [1000, 1000] 1000)

(*Indexed, fixed size b16T list. The list idx won't always match the
pair idx.*)

type_synonym idxed_flist = "(nat * b16T) list * nat"

primrec idx0s_app_flist :: "(nat * b16T) list \<Rightarrow> nat
\<Rightarrow> (nat * b16T) list" where
"idx0s_app_flist nlist 0 = nlist"
|"idx0s_app_flist nlist (Suc n) = idx0s_app_flist
((n,x\<bar>00\<cdot>00\<cdot>) # nlist) n"

definition create_idxed_flist :: "nat \<Rightarrow> idxed_flist" where
"create_idxed_flist lsize = (idx0s_app_flist [] lsize, lsize)"

value "create_idxed_flist 0x10"
value "length (fst (create_idxed_flist 0x10)) = 0x10"

(The write function.)

primrec write_idxed_flist :: "idxed_flist \<Rightarrow> nat
\<Rightarrow> b16T \<Rightarrow> idxed_flist option"
where
"write_idxed_flist (flist, lsize) idx data = (
if (idx >= lsize \<or> length flist \<noteq> lsize) then None
else Some (flist[idx := (idx, data)], lsize)
)"

(Two sequential writes done manually.)

value "write_idxed_flist
(the(write_idxed_flist (create_idxed_flist 0x10) 5
x\<bar>55\<cdot>55\<cdot>))
3 x\<bar>AA\<cdot>AA\<cdot>"

(Automate execution of a list of writes, to generate a list of states.)

fun execute_list :: "(idxed_flist option) list \<Rightarrow> (nat *
b16T) list
\<Rightarrow> (idxed_flist option) list" where
"execute_list [] _ = []"
|"execute_list state_list [] = state_list"
|"execute_list [s] ((idx,data) # xs)
= execute_list ((write_idxed_flist (the s) idx data) # [s]) xs"
|"execute_list (s # ss) ((idx,data) # xs)
= execute_list ((write_idxed_flist (the s) idx data) # (s # ss)) xs"
(*Adding the third condition sped up the termination proof a lot. It
probably
needs more conditions to speed it up more.*)

(*Execute a list of three instructions. The states are shown last to
first.*)

abbreviation "three_write_list ==
[(5, x\<bar>55\<cdot>55\<cdot>), (3, x\<bar>AA\<cdot>AA\<cdot>),
(0x10, x\<bar>55\<cdot>55\<cdot>)]"

value "execute_list [Some (create_idxed_flist 0xF)] three_write_list"

(*Prove something trivial. The value of proving this wasn't trivial. It
showed
me I needed the condition "length flist \<noteq> lsize" in my
function above.*)

lemma "idx < lsize \<and> length flist = lsize
\<Longrightarrow> write_idxed_flist (flist, lsize) idx dw \<noteq>
None"
by(simp)

(******************)
end
i140618a__mem_write_execute_state_list.thy

view this post on Zulip Email Gateway (Aug 19 2022 at 14:49):

From: Ramana Kumar <rk436@cam.ac.uk>
On Tue, Jun 17, 2014 at 11:54 PM, Gottfried Barrow <igbi@gmx.com> wrote:

On 14-06-17 16:58, Gerwin Klein wrote:

most assembly formalisation don’t represent memory as an array, but as a
function from address to value (usually byte or word). With that, read is
just function application, store is function update. These are easier to
reason about than arrays.

Gerwin,

I do actually have a set of word datatypes in mind. At the moment, this is
what I'm thinking:

type_synonym b8T = "bool * bool * bool * bool * bool * bool * bool * bool"
type_synonym b16T = "b8T * b8T"
type_synonym b32T = "b16T * b16T"
type_synonym b64T = "b32T * b32T"
type_synonym b128T = "b64T * b64T"
type_synonym b256T = "b128T * b128T"

If someone showed me specifically the memory representation you're talking
about, along with some supplementary documentation and explanations, I
could probably figure out in about 15 minutes to an hour whether trying to
go that route would be within my ability.

There are existing large formalisations of x86 out there that you might

want to look at. E.g. the Cambridge model in HOL4, which is close to what
you’d do in Isabelle...

Initially, I did have the idea to try and model things at the bit and
opcode level, but I decided that's not doable. Lifetimes have an upper
limit, and it's hard to produce something as it is.

I've started to compartmentalize more between programming and proving. I
don't have to do proofs for all the functions, datatypes, and lines of
logic I define in Isabelle/HOL, though proofs are desirable. Isabelle is
useful just as a way to work with functions and datatypes at a higher
level, to experiment with ideas.

My idea now is to take a very minimal model of an Intel cpu and implement
pseudo-assembly language around that model, for the smallest number of
instructions possible.

It seems it should be very simple. I move data in and out of registers,
back and forth to memory, doing bit operations and arithmetic, and look at
flags to make decisions. As far as proving, that wouldn't be simple at all.

I have to go with what I know, and I always learn something in the process
anyway. The only mutable datatype I know about in Isabelle/HOL is a record,
so it seems a mutable array would have to be something like I defined.

I did a few searches on the HOL4 model you mentioned, but my motivation is
low to pursue that. It's got to be very complex and huge.

I recommend looking at the other (non-x86) processor models in HOL4 too.
Not all are complex and huge (though I guess that depends on your
perspective).

Thanks,
GB


Last updated: Apr 23 2024 at 20:15 UTC