mmo.texi revision 1.1.1.1 1 1.1 skrll @section mmo backend
2 1.1 skrll The mmo object format is used exclusively together with Professor
3 1.1 skrll Donald E.@: Knuth's educational 64-bit processor MMIX. The simulator
4 1.1 skrll @command{mmix} which is available at
5 1.1 skrll @url{http://www-cs-faculty.stanford.edu/~knuth/programs/mmix.tar.gz}
6 1.1 skrll understands this format. That package also includes a combined
7 1.1 skrll assembler and linker called @command{mmixal}. The mmo format has
8 1.1 skrll no advantages feature-wise compared to e.g. ELF. It is a simple
9 1.1 skrll non-relocatable object format with no support for archives or
10 1.1 skrll debugging information, except for symbol value information and
11 1.1 skrll line numbers (which is not yet implemented in BFD). See
12 1.1 skrll @url{http://www-cs-faculty.stanford.edu/~knuth/mmix.html} for more
13 1.1 skrll information about MMIX. The ELF format is used for intermediate
14 1.1 skrll object files in the BFD implementation.
15 1.1 skrll
16 1.1 skrll @c We want to xref the symbol table node. A feature in "chew"
17 1.1 skrll @c requires that "commands" do not contain spaces in the
18 1.1 skrll @c arguments. Hence the hyphen in "Symbol-table".
19 1.1 skrll @menu
20 1.1 skrll * File layout::
21 1.1 skrll * Symbol-table::
22 1.1 skrll * mmo section mapping::
23 1.1 skrll @end menu
24 1.1 skrll
25 1.1 skrll @node File layout, Symbol-table, mmo, mmo
26 1.1 skrll @subsection File layout
27 1.1 skrll The mmo file contents is not partitioned into named sections as
28 1.1 skrll with e.g.@: ELF. Memory areas is formed by specifying the
29 1.1 skrll location of the data that follows. Only the memory area
30 1.1 skrll @samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} is executable, so
31 1.1 skrll it is used for code (and constants) and the area
32 1.1 skrll @samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} is used for
33 1.1 skrll writable data. @xref{mmo section mapping}.
34 1.1 skrll
35 1.1 skrll There is provision for specifying ``special data'' of 65536
36 1.1 skrll different types. We use type 80 (decimal), arbitrarily chosen the
37 1.1 skrll same as the ELF @code{e_machine} number for MMIX, filling it with
38 1.1 skrll section information normally found in ELF objects. @xref{mmo
39 1.1 skrll section mapping}.
40 1.1 skrll
41 1.1 skrll Contents is entered as 32-bit words, xor:ed over previous
42 1.1 skrll contents, always zero-initialized. A word that starts with the
43 1.1 skrll byte @samp{0x98} forms a command called a @samp{lopcode}, where
44 1.1 skrll the next byte distinguished between the thirteen lopcodes. The
45 1.1 skrll two remaining bytes, called the @samp{Y} and @samp{Z} fields, or
46 1.1 skrll the @samp{YZ} field (a 16-bit big-endian number), are used for
47 1.1 skrll various purposes different for each lopcode. As documented in
48 1.1 skrll @url{http://www-cs-faculty.stanford.edu/~knuth/mmixal-intro.ps.gz},
49 1.1 skrll the lopcodes are:
50 1.1 skrll
51 1.1 skrll @table @code
52 1.1 skrll @item lop_quote
53 1.1 skrll 0x98000001. The next word is contents, regardless of whether it
54 1.1 skrll starts with 0x98 or not.
55 1.1 skrll
56 1.1 skrll @item lop_loc
57 1.1 skrll 0x9801YYZZ, where @samp{Z} is 1 or 2. This is a location
58 1.1 skrll directive, setting the location for the next data to the next
59 1.1 skrll 32-bit word (for @math{Z = 1}) or 64-bit word (for @math{Z = 2}),
60 1.1 skrll plus @math{Y * 2^56}. Normally @samp{Y} is 0 for the text segment
61 1.1 skrll and 2 for the data segment.
62 1.1 skrll
63 1.1 skrll @item lop_skip
64 1.1 skrll 0x9802YYZZ. Increase the current location by @samp{YZ} bytes.
65 1.1 skrll
66 1.1 skrll @item lop_fixo
67 1.1 skrll 0x9803YYZZ, where @samp{Z} is 1 or 2. Store the current location
68 1.1 skrll as 64 bits into the location pointed to by the next 32-bit
69 1.1 skrll (@math{Z = 1}) or 64-bit (@math{Z = 2}) word, plus @math{Y *
70 1.1 skrll 2^56}.
71 1.1 skrll
72 1.1 skrll @item lop_fixr
73 1.1 skrll 0x9804YYZZ. @samp{YZ} is stored into the current location plus
74 1.1 skrll @math{2 - 4 * YZ}.
75 1.1 skrll
76 1.1 skrll @item lop_fixrx
77 1.1 skrll 0x980500ZZ. @samp{Z} is 16 or 24. A value @samp{L} derived from
78 1.1 skrll the following 32-bit word are used in a manner similar to
79 1.1 skrll @samp{YZ} in lop_fixr: it is xor:ed into the current location
80 1.1 skrll minus @math{4 * L}. The first byte of the word is 0 or 1. If it
81 1.1 skrll is 1, then @math{L = (@var{lowest 24 bits of word}) - 2^Z}, if 0,
82 1.1 skrll then @math{L = (@var{lowest 24 bits of word})}.
83 1.1 skrll
84 1.1 skrll @item lop_file
85 1.1 skrll 0x9806YYZZ. @samp{Y} is the file number, @samp{Z} is count of
86 1.1 skrll 32-bit words. Set the file number to @samp{Y} and the line
87 1.1 skrll counter to 0. The next @math{Z * 4} bytes contain the file name,
88 1.1 skrll padded with zeros if the count is not a multiple of four. The
89 1.1 skrll same @samp{Y} may occur multiple times, but @samp{Z} must be 0 for
90 1.1 skrll all but the first occurrence.
91 1.1 skrll
92 1.1 skrll @item lop_line
93 1.1 skrll 0x9807YYZZ. @samp{YZ} is the line number. Together with
94 1.1 skrll lop_file, it forms the source location for the next 32-bit word.
95 1.1 skrll Note that for each non-lopcode 32-bit word, line numbers are
96 1.1 skrll assumed incremented by one.
97 1.1 skrll
98 1.1 skrll @item lop_spec
99 1.1 skrll 0x9808YYZZ. @samp{YZ} is the type number. Data until the next
100 1.1 skrll lopcode other than lop_quote forms special data of type @samp{YZ}.
101 1.1 skrll @xref{mmo section mapping}.
102 1.1 skrll
103 1.1 skrll Other types than 80, (or type 80 with a content that does not
104 1.1 skrll parse) is stored in sections named @code{.MMIX.spec_data.@var{n}}
105 1.1 skrll where @var{n} is the @samp{YZ}-type. The flags for such a
106 1.1 skrll sections say not to allocate or load the data. The vma is 0.
107 1.1 skrll Contents of multiple occurrences of special data @var{n} is
108 1.1 skrll concatenated to the data of the previous lop_spec @var{n}s. The
109 1.1 skrll location in data or code at which the lop_spec occurred is lost.
110 1.1 skrll
111 1.1 skrll @item lop_pre
112 1.1 skrll 0x980901ZZ. The first lopcode in a file. The @samp{Z} field forms the
113 1.1 skrll length of header information in 32-bit words, where the first word
114 1.1 skrll tells the time in seconds since @samp{00:00:00 GMT Jan 1 1970}.
115 1.1 skrll
116 1.1 skrll @item lop_post
117 1.1 skrll 0x980a00ZZ. @math{Z > 32}. This lopcode follows after all
118 1.1 skrll content-generating lopcodes in a program. The @samp{Z} field
119 1.1 skrll denotes the value of @samp{rG} at the beginning of the program.
120 1.1 skrll The following @math{256 - Z} big-endian 64-bit words are loaded
121 1.1 skrll into global registers @samp{$G} @dots{} @samp{$255}.
122 1.1 skrll
123 1.1 skrll @item lop_stab
124 1.1 skrll 0x980b0000. The next-to-last lopcode in a program. Must follow
125 1.1 skrll immediately after the lop_post lopcode and its data. After this
126 1.1 skrll lopcode follows all symbols in a compressed format
127 1.1 skrll (@pxref{Symbol-table}).
128 1.1 skrll
129 1.1 skrll @item lop_end
130 1.1 skrll 0x980cYYZZ. The last lopcode in a program. It must follow the
131 1.1 skrll lop_stab lopcode and its data. The @samp{YZ} field contains the
132 1.1 skrll number of 32-bit words of symbol table information after the
133 1.1 skrll preceding lop_stab lopcode.
134 1.1 skrll @end table
135 1.1 skrll
136 1.1 skrll Note that the lopcode "fixups"; @code{lop_fixr}, @code{lop_fixrx} and
137 1.1 skrll @code{lop_fixo} are not generated by BFD, but are handled. They are
138 1.1 skrll generated by @code{mmixal}.
139 1.1 skrll
140 1.1 skrll This trivial one-label, one-instruction file:
141 1.1 skrll
142 1.1 skrll @example
143 1.1 skrll :Main TRAP 1,2,3
144 1.1 skrll @end example
145 1.1 skrll
146 1.1 skrll can be represented this way in mmo:
147 1.1 skrll
148 1.1 skrll @example
149 1.1 skrll 0x98090101 - lop_pre, one 32-bit word with timestamp.
150 1.1 skrll <timestamp>
151 1.1 skrll 0x98010002 - lop_loc, text segment, using a 64-bit address.
152 1.1 skrll Note that mmixal does not emit this for the file above.
153 1.1 skrll 0x00000000 - Address, high 32 bits.
154 1.1 skrll 0x00000000 - Address, low 32 bits.
155 1.1 skrll 0x98060002 - lop_file, 2 32-bit words for file-name.
156 1.1 skrll 0x74657374 - "test"
157 1.1 skrll 0x2e730000 - ".s\0\0"
158 1.1 skrll 0x98070001 - lop_line, line 1.
159 1.1 skrll 0x00010203 - TRAP 1,2,3
160 1.1 skrll 0x980a00ff - lop_post, setting $255 to 0.
161 1.1 skrll 0x00000000
162 1.1 skrll 0x00000000
163 1.1 skrll 0x980b0000 - lop_stab for ":Main" = 0, serial 1.
164 1.1 skrll 0x203a4040 @xref{Symbol-table}.
165 1.1 skrll 0x10404020
166 1.1 skrll 0x4d206120
167 1.1 skrll 0x69016e00
168 1.1 skrll 0x81000000
169 1.1 skrll 0x980c0005 - lop_end; symbol table contained five 32-bit words.
170 1.1 skrll @end example
171 1.1 skrll @node Symbol-table, mmo section mapping, File layout, mmo
172 1.1 skrll @subsection Symbol table format
173 1.1 skrll From mmixal.w (or really, the generated mmixal.tex) in
174 1.1 skrll @url{http://www-cs-faculty.stanford.edu/~knuth/programs/mmix.tar.gz}):
175 1.1 skrll ``Symbols are stored and retrieved by means of a @samp{ternary
176 1.1 skrll search trie}, following ideas of Bentley and Sedgewick. (See
177 1.1 skrll ACM--SIAM Symp.@: on Discrete Algorithms @samp{8} (1997), 360--369;
178 1.1 skrll R.@:Sedgewick, @samp{Algorithms in C} (Reading, Mass.@:
179 1.1 skrll Addison--Wesley, 1998), @samp{15.4}.) Each trie node stores a
180 1.1 skrll character, and there are branches to subtries for the cases where
181 1.1 skrll a given character is less than, equal to, or greater than the
182 1.1 skrll character in the trie. There also is a pointer to a symbol table
183 1.1 skrll entry if a symbol ends at the current node.''
184 1.1 skrll
185 1.1 skrll So it's a tree encoded as a stream of bytes. The stream of bytes
186 1.1 skrll acts on a single virtual global symbol, adding and removing
187 1.1 skrll characters and signalling complete symbol points. Here, we read
188 1.1 skrll the stream and create symbols at the completion points.
189 1.1 skrll
190 1.1 skrll First, there's a control byte @code{m}. If any of the listed bits
191 1.1 skrll in @code{m} is nonzero, we execute what stands at the right, in
192 1.1 skrll the listed order:
193 1.1 skrll
194 1.1 skrll @example
195 1.1 skrll (MMO3_LEFT)
196 1.1 skrll 0x40 - Traverse left trie.
197 1.1 skrll (Read a new command byte and recurse.)
198 1.1 skrll
199 1.1 skrll (MMO3_SYMBITS)
200 1.1 skrll 0x2f - Read the next byte as a character and store it in the
201 1.1 skrll current character position; increment character position.
202 1.1 skrll Test the bits of @code{m}:
203 1.1 skrll
204 1.1 skrll (MMO3_WCHAR)
205 1.1 skrll 0x80 - The character is 16-bit (so read another byte,
206 1.1 skrll merge into current character.
207 1.1 skrll
208 1.1 skrll (MMO3_TYPEBITS)
209 1.1 skrll 0xf - We have a complete symbol; parse the type, value
210 1.1 skrll and serial number and do what should be done
211 1.1 skrll with a symbol. The type and length information
212 1.1 skrll is in j = (m & 0xf).
213 1.1 skrll
214 1.1 skrll (MMO3_REGQUAL_BITS)
215 1.1 skrll j == 0xf: A register variable. The following
216 1.1 skrll byte tells which register.
217 1.1 skrll j <= 8: An absolute symbol. Read j bytes as the
218 1.1 skrll big-endian number the symbol equals.
219 1.1 skrll A j = 2 with two zero bytes denotes an
220 1.1 skrll unknown symbol.
221 1.1 skrll j > 8: As with j <= 8, but add (0x20 << 56)
222 1.1 skrll to the value in the following j - 8
223 1.1 skrll bytes.
224 1.1 skrll
225 1.1 skrll Then comes the serial number, as a variant of
226 1.1 skrll uleb128, but better named ubeb128:
227 1.1 skrll Read bytes and shift the previous value left 7
228 1.1 skrll (multiply by 128). Add in the new byte, repeat
229 1.1 skrll until a byte has bit 7 set. The serial number
230 1.1 skrll is the computed value minus 128.
231 1.1 skrll
232 1.1 skrll (MMO3_MIDDLE)
233 1.1 skrll 0x20 - Traverse middle trie. (Read a new command byte
234 1.1 skrll and recurse.) Decrement character position.
235 1.1 skrll
236 1.1 skrll (MMO3_RIGHT)
237 1.1 skrll 0x10 - Traverse right trie. (Read a new command byte and
238 1.1 skrll recurse.)
239 1.1 skrll @end example
240 1.1 skrll
241 1.1 skrll Let's look again at the @code{lop_stab} for the trivial file
242 1.1 skrll (@pxref{File layout}).
243 1.1 skrll
244 1.1 skrll @example
245 1.1 skrll 0x980b0000 - lop_stab for ":Main" = 0, serial 1.
246 1.1 skrll 0x203a4040
247 1.1 skrll 0x10404020
248 1.1 skrll 0x4d206120
249 1.1 skrll 0x69016e00
250 1.1 skrll 0x81000000
251 1.1 skrll @end example
252 1.1 skrll
253 1.1 skrll This forms the trivial trie (note that the path between ``:'' and
254 1.1 skrll ``M'' is redundant):
255 1.1 skrll
256 1.1 skrll @example
257 1.1 skrll 203a ":"
258 1.1 skrll 40 /
259 1.1 skrll 40 /
260 1.1 skrll 10 \
261 1.1 skrll 40 /
262 1.1 skrll 40 /
263 1.1 skrll 204d "M"
264 1.1 skrll 2061 "a"
265 1.1 skrll 2069 "i"
266 1.1 skrll 016e "n" is the last character in a full symbol, and
267 1.1 skrll with a value represented in one byte.
268 1.1 skrll 00 The value is 0.
269 1.1 skrll 81 The serial number is 1.
270 1.1 skrll @end example
271 1.1 skrll
272 1.1 skrll @node mmo section mapping, , Symbol-table, mmo
273 1.1 skrll @subsection mmo section mapping
274 1.1 skrll The implementation in BFD uses special data type 80 (decimal) to
275 1.1 skrll encapsulate and describe named sections, containing e.g.@: debug
276 1.1 skrll information. If needed, any datum in the encapsulation will be
277 1.1 skrll quoted using lop_quote. First comes a 32-bit word holding the
278 1.1 skrll number of 32-bit words containing the zero-terminated zero-padded
279 1.1 skrll segment name. After the name there's a 32-bit word holding flags
280 1.1 skrll describing the section type. Then comes a 64-bit big-endian word
281 1.1 skrll with the section length (in bytes), then another with the section
282 1.1 skrll start address. Depending on the type of section, the contents
283 1.1 skrll might follow, zero-padded to 32-bit boundary. For a loadable
284 1.1 skrll section (such as data or code), the contents might follow at some
285 1.1 skrll later point, not necessarily immediately, as a lop_loc with the
286 1.1 skrll same start address as in the section description, followed by the
287 1.1 skrll contents. This in effect forms a descriptor that must be emitted
288 1.1 skrll before the actual contents. Sections described this way must not
289 1.1 skrll overlap.
290 1.1 skrll
291 1.1 skrll For areas that don't have such descriptors, synthetic sections are
292 1.1 skrll formed by BFD. Consecutive contents in the two memory areas
293 1.1 skrll @samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} and
294 1.1 skrll @samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} are entered in
295 1.1 skrll sections named @code{.text} and @code{.data} respectively. If an area
296 1.1 skrll is not otherwise described, but would together with a neighboring
297 1.1 skrll lower area be less than @samp{0x40000000} bytes long, it is joined
298 1.1 skrll with the lower area and the gap is zero-filled. For other cases,
299 1.1 skrll a new section is formed, named @code{.MMIX.sec.@var{n}}. Here,
300 1.1 skrll @var{n} is a number, a running count through the mmo file,
301 1.1 skrll starting at 0.
302 1.1 skrll
303 1.1 skrll A loadable section specified as:
304 1.1 skrll
305 1.1 skrll @example
306 1.1 skrll .section secname,"ax"
307 1.1 skrll TETRA 1,2,3,4,-1,-2009
308 1.1 skrll BYTE 80
309 1.1 skrll @end example
310 1.1 skrll
311 1.1 skrll and linked to address @samp{0x4}, is represented by the sequence:
312 1.1 skrll
313 1.1 skrll @example
314 1.1 skrll 0x98080050 - lop_spec 80
315 1.1 skrll 0x00000002 - two 32-bit words for the section name
316 1.1 skrll 0x7365636e - "secn"
317 1.1 skrll 0x616d6500 - "ame\0"
318 1.1 skrll 0x00000033 - flags CODE, READONLY, LOAD, ALLOC
319 1.1 skrll 0x00000000 - high 32 bits of section length
320 1.1 skrll 0x0000001c - section length is 28 bytes; 6 * 4 + 1 + alignment to 32 bits
321 1.1 skrll 0x00000000 - high 32 bits of section address
322 1.1 skrll 0x00000004 - section address is 4
323 1.1 skrll 0x98010002 - 64 bits with address of following data
324 1.1 skrll 0x00000000 - high 32 bits of address
325 1.1 skrll 0x00000004 - low 32 bits: data starts at address 4
326 1.1 skrll 0x00000001 - 1
327 1.1 skrll 0x00000002 - 2
328 1.1 skrll 0x00000003 - 3
329 1.1 skrll 0x00000004 - 4
330 1.1 skrll 0xffffffff - -1
331 1.1 skrll 0xfffff827 - -2009
332 1.1 skrll 0x50000000 - 80 as a byte, padded with zeros.
333 1.1 skrll @end example
334 1.1 skrll
335 1.1 skrll Note that the lop_spec wrapping does not include the section
336 1.1 skrll contents. Compare this to a non-loaded section specified as:
337 1.1 skrll
338 1.1 skrll @example
339 1.1 skrll .section thirdsec
340 1.1 skrll TETRA 200001,100002
341 1.1 skrll BYTE 38,40
342 1.1 skrll @end example
343 1.1 skrll
344 1.1 skrll This, when linked to address @samp{0x200000000000001c}, is
345 1.1 skrll represented by:
346 1.1 skrll
347 1.1 skrll @example
348 1.1 skrll 0x98080050 - lop_spec 80
349 1.1 skrll 0x00000002 - two 32-bit words for the section name
350 1.1 skrll 0x7365636e - "thir"
351 1.1 skrll 0x616d6500 - "dsec"
352 1.1 skrll 0x00000010 - flag READONLY
353 1.1 skrll 0x00000000 - high 32 bits of section length
354 1.1 skrll 0x0000000c - section length is 12 bytes; 2 * 4 + 2 + alignment to 32 bits
355 1.1 skrll 0x20000000 - high 32 bits of address
356 1.1 skrll 0x0000001c - low 32 bits of address 0x200000000000001c
357 1.1 skrll 0x00030d41 - 200001
358 1.1 skrll 0x000186a2 - 100002
359 1.1 skrll 0x26280000 - 38, 40 as bytes, padded with zeros
360 1.1 skrll @end example
361 1.1 skrll
362 1.1 skrll For the latter example, the section contents must not be
363 1.1 skrll loaded in memory, and is therefore specified as part of the
364 1.1 skrll special data. The address is usually unimportant but might
365 1.1 skrll provide information for e.g.@: the DWARF 2 debugging format.
366