1 @section mmo backend 2 The mmo object format is used exclusively together with Professor 3 Donald E.@: Knuth's educational 64-bit processor MMIX. The simulator 4 @command{mmix} which is available at 5 @url{http://mmix.cs.hm.edu/src/index.html} 6 understands this format. That package also includes a combined 7 assembler and linker called @command{mmixal}. The mmo format has 8 no advantages feature-wise compared to e.g. ELF. It is a simple 9 non-relocatable object format with no support for archives or 10 debugging information, except for symbol value information and 11 line numbers (which is not yet implemented in BFD). See 12 @url{http://mmix.cs.hm.edu/} for more 13 information about MMIX. The ELF format is used for intermediate 14 object files in the BFD implementation. 15 16 @c We want to xref the symbol table node. A feature in "chew" 17 @c requires that "commands" do not contain spaces in the 18 @c arguments. Hence the hyphen in "Symbol-table". 19 @menu 20 * File layout:: 21 * Symbol-table:: 22 * mmo section mapping:: 23 @end menu 24 25 @node File layout, Symbol-table, mmo, mmo 26 @subsection File layout 27 The mmo file contents is not partitioned into named sections as 28 with e.g.@: ELF. Memory areas is formed by specifying the 29 location of the data that follows. Only the memory area 30 @samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} is executable, so 31 it is used for code (and constants) and the area 32 @samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} is used for 33 writable data. @xref{mmo section mapping}. 34 35 There is provision for specifying ``special data'' of 65536 36 different types. We use type 80 (decimal), arbitrarily chosen the 37 same as the ELF @code{e_machine} number for MMIX, filling it with 38 section information normally found in ELF objects. @xref{mmo 39 section mapping}. 40 41 Contents is entered as 32-bit words, xor:ed over previous 42 contents, always zero-initialized. A word that starts with the 43 byte @samp{0x98} forms a command called a @samp{lopcode}, where 44 the next byte distinguished between the thirteen lopcodes. The 45 two remaining bytes, called the @samp{Y} and @samp{Z} fields, or 46 the @samp{YZ} field (a 16-bit big-endian number), are used for 47 various purposes different for each lopcode. As documented in 48 @url{http://mmix.cs.hm.edu/doc/mmixal.pdf}, 49 the lopcodes are: 50 51 @table @code 52 @item lop_quote 53 0x98000001. The next word is contents, regardless of whether it 54 starts with 0x98 or not. 55 56 @item lop_loc 57 0x9801YYZZ, where @samp{Z} is 1 or 2. This is a location 58 directive, setting the location for the next data to the next 59 32-bit word (for @math{Z = 1}) or 64-bit word (for @math{Z = 2}), 60 plus @math{Y * 2^56}. Normally @samp{Y} is 0 for the text segment 61 and 2 for the data segment. Beware that the low bits of non- 62 tetrabyte-aligned values are silently discarded when being 63 automatically incremented and when storing contents (in contrast 64 to e.g. its use as current location when followed by lop_fixo 65 et al before the next possibly-quoted tetrabyte contents). 66 67 @item lop_skip 68 0x9802YYZZ. Increase the current location by @samp{YZ} bytes. 69 70 @item lop_fixo 71 0x9803YYZZ, where @samp{Z} is 1 or 2. Store the current location 72 as 64 bits into the location pointed to by the next 32-bit 73 (@math{Z = 1}) or 64-bit (@math{Z = 2}) word, plus @math{Y * 74 2^56}. 75 76 @item lop_fixr 77 0x9804YYZZ. @samp{YZ} is stored into the current location plus 78 @math{2 - 4 * YZ}. 79 80 @item lop_fixrx 81 0x980500ZZ. @samp{Z} is 16 or 24. A value @samp{L} derived from 82 the following 32-bit word are used in a manner similar to 83 @samp{YZ} in lop_fixr: it is xor:ed into the current location 84 minus @math{4 * L}. The first byte of the word is 0 or 1. If it 85 is 1, then @math{L = (@var{lowest 24 bits of word}) - 2^Z}, if 0, 86 then @math{L = (@var{lowest 24 bits of word})}. 87 88 @item lop_file 89 0x9806YYZZ. @samp{Y} is the file number, @samp{Z} is count of 90 32-bit words. Set the file number to @samp{Y} and the line 91 counter to 0. The next @math{Z * 4} bytes contain the file name, 92 padded with zeros if the count is not a multiple of four. The 93 same @samp{Y} may occur multiple times, but @samp{Z} must be 0 for 94 all but the first occurrence. 95 96 @item lop_line 97 0x9807YYZZ. @samp{YZ} is the line number. Together with 98 lop_file, it forms the source location for the next 32-bit word. 99 Note that for each non-lopcode 32-bit word, line numbers are 100 assumed incremented by one. 101 102 @item lop_spec 103 0x9808YYZZ. @samp{YZ} is the type number. Data until the next 104 lopcode other than lop_quote forms special data of type @samp{YZ}. 105 @xref{mmo section mapping}. 106 107 Other types than 80, (or type 80 with a content that does not 108 parse) is stored in sections named @code{.MMIX.spec_data.@var{n}} 109 where @var{n} is the @samp{YZ}-type. The flags for such a 110 sections say not to allocate or load the data. The vma is 0. 111 Contents of multiple occurrences of special data @var{n} is 112 concatenated to the data of the previous lop_spec @var{n}s. The 113 location in data or code at which the lop_spec occurred is lost. 114 115 @item lop_pre 116 0x980901ZZ. The first lopcode in a file. The @samp{Z} field forms the 117 length of header information in 32-bit words, where the first word 118 tells the time in seconds since @samp{00:00:00 GMT Jan 1 1970}. 119 120 @item lop_post 121 0x980a00ZZ. @math{Z > 32}. This lopcode follows after all 122 content-generating lopcodes in a program. The @samp{Z} field 123 denotes the value of @samp{rG} at the beginning of the program. 124 The following @math{256 - Z} big-endian 64-bit words are loaded 125 into global registers @samp{$G} @dots{} @samp{$255}. 126 127 @item lop_stab 128 0x980b0000. The next-to-last lopcode in a program. Must follow 129 immediately after the lop_post lopcode and its data. After this 130 lopcode follows all symbols in a compressed format 131 (@pxref{Symbol-table}). 132 133 @item lop_end 134 0x980cYYZZ. The last lopcode in a program. It must follow the 135 lop_stab lopcode and its data. The @samp{YZ} field contains the 136 number of 32-bit words of symbol table information after the 137 preceding lop_stab lopcode. 138 @end table 139 140 Note that the lopcode "fixups"; @code{lop_fixr}, @code{lop_fixrx} and 141 @code{lop_fixo} are not generated by BFD, but are handled. They are 142 generated by @code{mmixal}. 143 144 This trivial one-label, one-instruction file: 145 146 @example 147 :Main TRAP 1,2,3 148 @end example 149 150 can be represented this way in mmo: 151 152 @example 153 0x98090101 - lop_pre, one 32-bit word with timestamp. 154 <timestamp> 155 0x98010002 - lop_loc, text segment, using a 64-bit address. 156 Note that mmixal does not emit this for the file above. 157 0x00000000 - Address, high 32 bits. 158 0x00000000 - Address, low 32 bits. 159 0x98060002 - lop_file, 2 32-bit words for file-name. 160 0x74657374 - "test" 161 0x2e730000 - ".s\0\0" 162 0x98070001 - lop_line, line 1. 163 0x00010203 - TRAP 1,2,3 164 0x980a00ff - lop_post, setting $255 to 0. 165 0x00000000 166 0x00000000 167 0x980b0000 - lop_stab for ":Main" = 0, serial 1. 168 0x203a4040 @xref{Symbol-table}. 169 0x10404020 170 0x4d206120 171 0x69016e00 172 0x81000000 173 0x980c0005 - lop_end; symbol table contained five 32-bit words. 174 @end example 175 @node Symbol-table, mmo section mapping, File layout, mmo 176 @subsection Symbol table format 177 From mmixal.w (or really, the generated mmixal.tex) in the 178 MMIXware package which also contains the @command{mmix} simulator: 179 ``Symbols are stored and retrieved by means of a @samp{ternary 180 search trie}, following ideas of Bentley and Sedgewick. (See 181 ACM--SIAM Symp.@: on Discrete Algorithms @samp{8} (1997), 360--369; 182 R.@:Sedgewick, @samp{Algorithms in C} (Reading, Mass.@: 183 Addison--Wesley, 1998), @samp{15.4}.) Each trie node stores a 184 character, and there are branches to subtries for the cases where 185 a given character is less than, equal to, or greater than the 186 character in the trie. There also is a pointer to a symbol table 187 entry if a symbol ends at the current node.'' 188 189 So it's a tree encoded as a stream of bytes. The stream of bytes 190 acts on a single virtual global symbol, adding and removing 191 characters and signalling complete symbol points. Here, we read 192 the stream and create symbols at the completion points. 193 194 First, there's a control byte @code{m}. If any of the listed bits 195 in @code{m} is nonzero, we execute what stands at the right, in 196 the listed order: 197 198 @example 199 (MMO3_LEFT) 200 0x40 - Traverse left trie. 201 (Read a new command byte and recurse.) 202 203 (MMO3_SYMBITS) 204 0x2f - Read the next byte as a character and store it in the 205 current character position; increment character position. 206 Test the bits of @code{m}: 207 208 (MMO3_WCHAR) 209 0x80 - The character is 16-bit (so read another byte, 210 merge into current character. 211 212 (MMO3_TYPEBITS) 213 0xf - We have a complete symbol; parse the type, value 214 and serial number and do what should be done 215 with a symbol. The type and length information 216 is in j = (m & 0xf). 217 218 (MMO3_REGQUAL_BITS) 219 j == 0xf: A register variable. The following 220 byte tells which register. 221 j <= 8: An absolute symbol. Read j bytes as the 222 big-endian number the symbol equals. 223 A j = 2 with two zero bytes denotes an 224 unknown symbol. 225 j > 8: As with j <= 8, but add (0x20 << 56) 226 to the value in the following j - 8 227 bytes. 228 229 Then comes the serial number, as a variant of 230 uleb128, but better named ubeb128: 231 Read bytes and shift the previous value left 7 232 (multiply by 128). Add in the new byte, repeat 233 until a byte has bit 7 set. The serial number 234 is the computed value minus 128. 235 236 (MMO3_MIDDLE) 237 0x20 - Traverse middle trie. (Read a new command byte 238 and recurse.) Decrement character position. 239 240 (MMO3_RIGHT) 241 0x10 - Traverse right trie. (Read a new command byte and 242 recurse.) 243 @end example 244 245 Let's look again at the @code{lop_stab} for the trivial file 246 (@pxref{File layout}). 247 248 @example 249 0x980b0000 - lop_stab for ":Main" = 0, serial 1. 250 0x203a4040 251 0x10404020 252 0x4d206120 253 0x69016e00 254 0x81000000 255 @end example 256 257 This forms the trivial trie (note that the path between ``:'' and 258 ``M'' is redundant): 259 260 @example 261 203a ":" 262 40 / 263 40 / 264 10 \ 265 40 / 266 40 / 267 204d "M" 268 2061 "a" 269 2069 "i" 270 016e "n" is the last character in a full symbol, and 271 with a value represented in one byte. 272 00 The value is 0. 273 81 The serial number is 1. 274 @end example 275 276 @node mmo section mapping, , Symbol-table, mmo 277 @subsection mmo section mapping 278 The implementation in BFD uses special data type 80 (decimal) to 279 encapsulate and describe named sections, containing e.g.@: debug 280 information. If needed, any datum in the encapsulation will be 281 quoted using lop_quote. First comes a 32-bit word holding the 282 number of 32-bit words containing the zero-terminated zero-padded 283 segment name. After the name there's a 32-bit word holding flags 284 describing the section type. Then comes a 64-bit big-endian word 285 with the section length (in bytes), then another with the section 286 start address. Depending on the type of section, the contents 287 might follow, zero-padded to 32-bit boundary. For a loadable 288 section (such as data or code), the contents might follow at some 289 later point, not necessarily immediately, as a lop_loc with the 290 same start address as in the section description, followed by the 291 contents. This in effect forms a descriptor that must be emitted 292 before the actual contents. Sections described this way must not 293 overlap. 294 295 For areas that don't have such descriptors, synthetic sections are 296 formed by BFD. Consecutive contents in the two memory areas 297 @samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} and 298 @samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} are entered in 299 sections named @code{.text} and @code{.data} respectively. If an area 300 is not otherwise described, but would together with a neighboring 301 lower area be less than @samp{0x40000000} bytes long, it is joined 302 with the lower area and the gap is zero-filled. For other cases, 303 a new section is formed, named @code{.MMIX.sec.@var{n}}. Here, 304 @var{n} is a number, a running count through the mmo file, 305 starting at 0. 306 307 A loadable section specified as: 308 309 @example 310 .section secname,"ax" 311 TETRA 1,2,3,4,-1,-2009 312 BYTE 80 313 @end example 314 315 and linked to address @samp{0x4}, is represented by the sequence: 316 317 @example 318 0x98080050 - lop_spec 80 319 0x00000002 - two 32-bit words for the section name 320 0x7365636e - "secn" 321 0x616d6500 - "ame\0" 322 0x00000033 - flags CODE, READONLY, LOAD, ALLOC 323 0x00000000 - high 32 bits of section length 324 0x0000001c - section length is 28 bytes; 6 * 4 + 1 + alignment to 32 bits 325 0x00000000 - high 32 bits of section address 326 0x00000004 - section address is 4 327 0x98010002 - 64 bits with address of following data 328 0x00000000 - high 32 bits of address 329 0x00000004 - low 32 bits: data starts at address 4 330 0x00000001 - 1 331 0x00000002 - 2 332 0x00000003 - 3 333 0x00000004 - 4 334 0xffffffff - -1 335 0xfffff827 - -2009 336 0x50000000 - 80 as a byte, padded with zeros. 337 @end example 338 339 Note that the lop_spec wrapping does not include the section 340 contents. Compare this to a non-loaded section specified as: 341 342 @example 343 .section thirdsec 344 TETRA 200001,100002 345 BYTE 38,40 346 @end example 347 348 This, when linked to address @samp{0x200000000000001c}, is 349 represented by: 350 351 @example 352 0x98080050 - lop_spec 80 353 0x00000002 - two 32-bit words for the section name 354 0x7365636e - "thir" 355 0x616d6500 - "dsec" 356 0x00000010 - flag READONLY 357 0x00000000 - high 32 bits of section length 358 0x0000000c - section length is 12 bytes; 2 * 4 + 2 + alignment to 32 bits 359 0x20000000 - high 32 bits of address 360 0x0000001c - low 32 bits of address 0x200000000000001c 361 0x00030d41 - 200001 362 0x000186a2 - 100002 363 0x26280000 - 38, 40 as bytes, padded with zeros 364 @end example 365 366 For the latter example, the section contents must not be 367 loaded in memory, and is therefore specified as part of the 368 special data. The address is usually unimportant but might 369 provide information for e.g.@: the DWARF 2 debugging format. 370