Home | History | Annotate | Line # | Download | only in doc
      1 @section mmo backend
      2 The mmo object format is used exclusively together with Professor
      3 Donald E.@: Knuth's educational 64-bit processor MMIX.  The simulator
      4 @command{mmix} which is available at
      5 @url{http://mmix.cs.hm.edu/src/index.html}
      6 understands this format.  That package also includes a combined
      7 assembler and linker called @command{mmixal}.  The mmo format has
      8 no advantages feature-wise compared to e.g. ELF.  It is a simple
      9 non-relocatable object format with no support for archives or
     10 debugging information, except for symbol value information and
     11 line numbers (which is not yet implemented in BFD).  See
     12 @url{http://mmix.cs.hm.edu/} for more
     13 information about MMIX.  The ELF format is used for intermediate
     14 object files in the BFD implementation.
     15 
     16 @c We want to xref the symbol table node.  A feature in "chew"
     17 @c requires that "commands" do not contain spaces in the
     18 @c arguments.  Hence the hyphen in "Symbol-table".
     19 @menu
     20 * File layout::
     21 * Symbol-table::
     22 * mmo section mapping::
     23 @end menu
     24 
     25 @node File layout, Symbol-table, mmo, mmo
     26 @subsection File layout
     27 The mmo file contents is not partitioned into named sections as
     28 with e.g.@: ELF.  Memory areas is formed by specifying the
     29 location of the data that follows.  Only the memory area
     30 @samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} is executable, so
     31 it is used for code (and constants) and the area
     32 @samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} is used for
     33 writable data.  @xref{mmo section mapping}.
     34 
     35 There is provision for specifying ``special data'' of 65536
     36 different types.  We use type 80 (decimal), arbitrarily chosen the
     37 same as the ELF @code{e_machine} number for MMIX, filling it with
     38 section information normally found in ELF objects. @xref{mmo
     39 section mapping}.
     40 
     41 Contents is entered as 32-bit words, xor:ed over previous
     42 contents, always zero-initialized.  A word that starts with the
     43 byte @samp{0x98} forms a command called a @samp{lopcode}, where
     44 the next byte distinguished between the thirteen lopcodes.  The
     45 two remaining bytes, called the @samp{Y} and @samp{Z} fields, or
     46 the @samp{YZ} field (a 16-bit big-endian number), are used for
     47 various purposes different for each lopcode.  As documented in
     48 @url{http://mmix.cs.hm.edu/doc/mmixal.pdf},
     49 the lopcodes are:
     50 
     51 @table @code
     52 @item lop_quote
     53 0x98000001.  The next word is contents, regardless of whether it
     54 starts with 0x98 or not.
     55 
     56 @item lop_loc
     57 0x9801YYZZ, where @samp{Z} is 1 or 2.  This is a location
     58 directive, setting the location for the next data to the next
     59 32-bit word (for @math{Z = 1}) or 64-bit word (for @math{Z = 2}),
     60 plus @math{Y * 2^56}.  Normally @samp{Y} is 0 for the text segment
     61 and 2 for the data segment.  Beware that the low bits of non-
     62 tetrabyte-aligned values are silently discarded when being
     63 automatically incremented and when storing contents (in contrast
     64 to e.g. its use as current location when followed by lop_fixo
     65 et al before the next possibly-quoted tetrabyte contents).
     66 
     67 @item lop_skip
     68 0x9802YYZZ.  Increase the current location by @samp{YZ} bytes.
     69 
     70 @item lop_fixo
     71 0x9803YYZZ, where @samp{Z} is 1 or 2.  Store the current location
     72 as 64 bits into the location pointed to by the next 32-bit
     73 (@math{Z = 1}) or 64-bit (@math{Z = 2}) word, plus @math{Y *
     74 2^56}.
     75 
     76 @item lop_fixr
     77 0x9804YYZZ.  @samp{YZ} is stored into the current location plus
     78 @math{2 - 4 * YZ}.
     79 
     80 @item lop_fixrx
     81 0x980500ZZ.  @samp{Z} is 16 or 24.  A value @samp{L} derived from
     82 the following 32-bit word are used in a manner similar to
     83 @samp{YZ} in lop_fixr: it is xor:ed into the current location
     84 minus @math{4 * L}.  The first byte of the word is 0 or 1.  If it
     85 is 1, then @math{L = (@var{lowest 24 bits of word}) - 2^Z}, if 0,
     86 then @math{L = (@var{lowest 24 bits of word})}.
     87 
     88 @item lop_file
     89 0x9806YYZZ.  @samp{Y} is the file number, @samp{Z} is count of
     90 32-bit words.  Set the file number to @samp{Y} and the line
     91 counter to 0.  The next @math{Z * 4} bytes contain the file name,
     92 padded with zeros if the count is not a multiple of four.  The
     93 same @samp{Y} may occur multiple times, but @samp{Z} must be 0 for
     94 all but the first occurrence.
     95 
     96 @item lop_line
     97 0x9807YYZZ.  @samp{YZ} is the line number.  Together with
     98 lop_file, it forms the source location for the next 32-bit word.
     99 Note that for each non-lopcode 32-bit word, line numbers are
    100 assumed incremented by one.
    101 
    102 @item lop_spec
    103 0x9808YYZZ.  @samp{YZ} is the type number.  Data until the next
    104 lopcode other than lop_quote forms special data of type @samp{YZ}.
    105 @xref{mmo section mapping}.
    106 
    107 Other types than 80, (or type 80 with a content that does not
    108 parse) is stored in sections named @code{.MMIX.spec_data.@var{n}}
    109 where @var{n} is the @samp{YZ}-type.  The flags for such a
    110 sections say not to allocate or load the data.  The vma is 0.
    111 Contents of multiple occurrences of special data @var{n} is
    112 concatenated to the data of the previous lop_spec @var{n}s.  The
    113 location in data or code at which the lop_spec occurred is lost.
    114 
    115 @item lop_pre
    116 0x980901ZZ.  The first lopcode in a file.  The @samp{Z} field forms the
    117 length of header information in 32-bit words, where the first word
    118 tells the time in seconds since @samp{00:00:00 GMT Jan 1 1970}.
    119 
    120 @item lop_post
    121 0x980a00ZZ.  @math{Z > 32}.  This lopcode follows after all
    122 content-generating lopcodes in a program.  The @samp{Z} field
    123 denotes the value of @samp{rG} at the beginning of the program.
    124 The following @math{256 - Z} big-endian 64-bit words are loaded
    125 into global registers @samp{$G} @dots{} @samp{$255}.
    126 
    127 @item lop_stab
    128 0x980b0000.  The next-to-last lopcode in a program.  Must follow
    129 immediately after the lop_post lopcode and its data.  After this
    130 lopcode follows all symbols in a compressed format
    131 (@pxref{Symbol-table}).
    132 
    133 @item lop_end
    134 0x980cYYZZ.  The last lopcode in a program.  It must follow the
    135 lop_stab lopcode and its data.  The @samp{YZ} field contains the
    136 number of 32-bit words of symbol table information after the
    137 preceding lop_stab lopcode.
    138 @end table
    139 
    140 Note that the lopcode "fixups"; @code{lop_fixr}, @code{lop_fixrx} and
    141 @code{lop_fixo} are not generated by BFD, but are handled.  They are
    142 generated by @code{mmixal}.
    143 
    144 This trivial one-label, one-instruction file:
    145 
    146 @example
    147  :Main TRAP 1,2,3
    148 @end example
    149 
    150 can be represented this way in mmo:
    151 
    152 @example
    153  0x98090101 - lop_pre, one 32-bit word with timestamp.
    154  <timestamp>
    155  0x98010002 - lop_loc, text segment, using a 64-bit address.
    156               Note that mmixal does not emit this for the file above.
    157  0x00000000 - Address, high 32 bits.
    158  0x00000000 - Address, low 32 bits.
    159  0x98060002 - lop_file, 2 32-bit words for file-name.
    160  0x74657374 - "test"
    161  0x2e730000 - ".s\0\0"
    162  0x98070001 - lop_line, line 1.
    163  0x00010203 - TRAP 1,2,3
    164  0x980a00ff - lop_post, setting $255 to 0.
    165  0x00000000
    166  0x00000000
    167  0x980b0000 - lop_stab for ":Main" = 0, serial 1.
    168  0x203a4040   @xref{Symbol-table}.
    169  0x10404020
    170  0x4d206120
    171  0x69016e00
    172  0x81000000
    173  0x980c0005 - lop_end; symbol table contained five 32-bit words.
    174 @end example
    175 @node Symbol-table, mmo section mapping, File layout, mmo
    176 @subsection Symbol table format
    177 From mmixal.w (or really, the generated mmixal.tex) in the
    178 MMIXware package which also contains the @command{mmix} simulator:
    179 ``Symbols are stored and retrieved by means of a @samp{ternary
    180 search trie}, following ideas of Bentley and Sedgewick. (See
    181 ACM--SIAM Symp.@: on Discrete Algorithms @samp{8} (1997), 360--369;
    182 R.@:Sedgewick, @samp{Algorithms in C} (Reading, Mass.@:
    183 Addison--Wesley, 1998), @samp{15.4}.)  Each trie node stores a
    184 character, and there are branches to subtries for the cases where
    185 a given character is less than, equal to, or greater than the
    186 character in the trie.  There also is a pointer to a symbol table
    187 entry if a symbol ends at the current node.''
    188 
    189 So it's a tree encoded as a stream of bytes.  The stream of bytes
    190 acts on a single virtual global symbol, adding and removing
    191 characters and signalling complete symbol points.  Here, we read
    192 the stream and create symbols at the completion points.
    193 
    194 First, there's a control byte @code{m}.  If any of the listed bits
    195 in @code{m} is nonzero, we execute what stands at the right, in
    196 the listed order:
    197 
    198 @example
    199  (MMO3_LEFT)
    200  0x40 - Traverse left trie.
    201         (Read a new command byte and recurse.)
    202 
    203  (MMO3_SYMBITS)
    204  0x2f - Read the next byte as a character and store it in the
    205         current character position; increment character position.
    206         Test the bits of @code{m}:
    207 
    208         (MMO3_WCHAR)
    209         0x80 - The character is 16-bit (so read another byte,
    210                merge into current character.
    211 
    212         (MMO3_TYPEBITS)
    213         0xf  - We have a complete symbol; parse the type, value
    214                and serial number and do what should be done
    215                with a symbol.  The type and length information
    216                is in j = (m & 0xf).
    217 
    218                (MMO3_REGQUAL_BITS)
    219                j == 0xf: A register variable.  The following
    220                          byte tells which register.
    221                j <= 8:   An absolute symbol.  Read j bytes as the
    222                          big-endian number the symbol equals.
    223                          A j = 2 with two zero bytes denotes an
    224                          unknown symbol.
    225                j > 8:    As with j <= 8, but add (0x20 << 56)
    226                          to the value in the following j - 8
    227                          bytes.
    228 
    229                Then comes the serial number, as a variant of
    230                uleb128, but better named ubeb128:
    231                Read bytes and shift the previous value left 7
    232                (multiply by 128).  Add in the new byte, repeat
    233                until a byte has bit 7 set.  The serial number
    234                is the computed value minus 128.
    235 
    236         (MMO3_MIDDLE)
    237         0x20 - Traverse middle trie.  (Read a new command byte
    238                and recurse.)  Decrement character position.
    239 
    240  (MMO3_RIGHT)
    241  0x10 - Traverse right trie.  (Read a new command byte and
    242         recurse.)
    243 @end example
    244 
    245 Let's look again at the @code{lop_stab} for the trivial file
    246 (@pxref{File layout}).
    247 
    248 @example
    249  0x980b0000 - lop_stab for ":Main" = 0, serial 1.
    250  0x203a4040
    251  0x10404020
    252  0x4d206120
    253  0x69016e00
    254  0x81000000
    255 @end example
    256 
    257 This forms the trivial trie (note that the path between ``:'' and
    258 ``M'' is redundant):
    259 
    260 @example
    261  203a     ":"
    262  40       /
    263  40      /
    264  10      \
    265  40      /
    266  40     /
    267  204d  "M"
    268  2061  "a"
    269  2069  "i"
    270  016e  "n" is the last character in a full symbol, and
    271        with a value represented in one byte.
    272  00    The value is 0.
    273  81    The serial number is 1.
    274 @end example
    275 
    276 @node mmo section mapping, , Symbol-table, mmo
    277 @subsection mmo section mapping
    278 The implementation in BFD uses special data type 80 (decimal) to
    279 encapsulate and describe named sections, containing e.g.@: debug
    280 information.  If needed, any datum in the encapsulation will be
    281 quoted using lop_quote.  First comes a 32-bit word holding the
    282 number of 32-bit words containing the zero-terminated zero-padded
    283 segment name.  After the name there's a 32-bit word holding flags
    284 describing the section type.  Then comes a 64-bit big-endian word
    285 with the section length (in bytes), then another with the section
    286 start address.  Depending on the type of section, the contents
    287 might follow, zero-padded to 32-bit boundary.  For a loadable
    288 section (such as data or code), the contents might follow at some
    289 later point, not necessarily immediately, as a lop_loc with the
    290 same start address as in the section description, followed by the
    291 contents.  This in effect forms a descriptor that must be emitted
    292 before the actual contents.  Sections described this way must not
    293 overlap.
    294 
    295 For areas that don't have such descriptors, synthetic sections are
    296 formed by BFD.  Consecutive contents in the two memory areas
    297 @samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} and
    298 @samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} are entered in
    299 sections named @code{.text} and @code{.data} respectively.  If an area
    300 is not otherwise described, but would together with a neighboring
    301 lower area be less than @samp{0x40000000} bytes long, it is joined
    302 with the lower area and the gap is zero-filled.  For other cases,
    303 a new section is formed, named @code{.MMIX.sec.@var{n}}.  Here,
    304 @var{n} is a number, a running count through the mmo file,
    305 starting at 0.
    306 
    307 A loadable section specified as:
    308 
    309 @example
    310  .section secname,"ax"
    311  TETRA 1,2,3,4,-1,-2009
    312  BYTE 80
    313 @end example
    314 
    315 and linked to address @samp{0x4}, is represented by the sequence:
    316 
    317 @example
    318  0x98080050 - lop_spec 80
    319  0x00000002 - two 32-bit words for the section name
    320  0x7365636e - "secn"
    321  0x616d6500 - "ame\0"
    322  0x00000033 - flags CODE, READONLY, LOAD, ALLOC
    323  0x00000000 - high 32 bits of section length
    324  0x0000001c - section length is 28 bytes; 6 * 4 + 1 + alignment to 32 bits
    325  0x00000000 - high 32 bits of section address
    326  0x00000004 - section address is 4
    327  0x98010002 - 64 bits with address of following data
    328  0x00000000 - high 32 bits of address
    329  0x00000004 - low 32 bits: data starts at address 4
    330  0x00000001 - 1
    331  0x00000002 - 2
    332  0x00000003 - 3
    333  0x00000004 - 4
    334  0xffffffff - -1
    335  0xfffff827 - -2009
    336  0x50000000 - 80 as a byte, padded with zeros.
    337 @end example
    338 
    339 Note that the lop_spec wrapping does not include the section
    340 contents.  Compare this to a non-loaded section specified as:
    341 
    342 @example
    343  .section thirdsec
    344  TETRA 200001,100002
    345  BYTE 38,40
    346 @end example
    347 
    348 This, when linked to address @samp{0x200000000000001c}, is
    349 represented by:
    350 
    351 @example
    352  0x98080050 - lop_spec 80
    353  0x00000002 - two 32-bit words for the section name
    354  0x7365636e - "thir"
    355  0x616d6500 - "dsec"
    356  0x00000010 - flag READONLY
    357  0x00000000 - high 32 bits of section length
    358  0x0000000c - section length is 12 bytes; 2 * 4 + 2 + alignment to 32 bits
    359  0x20000000 - high 32 bits of address
    360  0x0000001c - low 32 bits of address 0x200000000000001c
    361  0x00030d41 - 200001
    362  0x000186a2 - 100002
    363  0x26280000 - 38, 40 as bytes, padded with zeros
    364 @end example
    365 
    366 For the latter example, the section contents must not be
    367 loaded in memory, and is therefore specified as part of the
    368 special data.  The address is usually unimportant but might
    369 provide information for e.g.@: the DWARF 2 debugging format.
    370