Home | History | Annotate | Line # | Download | only in doc
mmo.texi revision 1.1.1.1
      1 @section mmo backend
      2 The mmo object format is used exclusively together with Professor
      3 Donald E.@: Knuth's educational 64-bit processor MMIX.  The simulator
      4 @command{mmix} which is available at
      5 @url{http://www-cs-faculty.stanford.edu/~knuth/programs/mmix.tar.gz}
      6 understands this format.  That package also includes a combined
      7 assembler and linker called @command{mmixal}.  The mmo format has
      8 no advantages feature-wise compared to e.g. ELF.  It is a simple
      9 non-relocatable object format with no support for archives or
     10 debugging information, except for symbol value information and
     11 line numbers (which is not yet implemented in BFD).  See
     12 @url{http://www-cs-faculty.stanford.edu/~knuth/mmix.html} for more
     13 information about MMIX.  The ELF format is used for intermediate
     14 object files in the BFD implementation.
     15 
     16 @c We want to xref the symbol table node.  A feature in "chew"
     17 @c requires that "commands" do not contain spaces in the
     18 @c arguments.  Hence the hyphen in "Symbol-table".
     19 @menu
     20 * File layout::
     21 * Symbol-table::
     22 * mmo section mapping::
     23 @end menu
     24 
     25 @node File layout, Symbol-table, mmo, mmo
     26 @subsection File layout
     27 The mmo file contents is not partitioned into named sections as
     28 with e.g.@: ELF.  Memory areas is formed by specifying the
     29 location of the data that follows.  Only the memory area
     30 @samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} is executable, so
     31 it is used for code (and constants) and the area
     32 @samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} is used for
     33 writable data.  @xref{mmo section mapping}.
     34 
     35 There is provision for specifying ``special data'' of 65536
     36 different types.  We use type 80 (decimal), arbitrarily chosen the
     37 same as the ELF @code{e_machine} number for MMIX, filling it with
     38 section information normally found in ELF objects. @xref{mmo
     39 section mapping}.
     40 
     41 Contents is entered as 32-bit words, xor:ed over previous
     42 contents, always zero-initialized.  A word that starts with the
     43 byte @samp{0x98} forms a command called a @samp{lopcode}, where
     44 the next byte distinguished between the thirteen lopcodes.  The
     45 two remaining bytes, called the @samp{Y} and @samp{Z} fields, or
     46 the @samp{YZ} field (a 16-bit big-endian number), are used for
     47 various purposes different for each lopcode.  As documented in
     48 @url{http://www-cs-faculty.stanford.edu/~knuth/mmixal-intro.ps.gz},
     49 the lopcodes are:
     50 
     51 @table @code
     52 @item lop_quote
     53 0x98000001.  The next word is contents, regardless of whether it
     54 starts with 0x98 or not.
     55 
     56 @item lop_loc
     57 0x9801YYZZ, where @samp{Z} is 1 or 2.  This is a location
     58 directive, setting the location for the next data to the next
     59 32-bit word (for @math{Z = 1}) or 64-bit word (for @math{Z = 2}),
     60 plus @math{Y * 2^56}.  Normally @samp{Y} is 0 for the text segment
     61 and 2 for the data segment.
     62 
     63 @item lop_skip
     64 0x9802YYZZ.  Increase the current location by @samp{YZ} bytes.
     65 
     66 @item lop_fixo
     67 0x9803YYZZ, where @samp{Z} is 1 or 2.  Store the current location
     68 as 64 bits into the location pointed to by the next 32-bit
     69 (@math{Z = 1}) or 64-bit (@math{Z = 2}) word, plus @math{Y *
     70 2^56}.
     71 
     72 @item lop_fixr
     73 0x9804YYZZ.  @samp{YZ} is stored into the current location plus
     74 @math{2 - 4 * YZ}.
     75 
     76 @item lop_fixrx
     77 0x980500ZZ.  @samp{Z} is 16 or 24.  A value @samp{L} derived from
     78 the following 32-bit word are used in a manner similar to
     79 @samp{YZ} in lop_fixr: it is xor:ed into the current location
     80 minus @math{4 * L}.  The first byte of the word is 0 or 1.  If it
     81 is 1, then @math{L = (@var{lowest 24 bits of word}) - 2^Z}, if 0,
     82 then @math{L = (@var{lowest 24 bits of word})}.
     83 
     84 @item lop_file
     85 0x9806YYZZ.  @samp{Y} is the file number, @samp{Z} is count of
     86 32-bit words.  Set the file number to @samp{Y} and the line
     87 counter to 0.  The next @math{Z * 4} bytes contain the file name,
     88 padded with zeros if the count is not a multiple of four.  The
     89 same @samp{Y} may occur multiple times, but @samp{Z} must be 0 for
     90 all but the first occurrence.
     91 
     92 @item lop_line
     93 0x9807YYZZ.  @samp{YZ} is the line number.  Together with
     94 lop_file, it forms the source location for the next 32-bit word.
     95 Note that for each non-lopcode 32-bit word, line numbers are
     96 assumed incremented by one.
     97 
     98 @item lop_spec
     99 0x9808YYZZ.  @samp{YZ} is the type number.  Data until the next
    100 lopcode other than lop_quote forms special data of type @samp{YZ}.
    101 @xref{mmo section mapping}.
    102 
    103 Other types than 80, (or type 80 with a content that does not
    104 parse) is stored in sections named @code{.MMIX.spec_data.@var{n}}
    105 where @var{n} is the @samp{YZ}-type.  The flags for such a
    106 sections say not to allocate or load the data.  The vma is 0.
    107 Contents of multiple occurrences of special data @var{n} is
    108 concatenated to the data of the previous lop_spec @var{n}s.  The
    109 location in data or code at which the lop_spec occurred is lost.
    110 
    111 @item lop_pre
    112 0x980901ZZ.  The first lopcode in a file.  The @samp{Z} field forms the
    113 length of header information in 32-bit words, where the first word
    114 tells the time in seconds since @samp{00:00:00 GMT Jan 1 1970}.
    115 
    116 @item lop_post
    117 0x980a00ZZ.  @math{Z > 32}.  This lopcode follows after all
    118 content-generating lopcodes in a program.  The @samp{Z} field
    119 denotes the value of @samp{rG} at the beginning of the program.
    120 The following @math{256 - Z} big-endian 64-bit words are loaded
    121 into global registers @samp{$G} @dots{} @samp{$255}.
    122 
    123 @item lop_stab
    124 0x980b0000.  The next-to-last lopcode in a program.  Must follow
    125 immediately after the lop_post lopcode and its data.  After this
    126 lopcode follows all symbols in a compressed format
    127 (@pxref{Symbol-table}).
    128 
    129 @item lop_end
    130 0x980cYYZZ.  The last lopcode in a program.  It must follow the
    131 lop_stab lopcode and its data.  The @samp{YZ} field contains the
    132 number of 32-bit words of symbol table information after the
    133 preceding lop_stab lopcode.
    134 @end table
    135 
    136 Note that the lopcode "fixups"; @code{lop_fixr}, @code{lop_fixrx} and
    137 @code{lop_fixo} are not generated by BFD, but are handled.  They are
    138 generated by @code{mmixal}.
    139 
    140 This trivial one-label, one-instruction file:
    141 
    142 @example
    143  :Main TRAP 1,2,3
    144 @end example
    145 
    146 can be represented this way in mmo:
    147 
    148 @example
    149  0x98090101 - lop_pre, one 32-bit word with timestamp.
    150  <timestamp>
    151  0x98010002 - lop_loc, text segment, using a 64-bit address.
    152               Note that mmixal does not emit this for the file above.
    153  0x00000000 - Address, high 32 bits.
    154  0x00000000 - Address, low 32 bits.
    155  0x98060002 - lop_file, 2 32-bit words for file-name.
    156  0x74657374 - "test"
    157  0x2e730000 - ".s\0\0"
    158  0x98070001 - lop_line, line 1.
    159  0x00010203 - TRAP 1,2,3
    160  0x980a00ff - lop_post, setting $255 to 0.
    161  0x00000000
    162  0x00000000
    163  0x980b0000 - lop_stab for ":Main" = 0, serial 1.
    164  0x203a4040   @xref{Symbol-table}.
    165  0x10404020
    166  0x4d206120
    167  0x69016e00
    168  0x81000000
    169  0x980c0005 - lop_end; symbol table contained five 32-bit words.
    170 @end example
    171 @node Symbol-table, mmo section mapping, File layout, mmo
    172 @subsection Symbol table format
    173 From mmixal.w (or really, the generated mmixal.tex) in
    174 @url{http://www-cs-faculty.stanford.edu/~knuth/programs/mmix.tar.gz}):
    175 ``Symbols are stored and retrieved by means of a @samp{ternary
    176 search trie}, following ideas of Bentley and Sedgewick. (See
    177 ACM--SIAM Symp.@: on Discrete Algorithms @samp{8} (1997), 360--369;
    178 R.@:Sedgewick, @samp{Algorithms in C} (Reading, Mass.@:
    179 Addison--Wesley, 1998), @samp{15.4}.)  Each trie node stores a
    180 character, and there are branches to subtries for the cases where
    181 a given character is less than, equal to, or greater than the
    182 character in the trie.  There also is a pointer to a symbol table
    183 entry if a symbol ends at the current node.''
    184 
    185 So it's a tree encoded as a stream of bytes.  The stream of bytes
    186 acts on a single virtual global symbol, adding and removing
    187 characters and signalling complete symbol points.  Here, we read
    188 the stream and create symbols at the completion points.
    189 
    190 First, there's a control byte @code{m}.  If any of the listed bits
    191 in @code{m} is nonzero, we execute what stands at the right, in
    192 the listed order:
    193 
    194 @example
    195  (MMO3_LEFT)
    196  0x40 - Traverse left trie.
    197         (Read a new command byte and recurse.)
    198 
    199  (MMO3_SYMBITS)
    200  0x2f - Read the next byte as a character and store it in the
    201         current character position; increment character position.
    202         Test the bits of @code{m}:
    203 
    204         (MMO3_WCHAR)
    205         0x80 - The character is 16-bit (so read another byte,
    206                merge into current character.
    207 
    208         (MMO3_TYPEBITS)
    209         0xf  - We have a complete symbol; parse the type, value
    210                and serial number and do what should be done
    211                with a symbol.  The type and length information
    212                is in j = (m & 0xf).
    213 
    214                (MMO3_REGQUAL_BITS)
    215                j == 0xf: A register variable.  The following
    216                          byte tells which register.
    217                j <= 8:   An absolute symbol.  Read j bytes as the
    218                          big-endian number the symbol equals.
    219                          A j = 2 with two zero bytes denotes an
    220                          unknown symbol.
    221                j > 8:    As with j <= 8, but add (0x20 << 56)
    222                          to the value in the following j - 8
    223                          bytes.
    224 
    225                Then comes the serial number, as a variant of
    226                uleb128, but better named ubeb128:
    227                Read bytes and shift the previous value left 7
    228                (multiply by 128).  Add in the new byte, repeat
    229                until a byte has bit 7 set.  The serial number
    230                is the computed value minus 128.
    231 
    232         (MMO3_MIDDLE)
    233         0x20 - Traverse middle trie.  (Read a new command byte
    234                and recurse.)  Decrement character position.
    235 
    236  (MMO3_RIGHT)
    237  0x10 - Traverse right trie.  (Read a new command byte and
    238         recurse.)
    239 @end example
    240 
    241 Let's look again at the @code{lop_stab} for the trivial file
    242 (@pxref{File layout}).
    243 
    244 @example
    245  0x980b0000 - lop_stab for ":Main" = 0, serial 1.
    246  0x203a4040
    247  0x10404020
    248  0x4d206120
    249  0x69016e00
    250  0x81000000
    251 @end example
    252 
    253 This forms the trivial trie (note that the path between ``:'' and
    254 ``M'' is redundant):
    255 
    256 @example
    257  203a     ":"
    258  40       /
    259  40      /
    260  10      \
    261  40      /
    262  40     /
    263  204d  "M"
    264  2061  "a"
    265  2069  "i"
    266  016e  "n" is the last character in a full symbol, and
    267        with a value represented in one byte.
    268  00    The value is 0.
    269  81    The serial number is 1.
    270 @end example
    271 
    272 @node mmo section mapping, , Symbol-table, mmo
    273 @subsection mmo section mapping
    274 The implementation in BFD uses special data type 80 (decimal) to
    275 encapsulate and describe named sections, containing e.g.@: debug
    276 information.  If needed, any datum in the encapsulation will be
    277 quoted using lop_quote.  First comes a 32-bit word holding the
    278 number of 32-bit words containing the zero-terminated zero-padded
    279 segment name.  After the name there's a 32-bit word holding flags
    280 describing the section type.  Then comes a 64-bit big-endian word
    281 with the section length (in bytes), then another with the section
    282 start address.  Depending on the type of section, the contents
    283 might follow, zero-padded to 32-bit boundary.  For a loadable
    284 section (such as data or code), the contents might follow at some
    285 later point, not necessarily immediately, as a lop_loc with the
    286 same start address as in the section description, followed by the
    287 contents.  This in effect forms a descriptor that must be emitted
    288 before the actual contents.  Sections described this way must not
    289 overlap.
    290 
    291 For areas that don't have such descriptors, synthetic sections are
    292 formed by BFD.  Consecutive contents in the two memory areas
    293 @samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} and
    294 @samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} are entered in
    295 sections named @code{.text} and @code{.data} respectively.  If an area
    296 is not otherwise described, but would together with a neighboring
    297 lower area be less than @samp{0x40000000} bytes long, it is joined
    298 with the lower area and the gap is zero-filled.  For other cases,
    299 a new section is formed, named @code{.MMIX.sec.@var{n}}.  Here,
    300 @var{n} is a number, a running count through the mmo file,
    301 starting at 0.
    302 
    303 A loadable section specified as:
    304 
    305 @example
    306  .section secname,"ax"
    307  TETRA 1,2,3,4,-1,-2009
    308  BYTE 80
    309 @end example
    310 
    311 and linked to address @samp{0x4}, is represented by the sequence:
    312 
    313 @example
    314  0x98080050 - lop_spec 80
    315  0x00000002 - two 32-bit words for the section name
    316  0x7365636e - "secn"
    317  0x616d6500 - "ame\0"
    318  0x00000033 - flags CODE, READONLY, LOAD, ALLOC
    319  0x00000000 - high 32 bits of section length
    320  0x0000001c - section length is 28 bytes; 6 * 4 + 1 + alignment to 32 bits
    321  0x00000000 - high 32 bits of section address
    322  0x00000004 - section address is 4
    323  0x98010002 - 64 bits with address of following data
    324  0x00000000 - high 32 bits of address
    325  0x00000004 - low 32 bits: data starts at address 4
    326  0x00000001 - 1
    327  0x00000002 - 2
    328  0x00000003 - 3
    329  0x00000004 - 4
    330  0xffffffff - -1
    331  0xfffff827 - -2009
    332  0x50000000 - 80 as a byte, padded with zeros.
    333 @end example
    334 
    335 Note that the lop_spec wrapping does not include the section
    336 contents.  Compare this to a non-loaded section specified as:
    337 
    338 @example
    339  .section thirdsec
    340  TETRA 200001,100002
    341  BYTE 38,40
    342 @end example
    343 
    344 This, when linked to address @samp{0x200000000000001c}, is
    345 represented by:
    346 
    347 @example
    348  0x98080050 - lop_spec 80
    349  0x00000002 - two 32-bit words for the section name
    350  0x7365636e - "thir"
    351  0x616d6500 - "dsec"
    352  0x00000010 - flag READONLY
    353  0x00000000 - high 32 bits of section length
    354  0x0000000c - section length is 12 bytes; 2 * 4 + 2 + alignment to 32 bits
    355  0x20000000 - high 32 bits of address
    356  0x0000001c - low 32 bits of address 0x200000000000001c
    357  0x00030d41 - 200001
    358  0x000186a2 - 100002
    359  0x26280000 - 38, 40 as bytes, padded with zeros
    360 @end example
    361 
    362 For the latter example, the section contents must not be
    363 loaded in memory, and is therefore specified as part of the
    364 special data.  The address is usually unimportant but might
    365 provide information for e.g.@: the DWARF 2 debugging format.
    366