ngle_manual.txt revision 1.2
1$NetBSD: ngle_manual.txt,v 1.2 2025/12/28 07:53:20 macallan Exp $ 2 3The Unofficial NGLE Manual 4 5Preface 6This manual covers what I've been able to figure out about HP's NGLE family of 7graphics devices commonly used in HP PA-RISC workstations, namely HCRX24 and 8PCI Visualize EG. It doesn't explain basic concepts but anyone with some 9graphics driver writing experience should be able to understand it. 10Since there is no official documentation available I used the NGLE code found in 11XFree86 3.3 as a starting point, with plenty of guesswork and experimentation. 12The xf86 code is somewhat obfuscated ( register names are random numbers, values 13written are almost all magic numbers ) and does not actually accelerate any 14graphics operations. It does however use the blitter to clear the framebuffer 15and attribute planes, show how to use a cursor sprite, colour LUTs and so on. 16None of this is endorsed, supported, or (likely) known to Hewlett-Packard. 17All register definitions are from 18https://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/ic/nglereg.h 19kernel drivers for HCRX and PCI Visualize EG: 20https://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/hppa/dev/hyperfb.c 21https://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/hppa/dev/gftfb.c 22Xorg driver: 23https://cvsweb.netbsd.org/bsdweb.cgi/xsrc/external/mit/xf86-video-ngle/dist/src/ 24 251. Now how does this thing work 26All NGLE devices work in more or less the same way, with some differences in 27details and additional features. Every device occupies a 32MB range, half of 28which contains the STI ROM and registers, the other is for framebuffer access. 29The framebuffer aperture can map exactly one chunk of video memory - things like 30front or back buffers, overlay, attribute planes, and a few unusual things, like 31colour maps and cursor sprite bitmaps. Read and write access can be controlled 32independently, and all settings apply to both the drawing engine and CPU access 33through the framebuffer aperture. 34That means there is no such thing as direct framebuffer access, everything goes 35through the graphics pipeline. If you set the engine to 32bit colour expansion 36then whatever you write into the aperture will be expanded. Also, care must be 37taken to not attempt to access video memory while updating the cursor image or 38colour maps. 39All framebuffer access applies a fixed pitch of 2048 pixels. 40The chips support the usual selection of graphics primitives - rectangle fill, 41copy, colour expansion, and indirect access. There's plenty more ( many have 3D 42features ) but these are completely unknown. 43All register addresses listed here are relative to STI region 2, and all 44registers are 32bit big endian, even on PCI. 45There is no available information on video mode programming other than 46disassembling STI ROMs, and the details are very likely board specific ( HCRX 47is fixed at 1280x1024 for example ). So in order to get going one would: 48- setup STI access 49- get the board type, hardware addresses, video mode etc. from STI's INIT_GRAPH 50 and INQ_CONF calls 51- map framebuffer and registers ( STI region 1 and 2 should be enough ) 52- do our own initialization - STI likes to set the planemask to only allow 53 access to the planes used for text output, and leaves bitmap access modes at 54 something suitable for rectangle fills and character drawing, not something 55 useful to write into the framebuffer 56 572. Framebuffer access 58#define NGLE_BAboth 0x018000 /* read and write mode */ 59#define NGLE_DBA 0x018004 /* Dest. Bitmap Access */ 60#define NGLE_SBA 0x018008 /* Source Bitmap Access */ 61 62#define BA(F,C,S,A,J,B,I) \ 63 (((F)<<31)|((C)<<27)|((S)<<24)|((A)<<21)|((J)<<16)|((B)<<12)|(I)) 64 /* FCCC CSSS AAAJ JJJJ BBBB IIII IIII IIII */ 65 66/* F */ 67#define IndexedDcd 0 /* Pixel data is indexed (pseudo) color */ 68#define FractDcd 1 /* Pixel data is Fractional 8-8-8 */ 69/* C */ 70#define Otc04 2 /* Pixels in each longword transfer (4) */ 71#define Otc32 5 /* Pixels in each longword transfer (32) */ 72#define Otc24 7 /* NGLE uses this for 24bit blits */ 73 /* Should really be... */ 74#define Otc01 7 /* one pixel per longword */ 75/* S */ 76#define Ots08 3 /* Each pixel is size (8)d transfer (1) */ 77#define OtsIndirect 6 /* Each bit goes through FG/BG color(8) */ 78/* A */ 79#define AddrByte 3 /* byte access? Used by NGLE for direct fb */ 80#define AddrLong 5 /* FB address is Long aligned (pixel) */ 81#define Addr24 7 /* used for colour map access */ 82/* B */ 83#define BINapp0I 0x0 /* Application Buffer 0, Indexed */ 84#define BINapp1I 0x1 /* Application Buffer 1, Indexed */ 85#define BINovly 0x2 /* 8 bit overlay */ 86#define BINcursor 0x6 /* cursor bitmap on EG */ 87#define BINcmask 0x7 /* cursor mask on EG */ 88#define BINapp0F8 0xa /* Application Buffer 0, Fractional 8-8-8 */ 89/* next one is a guess, my HCRX24 doesn't seem to have it */ 90#define BINapp1F8 0xb /* Application Buffer 1, Fractional 8-8-8 */ 91#define BINattr 0xd /* Attribute Bitmap */ 92#define BINcmap 0xf /* colour map(s) */ 93/* I assume one of the undefined BIN* accesses the HCRX Z-buffer add-on. No clue 94 * about bit depth or if any bits are used for stencil */ 95 96/* other buffers are unknown */ 97/* J - 'BA just point' - function unknown */ 98/* I - 'BA index base' - function unknown */ 99 100The BIN* values control which buffer we access, Addr* controls how memory is 101presented to the CPU. With AddrLong all pixels are at 32bit boundaries, no 102matter the actual colour depth. Otc* controls how many pixels we write with a 103single 32bit access, so for 8bit pixels we would use Otc04, for 24bit colour 104Otc01, and Otc32 is for mono to colour expansion. OtsIndirect enables colour 105expansion, combined with Otc32 every set bit writes a foreground colour pixel, 106unset bits can be transparent or background. 107The *Dcd bit's exact function is a bit unclear - we set it for 24bit colour 108access to both framebuffer and colour maps. I suspect enabling it on an 8bit 109buffer will result in R3G3B2 output from rendering and blending operations, 110which we know nothing about. 111So, for normal access to the overlay on an HCRX we would use IndexedDcd, Otc04, 112Ots8, AddrByte, BINovly, and set a suitable planemask and binary operation. 113 114All writes to the framebuffer, by CPU or drawing engine, have binary operations 115and a plane maskapplied to them: 116 117#define NGLE_PLANEMASK 0x018018 /* image planemask */ 118 119#define NGLE_IBO 0x01801c /* image binary op */ 120 121#define IBOvals(R,M,X,S,D,L,B,F) \ 122 (((R)<<8)|((M)<<16)|((X)<<24)|((S)<<29)|((D)<<28)|((L)<<31)|((B)<<1)|(F)) 123 /* LSSD XXXX MMMM MMMM RRRR RRRR ???? ??BF */ 124 125/* R is a standard X11 ROP, no idea if the other bits are used for anything */ 126#define RopClr 0x0 127#define RopSrc 0x3 128#define RopInv 0xc 129#define RopSet 0xf 130/* M: 'mask addr offset' - function unknown */ 131/* X */ 132#define BitmapExtent08 3 /* Each write hits ( 8) bits in depth */ 133#define BitmapExtent32 5 /* Each write hits (32) bits in depth */ 134/* S: 'static reg' flag, NGLE sets it for blits, function is unknown but 135 we get occasional garbage in 8bit blits without it */ 136/* D */ 137#define DataDynamic 0 /* Data register reloaded by direct access */ 138#define MaskDynamic 1 /* Mask register reloaded by direct access */ 139/* L */ 140I suspect this selects how many mask bits to use in Otc* less than 32. 141#define MaskOtc 0 /* Mask contains Object Count valid bits */ 142/* B = 1 -> background transparency for masked fills */ 143/* F probably the same for foreground */ 144 145These bit definitions are from xf86, the S bit seems to control masking off 146extra bits when the number of pixels written Otc* exceeds the right border. 147Not sure what exactly the *Dynamic and MaskOtc bits do. 148 149For plain framebuffer memory access just use RopSrc, BitmapExtent* matching your 150target buffer, and everything else zero. 151 152Framebuffer geometry is always 2048 pixels ( with pixel size determined by 153Addr* ) by whatever your hardware allows, areas outside the visible screen may 154or may not be accessible, or backed by memory. 155HCRX always runs in 1280x1024, there is always an overlay and at least one 8bit 156image buffer, HCRX24 has a 24bit buffer that can be used as two 8bit buffers. 157There is no usable off-screen memory, in fact there seem to be registers to the 158right of the visible area. 159On a PCI Visualize EG with 4MB we get an actual 2048x2048 buffer which we can 160use any way we want. 161Finally, the xf86 code writes an 8bit one into 162#define NGLE_CONTROL_FB 0x200005 163before framebuffer access, function is unknown but I suspect it turns off 164pipeline pacing, which is then re-enabled whenever we touch the blitter. 165 1663. Drawing engine 167Basically, you poke coordinates into registers and apply an opcode to the last 168write's address to start an operation ( and specify which ), and there are 169registers to control drawing mode, ROPs etc. 170All register writes go through a pipeline which has 32 entries on HCRX. 171 172#define NGLE_BUSY 0x200000 /* busy register */ 173the first byte will be non-zero if the drawing engine is busy 174 175#define NGLE_FIFO 0x200008 /* # of fifo slots */ 176 177X and width in the upper 16bit, Y / height in the lower. 178#define NGLE_DST_XY 0x000800 /* destination XY */ 179#define NGLE_SIZE 0x000804 /* size WH */ 180#define NGLE_SRC_XY 0x000808 /* source XY */ 181#define NGLE_TRANSFER_DATA 0x000820 /* 'transfer data' - this is */ 182 /* a pixel mask on fills */ 183#define NGLE_RECT 0x000200 /* opcode to start a fill */ 184#define NGLE_BLIT 0x000300 /* opcode to start a blit */ 185#define NGLE_HCRX_FASTFILL 0x000140 /* opcode for HCRX fast rect */ 186#define NGLE_RECT_SIZE_START (NGLE_SIZE | NGLE_RECT) 187#define NGLE_BLT_DST_START (NGLE_DST_XY | NGLE_BLIT) 188 189So, in order to draw a rectangle you write coordinates into NGLE_DST_XY, set 190NGLE_TRANSFER_DATA to all ones unless you want it stippled, then write the 191width/height into NGLE_SIZE|NGLE_RECT. Rectangle fills move the destination 192coordinates down by the rectangle's height. 193NGLE_BLIT copies a retangle from SRC_XY to DST_XY with ROP etc. applied. It is 194possible to copy data between buffers, supported combinations of source and 195destination access modes need to be investigated. 196There are likely other opcodes for things like vectors, triangles and so on. 197HCRX_FASTFILL is implied by the xf86 code, but not actually used. It seems to 198work, more or less, but with strange side effects. More invastigation is needed. 199 200#define NGLE_CPR 0x01800c /* control plane register */ 201This is used when drawing into BINattr, on EG we use 0x00000102, on HCRX 2020x04000F00 for 24bit. There has to be some conversion, there is no way the 203attribute plane is actually 32bit. No idea what the individual bits do, has to 204be a combination of buffer selection ( front or back), colour mode / LUT 205selection, likely chip specific. Known values are from xf86. 206 207#define NGLE_FG 0x018010 /* fg colour */ 208#define NGLE_BG 0x018014 /* bg colour */ 209 210For a plain rectangle fill into the overlay we would use 211IBOvals(RopSrc, 0, BitmapExtent08, 1, DataDynamic, 0, 0, 0) 212and 213BA(IndexedDcd, Otc32, OtsIndirect, AddrLong, 0, BINovly, 0) 214... which draws 32 pixels at a time, apparently rectangle fills are internally 215implemented as 32-at-a-time colour expansion, and the S bit makes sure overflow 216pixels on the right border are masked off automatically. Set FG for plain fills, 217BG if using a mask ( in TRANSFER_DATA ), set the B bit to make the background 218transparent. For writes into BINattr use the CPR register instead of FG. 219 220For a simple copy we would use 221BA(IndexedDcd, Otc04, Ots08, AddrLong, 0, BINovly, 0)) 222... to copy four pixels at a time, Addr* doesn't seem to matter, disable colour 223expansion. 224IBOvals(RopSrc, 0, BitmapExtent08, 1, DataDynamic, MaskOtc, 0, 0) 225... to write 8bit deep, plain copy, mask off extra pixels if our width isn't a 226multiple of 4. 227 228To do the same operations on a 24bit buffer just use Otc01, FractionalDcd and 229BitmapExtent32. No need to set the S bit on copies since all pixels are 32bit 230anyway, and in order to copy between different buffers just set DBA and SBA 231separately. Make sure they use the same depth or results may get weird. 232 2334. Indirect framebuffer writes 234HP calls the mechanism 'BINC', no idea what it stands for. Basically, you set a 235target address and then write data into registers which trigger operations 236programmed in DBA and IBO, with the target address being updated according to 237which data register we write to. There is also a mechanism to copy blocks, used 238for colour maps. 239 240#define NGLE_BINC_SRC 0x000480 /* BINC src */ 241#define NGLE_BINC_DST 0x0004a0 /* BINC dst */ 242#define NGLE_BINC_MASK 0x0005a0 /* BINC pixel mask */ 243#define NGLE_BINC_DATA 0x0005c0 /* BINC data, inc X, some sort of blending */ 244#define NGLE_BINC_DATA_R 0x000600 /* BINC data, inc X */ 245#define NGLE_BINC_DATA_D 0x000620 /* BINC data, inc Y */ 246#define NGLE_BINC_DATA_U 0x000640 /* BINC data, dec Y */ 247#define NGLE_BINC_DATA_L 0x000660 /* BINC data, dec X */ 248#define NGLE_BINC_DATA_DR 0x000680 /* BINC data, inc X, inc Y */ 249#define NGLE_BINC_DATA_DL 0x0006a0 /* BINC data, dec X, inc Y */ 250#define NGLE_BINC_DATA_UR 0x0006c0 /* BINC data, inc X, dec Y */ 251#define NGLE_BINC_DATA_UL 0x0006e0 /* BINC data, dec X, dec Y */ 252 253SRC and DST are 'linear' addresses, depending on Addr* in DBA, pitch is Addr* 254times 2048. 255The BINC_DATA registers differ only in the way the destination address is 256updated, up or down a line, left or right by Otc* pixels. 257So, in order to draw a 12x20 pixel character to (100,150) we would use the same 258DBA and IBO values we used for rectangles, write 0xfff00000 into NGLE_BINC_MASK 259to make sure we only write 12 pixels per line, set FG and BG as needed, set 260BINC_DST to (100 * 4 + 150 * 8192) - we're in AddrLong - then poke our character 261bitmap into NGLE_BINC_DATA_D, one left aligned line at a time. 262BINC operations by themselves are unlikely to overrun the pipeline but they may 263if a lot of them happen while something more time consuming, like a full screen 264scroll, is in progress. 265Not sure what exactly NGLE_BINC_DATA does, the xf86 code uses it for colour map 266updates. 267 2685. Colour maps 269LUTs are held in their own buffer ( BINcmap ), size is likely chip-specific. 270HCRX has room for at least three 256 entry colour maps, EG probably has two or 271three. 272Basically, we BINC-write our colour map into BINcmap, then tell the hardware to 273update the actual colour map(s) from that buffer. 274We'd use: 275BA(FractDcd, Otc01, Ots08, Addr24, 0, BINcmap, 0) 276IBOvals(RopSrc, 0, BitmapExtent08, 0, DataDynamic, MaskOtc, 0, 0) 277Not sure how 'Addr24' differs from AddrLong, but that's what the xf86 code uses. 278Then set BINC_DST to 0 ( or whichever entry we want to update - 4 for the 2nd 279entry etc. ) and poke our colour map into BINC_DATA_R, one entry at a time. 280Sending it to the DAC works like this - set BINC_SRC to 0, then write a command 281into the appropriate LUTBLT register: 282#define NGLE_EG_LUTBLT 0x200118 /* EG LUT blt ctrl */ 283 /* EWRRRROO OOOOOOOO TTRRRRLL LLLLLLLL */ 284 #define LBC_ENABLE 0x80000000 285 #define LBC_WAIT_BLANK 0x40000000 286 #define LBS_OFFSET_SHIFT 16 287 #define LBC_TYPE_MASK 0xc000 288 #define LBC_TYPE_CMAP 0 289 #define LBC_TYPE_CURSOR 0x8000 290 #define LBC_TYPE_OVERLAY 0xc000 291 #define LBC_LENGTH_SHIFT 0 292In order to update the whole thing we would use 293LBC_ENABLE | LBC_TYPE_CMAP | 0x100 294Length and offset are in 32bit words. 295 296HCRX uses a different register: 297#define NGLE_HCRX_LUTBLT 0x210020 /* HCRX LUT blt ctrl */ 298... which otherwise works exactly the same way. 299 300On HCRX we need: 301- a linear ramp in the first 256 entries, 24bit output goes through this. 302- the overlay's colour map starts at entry 512 303- hardware sprite colours are controlled by two entries using LBC_TYPE_CURSOR 304 and offset 0 305 306On EG: 307- the main colour map lives at offset 0, type LBC_TYPE_CMAP 308- four entries at offset 0 with LBC_TYPE_CURSOR, the first two do nothing, the 309 other two are cursor sprite colours 310 311There seems to be at least 512 entries worth of buffer space on both HCRX and 312EG, xf86 keeps the entire palette in there, updates entries as needed and always 313LUTBLTs the whole thing. 314 3156. Hardware cursor 316Again, chip-specific. Cursor position works the same on HCRX and PCI EG, uses 317different registers though. Older chips use a different register layout. 318Bitmap access is different on HCRX, both support a 64x64 sprite. 319 320#define NGLE_EG_CURSOR 0x200100 /* cursor coordinates on EG */ 321 #define EG_ENABLE_CURSOR 0x80000000 322#define NGLE_HCRX_CURSOR 0x210000 /* HCRX cursor coord & enable */ 323 #define HCRX_ENABLE_CURSOR 0x80000000 324Coordinates are signed 12bit quantities, X in the upper halfword, Y in the 325lower, enable bit at 0x80000000. There is no hotspot register, negative 326coordinates will move the sprite partially off screen as expected. 327On HCRX we need to write zero into 328#define NGLE_HCRX_VBUS 0x000420 /* HCRX video bus access */ 329before writing NGLE_HCRX_CURSOR. 330 331Cursor bitmap access on HCRX is simple: 332#define NGLE_HCRX_CURSOR_ADDR 0x210004 /* HCRX cursor address */ 333#define NGLE_HCRX_CURSOR_DATA 0x210008 /* HCRX cursor data */ 334The mask is at offset 0, bitmap at 0x80. Subsequent writes to CURSOR_DATA update 335the address as expected. 336 337On EG we have to use BINC writes: 338BA(IndexedDcd, Otc32, 0, AddrLong, 0, BINcmask, 0) 339IBOvals(RopSrc, 0, 0, 0, DataDynamic, MaskOtc, 0, 0) 340set BINC_DST to 0, then poke the mask into NGLE_BINC_DATA_R and 341NGLE_BINC_DATA_DL - write 32bit, move right, write the rest of the line, move 342down/left to the next line etc. 343No LUTBLT analog here, for the the cursor bitmap use BINcursor. 344 3457. Miscellaneous 346#define NGLE_HCRX_PLANE_ENABLE 0x21003c /* HCRX plane enable */ 347#define NGLE_HCRX_MISCVID 0x210040 /* HCRX misc video */ 348 #define HCRX_BOOST_ENABLE 0x80000000 /* extra high signal level */ 349 #define HCRX_VIDEO_ENABLE 0x0A000000 350 #define HCRX_OUTPUT_ENABLE 0x01000000 351xf86 uses HCRX_VIDEO_ENABLE, the other bits were found by experiment, functions 352are guesswork. There are other bits with unknown function. 353 354This is set by xf86, other values unknown. 355#define NGLE_HCRX_HB_MODE2 0x210120 /* HCRX 'hyperbowl' mode 2 */ 356 #define HYPERBOWL_MODE2_8_24 15 357 358This seems to be the HCRX's analogue to FX's force attribute register - we can 359switch between overlay opacity and image plane display mode on the fly 360#define NGLE_HCRX_HB_MODE 0x210130 /* HCRX 'hyperbowl' */ 361 #define HYPERBOWL_MODE_FOR_8_OVER_88_LUT0_NO_TRANSPARENCIES 4 362 #define HYPERBOWL_MODE01_8_24_LUT0_TRANSPARENT_LUT1_OPAQUE 8 363 #define HYPERBOWL_MODE01_8_24_LUT0_OPAQUE_LUT1_OPAQUE 10 364 3658. Visualize EG notes 366All referenves to 'EG' and the like strictly refer to the PCI Visualize EG card 367with 4MB video memory. There is a GSC variant which may have 2MB or 4MB, other 368differences are unknown. 369The xf86 code does not support the PCI EG at all, it seems to be somewhat 370similar to the 'Artist' variant, the cursor register is at the same address but 371works as on HCRX. I suspect the GSC variant to be more like Artist. 372It is possible to put cards with enough memory into double buffer mode using 373the firmware configuration menu - I need to figure out what exactly that does. 374Same with grey scale mode, which may just select a different default palette. 375