ngle_manual.txt revision 1.1
1$NetBSD: ngle_manual.txt,v 1.1 2025/12/25 11:41:09 macallan Exp $
2
3The Unofficial NGLE Manual
4
5Preface
6This manual covers what I've been able to figure out about HP's NGLE family of
7graphics devices commonly used in HP PA-RISC workstations, namely HCRX24 and
8PCI Visualize EG.
9Since there is no official documentation available I used the NGLE code found in
10XFree86 3.3 as a starting point, with plenty of guesswork and experimentation.
11The xf86 code is somewhat obfuscated ( register names are random numbers, and 
12few of the values written into them are explained ) and does not actually
13accelerate any graphics operations. It does however use the blitter to clear the
14framebuffer and attribute planes, show how to use a cursor sprite, colour LUTs
15and so on.
16None of this is endorsed, supported, or (likely) known to Hewlett-Packard.
17All register definitions are from
18https://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/ic/nglereg.h
19kernel drivers for HCRX and PCI Visualize EG:
20https://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/hppa/dev/hyperfb.c
21https://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/hppa/dev/gftfb.c
22Xorg driver:
23https://cvsweb.netbsd.org/bsdweb.cgi/xsrc/external/mit/xf86-video-ngle/dist/src/
24
251. Now how does this thing work
26All NGLE devices work in more or less the same way, with some differences in
27details and additional features. Every device occupies a 32MB range, half of
28which contains the STI ROM and registers, the other is for framebuffer access.
29The framebuffer aperture can map exactly one chunk video memory - things like
30front or back buffers, overlay, attribute planes, and a few unusual things, like
31colour maps and cursor sprite bitmaps. Read and write access can be controlled
32independently, and all settings apply to both the drawing engine and CPU access
33through the framebuffer aperture.
34That means there is no such thing as direct framebuffer access, everything goes
35through the graphics pipeline. If you set the engine to 32bit colour expansion
36then whatever you write into the aperture will be expanded. Also, care must be
37taken to not atempt to access video memory while updating the cursor image or
38colour maps.
39All framebuffer access applies a fixed pitch of 2048 pixels.
40The chips support the usual selection of graphics primitives - rectangle fill,
41copy, colour expansion, and indirect access. There's plenty more ( many have 3D
42features ) but these are completely unknown.
43All register addresses are relative to STI region 2, and all registers are big
44endian, even on PCI.
45
462. Framebuffer access
47#define	NGLE_BAboth		0x018000	/* read and write mode */
48#define	NGLE_DBA		0x018004	/* Dest. Bitmap Access */
49#define	NGLE_SBA		0x018008	/* Source Bitmap Access */
50
51#define BA(F,C,S,A,J,B,I)						\
52	(((F)<<31)|((C)<<27)|((S)<<24)|((A)<<21)|((J)<<16)|((B)<<12)|(I))
53	/* FCCC CSSS AAAJ JJJJ BBBB IIII IIII IIII */
54
55/* F */
56#define	    IndexedDcd	0	/* Pixel data is indexed (pseudo) color */
57#define	    FractDcd	1	/* Pixel data is Fractional 8-8-8 */
58/* C */
59#define	    Otc04	2	/* Pixels in each longword transfer (4) */
60#define	    Otc32	5	/* Pixels in each longword transfer (32) */
61#define	    Otc24	7	/* NGLE uses this for 24bit blits */
62				/* Should really be... */
63#define	    Otc01	7	/* one pixel per longword */
64/* S */
65#define	    Ots08	3	/* Each pixel is size (8)d transfer (1) */
66#define	    OtsIndirect	6	/* Each bit goes through FG/BG color(8) */
67/* A */
68#define	    AddrByte	3	/* byte access? Used by NGLE for direct fb */
69#define	    AddrLong	5	/* FB address is Long aligned (pixel) */
70#define     Addr24	7	/* used for colour map access */
71/* B */
72#define	    BINapp0I	0x0	/* Application Buffer 0, Indexed */
73#define	    BINapp1I	0x1	/* Application Buffer 1, Indexed */
74#define	    BINovly	0x2	/* 8 bit overlay */
75#define	    BINcursor	0x6	/* cursor bitmap on EG */
76#define	    BINcmask	0x7	/* cursor mask on EG */
77#define	    BINapp0F8	0xa	/* Application Buffer 0, Fractional 8-8-8 */
78/* next one is a guess, my HCRX24 doesn't seem to have it */
79#define	    BINapp1F8	0xb	/* Application Buffer 1, Fractional 8-8-8 */
80#define	    BINattr	0xd	/* Attribute Bitmap */
81#define	    BINcmap	0xf	/* colour map(s) */
82/* I assume one of the undefined BIN* accesses the HCRX Z-buffer add-on. No clue
83 * about bit depth or if any bits are used for stencil */
84 
85/* other buffers are unknown */
86/* J - 'BA just point' - function unknown */
87/* I - 'BA index base' - function unknown */
88
89The BIN* values control which buffer we access, Addr* controls how memory is
90presented to the CPU. With AddrLong all pixels are at 32bit boundaries, no
91matter the actual colour depth. Otc* controls how many pixels we write with a
92single 32bit access, so for 8bit we would use Otc04, for 24bit colour Otc01,
93and Otc32 is for mono to colour expansion. OtsIndirect enables colour
94expansion, combined with Otc32 every set bit writes a foreground colour pixel,
95unset bits can be transparent or background.
96The *Dcd bit's exact function is a bit unclear - we set it for 24bit colour
97access to both framebuffer and colour maps. I suspect enabling it on an 8bit
98buffer will result in R3G3B2 output from rendering and blending operations,
99which we know nothing about.
100So, for normal access to the overlay on an HCRX we would use IndexedDcd, Otc04,
101Ots8, AddrByte, BINovly.
102Framebuffer geometry is always 2048 pixels ( with pixel size determined by
103Addr* ) by whatever your hardware allows, memory outside the visible screen may
104or may not be accessible.
105HCRX always runs in 1280x1024, there is always an overlay and at least one 8bit
106image buffer, HCRX24 has a 24bit buffer that can be used as two 8bit buffers.
107There is no usable off-screen memory, in fact there seem to be registers to the
108right of the visible area.
109On a PCI Visualize EG we get an actual 2048x2048 buffer which we can use any
110way we want.
111Finally, the xf86 code writes a one into
112#define	NGLE_CONTROL_FB		0x200005
113before framebuffer access, function is unknown but I suspect it turns off
114pipeline pacing, which is then re-enabled whenever we touch the blitter.
115
1163. Drawing engine
117Basically, you poke coordinates into registers and apply an opcode to the last
118write's address to start an operation ( and specify which ), and there are
119registers to control drawing mode, ROPs etc.
120All register writes go through a pipeline which has 32 entries on HCRX.
121
122#define	NGLE_BUSY		0x200000	/* busy register */
123the first byte will be non-zero if the drawing engine is busy
124
125#define	NGLE_FIFO		0x200008	/* # of fifo slots */
126
127X and width in the upper 16bit, Y / height in the lower.
128#define	NGLE_DST_XY		0x000800	/* destination XY */
129#define	NGLE_SIZE		0x000804	/* size WH */
130#define	NGLE_SRC_XY		0x000808	/* source XY */
131#define	NGLE_TRANSFER_DATA	0x000820	/* 'transfer data' - this is */
132						/* a pixel mask on fills */
133#define NGLE_RECT		0x000200	/* opcode to start a fill */
134#define NGLE_BLIT		0x000300	/* opcode to start a blit */
135#define NGLE_HCRX_FASTFILL	0x000140	/* opcode for HCRX fast rect */
136#define	NGLE_RECT_SIZE_START	(NGLE_SIZE | NGLE_RECT)
137#define	NGLE_BLT_DST_START	(NGLE_DST_XY | NGLE_BLIT)
138
139So, in order to draw a rectangle you write coordinates into NGLE_DST_XY, set
140NGLE_TRANSFER_DATA to all ones unless you want it stippled, then write the
141width/height into NGLE_SIZE|NGLE_RECT. Rectangle fills move the destination
142coordinates down by the rectangle's height.
143NGLE_BLIT copies a retangle from SRC_XY to DST_XY with ROP etc. applied. It is
144possible to copy data between buffers, supported combinations of source and
145destination access modes need to be investigated.
146There are likely other opcodes for things like vectors, triangles and so on.
147HCRX_FASTFILL is implied by the xf86 code, but not actually used. It seems to
148work, more or less, but with strange side effects. More invastigation is needed.
149
150#define	NGLE_CPR		0x01800c	/* control plane register */
151This is used when drawing into BINattr, on EG we use 0x00000102, on HCRX 
1520x04000F00 for 24bit. There has to be some conversion, there is no way the
153attribute plane is actually 32bit. No idea what the individual bits do, has to
154be a combination of buffer selection ( front or back), colour mode / LUT
155selection, likely chip specific. Known values are from xf86.
156
157#define	NGLE_FG			0x018010	/* fg colour */
158#define	NGLE_BG			0x018014	/* bg colour */
159#define	NGLE_PLANEMASK		0x018018	/* image planemask */
160#define	NGLE_IBO		0x01801c	/* image binary op */
161
162#define IBOvals(R,M,X,S,D,L,B,F)					\
163	(((R)<<8)|((M)<<16)|((X)<<24)|((S)<<29)|((D)<<28)|((L)<<31)|((B)<<1)|(F))
164	/* LSSD XXXX MMMM MMMM RRRR RRRR ???? ??BF */
165
166/* R is a standard X11 ROP, no idea if the other bits are used for anything  */
167#define	    RopClr 	0x0
168#define	    RopSrc 	0x3
169#define	    RopInv 	0xc
170#define	    RopSet 	0xf
171/* M: 'mask addr offset' - function unknown */
172/* X */
173#define	    BitmapExtent08  3	/* Each write hits ( 8) bits in depth */
174#define	    BitmapExtent32  5	/* Each write hits (32) bits in depth */
175/* S: 'static reg' flag, NGLE sets it for blits, function is unknown but
176      we get occasional garbage in 8bit blits without it  */
177/* D */
178#define	    DataDynamic	    0	/* Data register reloaded by direct access */
179#define	    MaskDynamic	    1	/* Mask register reloaded by direct access */
180/* L */
181I suspect this selects how many mask bits to use in Otc* less than 32.
182#define	    MaskOtc	    0	/* Mask contains Object Count valid bits */
183/* B = 1 -> background transparency for masked fills */
184/* F probably the same for foreground */
185
186These bit definitions are from xf86, the S bit seems to control masking off
187extra bits when the number of pixels written Otc* exceeds the right border.
188Not sure what exactly the *Dynamic and MaskOtc bits do.
189For a plain rectangle fill into the overlay we would use 
190IBOvals(RopSrc, 0, BitmapExtent08, 1, DataDynamic, 0, 0, 0)
191and
192BA(IndexedDcd, Otc32, OtsIndirect, AddrLong, 0, BINovly, 0)
193... which draws 32 pixels at a time, without the S bit we would get our width
194expanded to the next multiple of 32. Enable colour expansion so we draw in 
195whatever colour is in NGLE_FG.
196
197For a simple copy we would use
198BA(IndexedDcd, Otc04, Ots08, AddrLong, 0, BINovly, 0))
199... top copy four pixels at a time, Addr* doesn't seem to matter, disable colour
200expansion.
201IBOvals(RopSrc, 0, BitmapExtent08, 1, DataDynamic, MaskOtc, 0, 0)
202... to write 8bit deep, plain copy, mask off extra pixels if our width isn't a
203multiple of 4.
204
205To do the same operations on a 24bit buffer just use Otc01, FractionalDcd and
206BitmapExtent32. No need to set the S bit on copies since all pixels are 32bit
207anyway.
208
2094. Indirect framebuffer writes
210HP calls the mechanism 'BINC', no idea what it stands for. Basically, you set a
211target address and then write data into registers which trigger operations
212programmed in DBA and IBO, with the traget address being updated according to
213which data register we write to. There is also a mechanism to copy blocks, used
214for colour map and cursor bitmaps.
215
216#define	NGLE_BINC_SRC		0x000480	/* BINC src */
217#define	NGLE_BINC_DST		0x0004a0	/* BINC dst */
218#define	NGLE_BINC_MASK		0x0005a0	/* BINC pixel mask */
219#define	NGLE_BINC_DATA		0x0005c0	/* BINC data, inc X, some sort of blending */
220#define	NGLE_BINC_DATA_R	0x000600	/* BINC data, inc X */
221#define	NGLE_BINC_DATA_D	0x000620	/* BINC data, inc Y */
222#define	NGLE_BINC_DATA_U	0x000640	/* BINC data, dec Y */
223#define	NGLE_BINC_DATA_L	0x000660	/* BINC data, dec X */
224#define	NGLE_BINC_DATA_DR	0x000680	/* BINC data, inc X, inc Y */
225#define	NGLE_BINC_DATA_DL	0x0006a0	/* BINC data, dec X, inc Y */
226#define	NGLE_BINC_DATA_UR	0x0006c0	/* BINC data, inc X, dec Y */
227#define	NGLE_BINC_DATA_UL	0x0006e0	/* BINC data, dec X, dec Y */
228
229SRC and DST are 'linear' addresses, depending on Addr* in DBA, pitch is Addr*
230times 2048.
231The BINC_DATA registers differ only in the way the destination address is
232updated, up or down a line, left or right by Otc* pixels.
233So, in order to draw a 12x20 pixel character to (100,150) we would use the same
234DBA and IBO values we used for rectangles, write 0xfff00000 into NGLE_BINC_MASK
235to make sure we only write 12 pixels per line, set FG and BG as needed, set
236BINC_DST to (100 * 4 + 150 * 8192) - we're in AddrLong - then poke our character
237bitmap into NGLE_BINC_DATA_D, one left aligned line at a time.
238BINC operations by themselves are unlikely to overrun the pipeline but they may
239if a lot of them happen while something more time consuming, like a full screen
240scroll, is in progress.
241Not sure what exactly NGLE_BINC_DATA does, the xf86 code uses it for colour map
242updates.
243
2445. Colour maps
245LUTs are held in their own buffer ( BINcmap ), size is likely chip-specific.
246HCRX has room for at least three 256 entry colour maps, EG probably has two or
247three.
248Basically, we BINC-write our colour map into BINcmap, then tell the hardware to
249update the actual colour map(s) from that buffer.
250We'd use:
251BA(FractDcd, Otc01, Ots08, Addr24, 0, BINcmap, 0)
252IBOvals(RopSrc, 0, BitmapExtent08, 0, DataDynamic, MaskOtc, 0, 0)
253Not sure how 'Addr24' differs from AddrLong, but that's what the xf86 code uses.
254Then set BINC_DST to 0 ( or whichever entry we want to update - 4 for the 2nd
255entry etc. ) and poke our colour map into BINC_DATA_R, one entry at a time.
256Sending it to the DAC works like this - set BINC_SRC to 0, then write a command
257into the appropriate LUTBLT register:
258#define	NGLE_EG_LUTBLT		0x200118	/* EG LUT blt ctrl */
259	/* EWRRRROO OOOOOOOO TTRRRRLL LLLLLLLL */
260	#define LBC_ENABLE	0x80000000
261	#define LBC_WAIT_BLANK	0x40000000
262	#define LBS_OFFSET_SHIFT	16
263	#define LBC_TYPE_MASK		0xc000
264	#define LBC_TYPE_CMAP		0
265	#define LBC_TYPE_CURSOR		0x8000
266	#define LBC_TYPE_OVERLAY	0xc000
267	#define LBC_LENGTH_SHIFT	0
268In order to update the whole thing we would use 
269LBC_ENABLE | LBC_TYPE_CMAP | 0x100
270Length and offset are in 32bit words.
271
272HCRX uses a different register:
273#define	NGLE_HCRX_LUTBLT	0x210020	/* HCRX LUT blt ctrl */
274... which otherwise works exactly the same way.
275
276On HCRX we need:
277- a linear ramp in the first 256 entries, 24bit output goes through this.
278- the overlay's colour map starts at entry 512
279- hardware sprite colours are controlled by two entries using LBC_TYPE_CURSOR
280  and offset 0
281
282On EG:
283- the main colour map lives at offset 0, type LBC_TYPE_CMAP
284- four entries at offset 0 with LBC_TYPE_CURSOR, the first two do nothing, the
285  other two are cursor sprite colours
286
2876. Hardware cursor
288Again, chip-specific. Cursor position works the same on HCRX and  PCI EG, uses
289different registers though. Older chips use a different register layout.
290Bitmap access is different on HCRX, both support a 64x64 sprite.
291
292#define	NGLE_EG_CURSOR		0x200100	/* cursor coordinates on EG */
293	#define EG_ENABLE_CURSOR	0x80000000
294#define	NGLE_HCRX_CURSOR	0x210000	/* HCRX cursor coord & enable */
295	#define HCRX_ENABLE_CURSOR	0x80000000
296Coordinates are signed 12bit quantities, X in the upper halfword, Y in the
297lower, enable bit at 0x80000000. There is no hotspot register, negative
298coordinates will move the sprite partially off screen as expected.
299On HCRX we need to write zero into
300#define	NGLE_HCRX_VBUS		0x000420	/* HCRX video bus access */
301before writing NGLE_HCRX_CURSOR.
302
303Cursor bitmap access on HCRX is simple:
304#define	NGLE_HCRX_CURSOR_ADDR	0x210004	/* HCRX cursor address */
305#define	NGLE_HCRX_CURSOR_DATA	0x210008	/* HCRX cursor data */
306The mask is at offset 0, bitmap at 0x80. Subsequent writes to CURSOR_DATA update
307the address as expected.
308
309On EG we have to use BINC writes:
310BA(IndexedDcd, Otc32, 0, AddrLong, 0, BINcmask, 0)
311IBOvals(RopSrc, 0, 0, 0, DataDynamic, MaskOtc, 0, 0)
312set BINC_DST to 0, then poke the mask into NGLE_BINC_DATA_R and
313NGLE_BINC_DATA_DL - write 32bit, move right, write the rest of the line, move
314down/left to the next line etc.
315No LUTBLT analog here, for the the cursor bitmap use BINcursor.
316
3177. Miscellaneous
318#define	NGLE_HCRX_PLANE_ENABLE	0x21003c	/* HCRX plane enable */ 
319#define	NGLE_HCRX_MISCVID	0x210040	/* HCRX misc video */
320	#define HCRX_BOOST_ENABLE	0x80000000 /* extra high signal level */
321	#define HCRX_VIDEO_ENABLE	0x0A000000
322	#define HCRX_OUTPUT_ENABLE	0x01000000
323xf86 uses HCRX_VIDEO_ENABLE, the other bits were found by experiment, functions
324are guesswork. There are other bits with unknown function.
325
326This is set by xf86, other values unknown.
327#define	NGLE_HCRX_HB_MODE2	0x210120	/* HCRX 'hyperbowl' mode 2 */
328	#define HYPERBOWL_MODE2_8_24					15
329
330This seems to be the HCRX's analogue to FX's force attribute register - we can
331switch between overlay opacity and image plane display mode on the fly
332#define	NGLE_HCRX_HB_MODE	0x210130	/* HCRX 'hyperbowl' */
333	#define HYPERBOWL_MODE_FOR_8_OVER_88_LUT0_NO_TRANSPARENCIES	4
334	#define HYPERBOWL_MODE01_8_24_LUT0_TRANSPARENT_LUT1_OPAQUE	8
335	#define HYPERBOWL_MODE01_8_24_LUT0_OPAQUE_LUT1_OPAQUE		10
336
3378. Visualize EG notes
338All referenves to 'EG' and the like strictly refer to the PCI Visualize EG card
339with 4MB video memory. There is a GSC variant whch may have 2MB or 4MB, other
340differences are unknown.
341The xf86 code does not support the PCI EG at all, it seems to be somewhat
342similar to the 'Artist' variant, the cursor register is at the same address but
343works as on HCRX. I suspect the GSC variant to be more like Artist.
344It is possible to put cards with enough memory into double buffer mode using
345the firmware configuration menu - I need to figure out what exactly that does.
346Same with grey scale mode, which may just select a different default palette. 
347