1d983712dSmrg
2d983712dSmrg
3d983712dSmrgThis file is NOT up to date for the New Design!
4d983712dSmrg
5d983712dSmrg
6d983712dSmrg
7d983712dSmrg
8d983712dSmrg============== old (pre-ND) contents below ==============
9d983712dSmrg
10d983712dSmrg"I just thought it would be usefull if we had some kind of TODO and BUGS
11d983712dSmrgfiles in the distribution as it would make it easier to see what is needed
12d983712dSmrgto be done and what could be done better, instead of browsing through the
13d983712dSmrgsourcecode. And we whould be able to se the progress literally by the ever
14d983712dSmrgdecreasing TODO file :-)"
15d983712dSmrg
16d983712dSmrg
17d983712dSmrg## BUGS:
18d983712dSmrg
19d983712dSmrgAll Tseng cards:
20d983712dSmrg
21d983712dSmrg* We definitely NEED to fix that color-expansion problem. See Appendix A
22d983712dSmrgbelow for a detailed explanation.
23d983712dSmrg
24d983712dSmrg* There are still some problems with the HW-cursor. The error message about
25d983712dSmrg"wrong color selected" is disabled, and the limitation documented. Better
26d983712dSmrgwould be to have a way to dynamically switch to software-cursor mode if the
27d983712dSmrgcolor can not be made. HW cursor doesn't work in DoubleScan modes yet (only
28d983712dSmrghalf of the cursor displayed)
29d983712dSmrg
30d983712dSmrg* text font sometimes corrupted when going back to text mode. This may be
31d983712dSmrgrelated to the order in which registers are restored: the ARK driver first
32d983712dSmrgrestores extended registers before restoring the standard registers for
33d983712dSmrgexcactly this reason.
34d983712dSmrg
35d983712dSmrg* The code needs to be heavily reworked to fix all sorts of data type
36d983712dSmrgproblems. The current code will certainly not run on an Alpha. The first
37d983712dSmrgstep is to replace all hardware related variables by CARD8/CARD16/CARD32
38d983712dSmrgtypes.
39d983712dSmrg
40d983712dSmrg
41d983712dSmrgET6000:
42d983712dSmrg
43d983712dSmrg* The trapezoid code is disabled because it doesn't comply with the way the
44d983712dSmrgnon-accelerated ("cfb") code does things. This needs to be fixed.
45d983712dSmrg
46d983712dSmrg
47d983712dSmrgET-4000(W32):
48d983712dSmrg
49d983712dSmrg* Hardware cursor support for the W32 is still lacking color support. We
50d983712dSmrgneed to reserve color cells #0 and #255 to make this work. From discussions
51d983712dSmrgon the development list, it seems the best solution is to allocate these cells
52d983712dSmrgread-write, and then use them for the HW cursor. We MUST however document
53d983712dSmrgthat this will break some clients which depend on a fixed color in cell #0,
54d983712dSmrgand some others that rely on the presence of 256 color cells. It will also
55d983712dSmrgcause cursor color problems when someone uses a local color map.
56d983712dSmrg
57d983712dSmrg
58d983712dSmrg## TODO:
59d983712dSmrg
60d983712dSmrgAll cards:
61d983712dSmrg
62d983712dSmrg* The accelerator on the Tseng devices is capable of much more. Especially
63d983712dSmrgthe pattern support is not used most of the time: It can render a pattern in
64d983712dSmrgjust about every accelerated operation. This means patterned lines, bitblts,
65d983712dSmrgscreencopies, etc. are possible. However, operations like these are very
66d983712dSmrguncommon in normal server use, so the speed benefit would go largely unnoticed.
67d983712dSmrg
68d983712dSmrg
69d983712dSmrgET4000:
70d983712dSmrg
71d983712dSmrg* support needs to be added for several clockchips and RAMDACs:
72d983712dSmrg        - 8-bit RAMDAC support for >8bpp modes: Sierra DACs and possibly others
73d983712dSmrg        - AT&T 20C49x RAMDAC support is not correct.
74d983712dSmrg
75d983712dSmrg* SuperProbe could use an update. It doesn't detect some of the RAMDACs that
76d983712dSmrgare detected by the driver.
77d983712dSmrg
78d983712dSmrg* Several of the color expansion-related accelerations are still only 8bpp.
79d983712dSmrgIt should be easy to use the same trick on those as on the standard color
80d983712dSmrgexpand code (use intermediate buffer, expand data before blitting).
81d983712dSmrg
82d983712dSmrg* many of the operations that the W32 family can't support natively (e.g.
83d983712dSmrgFillRectSolid for 24bpp) can be performed using CPU-to-screen operations,
84d983712dSmrgfeeding the correct (color) information through the ACL aperture.
85d983712dSmrg
86d983712dSmrg
87d983712dSmrgET6000:
88d983712dSmrg
89d983712dSmrg* someone might want to look at how the bitBLT engine of the ET6000 is
90d983712dSmrgconstructed, and come up with some fancy ways of abusing it. We're still
91d983712dSmrgonly using a small part of it (I'm thinking about the compare map and the
92d983712dSmrgextensions to the MIX hardware compared to the ET4000).
93d983712dSmrg
94d983712dSmrg* Mclk support is still lacking (that would also allow MClk-dependent
95d983712dSmrgmaximum bandwidth).
96d983712dSmrg
97d983712dSmrg* Apart from the things mentionned above, I think the ET6000 server is
98d983712dSmrgpretty complete. Some optimisations could possibly be added. Like for
99d983712dSmrgexample some assembler code for calculating a framebuffer address from X/Y
100d983712dSmrgcoordinates. That would help to speed up small blits.
101d983712dSmrg
102d983712dSmrg
103d983712dSmrg=======================================================================
104d983712dSmrgAPPENDIX A: the color expansion problem
105d983712dSmrg----------------------------------------
106d983712dSmrg
107d983712dSmrgAs suggested in the data book, we're doing font rendering using the
108d983712dSmrgcolor-expansion (MIX map) capabilities of the Tseng accelerator.
109d983712dSmrg
110d983712dSmrgWe're using a ping-pong buffer scheme (triple buffering actually) in
111d983712dSmrgoff-screen memory to store one scanline worth of font data at a time. each
112d983712dSmrgof these scanlines is "blitted" to on-screen memory using the accelerator.
113d983712dSmrgThe scanline is the MIX map, and there's also a 4x1 solid foreground color
114d983712dSmrg(SRC map), and a 4x1 solid background color (PAT map). 
115d983712dSmrg
116d983712dSmrgBasically, the flow is as follows:
117d983712dSmrg
118d983712dSmrg	- setup accelerator for font-expansion
119d983712dSmrg	
120d983712dSmrg	- store scanline 1 in off-screen memory buffer 1
121d983712dSmrg	
122d983712dSmrg	- start operation
123d983712dSmrg	
124d983712dSmrg	- store scanline 2 in off-screen memory buffer 2
125d983712dSmrg	
126d983712dSmrg	- start operation
127d983712dSmrg	
128d983712dSmrg	- store scanline 3 in off-screen memory buffer 3
129d983712dSmrg	
130d983712dSmrg	- start operation
131d983712dSmrg	
132d983712dSmrg	- store scanline 4 in off-screen memory buffer 1
133d983712dSmrg	
134d983712dSmrg	- start operation
135d983712dSmrg	
136d983712dSmrg	... etc, until the whole line of text is drawn.
137d983712dSmrg	
138d983712dSmrgThere is no explicit "waiting" for the accelerator to finish an operation
139d983712dSmrgbefore starting a new one, because it has been set up to add "wait-states"
140d983712dSmrgwhen the queue is full. We're aiming to use concurrency between the
141d983712dSmrgaccelerator and the storing of scanlines in the buffers. Anyway, waiting
142d983712dSmrgafter each operation doesn't help.
143d983712dSmrg
144d983712dSmrgNow, in 99% of all cases, text is rendered OK. But in some cases, we're
145d983712dSmrgseeing severe font corruption.
146d983712dSmrg
147d983712dSmrgWhat we're seeing is this: sometimes, exactly 32 pixels of a scanline are
148d983712dSmrgrendered with the scanline data that was there BEFORE, instead of the one
149d983712dSmrgthat was just written into the scanline buffer. In other words, 32 pixels of
150d983712dSmrgline 2 (for example) are rendered at line 5. The rest of the scanline can be
151d983712dSmrgOK (i.e. data from scanline 5 is actually written there).
152d983712dSmrg
153d983712dSmrgHere's an attempt at showing you what _should_ have been rendered:
154d983712dSmrg
155d983712dSmrg1
156d983712dSmrg2   #####################################################################
157d983712dSmrg3
158d983712dSmrg4
159d983712dSmrg5
160d983712dSmrg6   #####################################################################
161d983712dSmrg7
162d983712dSmrg8
163d983712dSmrg9
164d983712dSmrg10  #####################################################################
165d983712dSmrg11
166d983712dSmrg12
167d983712dSmrg13
168d983712dSmrg14  #####################################################################
169d983712dSmrg15
170d983712dSmrg
171d983712dSmrg
172d983712dSmrg
173d983712dSmrgand what _is_ rendered sometimes (only an example):
174d983712dSmrg
175d983712dSmrg1  
176d983712dSmrg2   #####################################################################
177d983712dSmrg3
178d983712dSmrg4
179d983712dSmrg5 
180d983712dSmrg6   ########################                                #############
181d983712dSmrg7
182d983712dSmrg8
183d983712dSmrg9
184d983712dSmrg10  #####################################################################
185d983712dSmrg11
186d983712dSmrg12
187d983712dSmrg13  ########################
188d983712dSmrg14  #####################################################################
189d983712dSmrg15  
190d983712dSmrg
191d983712dSmrgAt line 6, 32 pixels of the "black" scanline data from line 3 is rendered
192d983712dSmrginstead of the actual full-white that would normally have to be there. At
193d983712dSmrgline 13, the opposite happened (data from line 10 rendered at line 13). This
194d983712dSmrg32-pixel width of the "bug" is independent of the color depth: we're seeing
195d983712dSmrgthis at 8bpp as well as at 16bpp, 24bpp and 32bpp. 32 pixels each time.
196d983712dSmrg
197d983712dSmrgRemember, we're talking triple-buffering here, so the "wrongly" rendered
198d983712dSmrgdata is in fact the data that was in the scanline-buffer from the PREVIOUS
199d983712dSmrgoperation that used that buffer.
200d983712dSmrg
201d983712dSmrgIn fact, my best explanation is that sometimes, a whole DWORD (32 bits) of
202d983712dSmrgdata isn't in the video memory yet by the time the accelerator starts
203d983712dSmrgrendering with it.
204d983712dSmrg
205d983712dSmrgBut the data _is_ being written to there by the driver software, because if
206d983712dSmrgyou restart the scanline-operation again, without writing any more data to
207d983712dSmrgthe scanline buffers (only the MIX address and the destination address are
208d983712dSmrgreprogrammed to restart the scanline color expansion operation -- see code
209d983712dSmrgin tseng_acl.c), data _is_ rendered correctly.
210d983712dSmrg
211d983712dSmrg
212d983712dSmrg
213d983712dSmrgI have investigated this as far as I possibly can. I checked if the data was
214d983712dSmrgactually written in video memory. It was. I checked all kinds of PCI-related
215d983712dSmrgthings, like write-gathering or write-reordering of the PCI chipset, etc. I
216d983712dSmrgdisabled all possible enhanced features, both on the PCI chipset, inside the
217d983712dSmrgCPU, and on the ET6000.
218d983712dSmrg
219d983712dSmrgWhat strikes me, is that the exact same problems are seen on ET4000W32p as
220d983712dSmrgon the ET6000. This immediately rules out any special features that were
221d983712dSmrgonly added with the ET6000, like problems with the MDRAM cache buffers, etc.
222d983712dSmrgIt seems to be a generic problem to all Tseng accelerators.
223d983712dSmrg
224d983712dSmrgThe exact same higher-level code is being used for other chipsets as well
225d983712dSmrg(i.e. the system of writing scanlines of data to off-screen memory and
226d983712dSmrgmaking the accelerator expand it into on-screen memory), and there are no
227d983712dSmrgproblems on these other chipsets. The acceleration architecture we're using
228d983712dSmrgis completely device-independent up to the point where each chip needs to
229d983712dSmrgprovide a
230d983712dSmrg
231d983712dSmrg	SetupForScanlineScreenToScreenColorExpand()
232d983712dSmrg
233d983712dSmrgand a
234d983712dSmrg
235d983712dSmrg	SubsequentScanlineScreenToScreenColorExpand()
236d983712dSmrgfunction.
237d983712dSmrg
238d983712dSmrgSince the higher-level code is being used by other chip drivers as well, it
239d983712dSmrgseems to be OK.
240d983712dSmrg
241d983712dSmrgSo the problem is either in those device-dependent functions, or in the
242d983712dSmrghardware itself.
243d983712dSmrg
244d983712dSmrg
245d983712dSmrgI have found one kludge to work around this problem, and it should (?) tell
246d983712dSmrgyou a lot about the problem: if I start each scanline-colorexpand operation
247d983712dSmrgTWICE, rendering is suddenly perfect (at least there are so little rendering
248d983712dSmrgerrors that I haven't seen any yet).
249d983712dSmrg
250d983712dSmrg
251d983712dSmrgI am including the two device-depending functions so that you may be able to
252d983712dSmrgfollow what I'm saying here:
253d983712dSmrg
254d983712dSmrg
255d983712dSmrg
256d983712dSmrgOne entire line of text is drawn by calling the Setup() function ONCE. All
257d983712dSmrgscanlines of text (16 of them in case of a 8x16 font) are drawn by filling
258d983712dSmrgthe off-screen scanline buffers and calling the Subsequent() function.
259d983712dSmrg
260d983712dSmrg
261d983712dSmrg
262d983712dSmrg
263d983712dSmrg
264d983712dSmrg$XFree86: xc/programs/Xserver/hw/xfree86/drivers/tseng/README,v 1.12 2000/08/08 08:58:06 eich Exp $
265