1 1.8 riastrad /* $NetBSD: intel_lrc.c,v 1.8 2021/12/19 12:32:15 riastradh Exp $ */ 2 1.1 riastrad 3 1.1 riastrad /* 4 1.1 riastrad * Copyright 2014 Intel Corporation 5 1.1 riastrad * 6 1.1 riastrad * Permission is hereby granted, free of charge, to any person obtaining a 7 1.1 riastrad * copy of this software and associated documentation files (the "Software"), 8 1.1 riastrad * to deal in the Software without restriction, including without limitation 9 1.1 riastrad * the rights to use, copy, modify, merge, publish, distribute, sublicense, 10 1.1 riastrad * and/or sell copies of the Software, and to permit persons to whom the 11 1.1 riastrad * Software is furnished to do so, subject to the following conditions: 12 1.1 riastrad * 13 1.1 riastrad * The above copyright notice and this permission notice (including the next 14 1.1 riastrad * paragraph) shall be included in all copies or substantial portions of the 15 1.1 riastrad * Software. 16 1.1 riastrad * 17 1.1 riastrad * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 18 1.1 riastrad * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 19 1.1 riastrad * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 20 1.1 riastrad * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 21 1.1 riastrad * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 22 1.1 riastrad * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS 23 1.1 riastrad * IN THE SOFTWARE. 24 1.1 riastrad * 25 1.1 riastrad * Authors: 26 1.1 riastrad * Ben Widawsky <ben (at) bwidawsk.net> 27 1.1 riastrad * Michel Thierry <michel.thierry (at) intel.com> 28 1.1 riastrad * Thomas Daniel <thomas.daniel (at) intel.com> 29 1.1 riastrad * Oscar Mateo <oscar.mateo (at) intel.com> 30 1.1 riastrad * 31 1.1 riastrad */ 32 1.1 riastrad 33 1.1 riastrad /** 34 1.1 riastrad * DOC: Logical Rings, Logical Ring Contexts and Execlists 35 1.1 riastrad * 36 1.1 riastrad * Motivation: 37 1.1 riastrad * GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts". 38 1.1 riastrad * These expanded contexts enable a number of new abilities, especially 39 1.1 riastrad * "Execlists" (also implemented in this file). 40 1.1 riastrad * 41 1.1 riastrad * One of the main differences with the legacy HW contexts is that logical 42 1.1 riastrad * ring contexts incorporate many more things to the context's state, like 43 1.1 riastrad * PDPs or ringbuffer control registers: 44 1.1 riastrad * 45 1.1 riastrad * The reason why PDPs are included in the context is straightforward: as 46 1.1 riastrad * PPGTTs (per-process GTTs) are actually per-context, having the PDPs 47 1.1 riastrad * contained there mean you don't need to do a ppgtt->switch_mm yourself, 48 1.1 riastrad * instead, the GPU will do it for you on the context switch. 49 1.1 riastrad * 50 1.1 riastrad * But, what about the ringbuffer control registers (head, tail, etc..)? 51 1.1 riastrad * shouldn't we just need a set of those per engine command streamer? This is 52 1.1 riastrad * where the name "Logical Rings" starts to make sense: by virtualizing the 53 1.1 riastrad * rings, the engine cs shifts to a new "ring buffer" with every context 54 1.1 riastrad * switch. When you want to submit a workload to the GPU you: A) choose your 55 1.1 riastrad * context, B) find its appropriate virtualized ring, C) write commands to it 56 1.1 riastrad * and then, finally, D) tell the GPU to switch to that context. 57 1.1 riastrad * 58 1.1 riastrad * Instead of the legacy MI_SET_CONTEXT, the way you tell the GPU to switch 59 1.1 riastrad * to a contexts is via a context execution list, ergo "Execlists". 60 1.1 riastrad * 61 1.1 riastrad * LRC implementation: 62 1.1 riastrad * Regarding the creation of contexts, we have: 63 1.1 riastrad * 64 1.1 riastrad * - One global default context. 65 1.1 riastrad * - One local default context for each opened fd. 66 1.1 riastrad * - One local extra context for each context create ioctl call. 67 1.1 riastrad * 68 1.1 riastrad * Now that ringbuffers belong per-context (and not per-engine, like before) 69 1.1 riastrad * and that contexts are uniquely tied to a given engine (and not reusable, 70 1.1 riastrad * like before) we need: 71 1.1 riastrad * 72 1.1 riastrad * - One ringbuffer per-engine inside each context. 73 1.1 riastrad * - One backing object per-engine inside each context. 74 1.1 riastrad * 75 1.1 riastrad * The global default context starts its life with these new objects fully 76 1.1 riastrad * allocated and populated. The local default context for each opened fd is 77 1.1 riastrad * more complex, because we don't know at creation time which engine is going 78 1.1 riastrad * to use them. To handle this, we have implemented a deferred creation of LR 79 1.1 riastrad * contexts: 80 1.1 riastrad * 81 1.1 riastrad * The local context starts its life as a hollow or blank holder, that only 82 1.1 riastrad * gets populated for a given engine once we receive an execbuffer. If later 83 1.1 riastrad * on we receive another execbuffer ioctl for the same context but a different 84 1.1 riastrad * engine, we allocate/populate a new ringbuffer and context backing object and 85 1.1 riastrad * so on. 86 1.1 riastrad * 87 1.1 riastrad * Finally, regarding local contexts created using the ioctl call: as they are 88 1.1 riastrad * only allowed with the render ring, we can allocate & populate them right 89 1.1 riastrad * away (no need to defer anything, at least for now). 90 1.1 riastrad * 91 1.1 riastrad * Execlists implementation: 92 1.1 riastrad * Execlists are the new method by which, on gen8+ hardware, workloads are 93 1.1 riastrad * submitted for execution (as opposed to the legacy, ringbuffer-based, method). 94 1.1 riastrad * This method works as follows: 95 1.1 riastrad * 96 1.1 riastrad * When a request is committed, its commands (the BB start and any leading or 97 1.1 riastrad * trailing commands, like the seqno breadcrumbs) are placed in the ringbuffer 98 1.1 riastrad * for the appropriate context. The tail pointer in the hardware context is not 99 1.1 riastrad * updated at this time, but instead, kept by the driver in the ringbuffer 100 1.1 riastrad * structure. A structure representing this request is added to a request queue 101 1.1 riastrad * for the appropriate engine: this structure contains a copy of the context's 102 1.1 riastrad * tail after the request was written to the ring buffer and a pointer to the 103 1.1 riastrad * context itself. 104 1.1 riastrad * 105 1.1 riastrad * If the engine's request queue was empty before the request was added, the 106 1.1 riastrad * queue is processed immediately. Otherwise the queue will be processed during 107 1.1 riastrad * a context switch interrupt. In any case, elements on the queue will get sent 108 1.1 riastrad * (in pairs) to the GPU's ExecLists Submit Port (ELSP, for short) with a 109 1.1 riastrad * globally unique 20-bits submission ID. 110 1.1 riastrad * 111 1.1 riastrad * When execution of a request completes, the GPU updates the context status 112 1.1 riastrad * buffer with a context complete event and generates a context switch interrupt. 113 1.1 riastrad * During the interrupt handling, the driver examines the events in the buffer: 114 1.1 riastrad * for each context complete event, if the announced ID matches that on the head 115 1.1 riastrad * of the request queue, then that request is retired and removed from the queue. 116 1.1 riastrad * 117 1.1 riastrad * After processing, if any requests were retired and the queue is not empty 118 1.1 riastrad * then a new execution list can be submitted. The two requests at the front of 119 1.1 riastrad * the queue are next to be submitted but since a context may not occur twice in 120 1.1 riastrad * an execution list, if subsequent requests have the same ID as the first then 121 1.1 riastrad * the two requests must be combined. This is done simply by discarding requests 122 1.1 riastrad * at the head of the queue until either only one requests is left (in which case 123 1.1 riastrad * we use a NULL second context) or the first two requests have unique IDs. 124 1.1 riastrad * 125 1.1 riastrad * By always executing the first two requests in the queue the driver ensures 126 1.1 riastrad * that the GPU is kept as busy as possible. In the case where a single context 127 1.1 riastrad * completes but a second context is still executing, the request for this second 128 1.1 riastrad * context will be at the head of the queue when we remove the first one. This 129 1.1 riastrad * request will then be resubmitted along with a new request for a different context, 130 1.1 riastrad * which will cause the hardware to continue executing the second request and queue 131 1.1 riastrad * the new request (the GPU detects the condition of a context getting preempted 132 1.1 riastrad * with the same context and optimizes the context switch flow by not doing 133 1.1 riastrad * preemption, but just sampling the new tail pointer). 134 1.1 riastrad * 135 1.1 riastrad */ 136 1.1 riastrad #include <sys/cdefs.h> 137 1.8 riastrad __KERNEL_RCSID(0, "$NetBSD: intel_lrc.c,v 1.8 2021/12/19 12:32:15 riastradh Exp $"); 138 1.1 riastrad 139 1.1 riastrad #include <linux/interrupt.h> 140 1.1 riastrad 141 1.1 riastrad #include "i915_drv.h" 142 1.1 riastrad #include "i915_perf.h" 143 1.1 riastrad #include "i915_trace.h" 144 1.1 riastrad #include "i915_vgpu.h" 145 1.1 riastrad #include "intel_context.h" 146 1.1 riastrad #include "intel_engine_pm.h" 147 1.1 riastrad #include "intel_gt.h" 148 1.1 riastrad #include "intel_gt_pm.h" 149 1.1 riastrad #include "intel_gt_requests.h" 150 1.1 riastrad #include "intel_lrc_reg.h" 151 1.1 riastrad #include "intel_mocs.h" 152 1.1 riastrad #include "intel_reset.h" 153 1.1 riastrad #include "intel_ring.h" 154 1.1 riastrad #include "intel_workarounds.h" 155 1.1 riastrad 156 1.5 riastrad #include <linux/nbsd-namespace.h> 157 1.5 riastrad 158 1.1 riastrad #define RING_EXECLIST_QFULL (1 << 0x2) 159 1.1 riastrad #define RING_EXECLIST1_VALID (1 << 0x3) 160 1.1 riastrad #define RING_EXECLIST0_VALID (1 << 0x4) 161 1.1 riastrad #define RING_EXECLIST_ACTIVE_STATUS (3 << 0xE) 162 1.1 riastrad #define RING_EXECLIST1_ACTIVE (1 << 0x11) 163 1.1 riastrad #define RING_EXECLIST0_ACTIVE (1 << 0x12) 164 1.1 riastrad 165 1.1 riastrad #define GEN8_CTX_STATUS_IDLE_ACTIVE (1 << 0) 166 1.1 riastrad #define GEN8_CTX_STATUS_PREEMPTED (1 << 1) 167 1.1 riastrad #define GEN8_CTX_STATUS_ELEMENT_SWITCH (1 << 2) 168 1.1 riastrad #define GEN8_CTX_STATUS_ACTIVE_IDLE (1 << 3) 169 1.1 riastrad #define GEN8_CTX_STATUS_COMPLETE (1 << 4) 170 1.1 riastrad #define GEN8_CTX_STATUS_LITE_RESTORE (1 << 15) 171 1.1 riastrad 172 1.1 riastrad #define GEN8_CTX_STATUS_COMPLETED_MASK \ 173 1.1 riastrad (GEN8_CTX_STATUS_COMPLETE | GEN8_CTX_STATUS_PREEMPTED) 174 1.1 riastrad 175 1.1 riastrad #define CTX_DESC_FORCE_RESTORE BIT_ULL(2) 176 1.1 riastrad 177 1.1 riastrad #define GEN12_CTX_STATUS_SWITCHED_TO_NEW_QUEUE (0x1) /* lower csb dword */ 178 1.1 riastrad #define GEN12_CTX_SWITCH_DETAIL(csb_dw) ((csb_dw) & 0xF) /* upper csb dword */ 179 1.1 riastrad #define GEN12_CSB_SW_CTX_ID_MASK GENMASK(25, 15) 180 1.1 riastrad #define GEN12_IDLE_CTX_ID 0x7FF 181 1.1 riastrad #define GEN12_CSB_CTX_VALID(csb_dw) \ 182 1.1 riastrad (FIELD_GET(GEN12_CSB_SW_CTX_ID_MASK, csb_dw) != GEN12_IDLE_CTX_ID) 183 1.1 riastrad 184 1.1 riastrad /* Typical size of the average request (2 pipecontrols and a MI_BB) */ 185 1.1 riastrad #define EXECLISTS_REQUEST_SIZE 64 /* bytes */ 186 1.1 riastrad #define WA_TAIL_DWORDS 2 187 1.1 riastrad #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS) 188 1.1 riastrad 189 1.1 riastrad struct virtual_engine { 190 1.1 riastrad struct intel_engine_cs base; 191 1.1 riastrad struct intel_context context; 192 1.1 riastrad 193 1.1 riastrad /* 194 1.1 riastrad * We allow only a single request through the virtual engine at a time 195 1.1 riastrad * (each request in the timeline waits for the completion fence of 196 1.1 riastrad * the previous before being submitted). By restricting ourselves to 197 1.1 riastrad * only submitting a single request, each request is placed on to a 198 1.1 riastrad * physical to maximise load spreading (by virtue of the late greedy 199 1.1 riastrad * scheduling -- each real engine takes the next available request 200 1.1 riastrad * upon idling). 201 1.1 riastrad */ 202 1.1 riastrad struct i915_request *request; 203 1.1 riastrad 204 1.1 riastrad /* 205 1.1 riastrad * We keep a rbtree of available virtual engines inside each physical 206 1.1 riastrad * engine, sorted by priority. Here we preallocate the nodes we need 207 1.1 riastrad * for the virtual engine, indexed by physical_engine->id. 208 1.1 riastrad */ 209 1.1 riastrad struct ve_node { 210 1.1 riastrad struct rb_node rb; 211 1.1 riastrad int prio; 212 1.7 riastrad uint64_t order; 213 1.7 riastrad bool inserted; 214 1.1 riastrad } nodes[I915_NUM_ENGINES]; 215 1.7 riastrad uint64_t order; 216 1.1 riastrad 217 1.1 riastrad /* 218 1.1 riastrad * Keep track of bonded pairs -- restrictions upon on our selection 219 1.1 riastrad * of physical engines any particular request may be submitted to. 220 1.1 riastrad * If we receive a submit-fence from a master engine, we will only 221 1.1 riastrad * use one of sibling_mask physical engines. 222 1.1 riastrad */ 223 1.1 riastrad struct ve_bond { 224 1.1 riastrad const struct intel_engine_cs *master; 225 1.1 riastrad intel_engine_mask_t sibling_mask; 226 1.1 riastrad } *bonds; 227 1.1 riastrad unsigned int num_bonds; 228 1.1 riastrad 229 1.1 riastrad /* And finally, which physical engines this virtual engine maps onto. */ 230 1.1 riastrad unsigned int num_siblings; 231 1.1 riastrad struct intel_engine_cs *siblings[0]; 232 1.1 riastrad }; 233 1.1 riastrad 234 1.7 riastrad #ifdef __NetBSD__ 235 1.7 riastrad static int 236 1.7 riastrad compare_ve_nodes(void *cookie, const void *va, const void *vb) 237 1.7 riastrad { 238 1.7 riastrad const struct ve_node *na = va; 239 1.7 riastrad const struct ve_node *nb = vb; 240 1.7 riastrad 241 1.7 riastrad if (na->prio < nb->prio) 242 1.7 riastrad return -1; 243 1.7 riastrad if (na->prio > nb->prio) 244 1.7 riastrad return +1; 245 1.7 riastrad if (na->order < nb->order) 246 1.7 riastrad return -1; 247 1.7 riastrad if (na->order > nb->order) 248 1.7 riastrad return +1; 249 1.7 riastrad return 0; 250 1.7 riastrad } 251 1.7 riastrad 252 1.7 riastrad static int 253 1.7 riastrad compare_ve_node_key(void *cookie, const void *vn, const void *vk) 254 1.7 riastrad { 255 1.7 riastrad const struct ve_node *n = vn; 256 1.7 riastrad const int *k = vk; 257 1.7 riastrad 258 1.7 riastrad if (n->prio < *k) 259 1.7 riastrad return -1; 260 1.7 riastrad if (n->prio > *k) 261 1.7 riastrad return +1; 262 1.7 riastrad return 0; 263 1.7 riastrad } 264 1.7 riastrad 265 1.7 riastrad static const rb_tree_ops_t ve_tree_ops = { 266 1.7 riastrad .rbto_compare_nodes = compare_ve_nodes, 267 1.7 riastrad .rbto_compare_key = compare_ve_node_key, 268 1.7 riastrad .rbto_node_offset = offsetof(struct ve_node, rb), 269 1.7 riastrad }; 270 1.7 riastrad #endif 271 1.7 riastrad 272 1.1 riastrad static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine) 273 1.1 riastrad { 274 1.1 riastrad GEM_BUG_ON(!intel_engine_is_virtual(engine)); 275 1.1 riastrad return container_of(engine, struct virtual_engine, base); 276 1.1 riastrad } 277 1.1 riastrad 278 1.1 riastrad static int __execlists_context_alloc(struct intel_context *ce, 279 1.1 riastrad struct intel_engine_cs *engine); 280 1.1 riastrad 281 1.1 riastrad static void execlists_init_reg_state(u32 *reg_state, 282 1.1 riastrad const struct intel_context *ce, 283 1.1 riastrad const struct intel_engine_cs *engine, 284 1.1 riastrad const struct intel_ring *ring, 285 1.1 riastrad bool close); 286 1.1 riastrad static void 287 1.1 riastrad __execlists_update_reg_state(const struct intel_context *ce, 288 1.1 riastrad const struct intel_engine_cs *engine, 289 1.1 riastrad u32 head); 290 1.1 riastrad 291 1.1 riastrad static void mark_eio(struct i915_request *rq) 292 1.1 riastrad { 293 1.1 riastrad if (i915_request_completed(rq)) 294 1.1 riastrad return; 295 1.1 riastrad 296 1.1 riastrad GEM_BUG_ON(i915_request_signaled(rq)); 297 1.1 riastrad 298 1.1 riastrad dma_fence_set_error(&rq->fence, -EIO); 299 1.1 riastrad i915_request_mark_complete(rq); 300 1.1 riastrad } 301 1.1 riastrad 302 1.1 riastrad static struct i915_request * 303 1.1 riastrad active_request(const struct intel_timeline * const tl, struct i915_request *rq) 304 1.1 riastrad { 305 1.1 riastrad struct i915_request *active = rq; 306 1.1 riastrad 307 1.1 riastrad rcu_read_lock(); 308 1.1 riastrad list_for_each_entry_continue_reverse(rq, &tl->requests, link) { 309 1.1 riastrad if (i915_request_completed(rq)) 310 1.1 riastrad break; 311 1.1 riastrad 312 1.1 riastrad active = rq; 313 1.1 riastrad } 314 1.1 riastrad rcu_read_unlock(); 315 1.1 riastrad 316 1.1 riastrad return active; 317 1.1 riastrad } 318 1.1 riastrad 319 1.1 riastrad static inline u32 intel_hws_preempt_address(struct intel_engine_cs *engine) 320 1.1 riastrad { 321 1.1 riastrad return (i915_ggtt_offset(engine->status_page.vma) + 322 1.1 riastrad I915_GEM_HWS_PREEMPT_ADDR); 323 1.1 riastrad } 324 1.1 riastrad 325 1.1 riastrad static inline void 326 1.1 riastrad ring_set_paused(const struct intel_engine_cs *engine, int state) 327 1.1 riastrad { 328 1.1 riastrad /* 329 1.1 riastrad * We inspect HWS_PREEMPT with a semaphore inside 330 1.1 riastrad * engine->emit_fini_breadcrumb. If the dword is true, 331 1.1 riastrad * the ring is paused as the semaphore will busywait 332 1.1 riastrad * until the dword is false. 333 1.1 riastrad */ 334 1.1 riastrad engine->status_page.addr[I915_GEM_HWS_PREEMPT] = state; 335 1.1 riastrad if (state) 336 1.1 riastrad wmb(); 337 1.1 riastrad } 338 1.1 riastrad 339 1.1 riastrad static inline struct i915_priolist *to_priolist(struct rb_node *rb) 340 1.1 riastrad { 341 1.1 riastrad return rb_entry(rb, struct i915_priolist, node); 342 1.1 riastrad } 343 1.1 riastrad 344 1.1 riastrad static inline int rq_prio(const struct i915_request *rq) 345 1.1 riastrad { 346 1.1 riastrad return rq->sched.attr.priority; 347 1.1 riastrad } 348 1.1 riastrad 349 1.1 riastrad static int effective_prio(const struct i915_request *rq) 350 1.1 riastrad { 351 1.1 riastrad int prio = rq_prio(rq); 352 1.1 riastrad 353 1.1 riastrad /* 354 1.1 riastrad * If this request is special and must not be interrupted at any 355 1.1 riastrad * cost, so be it. Note we are only checking the most recent request 356 1.1 riastrad * in the context and so may be masking an earlier vip request. It 357 1.1 riastrad * is hoped that under the conditions where nopreempt is used, this 358 1.1 riastrad * will not matter (i.e. all requests to that context will be 359 1.1 riastrad * nopreempt for as long as desired). 360 1.1 riastrad */ 361 1.1 riastrad if (i915_request_has_nopreempt(rq)) 362 1.1 riastrad prio = I915_PRIORITY_UNPREEMPTABLE; 363 1.1 riastrad 364 1.1 riastrad /* 365 1.1 riastrad * On unwinding the active request, we give it a priority bump 366 1.1 riastrad * if it has completed waiting on any semaphore. If we know that 367 1.1 riastrad * the request has already started, we can prevent an unwanted 368 1.1 riastrad * preempt-to-idle cycle by taking that into account now. 369 1.1 riastrad */ 370 1.1 riastrad if (__i915_request_has_started(rq)) 371 1.1 riastrad prio |= I915_PRIORITY_NOSEMAPHORE; 372 1.1 riastrad 373 1.1 riastrad /* Restrict mere WAIT boosts from triggering preemption */ 374 1.1 riastrad BUILD_BUG_ON(__NO_PREEMPTION & ~I915_PRIORITY_MASK); /* only internal */ 375 1.1 riastrad return prio | __NO_PREEMPTION; 376 1.1 riastrad } 377 1.1 riastrad 378 1.1 riastrad static int queue_prio(const struct intel_engine_execlists *execlists) 379 1.1 riastrad { 380 1.1 riastrad struct i915_priolist *p; 381 1.1 riastrad struct rb_node *rb; 382 1.1 riastrad 383 1.1 riastrad rb = rb_first_cached(&execlists->queue); 384 1.1 riastrad if (!rb) 385 1.1 riastrad return INT_MIN; 386 1.1 riastrad 387 1.1 riastrad /* 388 1.1 riastrad * As the priolist[] are inverted, with the highest priority in [0], 389 1.1 riastrad * we have to flip the index value to become priority. 390 1.1 riastrad */ 391 1.1 riastrad p = to_priolist(rb); 392 1.1 riastrad return ((p->priority + 1) << I915_USER_PRIORITY_SHIFT) - ffs(p->used); 393 1.1 riastrad } 394 1.1 riastrad 395 1.1 riastrad static inline bool need_preempt(const struct intel_engine_cs *engine, 396 1.1 riastrad const struct i915_request *rq, 397 1.1 riastrad struct rb_node *rb) 398 1.1 riastrad { 399 1.1 riastrad int last_prio; 400 1.1 riastrad 401 1.1 riastrad if (!intel_engine_has_semaphores(engine)) 402 1.1 riastrad return false; 403 1.1 riastrad 404 1.1 riastrad /* 405 1.1 riastrad * Check if the current priority hint merits a preemption attempt. 406 1.1 riastrad * 407 1.1 riastrad * We record the highest value priority we saw during rescheduling 408 1.1 riastrad * prior to this dequeue, therefore we know that if it is strictly 409 1.1 riastrad * less than the current tail of ESLP[0], we do not need to force 410 1.1 riastrad * a preempt-to-idle cycle. 411 1.1 riastrad * 412 1.1 riastrad * However, the priority hint is a mere hint that we may need to 413 1.1 riastrad * preempt. If that hint is stale or we may be trying to preempt 414 1.1 riastrad * ourselves, ignore the request. 415 1.1 riastrad * 416 1.1 riastrad * More naturally we would write 417 1.1 riastrad * prio >= max(0, last); 418 1.1 riastrad * except that we wish to prevent triggering preemption at the same 419 1.1 riastrad * priority level: the task that is running should remain running 420 1.1 riastrad * to preserve FIFO ordering of dependencies. 421 1.1 riastrad */ 422 1.1 riastrad last_prio = max(effective_prio(rq), I915_PRIORITY_NORMAL - 1); 423 1.1 riastrad if (engine->execlists.queue_priority_hint <= last_prio) 424 1.1 riastrad return false; 425 1.1 riastrad 426 1.1 riastrad /* 427 1.1 riastrad * Check against the first request in ELSP[1], it will, thanks to the 428 1.1 riastrad * power of PI, be the highest priority of that context. 429 1.1 riastrad */ 430 1.1 riastrad if (!list_is_last(&rq->sched.link, &engine->active.requests) && 431 1.1 riastrad rq_prio(list_next_entry(rq, sched.link)) > last_prio) 432 1.1 riastrad return true; 433 1.1 riastrad 434 1.1 riastrad if (rb) { 435 1.1 riastrad struct virtual_engine *ve = 436 1.1 riastrad rb_entry(rb, typeof(*ve), nodes[engine->id].rb); 437 1.1 riastrad bool preempt = false; 438 1.1 riastrad 439 1.1 riastrad if (engine == ve->siblings[0]) { /* only preempt one sibling */ 440 1.1 riastrad struct i915_request *next; 441 1.1 riastrad 442 1.1 riastrad rcu_read_lock(); 443 1.1 riastrad next = READ_ONCE(ve->request); 444 1.1 riastrad if (next) 445 1.1 riastrad preempt = rq_prio(next) > last_prio; 446 1.1 riastrad rcu_read_unlock(); 447 1.1 riastrad } 448 1.1 riastrad 449 1.1 riastrad if (preempt) 450 1.1 riastrad return preempt; 451 1.1 riastrad } 452 1.1 riastrad 453 1.1 riastrad /* 454 1.1 riastrad * If the inflight context did not trigger the preemption, then maybe 455 1.1 riastrad * it was the set of queued requests? Pick the highest priority in 456 1.1 riastrad * the queue (the first active priolist) and see if it deserves to be 457 1.1 riastrad * running instead of ELSP[0]. 458 1.1 riastrad * 459 1.1 riastrad * The highest priority request in the queue can not be either 460 1.1 riastrad * ELSP[0] or ELSP[1] as, thanks again to PI, if it was the same 461 1.1 riastrad * context, it's priority would not exceed ELSP[0] aka last_prio. 462 1.1 riastrad */ 463 1.1 riastrad return queue_prio(&engine->execlists) > last_prio; 464 1.1 riastrad } 465 1.1 riastrad 466 1.1 riastrad __maybe_unused static inline bool 467 1.1 riastrad assert_priority_queue(const struct i915_request *prev, 468 1.1 riastrad const struct i915_request *next) 469 1.1 riastrad { 470 1.1 riastrad /* 471 1.1 riastrad * Without preemption, the prev may refer to the still active element 472 1.1 riastrad * which we refuse to let go. 473 1.1 riastrad * 474 1.1 riastrad * Even with preemption, there are times when we think it is better not 475 1.1 riastrad * to preempt and leave an ostensibly lower priority request in flight. 476 1.1 riastrad */ 477 1.1 riastrad if (i915_request_is_active(prev)) 478 1.1 riastrad return true; 479 1.1 riastrad 480 1.1 riastrad return rq_prio(prev) >= rq_prio(next); 481 1.1 riastrad } 482 1.1 riastrad 483 1.1 riastrad /* 484 1.1 riastrad * The context descriptor encodes various attributes of a context, 485 1.1 riastrad * including its GTT address and some flags. Because it's fairly 486 1.1 riastrad * expensive to calculate, we'll just do it once and cache the result, 487 1.1 riastrad * which remains valid until the context is unpinned. 488 1.1 riastrad * 489 1.1 riastrad * This is what a descriptor looks like, from LSB to MSB:: 490 1.1 riastrad * 491 1.1 riastrad * bits 0-11: flags, GEN8_CTX_* (cached in ctx->desc_template) 492 1.1 riastrad * bits 12-31: LRCA, GTT address of (the HWSP of) this context 493 1.1 riastrad * bits 32-52: ctx ID, a globally unique tag (highest bit used by GuC) 494 1.1 riastrad * bits 53-54: mbz, reserved for use by hardware 495 1.1 riastrad * bits 55-63: group ID, currently unused and set to 0 496 1.1 riastrad * 497 1.1 riastrad * Starting from Gen11, the upper dword of the descriptor has a new format: 498 1.1 riastrad * 499 1.1 riastrad * bits 32-36: reserved 500 1.1 riastrad * bits 37-47: SW context ID 501 1.1 riastrad * bits 48:53: engine instance 502 1.1 riastrad * bit 54: mbz, reserved for use by hardware 503 1.1 riastrad * bits 55-60: SW counter 504 1.1 riastrad * bits 61-63: engine class 505 1.1 riastrad * 506 1.1 riastrad * engine info, SW context ID and SW counter need to form a unique number 507 1.1 riastrad * (Context ID) per lrc. 508 1.1 riastrad */ 509 1.1 riastrad static u64 510 1.1 riastrad lrc_descriptor(struct intel_context *ce, struct intel_engine_cs *engine) 511 1.1 riastrad { 512 1.1 riastrad u64 desc; 513 1.1 riastrad 514 1.1 riastrad desc = INTEL_LEGACY_32B_CONTEXT; 515 1.1 riastrad if (i915_vm_is_4lvl(ce->vm)) 516 1.1 riastrad desc = INTEL_LEGACY_64B_CONTEXT; 517 1.1 riastrad desc <<= GEN8_CTX_ADDRESSING_MODE_SHIFT; 518 1.1 riastrad 519 1.1 riastrad desc |= GEN8_CTX_VALID | GEN8_CTX_PRIVILEGE; 520 1.1 riastrad if (IS_GEN(engine->i915, 8)) 521 1.1 riastrad desc |= GEN8_CTX_L3LLC_COHERENT; 522 1.1 riastrad 523 1.1 riastrad desc |= i915_ggtt_offset(ce->state); /* bits 12-31 */ 524 1.1 riastrad /* 525 1.1 riastrad * The following 32bits are copied into the OA reports (dword 2). 526 1.1 riastrad * Consider updating oa_get_render_ctx_id in i915_perf.c when changing 527 1.1 riastrad * anything below. 528 1.1 riastrad */ 529 1.1 riastrad if (INTEL_GEN(engine->i915) >= 11) { 530 1.1 riastrad desc |= (u64)engine->instance << GEN11_ENGINE_INSTANCE_SHIFT; 531 1.1 riastrad /* bits 48-53 */ 532 1.1 riastrad 533 1.1 riastrad desc |= (u64)engine->class << GEN11_ENGINE_CLASS_SHIFT; 534 1.1 riastrad /* bits 61-63 */ 535 1.1 riastrad } 536 1.1 riastrad 537 1.1 riastrad return desc; 538 1.1 riastrad } 539 1.1 riastrad 540 1.1 riastrad static inline unsigned int dword_in_page(void *addr) 541 1.1 riastrad { 542 1.1 riastrad return offset_in_page(addr) / sizeof(u32); 543 1.1 riastrad } 544 1.1 riastrad 545 1.1 riastrad static void set_offsets(u32 *regs, 546 1.1 riastrad const u8 *data, 547 1.1 riastrad const struct intel_engine_cs *engine, 548 1.1 riastrad bool clear) 549 1.1 riastrad #define NOP(x) (BIT(7) | (x)) 550 1.1 riastrad #define LRI(count, flags) ((flags) << 6 | (count) | BUILD_BUG_ON_ZERO(count >= BIT(6))) 551 1.1 riastrad #define POSTED BIT(0) 552 1.1 riastrad #define REG(x) (((x) >> 2) | BUILD_BUG_ON_ZERO(x >= 0x200)) 553 1.1 riastrad #define REG16(x) \ 554 1.1 riastrad (((x) >> 9) | BIT(7) | BUILD_BUG_ON_ZERO(x >= 0x10000)), \ 555 1.1 riastrad (((x) >> 2) & 0x7f) 556 1.1 riastrad #define END(x) 0, (x) 557 1.1 riastrad { 558 1.1 riastrad const u32 base = engine->mmio_base; 559 1.1 riastrad 560 1.1 riastrad while (*data) { 561 1.1 riastrad u8 count, flags; 562 1.1 riastrad 563 1.1 riastrad if (*data & BIT(7)) { /* skip */ 564 1.1 riastrad count = *data++ & ~BIT(7); 565 1.1 riastrad if (clear) 566 1.1 riastrad memset32(regs, MI_NOOP, count); 567 1.1 riastrad regs += count; 568 1.1 riastrad continue; 569 1.1 riastrad } 570 1.1 riastrad 571 1.1 riastrad count = *data & 0x3f; 572 1.1 riastrad flags = *data >> 6; 573 1.1 riastrad data++; 574 1.1 riastrad 575 1.1 riastrad *regs = MI_LOAD_REGISTER_IMM(count); 576 1.1 riastrad if (flags & POSTED) 577 1.1 riastrad *regs |= MI_LRI_FORCE_POSTED; 578 1.1 riastrad if (INTEL_GEN(engine->i915) >= 11) 579 1.1 riastrad *regs |= MI_LRI_CS_MMIO; 580 1.1 riastrad regs++; 581 1.1 riastrad 582 1.1 riastrad GEM_BUG_ON(!count); 583 1.1 riastrad do { 584 1.1 riastrad u32 offset = 0; 585 1.1 riastrad u8 v; 586 1.1 riastrad 587 1.1 riastrad do { 588 1.1 riastrad v = *data++; 589 1.1 riastrad offset <<= 7; 590 1.1 riastrad offset |= v & ~BIT(7); 591 1.1 riastrad } while (v & BIT(7)); 592 1.1 riastrad 593 1.1 riastrad regs[0] = base + (offset << 2); 594 1.1 riastrad if (clear) 595 1.1 riastrad regs[1] = 0; 596 1.1 riastrad regs += 2; 597 1.1 riastrad } while (--count); 598 1.1 riastrad } 599 1.1 riastrad 600 1.1 riastrad if (clear) { 601 1.1 riastrad u8 count = *++data; 602 1.1 riastrad 603 1.1 riastrad /* Clear past the tail for HW access */ 604 1.1 riastrad GEM_BUG_ON(dword_in_page(regs) > count); 605 1.1 riastrad memset32(regs, MI_NOOP, count - dword_in_page(regs)); 606 1.1 riastrad 607 1.1 riastrad /* Close the batch; used mainly by live_lrc_layout() */ 608 1.1 riastrad *regs = MI_BATCH_BUFFER_END; 609 1.1 riastrad if (INTEL_GEN(engine->i915) >= 10) 610 1.1 riastrad *regs |= BIT(0); 611 1.1 riastrad } 612 1.1 riastrad } 613 1.1 riastrad 614 1.1 riastrad static const u8 gen8_xcs_offsets[] = { 615 1.1 riastrad NOP(1), 616 1.1 riastrad LRI(11, 0), 617 1.1 riastrad REG16(0x244), 618 1.1 riastrad REG(0x034), 619 1.1 riastrad REG(0x030), 620 1.1 riastrad REG(0x038), 621 1.1 riastrad REG(0x03c), 622 1.1 riastrad REG(0x168), 623 1.1 riastrad REG(0x140), 624 1.1 riastrad REG(0x110), 625 1.1 riastrad REG(0x11c), 626 1.1 riastrad REG(0x114), 627 1.1 riastrad REG(0x118), 628 1.1 riastrad 629 1.1 riastrad NOP(9), 630 1.1 riastrad LRI(9, 0), 631 1.1 riastrad REG16(0x3a8), 632 1.1 riastrad REG16(0x28c), 633 1.1 riastrad REG16(0x288), 634 1.1 riastrad REG16(0x284), 635 1.1 riastrad REG16(0x280), 636 1.1 riastrad REG16(0x27c), 637 1.1 riastrad REG16(0x278), 638 1.1 riastrad REG16(0x274), 639 1.1 riastrad REG16(0x270), 640 1.1 riastrad 641 1.1 riastrad NOP(13), 642 1.1 riastrad LRI(2, 0), 643 1.1 riastrad REG16(0x200), 644 1.1 riastrad REG(0x028), 645 1.1 riastrad 646 1.1 riastrad END(80) 647 1.1 riastrad }; 648 1.1 riastrad 649 1.1 riastrad static const u8 gen9_xcs_offsets[] = { 650 1.1 riastrad NOP(1), 651 1.1 riastrad LRI(14, POSTED), 652 1.1 riastrad REG16(0x244), 653 1.1 riastrad REG(0x034), 654 1.1 riastrad REG(0x030), 655 1.1 riastrad REG(0x038), 656 1.1 riastrad REG(0x03c), 657 1.1 riastrad REG(0x168), 658 1.1 riastrad REG(0x140), 659 1.1 riastrad REG(0x110), 660 1.1 riastrad REG(0x11c), 661 1.1 riastrad REG(0x114), 662 1.1 riastrad REG(0x118), 663 1.1 riastrad REG(0x1c0), 664 1.1 riastrad REG(0x1c4), 665 1.1 riastrad REG(0x1c8), 666 1.1 riastrad 667 1.1 riastrad NOP(3), 668 1.1 riastrad LRI(9, POSTED), 669 1.1 riastrad REG16(0x3a8), 670 1.1 riastrad REG16(0x28c), 671 1.1 riastrad REG16(0x288), 672 1.1 riastrad REG16(0x284), 673 1.1 riastrad REG16(0x280), 674 1.1 riastrad REG16(0x27c), 675 1.1 riastrad REG16(0x278), 676 1.1 riastrad REG16(0x274), 677 1.1 riastrad REG16(0x270), 678 1.1 riastrad 679 1.1 riastrad NOP(13), 680 1.1 riastrad LRI(1, POSTED), 681 1.1 riastrad REG16(0x200), 682 1.1 riastrad 683 1.1 riastrad NOP(13), 684 1.1 riastrad LRI(44, POSTED), 685 1.1 riastrad REG(0x028), 686 1.1 riastrad REG(0x09c), 687 1.1 riastrad REG(0x0c0), 688 1.1 riastrad REG(0x178), 689 1.1 riastrad REG(0x17c), 690 1.1 riastrad REG16(0x358), 691 1.1 riastrad REG(0x170), 692 1.1 riastrad REG(0x150), 693 1.1 riastrad REG(0x154), 694 1.1 riastrad REG(0x158), 695 1.1 riastrad REG16(0x41c), 696 1.1 riastrad REG16(0x600), 697 1.1 riastrad REG16(0x604), 698 1.1 riastrad REG16(0x608), 699 1.1 riastrad REG16(0x60c), 700 1.1 riastrad REG16(0x610), 701 1.1 riastrad REG16(0x614), 702 1.1 riastrad REG16(0x618), 703 1.1 riastrad REG16(0x61c), 704 1.1 riastrad REG16(0x620), 705 1.1 riastrad REG16(0x624), 706 1.1 riastrad REG16(0x628), 707 1.1 riastrad REG16(0x62c), 708 1.1 riastrad REG16(0x630), 709 1.1 riastrad REG16(0x634), 710 1.1 riastrad REG16(0x638), 711 1.1 riastrad REG16(0x63c), 712 1.1 riastrad REG16(0x640), 713 1.1 riastrad REG16(0x644), 714 1.1 riastrad REG16(0x648), 715 1.1 riastrad REG16(0x64c), 716 1.1 riastrad REG16(0x650), 717 1.1 riastrad REG16(0x654), 718 1.1 riastrad REG16(0x658), 719 1.1 riastrad REG16(0x65c), 720 1.1 riastrad REG16(0x660), 721 1.1 riastrad REG16(0x664), 722 1.1 riastrad REG16(0x668), 723 1.1 riastrad REG16(0x66c), 724 1.1 riastrad REG16(0x670), 725 1.1 riastrad REG16(0x674), 726 1.1 riastrad REG16(0x678), 727 1.1 riastrad REG16(0x67c), 728 1.1 riastrad REG(0x068), 729 1.1 riastrad 730 1.1 riastrad END(176) 731 1.1 riastrad }; 732 1.1 riastrad 733 1.1 riastrad static const u8 gen12_xcs_offsets[] = { 734 1.1 riastrad NOP(1), 735 1.1 riastrad LRI(13, POSTED), 736 1.1 riastrad REG16(0x244), 737 1.1 riastrad REG(0x034), 738 1.1 riastrad REG(0x030), 739 1.1 riastrad REG(0x038), 740 1.1 riastrad REG(0x03c), 741 1.1 riastrad REG(0x168), 742 1.1 riastrad REG(0x140), 743 1.1 riastrad REG(0x110), 744 1.1 riastrad REG(0x1c0), 745 1.1 riastrad REG(0x1c4), 746 1.1 riastrad REG(0x1c8), 747 1.1 riastrad REG(0x180), 748 1.1 riastrad REG16(0x2b4), 749 1.1 riastrad 750 1.1 riastrad NOP(5), 751 1.1 riastrad LRI(9, POSTED), 752 1.1 riastrad REG16(0x3a8), 753 1.1 riastrad REG16(0x28c), 754 1.1 riastrad REG16(0x288), 755 1.1 riastrad REG16(0x284), 756 1.1 riastrad REG16(0x280), 757 1.1 riastrad REG16(0x27c), 758 1.1 riastrad REG16(0x278), 759 1.1 riastrad REG16(0x274), 760 1.1 riastrad REG16(0x270), 761 1.1 riastrad 762 1.1 riastrad END(80) 763 1.1 riastrad }; 764 1.1 riastrad 765 1.1 riastrad static const u8 gen8_rcs_offsets[] = { 766 1.1 riastrad NOP(1), 767 1.1 riastrad LRI(14, POSTED), 768 1.1 riastrad REG16(0x244), 769 1.1 riastrad REG(0x034), 770 1.1 riastrad REG(0x030), 771 1.1 riastrad REG(0x038), 772 1.1 riastrad REG(0x03c), 773 1.1 riastrad REG(0x168), 774 1.1 riastrad REG(0x140), 775 1.1 riastrad REG(0x110), 776 1.1 riastrad REG(0x11c), 777 1.1 riastrad REG(0x114), 778 1.1 riastrad REG(0x118), 779 1.1 riastrad REG(0x1c0), 780 1.1 riastrad REG(0x1c4), 781 1.1 riastrad REG(0x1c8), 782 1.1 riastrad 783 1.1 riastrad NOP(3), 784 1.1 riastrad LRI(9, POSTED), 785 1.1 riastrad REG16(0x3a8), 786 1.1 riastrad REG16(0x28c), 787 1.1 riastrad REG16(0x288), 788 1.1 riastrad REG16(0x284), 789 1.1 riastrad REG16(0x280), 790 1.1 riastrad REG16(0x27c), 791 1.1 riastrad REG16(0x278), 792 1.1 riastrad REG16(0x274), 793 1.1 riastrad REG16(0x270), 794 1.1 riastrad 795 1.1 riastrad NOP(13), 796 1.1 riastrad LRI(1, 0), 797 1.1 riastrad REG(0x0c8), 798 1.1 riastrad 799 1.1 riastrad END(80) 800 1.1 riastrad }; 801 1.1 riastrad 802 1.1 riastrad static const u8 gen9_rcs_offsets[] = { 803 1.1 riastrad NOP(1), 804 1.1 riastrad LRI(14, POSTED), 805 1.1 riastrad REG16(0x244), 806 1.1 riastrad REG(0x34), 807 1.1 riastrad REG(0x30), 808 1.1 riastrad REG(0x38), 809 1.1 riastrad REG(0x3c), 810 1.1 riastrad REG(0x168), 811 1.1 riastrad REG(0x140), 812 1.1 riastrad REG(0x110), 813 1.1 riastrad REG(0x11c), 814 1.1 riastrad REG(0x114), 815 1.1 riastrad REG(0x118), 816 1.1 riastrad REG(0x1c0), 817 1.1 riastrad REG(0x1c4), 818 1.1 riastrad REG(0x1c8), 819 1.1 riastrad 820 1.1 riastrad NOP(3), 821 1.1 riastrad LRI(9, POSTED), 822 1.1 riastrad REG16(0x3a8), 823 1.1 riastrad REG16(0x28c), 824 1.1 riastrad REG16(0x288), 825 1.1 riastrad REG16(0x284), 826 1.1 riastrad REG16(0x280), 827 1.1 riastrad REG16(0x27c), 828 1.1 riastrad REG16(0x278), 829 1.1 riastrad REG16(0x274), 830 1.1 riastrad REG16(0x270), 831 1.1 riastrad 832 1.1 riastrad NOP(13), 833 1.1 riastrad LRI(1, 0), 834 1.1 riastrad REG(0xc8), 835 1.1 riastrad 836 1.1 riastrad NOP(13), 837 1.1 riastrad LRI(44, POSTED), 838 1.1 riastrad REG(0x28), 839 1.1 riastrad REG(0x9c), 840 1.1 riastrad REG(0xc0), 841 1.1 riastrad REG(0x178), 842 1.1 riastrad REG(0x17c), 843 1.1 riastrad REG16(0x358), 844 1.1 riastrad REG(0x170), 845 1.1 riastrad REG(0x150), 846 1.1 riastrad REG(0x154), 847 1.1 riastrad REG(0x158), 848 1.1 riastrad REG16(0x41c), 849 1.1 riastrad REG16(0x600), 850 1.1 riastrad REG16(0x604), 851 1.1 riastrad REG16(0x608), 852 1.1 riastrad REG16(0x60c), 853 1.1 riastrad REG16(0x610), 854 1.1 riastrad REG16(0x614), 855 1.1 riastrad REG16(0x618), 856 1.1 riastrad REG16(0x61c), 857 1.1 riastrad REG16(0x620), 858 1.1 riastrad REG16(0x624), 859 1.1 riastrad REG16(0x628), 860 1.1 riastrad REG16(0x62c), 861 1.1 riastrad REG16(0x630), 862 1.1 riastrad REG16(0x634), 863 1.1 riastrad REG16(0x638), 864 1.1 riastrad REG16(0x63c), 865 1.1 riastrad REG16(0x640), 866 1.1 riastrad REG16(0x644), 867 1.1 riastrad REG16(0x648), 868 1.1 riastrad REG16(0x64c), 869 1.1 riastrad REG16(0x650), 870 1.1 riastrad REG16(0x654), 871 1.1 riastrad REG16(0x658), 872 1.1 riastrad REG16(0x65c), 873 1.1 riastrad REG16(0x660), 874 1.1 riastrad REG16(0x664), 875 1.1 riastrad REG16(0x668), 876 1.1 riastrad REG16(0x66c), 877 1.1 riastrad REG16(0x670), 878 1.1 riastrad REG16(0x674), 879 1.1 riastrad REG16(0x678), 880 1.1 riastrad REG16(0x67c), 881 1.1 riastrad REG(0x68), 882 1.1 riastrad 883 1.1 riastrad END(176) 884 1.1 riastrad }; 885 1.1 riastrad 886 1.1 riastrad static const u8 gen11_rcs_offsets[] = { 887 1.1 riastrad NOP(1), 888 1.1 riastrad LRI(15, POSTED), 889 1.1 riastrad REG16(0x244), 890 1.1 riastrad REG(0x034), 891 1.1 riastrad REG(0x030), 892 1.1 riastrad REG(0x038), 893 1.1 riastrad REG(0x03c), 894 1.1 riastrad REG(0x168), 895 1.1 riastrad REG(0x140), 896 1.1 riastrad REG(0x110), 897 1.1 riastrad REG(0x11c), 898 1.1 riastrad REG(0x114), 899 1.1 riastrad REG(0x118), 900 1.1 riastrad REG(0x1c0), 901 1.1 riastrad REG(0x1c4), 902 1.1 riastrad REG(0x1c8), 903 1.1 riastrad REG(0x180), 904 1.1 riastrad 905 1.1 riastrad NOP(1), 906 1.1 riastrad LRI(9, POSTED), 907 1.1 riastrad REG16(0x3a8), 908 1.1 riastrad REG16(0x28c), 909 1.1 riastrad REG16(0x288), 910 1.1 riastrad REG16(0x284), 911 1.1 riastrad REG16(0x280), 912 1.1 riastrad REG16(0x27c), 913 1.1 riastrad REG16(0x278), 914 1.1 riastrad REG16(0x274), 915 1.1 riastrad REG16(0x270), 916 1.1 riastrad 917 1.1 riastrad LRI(1, POSTED), 918 1.1 riastrad REG(0x1b0), 919 1.1 riastrad 920 1.1 riastrad NOP(10), 921 1.1 riastrad LRI(1, 0), 922 1.1 riastrad REG(0x0c8), 923 1.1 riastrad 924 1.1 riastrad END(80) 925 1.1 riastrad }; 926 1.1 riastrad 927 1.1 riastrad static const u8 gen12_rcs_offsets[] = { 928 1.1 riastrad NOP(1), 929 1.1 riastrad LRI(13, POSTED), 930 1.1 riastrad REG16(0x244), 931 1.1 riastrad REG(0x034), 932 1.1 riastrad REG(0x030), 933 1.1 riastrad REG(0x038), 934 1.1 riastrad REG(0x03c), 935 1.1 riastrad REG(0x168), 936 1.1 riastrad REG(0x140), 937 1.1 riastrad REG(0x110), 938 1.1 riastrad REG(0x1c0), 939 1.1 riastrad REG(0x1c4), 940 1.1 riastrad REG(0x1c8), 941 1.1 riastrad REG(0x180), 942 1.1 riastrad REG16(0x2b4), 943 1.1 riastrad 944 1.1 riastrad NOP(5), 945 1.1 riastrad LRI(9, POSTED), 946 1.1 riastrad REG16(0x3a8), 947 1.1 riastrad REG16(0x28c), 948 1.1 riastrad REG16(0x288), 949 1.1 riastrad REG16(0x284), 950 1.1 riastrad REG16(0x280), 951 1.1 riastrad REG16(0x27c), 952 1.1 riastrad REG16(0x278), 953 1.1 riastrad REG16(0x274), 954 1.1 riastrad REG16(0x270), 955 1.1 riastrad 956 1.1 riastrad LRI(3, POSTED), 957 1.1 riastrad REG(0x1b0), 958 1.1 riastrad REG16(0x5a8), 959 1.1 riastrad REG16(0x5ac), 960 1.1 riastrad 961 1.1 riastrad NOP(6), 962 1.1 riastrad LRI(1, 0), 963 1.1 riastrad REG(0x0c8), 964 1.1 riastrad 965 1.1 riastrad END(80) 966 1.1 riastrad }; 967 1.1 riastrad 968 1.1 riastrad #undef END 969 1.1 riastrad #undef REG16 970 1.1 riastrad #undef REG 971 1.1 riastrad #undef LRI 972 1.1 riastrad #undef NOP 973 1.1 riastrad 974 1.1 riastrad static const u8 *reg_offsets(const struct intel_engine_cs *engine) 975 1.1 riastrad { 976 1.1 riastrad /* 977 1.1 riastrad * The gen12+ lists only have the registers we program in the basic 978 1.1 riastrad * default state. We rely on the context image using relative 979 1.1 riastrad * addressing to automatic fixup the register state between the 980 1.1 riastrad * physical engines for virtual engine. 981 1.1 riastrad */ 982 1.1 riastrad GEM_BUG_ON(INTEL_GEN(engine->i915) >= 12 && 983 1.1 riastrad !intel_engine_has_relative_mmio(engine)); 984 1.1 riastrad 985 1.1 riastrad if (engine->class == RENDER_CLASS) { 986 1.1 riastrad if (INTEL_GEN(engine->i915) >= 12) 987 1.1 riastrad return gen12_rcs_offsets; 988 1.1 riastrad else if (INTEL_GEN(engine->i915) >= 11) 989 1.1 riastrad return gen11_rcs_offsets; 990 1.1 riastrad else if (INTEL_GEN(engine->i915) >= 9) 991 1.1 riastrad return gen9_rcs_offsets; 992 1.1 riastrad else 993 1.1 riastrad return gen8_rcs_offsets; 994 1.1 riastrad } else { 995 1.1 riastrad if (INTEL_GEN(engine->i915) >= 12) 996 1.1 riastrad return gen12_xcs_offsets; 997 1.1 riastrad else if (INTEL_GEN(engine->i915) >= 9) 998 1.1 riastrad return gen9_xcs_offsets; 999 1.1 riastrad else 1000 1.1 riastrad return gen8_xcs_offsets; 1001 1.1 riastrad } 1002 1.1 riastrad } 1003 1.1 riastrad 1004 1.1 riastrad static struct i915_request * 1005 1.1 riastrad __unwind_incomplete_requests(struct intel_engine_cs *engine) 1006 1.1 riastrad { 1007 1.1 riastrad struct i915_request *rq, *rn, *active = NULL; 1008 1.1 riastrad struct list_head *uninitialized_var(pl); 1009 1.1 riastrad int prio = I915_PRIORITY_INVALID; 1010 1.1 riastrad 1011 1.1 riastrad lockdep_assert_held(&engine->active.lock); 1012 1.1 riastrad 1013 1.1 riastrad list_for_each_entry_safe_reverse(rq, rn, 1014 1.1 riastrad &engine->active.requests, 1015 1.1 riastrad sched.link) { 1016 1.1 riastrad if (i915_request_completed(rq)) 1017 1.1 riastrad continue; /* XXX */ 1018 1.1 riastrad 1019 1.1 riastrad __i915_request_unsubmit(rq); 1020 1.1 riastrad 1021 1.1 riastrad /* 1022 1.1 riastrad * Push the request back into the queue for later resubmission. 1023 1.1 riastrad * If this request is not native to this physical engine (i.e. 1024 1.1 riastrad * it came from a virtual source), push it back onto the virtual 1025 1.1 riastrad * engine so that it can be moved across onto another physical 1026 1.1 riastrad * engine as load dictates. 1027 1.1 riastrad */ 1028 1.1 riastrad if (likely(rq->execution_mask == engine->mask)) { 1029 1.1 riastrad GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID); 1030 1.1 riastrad if (rq_prio(rq) != prio) { 1031 1.1 riastrad prio = rq_prio(rq); 1032 1.1 riastrad pl = i915_sched_lookup_priolist(engine, prio); 1033 1.1 riastrad } 1034 1.1 riastrad GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); 1035 1.1 riastrad 1036 1.1 riastrad list_move(&rq->sched.link, pl); 1037 1.1 riastrad set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 1038 1.1 riastrad 1039 1.1 riastrad active = rq; 1040 1.1 riastrad } else { 1041 1.1 riastrad struct intel_engine_cs *owner = rq->context->engine; 1042 1.1 riastrad 1043 1.1 riastrad /* 1044 1.1 riastrad * Decouple the virtual breadcrumb before moving it 1045 1.1 riastrad * back to the virtual engine -- we don't want the 1046 1.1 riastrad * request to complete in the background and try 1047 1.1 riastrad * and cancel the breadcrumb on the virtual engine 1048 1.1 riastrad * (instead of the old engine where it is linked)! 1049 1.1 riastrad */ 1050 1.1 riastrad if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, 1051 1.1 riastrad &rq->fence.flags)) { 1052 1.1 riastrad spin_lock_nested(&rq->lock, 1053 1.1 riastrad SINGLE_DEPTH_NESTING); 1054 1.1 riastrad i915_request_cancel_breadcrumb(rq); 1055 1.1 riastrad spin_unlock(&rq->lock); 1056 1.1 riastrad } 1057 1.1 riastrad rq->engine = owner; 1058 1.1 riastrad owner->submit_request(rq); 1059 1.1 riastrad active = NULL; 1060 1.1 riastrad } 1061 1.1 riastrad } 1062 1.1 riastrad 1063 1.1 riastrad return active; 1064 1.1 riastrad } 1065 1.1 riastrad 1066 1.1 riastrad struct i915_request * 1067 1.1 riastrad execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists) 1068 1.1 riastrad { 1069 1.1 riastrad struct intel_engine_cs *engine = 1070 1.1 riastrad container_of(execlists, typeof(*engine), execlists); 1071 1.1 riastrad 1072 1.1 riastrad return __unwind_incomplete_requests(engine); 1073 1.1 riastrad } 1074 1.1 riastrad 1075 1.1 riastrad static inline void 1076 1.1 riastrad execlists_context_status_change(struct i915_request *rq, unsigned long status) 1077 1.1 riastrad { 1078 1.1 riastrad /* 1079 1.1 riastrad * Only used when GVT-g is enabled now. When GVT-g is disabled, 1080 1.1 riastrad * The compiler should eliminate this function as dead-code. 1081 1.1 riastrad */ 1082 1.1 riastrad if (!IS_ENABLED(CONFIG_DRM_I915_GVT)) 1083 1.1 riastrad return; 1084 1.1 riastrad 1085 1.1 riastrad atomic_notifier_call_chain(&rq->engine->context_status_notifier, 1086 1.1 riastrad status, rq); 1087 1.1 riastrad } 1088 1.1 riastrad 1089 1.1 riastrad static void intel_engine_context_in(struct intel_engine_cs *engine) 1090 1.1 riastrad { 1091 1.1 riastrad unsigned long flags; 1092 1.1 riastrad 1093 1.1 riastrad if (READ_ONCE(engine->stats.enabled) == 0) 1094 1.1 riastrad return; 1095 1.1 riastrad 1096 1.1 riastrad write_seqlock_irqsave(&engine->stats.lock, flags); 1097 1.1 riastrad 1098 1.1 riastrad if (engine->stats.enabled > 0) { 1099 1.1 riastrad if (engine->stats.active++ == 0) 1100 1.1 riastrad engine->stats.start = ktime_get(); 1101 1.1 riastrad GEM_BUG_ON(engine->stats.active == 0); 1102 1.1 riastrad } 1103 1.1 riastrad 1104 1.1 riastrad write_sequnlock_irqrestore(&engine->stats.lock, flags); 1105 1.1 riastrad } 1106 1.1 riastrad 1107 1.1 riastrad static void intel_engine_context_out(struct intel_engine_cs *engine) 1108 1.1 riastrad { 1109 1.1 riastrad unsigned long flags; 1110 1.1 riastrad 1111 1.1 riastrad if (READ_ONCE(engine->stats.enabled) == 0) 1112 1.1 riastrad return; 1113 1.1 riastrad 1114 1.1 riastrad write_seqlock_irqsave(&engine->stats.lock, flags); 1115 1.1 riastrad 1116 1.1 riastrad if (engine->stats.enabled > 0) { 1117 1.1 riastrad ktime_t last; 1118 1.1 riastrad 1119 1.1 riastrad if (engine->stats.active && --engine->stats.active == 0) { 1120 1.1 riastrad /* 1121 1.1 riastrad * Decrement the active context count and in case GPU 1122 1.1 riastrad * is now idle add up to the running total. 1123 1.1 riastrad */ 1124 1.1 riastrad last = ktime_sub(ktime_get(), engine->stats.start); 1125 1.1 riastrad 1126 1.1 riastrad engine->stats.total = ktime_add(engine->stats.total, 1127 1.1 riastrad last); 1128 1.1 riastrad } else if (engine->stats.active == 0) { 1129 1.1 riastrad /* 1130 1.1 riastrad * After turning on engine stats, context out might be 1131 1.1 riastrad * the first event in which case we account from the 1132 1.1 riastrad * time stats gathering was turned on. 1133 1.1 riastrad */ 1134 1.1 riastrad last = ktime_sub(ktime_get(), engine->stats.enabled_at); 1135 1.1 riastrad 1136 1.1 riastrad engine->stats.total = ktime_add(engine->stats.total, 1137 1.1 riastrad last); 1138 1.1 riastrad } 1139 1.1 riastrad } 1140 1.1 riastrad 1141 1.1 riastrad write_sequnlock_irqrestore(&engine->stats.lock, flags); 1142 1.1 riastrad } 1143 1.1 riastrad 1144 1.1 riastrad static int lrc_ring_mi_mode(const struct intel_engine_cs *engine) 1145 1.1 riastrad { 1146 1.1 riastrad if (INTEL_GEN(engine->i915) >= 12) 1147 1.1 riastrad return 0x60; 1148 1.1 riastrad else if (INTEL_GEN(engine->i915) >= 9) 1149 1.1 riastrad return 0x54; 1150 1.1 riastrad else if (engine->class == RENDER_CLASS) 1151 1.1 riastrad return 0x58; 1152 1.1 riastrad else 1153 1.1 riastrad return -1; 1154 1.1 riastrad } 1155 1.1 riastrad 1156 1.1 riastrad static void 1157 1.1 riastrad execlists_check_context(const struct intel_context *ce, 1158 1.1 riastrad const struct intel_engine_cs *engine) 1159 1.1 riastrad { 1160 1.1 riastrad const struct intel_ring *ring = ce->ring; 1161 1.1 riastrad u32 *regs = ce->lrc_reg_state; 1162 1.1 riastrad bool valid = true; 1163 1.1 riastrad int x; 1164 1.1 riastrad 1165 1.1 riastrad if (regs[CTX_RING_START] != i915_ggtt_offset(ring->vma)) { 1166 1.1 riastrad pr_err("%s: context submitted with incorrect RING_START [%08x], expected %08x\n", 1167 1.1 riastrad engine->name, 1168 1.1 riastrad regs[CTX_RING_START], 1169 1.1 riastrad i915_ggtt_offset(ring->vma)); 1170 1.1 riastrad regs[CTX_RING_START] = i915_ggtt_offset(ring->vma); 1171 1.1 riastrad valid = false; 1172 1.1 riastrad } 1173 1.1 riastrad 1174 1.1 riastrad if ((regs[CTX_RING_CTL] & ~(RING_WAIT | RING_WAIT_SEMAPHORE)) != 1175 1.1 riastrad (RING_CTL_SIZE(ring->size) | RING_VALID)) { 1176 1.1 riastrad pr_err("%s: context submitted with incorrect RING_CTL [%08x], expected %08x\n", 1177 1.1 riastrad engine->name, 1178 1.1 riastrad regs[CTX_RING_CTL], 1179 1.1 riastrad (u32)(RING_CTL_SIZE(ring->size) | RING_VALID)); 1180 1.1 riastrad regs[CTX_RING_CTL] = RING_CTL_SIZE(ring->size) | RING_VALID; 1181 1.1 riastrad valid = false; 1182 1.1 riastrad } 1183 1.1 riastrad 1184 1.1 riastrad x = lrc_ring_mi_mode(engine); 1185 1.1 riastrad if (x != -1 && regs[x + 1] & (regs[x + 1] >> 16) & STOP_RING) { 1186 1.1 riastrad pr_err("%s: context submitted with STOP_RING [%08x] in RING_MI_MODE\n", 1187 1.1 riastrad engine->name, regs[x + 1]); 1188 1.1 riastrad regs[x + 1] &= ~STOP_RING; 1189 1.1 riastrad regs[x + 1] |= STOP_RING << 16; 1190 1.1 riastrad valid = false; 1191 1.1 riastrad } 1192 1.1 riastrad 1193 1.1 riastrad WARN_ONCE(!valid, "Invalid lrc state found before submission\n"); 1194 1.1 riastrad } 1195 1.1 riastrad 1196 1.1 riastrad static void restore_default_state(struct intel_context *ce, 1197 1.1 riastrad struct intel_engine_cs *engine) 1198 1.1 riastrad { 1199 1.1 riastrad u32 *regs = ce->lrc_reg_state; 1200 1.1 riastrad 1201 1.1 riastrad if (engine->pinned_default_state) 1202 1.1 riastrad memcpy(regs, /* skip restoring the vanilla PPHWSP */ 1203 1.1 riastrad engine->pinned_default_state + LRC_STATE_PN * PAGE_SIZE, 1204 1.1 riastrad engine->context_size - PAGE_SIZE); 1205 1.1 riastrad 1206 1.1 riastrad execlists_init_reg_state(regs, ce, engine, ce->ring, false); 1207 1.1 riastrad } 1208 1.1 riastrad 1209 1.1 riastrad static void reset_active(struct i915_request *rq, 1210 1.1 riastrad struct intel_engine_cs *engine) 1211 1.1 riastrad { 1212 1.1 riastrad struct intel_context * const ce = rq->context; 1213 1.1 riastrad u32 head; 1214 1.1 riastrad 1215 1.1 riastrad /* 1216 1.1 riastrad * The executing context has been cancelled. We want to prevent 1217 1.1 riastrad * further execution along this context and propagate the error on 1218 1.1 riastrad * to anything depending on its results. 1219 1.1 riastrad * 1220 1.1 riastrad * In __i915_request_submit(), we apply the -EIO and remove the 1221 1.1 riastrad * requests' payloads for any banned requests. But first, we must 1222 1.1 riastrad * rewind the context back to the start of the incomplete request so 1223 1.1 riastrad * that we do not jump back into the middle of the batch. 1224 1.1 riastrad * 1225 1.1 riastrad * We preserve the breadcrumbs and semaphores of the incomplete 1226 1.1 riastrad * requests so that inter-timeline dependencies (i.e other timelines) 1227 1.1 riastrad * remain correctly ordered. And we defer to __i915_request_submit() 1228 1.1 riastrad * so that all asynchronous waits are correctly handled. 1229 1.1 riastrad */ 1230 1.1 riastrad ENGINE_TRACE(engine, "{ rq=%llx:%lld }\n", 1231 1.1 riastrad rq->fence.context, rq->fence.seqno); 1232 1.1 riastrad 1233 1.1 riastrad /* On resubmission of the active request, payload will be scrubbed */ 1234 1.1 riastrad if (i915_request_completed(rq)) 1235 1.1 riastrad head = rq->tail; 1236 1.1 riastrad else 1237 1.1 riastrad head = active_request(ce->timeline, rq)->head; 1238 1.1 riastrad head = intel_ring_wrap(ce->ring, head); 1239 1.1 riastrad 1240 1.1 riastrad /* Scrub the context image to prevent replaying the previous batch */ 1241 1.1 riastrad restore_default_state(ce, engine); 1242 1.1 riastrad __execlists_update_reg_state(ce, engine, head); 1243 1.1 riastrad 1244 1.1 riastrad /* We've switched away, so this should be a no-op, but intent matters */ 1245 1.1 riastrad ce->lrc_desc |= CTX_DESC_FORCE_RESTORE; 1246 1.1 riastrad } 1247 1.1 riastrad 1248 1.1 riastrad static inline struct intel_engine_cs * 1249 1.1 riastrad __execlists_schedule_in(struct i915_request *rq) 1250 1.1 riastrad { 1251 1.1 riastrad struct intel_engine_cs * const engine = rq->engine; 1252 1.1 riastrad struct intel_context * const ce = rq->context; 1253 1.1 riastrad 1254 1.1 riastrad intel_context_get(ce); 1255 1.1 riastrad 1256 1.1 riastrad if (unlikely(intel_context_is_banned(ce))) 1257 1.1 riastrad reset_active(rq, engine); 1258 1.1 riastrad 1259 1.1 riastrad if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) 1260 1.1 riastrad execlists_check_context(ce, engine); 1261 1.1 riastrad 1262 1.1 riastrad if (ce->tag) { 1263 1.1 riastrad /* Use a fixed tag for OA and friends */ 1264 1.1 riastrad ce->lrc_desc |= (u64)ce->tag << 32; 1265 1.1 riastrad } else { 1266 1.1 riastrad /* We don't need a strict matching tag, just different values */ 1267 1.1 riastrad ce->lrc_desc &= ~GENMASK_ULL(47, 37); 1268 1.1 riastrad ce->lrc_desc |= 1269 1.1 riastrad (u64)(++engine->context_tag % NUM_CONTEXT_TAG) << 1270 1.1 riastrad GEN11_SW_CTX_ID_SHIFT; 1271 1.1 riastrad BUILD_BUG_ON(NUM_CONTEXT_TAG > GEN12_MAX_CONTEXT_HW_ID); 1272 1.1 riastrad } 1273 1.1 riastrad 1274 1.1 riastrad __intel_gt_pm_get(engine->gt); 1275 1.1 riastrad execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN); 1276 1.1 riastrad intel_engine_context_in(engine); 1277 1.1 riastrad 1278 1.1 riastrad return engine; 1279 1.1 riastrad } 1280 1.1 riastrad 1281 1.1 riastrad static inline struct i915_request * 1282 1.1 riastrad execlists_schedule_in(struct i915_request *rq, int idx) 1283 1.1 riastrad { 1284 1.1 riastrad struct intel_context * const ce = rq->context; 1285 1.1 riastrad struct intel_engine_cs *old; 1286 1.1 riastrad 1287 1.1 riastrad GEM_BUG_ON(!intel_engine_pm_is_awake(rq->engine)); 1288 1.1 riastrad trace_i915_request_in(rq, idx); 1289 1.1 riastrad 1290 1.1 riastrad old = READ_ONCE(ce->inflight); 1291 1.1 riastrad do { 1292 1.1 riastrad if (!old) { 1293 1.1 riastrad WRITE_ONCE(ce->inflight, __execlists_schedule_in(rq)); 1294 1.1 riastrad break; 1295 1.1 riastrad } 1296 1.1 riastrad } while (!try_cmpxchg(&ce->inflight, &old, ptr_inc(old))); 1297 1.1 riastrad 1298 1.1 riastrad GEM_BUG_ON(intel_context_inflight(ce) != rq->engine); 1299 1.1 riastrad return i915_request_get(rq); 1300 1.1 riastrad } 1301 1.1 riastrad 1302 1.1 riastrad static void kick_siblings(struct i915_request *rq, struct intel_context *ce) 1303 1.1 riastrad { 1304 1.1 riastrad struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 1305 1.1 riastrad struct i915_request *next = READ_ONCE(ve->request); 1306 1.1 riastrad 1307 1.1 riastrad if (next && next->execution_mask & ~rq->execution_mask) 1308 1.1 riastrad tasklet_schedule(&ve->base.execlists.tasklet); 1309 1.1 riastrad } 1310 1.1 riastrad 1311 1.1 riastrad static inline void 1312 1.1 riastrad __execlists_schedule_out(struct i915_request *rq, 1313 1.1 riastrad struct intel_engine_cs * const engine) 1314 1.1 riastrad { 1315 1.1 riastrad struct intel_context * const ce = rq->context; 1316 1.1 riastrad 1317 1.1 riastrad /* 1318 1.1 riastrad * NB process_csb() is not under the engine->active.lock and hence 1319 1.1 riastrad * schedule_out can race with schedule_in meaning that we should 1320 1.1 riastrad * refrain from doing non-trivial work here. 1321 1.1 riastrad */ 1322 1.1 riastrad 1323 1.1 riastrad /* 1324 1.1 riastrad * If we have just completed this context, the engine may now be 1325 1.1 riastrad * idle and we want to re-enter powersaving. 1326 1.1 riastrad */ 1327 1.1 riastrad if (list_is_last(&rq->link, &ce->timeline->requests) && 1328 1.1 riastrad i915_request_completed(rq)) 1329 1.1 riastrad intel_engine_add_retire(engine, ce->timeline); 1330 1.1 riastrad 1331 1.1 riastrad intel_engine_context_out(engine); 1332 1.1 riastrad execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT); 1333 1.1 riastrad intel_gt_pm_put_async(engine->gt); 1334 1.1 riastrad 1335 1.1 riastrad /* 1336 1.1 riastrad * If this is part of a virtual engine, its next request may 1337 1.1 riastrad * have been blocked waiting for access to the active context. 1338 1.1 riastrad * We have to kick all the siblings again in case we need to 1339 1.1 riastrad * switch (e.g. the next request is not runnable on this 1340 1.1 riastrad * engine). Hopefully, we will already have submitted the next 1341 1.1 riastrad * request before the tasklet runs and do not need to rebuild 1342 1.1 riastrad * each virtual tree and kick everyone again. 1343 1.1 riastrad */ 1344 1.1 riastrad if (ce->engine != engine) 1345 1.1 riastrad kick_siblings(rq, ce); 1346 1.1 riastrad 1347 1.1 riastrad intel_context_put(ce); 1348 1.1 riastrad } 1349 1.1 riastrad 1350 1.1 riastrad static inline void 1351 1.1 riastrad execlists_schedule_out(struct i915_request *rq) 1352 1.1 riastrad { 1353 1.1 riastrad struct intel_context * const ce = rq->context; 1354 1.1 riastrad struct intel_engine_cs *cur, *old; 1355 1.1 riastrad 1356 1.1 riastrad trace_i915_request_out(rq); 1357 1.1 riastrad 1358 1.1 riastrad old = READ_ONCE(ce->inflight); 1359 1.1 riastrad do 1360 1.1 riastrad cur = ptr_unmask_bits(old, 2) ? ptr_dec(old) : NULL; 1361 1.1 riastrad while (!try_cmpxchg(&ce->inflight, &old, cur)); 1362 1.1 riastrad if (!cur) 1363 1.1 riastrad __execlists_schedule_out(rq, old); 1364 1.1 riastrad 1365 1.1 riastrad i915_request_put(rq); 1366 1.1 riastrad } 1367 1.1 riastrad 1368 1.1 riastrad static u64 execlists_update_context(struct i915_request *rq) 1369 1.1 riastrad { 1370 1.1 riastrad struct intel_context *ce = rq->context; 1371 1.1 riastrad u64 desc = ce->lrc_desc; 1372 1.1 riastrad u32 tail, prev; 1373 1.1 riastrad 1374 1.1 riastrad /* 1375 1.1 riastrad * WaIdleLiteRestore:bdw,skl 1376 1.1 riastrad * 1377 1.1 riastrad * We should never submit the context with the same RING_TAIL twice 1378 1.1 riastrad * just in case we submit an empty ring, which confuses the HW. 1379 1.1 riastrad * 1380 1.1 riastrad * We append a couple of NOOPs (gen8_emit_wa_tail) after the end of 1381 1.1 riastrad * the normal request to be able to always advance the RING_TAIL on 1382 1.1 riastrad * subsequent resubmissions (for lite restore). Should that fail us, 1383 1.1 riastrad * and we try and submit the same tail again, force the context 1384 1.1 riastrad * reload. 1385 1.1 riastrad * 1386 1.1 riastrad * If we need to return to a preempted context, we need to skip the 1387 1.1 riastrad * lite-restore and force it to reload the RING_TAIL. Otherwise, the 1388 1.1 riastrad * HW has a tendency to ignore us rewinding the TAIL to the end of 1389 1.1 riastrad * an earlier request. 1390 1.1 riastrad */ 1391 1.1 riastrad tail = intel_ring_set_tail(rq->ring, rq->tail); 1392 1.1 riastrad prev = ce->lrc_reg_state[CTX_RING_TAIL]; 1393 1.1 riastrad if (unlikely(intel_ring_direction(rq->ring, tail, prev) <= 0)) 1394 1.1 riastrad desc |= CTX_DESC_FORCE_RESTORE; 1395 1.1 riastrad ce->lrc_reg_state[CTX_RING_TAIL] = tail; 1396 1.1 riastrad rq->tail = rq->wa_tail; 1397 1.1 riastrad 1398 1.1 riastrad /* 1399 1.1 riastrad * Make sure the context image is complete before we submit it to HW. 1400 1.1 riastrad * 1401 1.1 riastrad * Ostensibly, writes (including the WCB) should be flushed prior to 1402 1.1 riastrad * an uncached write such as our mmio register access, the empirical 1403 1.1 riastrad * evidence (esp. on Braswell) suggests that the WC write into memory 1404 1.1 riastrad * may not be visible to the HW prior to the completion of the UC 1405 1.1 riastrad * register write and that we may begin execution from the context 1406 1.1 riastrad * before its image is complete leading to invalid PD chasing. 1407 1.1 riastrad */ 1408 1.1 riastrad wmb(); 1409 1.1 riastrad 1410 1.1 riastrad ce->lrc_desc &= ~CTX_DESC_FORCE_RESTORE; 1411 1.1 riastrad return desc; 1412 1.1 riastrad } 1413 1.1 riastrad 1414 1.1 riastrad static inline void write_desc(struct intel_engine_execlists *execlists, u64 desc, u32 port) 1415 1.1 riastrad { 1416 1.4 riastrad #ifdef __NetBSD__ 1417 1.4 riastrad if (execlists->ctrl_reg) { 1418 1.4 riastrad bus_space_write_4(execlists->bst, execlists->bsh, execlists->submit_reg + port * 2, lower_32_bits(desc)); 1419 1.4 riastrad bus_space_write_4(execlists->bst, execlists->bsh, execlists->submit_reg + port * 2 + 1, upper_32_bits(desc)); 1420 1.4 riastrad } else { 1421 1.4 riastrad bus_space_write_4(execlists->bst, execlists->bsh, execlists->submit_reg, upper_32_bits(desc)); 1422 1.4 riastrad bus_space_write_4(execlists->bst, execlists->bsh, execlists->submit_reg, lower_32_bits(desc)); 1423 1.4 riastrad } 1424 1.4 riastrad #else 1425 1.1 riastrad if (execlists->ctrl_reg) { 1426 1.1 riastrad writel(lower_32_bits(desc), execlists->submit_reg + port * 2); 1427 1.1 riastrad writel(upper_32_bits(desc), execlists->submit_reg + port * 2 + 1); 1428 1.1 riastrad } else { 1429 1.1 riastrad writel(upper_32_bits(desc), execlists->submit_reg); 1430 1.1 riastrad writel(lower_32_bits(desc), execlists->submit_reg); 1431 1.1 riastrad } 1432 1.4 riastrad #endif 1433 1.1 riastrad } 1434 1.1 riastrad 1435 1.1 riastrad static __maybe_unused void 1436 1.1 riastrad trace_ports(const struct intel_engine_execlists *execlists, 1437 1.1 riastrad const char *msg, 1438 1.1 riastrad struct i915_request * const *ports) 1439 1.1 riastrad { 1440 1.1 riastrad const struct intel_engine_cs *engine = 1441 1.5 riastrad const_container_of(execlists, typeof(*engine), execlists); 1442 1.1 riastrad 1443 1.1 riastrad if (!ports[0]) 1444 1.1 riastrad return; 1445 1.1 riastrad 1446 1.1 riastrad ENGINE_TRACE(engine, "%s { %llx:%lld%s, %llx:%lld }\n", msg, 1447 1.1 riastrad ports[0]->fence.context, 1448 1.1 riastrad ports[0]->fence.seqno, 1449 1.1 riastrad i915_request_completed(ports[0]) ? "!" : 1450 1.1 riastrad i915_request_started(ports[0]) ? "*" : 1451 1.1 riastrad "", 1452 1.1 riastrad ports[1] ? ports[1]->fence.context : 0, 1453 1.1 riastrad ports[1] ? ports[1]->fence.seqno : 0); 1454 1.1 riastrad } 1455 1.1 riastrad 1456 1.1 riastrad static __maybe_unused bool 1457 1.1 riastrad assert_pending_valid(const struct intel_engine_execlists *execlists, 1458 1.1 riastrad const char *msg) 1459 1.1 riastrad { 1460 1.1 riastrad struct i915_request * const *port, *rq; 1461 1.1 riastrad struct intel_context *ce = NULL; 1462 1.1 riastrad 1463 1.1 riastrad trace_ports(execlists, msg, execlists->pending); 1464 1.1 riastrad 1465 1.1 riastrad if (!execlists->pending[0]) { 1466 1.1 riastrad GEM_TRACE_ERR("Nothing pending for promotion!\n"); 1467 1.1 riastrad return false; 1468 1.1 riastrad } 1469 1.1 riastrad 1470 1.1 riastrad if (execlists->pending[execlists_num_ports(execlists)]) { 1471 1.1 riastrad GEM_TRACE_ERR("Excess pending[%d] for promotion!\n", 1472 1.1 riastrad execlists_num_ports(execlists)); 1473 1.1 riastrad return false; 1474 1.1 riastrad } 1475 1.1 riastrad 1476 1.1 riastrad for (port = execlists->pending; (rq = *port); port++) { 1477 1.1 riastrad unsigned long flags; 1478 1.1 riastrad bool ok = true; 1479 1.1 riastrad 1480 1.1 riastrad GEM_BUG_ON(!kref_read(&rq->fence.refcount)); 1481 1.1 riastrad GEM_BUG_ON(!i915_request_is_active(rq)); 1482 1.1 riastrad 1483 1.1 riastrad if (ce == rq->context) { 1484 1.1 riastrad GEM_TRACE_ERR("Dup context:%llx in pending[%zd]\n", 1485 1.1 riastrad ce->timeline->fence_context, 1486 1.1 riastrad port - execlists->pending); 1487 1.1 riastrad return false; 1488 1.1 riastrad } 1489 1.1 riastrad ce = rq->context; 1490 1.1 riastrad 1491 1.1 riastrad /* Hold tightly onto the lock to prevent concurrent retires! */ 1492 1.1 riastrad if (!spin_trylock_irqsave(&rq->lock, flags)) 1493 1.1 riastrad continue; 1494 1.1 riastrad 1495 1.1 riastrad if (i915_request_completed(rq)) 1496 1.1 riastrad goto unlock; 1497 1.1 riastrad 1498 1.1 riastrad if (i915_active_is_idle(&ce->active) && 1499 1.1 riastrad !intel_context_is_barrier(ce)) { 1500 1.1 riastrad GEM_TRACE_ERR("Inactive context:%llx in pending[%zd]\n", 1501 1.1 riastrad ce->timeline->fence_context, 1502 1.1 riastrad port - execlists->pending); 1503 1.1 riastrad ok = false; 1504 1.1 riastrad goto unlock; 1505 1.1 riastrad } 1506 1.1 riastrad 1507 1.1 riastrad if (!i915_vma_is_pinned(ce->state)) { 1508 1.1 riastrad GEM_TRACE_ERR("Unpinned context:%llx in pending[%zd]\n", 1509 1.1 riastrad ce->timeline->fence_context, 1510 1.1 riastrad port - execlists->pending); 1511 1.1 riastrad ok = false; 1512 1.1 riastrad goto unlock; 1513 1.1 riastrad } 1514 1.1 riastrad 1515 1.1 riastrad if (!i915_vma_is_pinned(ce->ring->vma)) { 1516 1.1 riastrad GEM_TRACE_ERR("Unpinned ring:%llx in pending[%zd]\n", 1517 1.1 riastrad ce->timeline->fence_context, 1518 1.1 riastrad port - execlists->pending); 1519 1.1 riastrad ok = false; 1520 1.1 riastrad goto unlock; 1521 1.1 riastrad } 1522 1.1 riastrad 1523 1.1 riastrad unlock: 1524 1.1 riastrad spin_unlock_irqrestore(&rq->lock, flags); 1525 1.1 riastrad if (!ok) 1526 1.1 riastrad return false; 1527 1.1 riastrad } 1528 1.1 riastrad 1529 1.1 riastrad return ce; 1530 1.1 riastrad } 1531 1.1 riastrad 1532 1.1 riastrad static void execlists_submit_ports(struct intel_engine_cs *engine) 1533 1.1 riastrad { 1534 1.1 riastrad struct intel_engine_execlists *execlists = &engine->execlists; 1535 1.1 riastrad unsigned int n; 1536 1.1 riastrad 1537 1.1 riastrad GEM_BUG_ON(!assert_pending_valid(execlists, "submit")); 1538 1.1 riastrad 1539 1.1 riastrad /* 1540 1.1 riastrad * We can skip acquiring intel_runtime_pm_get() here as it was taken 1541 1.1 riastrad * on our behalf by the request (see i915_gem_mark_busy()) and it will 1542 1.1 riastrad * not be relinquished until the device is idle (see 1543 1.1 riastrad * i915_gem_idle_work_handler()). As a precaution, we make sure 1544 1.1 riastrad * that all ELSP are drained i.e. we have processed the CSB, 1545 1.1 riastrad * before allowing ourselves to idle and calling intel_runtime_pm_put(). 1546 1.1 riastrad */ 1547 1.1 riastrad GEM_BUG_ON(!intel_engine_pm_is_awake(engine)); 1548 1.1 riastrad 1549 1.1 riastrad /* 1550 1.1 riastrad * ELSQ note: the submit queue is not cleared after being submitted 1551 1.1 riastrad * to the HW so we need to make sure we always clean it up. This is 1552 1.1 riastrad * currently ensured by the fact that we always write the same number 1553 1.1 riastrad * of elsq entries, keep this in mind before changing the loop below. 1554 1.1 riastrad */ 1555 1.1 riastrad for (n = execlists_num_ports(execlists); n--; ) { 1556 1.1 riastrad struct i915_request *rq = execlists->pending[n]; 1557 1.1 riastrad 1558 1.1 riastrad write_desc(execlists, 1559 1.1 riastrad rq ? execlists_update_context(rq) : 0, 1560 1.1 riastrad n); 1561 1.1 riastrad } 1562 1.1 riastrad 1563 1.1 riastrad /* we need to manually load the submit queue */ 1564 1.1 riastrad if (execlists->ctrl_reg) 1565 1.6 riastrad #ifdef __NetBSD__ 1566 1.6 riastrad bus_space_write_4(execlists->bst, execlists->bsh, execlists->ctrl_reg, EL_CTRL_LOAD); 1567 1.6 riastrad #else 1568 1.1 riastrad writel(EL_CTRL_LOAD, execlists->ctrl_reg); 1569 1.6 riastrad #endif 1570 1.1 riastrad } 1571 1.1 riastrad 1572 1.1 riastrad static bool ctx_single_port_submission(const struct intel_context *ce) 1573 1.1 riastrad { 1574 1.1 riastrad return (IS_ENABLED(CONFIG_DRM_I915_GVT) && 1575 1.1 riastrad intel_context_force_single_submission(ce)); 1576 1.1 riastrad } 1577 1.1 riastrad 1578 1.1 riastrad static bool can_merge_ctx(const struct intel_context *prev, 1579 1.1 riastrad const struct intel_context *next) 1580 1.1 riastrad { 1581 1.1 riastrad if (prev != next) 1582 1.1 riastrad return false; 1583 1.1 riastrad 1584 1.1 riastrad if (ctx_single_port_submission(prev)) 1585 1.1 riastrad return false; 1586 1.1 riastrad 1587 1.1 riastrad return true; 1588 1.1 riastrad } 1589 1.1 riastrad 1590 1.1 riastrad static bool can_merge_rq(const struct i915_request *prev, 1591 1.1 riastrad const struct i915_request *next) 1592 1.1 riastrad { 1593 1.1 riastrad GEM_BUG_ON(prev == next); 1594 1.1 riastrad GEM_BUG_ON(!assert_priority_queue(prev, next)); 1595 1.1 riastrad 1596 1.1 riastrad /* 1597 1.1 riastrad * We do not submit known completed requests. Therefore if the next 1598 1.1 riastrad * request is already completed, we can pretend to merge it in 1599 1.1 riastrad * with the previous context (and we will skip updating the ELSP 1600 1.1 riastrad * and tracking). Thus hopefully keeping the ELSP full with active 1601 1.1 riastrad * contexts, despite the best efforts of preempt-to-busy to confuse 1602 1.1 riastrad * us. 1603 1.1 riastrad */ 1604 1.1 riastrad if (i915_request_completed(next)) 1605 1.1 riastrad return true; 1606 1.1 riastrad 1607 1.1 riastrad if (unlikely((prev->fence.flags ^ next->fence.flags) & 1608 1.1 riastrad (BIT(I915_FENCE_FLAG_NOPREEMPT) | 1609 1.1 riastrad BIT(I915_FENCE_FLAG_SENTINEL)))) 1610 1.1 riastrad return false; 1611 1.1 riastrad 1612 1.1 riastrad if (!can_merge_ctx(prev->context, next->context)) 1613 1.1 riastrad return false; 1614 1.1 riastrad 1615 1.1 riastrad return true; 1616 1.1 riastrad } 1617 1.1 riastrad 1618 1.1 riastrad static void virtual_update_register_offsets(u32 *regs, 1619 1.1 riastrad struct intel_engine_cs *engine) 1620 1.1 riastrad { 1621 1.1 riastrad set_offsets(regs, reg_offsets(engine), engine, false); 1622 1.1 riastrad } 1623 1.1 riastrad 1624 1.1 riastrad static bool virtual_matches(const struct virtual_engine *ve, 1625 1.1 riastrad const struct i915_request *rq, 1626 1.1 riastrad const struct intel_engine_cs *engine) 1627 1.1 riastrad { 1628 1.1 riastrad const struct intel_engine_cs *inflight; 1629 1.1 riastrad 1630 1.1 riastrad if (!(rq->execution_mask & engine->mask)) /* We peeked too soon! */ 1631 1.1 riastrad return false; 1632 1.1 riastrad 1633 1.1 riastrad /* 1634 1.1 riastrad * We track when the HW has completed saving the context image 1635 1.1 riastrad * (i.e. when we have seen the final CS event switching out of 1636 1.1 riastrad * the context) and must not overwrite the context image before 1637 1.1 riastrad * then. This restricts us to only using the active engine 1638 1.1 riastrad * while the previous virtualized request is inflight (so 1639 1.1 riastrad * we reuse the register offsets). This is a very small 1640 1.1 riastrad * hystersis on the greedy seelction algorithm. 1641 1.1 riastrad */ 1642 1.1 riastrad inflight = intel_context_inflight(&ve->context); 1643 1.1 riastrad if (inflight && inflight != engine) 1644 1.1 riastrad return false; 1645 1.1 riastrad 1646 1.1 riastrad return true; 1647 1.1 riastrad } 1648 1.1 riastrad 1649 1.1 riastrad static void virtual_xfer_breadcrumbs(struct virtual_engine *ve, 1650 1.1 riastrad struct intel_engine_cs *engine) 1651 1.1 riastrad { 1652 1.1 riastrad struct intel_engine_cs *old = ve->siblings[0]; 1653 1.1 riastrad 1654 1.1 riastrad /* All unattached (rq->engine == old) must already be completed */ 1655 1.1 riastrad 1656 1.1 riastrad spin_lock(&old->breadcrumbs.irq_lock); 1657 1.1 riastrad if (!list_empty(&ve->context.signal_link)) { 1658 1.1 riastrad list_move_tail(&ve->context.signal_link, 1659 1.1 riastrad &engine->breadcrumbs.signalers); 1660 1.1 riastrad intel_engine_signal_breadcrumbs(engine); 1661 1.1 riastrad } 1662 1.1 riastrad spin_unlock(&old->breadcrumbs.irq_lock); 1663 1.1 riastrad } 1664 1.1 riastrad 1665 1.1 riastrad static struct i915_request * 1666 1.1 riastrad last_active(const struct intel_engine_execlists *execlists) 1667 1.1 riastrad { 1668 1.1 riastrad struct i915_request * const *last = READ_ONCE(execlists->active); 1669 1.1 riastrad 1670 1.1 riastrad while (*last && i915_request_completed(*last)) 1671 1.1 riastrad last++; 1672 1.1 riastrad 1673 1.1 riastrad return *last; 1674 1.1 riastrad } 1675 1.1 riastrad 1676 1.1 riastrad #define for_each_waiter(p__, rq__) \ 1677 1.1 riastrad list_for_each_entry_lockless(p__, \ 1678 1.1 riastrad &(rq__)->sched.waiters_list, \ 1679 1.1 riastrad wait_link) 1680 1.1 riastrad 1681 1.1 riastrad static void defer_request(struct i915_request *rq, struct list_head * const pl) 1682 1.1 riastrad { 1683 1.1 riastrad LIST_HEAD(list); 1684 1.1 riastrad 1685 1.1 riastrad /* 1686 1.1 riastrad * We want to move the interrupted request to the back of 1687 1.1 riastrad * the round-robin list (i.e. its priority level), but 1688 1.1 riastrad * in doing so, we must then move all requests that were in 1689 1.1 riastrad * flight and were waiting for the interrupted request to 1690 1.1 riastrad * be run after it again. 1691 1.1 riastrad */ 1692 1.1 riastrad do { 1693 1.1 riastrad struct i915_dependency *p; 1694 1.1 riastrad 1695 1.1 riastrad GEM_BUG_ON(i915_request_is_active(rq)); 1696 1.1 riastrad list_move_tail(&rq->sched.link, pl); 1697 1.1 riastrad 1698 1.1 riastrad for_each_waiter(p, rq) { 1699 1.1 riastrad struct i915_request *w = 1700 1.1 riastrad container_of(p->waiter, typeof(*w), sched); 1701 1.1 riastrad 1702 1.1 riastrad /* Leave semaphores spinning on the other engines */ 1703 1.1 riastrad if (w->engine != rq->engine) 1704 1.1 riastrad continue; 1705 1.1 riastrad 1706 1.1 riastrad /* No waiter should start before its signaler */ 1707 1.1 riastrad GEM_BUG_ON(i915_request_started(w) && 1708 1.1 riastrad !i915_request_completed(rq)); 1709 1.1 riastrad 1710 1.1 riastrad GEM_BUG_ON(i915_request_is_active(w)); 1711 1.1 riastrad if (!i915_request_is_ready(w)) 1712 1.1 riastrad continue; 1713 1.1 riastrad 1714 1.1 riastrad if (rq_prio(w) < rq_prio(rq)) 1715 1.1 riastrad continue; 1716 1.1 riastrad 1717 1.1 riastrad GEM_BUG_ON(rq_prio(w) > rq_prio(rq)); 1718 1.1 riastrad list_move_tail(&w->sched.link, &list); 1719 1.1 riastrad } 1720 1.1 riastrad 1721 1.1 riastrad rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); 1722 1.1 riastrad } while (rq); 1723 1.1 riastrad } 1724 1.1 riastrad 1725 1.1 riastrad static void defer_active(struct intel_engine_cs *engine) 1726 1.1 riastrad { 1727 1.1 riastrad struct i915_request *rq; 1728 1.1 riastrad 1729 1.1 riastrad rq = __unwind_incomplete_requests(engine); 1730 1.1 riastrad if (!rq) 1731 1.1 riastrad return; 1732 1.1 riastrad 1733 1.1 riastrad defer_request(rq, i915_sched_lookup_priolist(engine, rq_prio(rq))); 1734 1.1 riastrad } 1735 1.1 riastrad 1736 1.1 riastrad static bool 1737 1.1 riastrad need_timeslice(struct intel_engine_cs *engine, const struct i915_request *rq) 1738 1.1 riastrad { 1739 1.1 riastrad int hint; 1740 1.1 riastrad 1741 1.1 riastrad if (!intel_engine_has_timeslices(engine)) 1742 1.1 riastrad return false; 1743 1.1 riastrad 1744 1.1 riastrad if (list_is_last(&rq->sched.link, &engine->active.requests)) 1745 1.1 riastrad return false; 1746 1.1 riastrad 1747 1.1 riastrad hint = max(rq_prio(list_next_entry(rq, sched.link)), 1748 1.1 riastrad engine->execlists.queue_priority_hint); 1749 1.1 riastrad 1750 1.1 riastrad return hint >= effective_prio(rq); 1751 1.1 riastrad } 1752 1.1 riastrad 1753 1.1 riastrad static int 1754 1.1 riastrad switch_prio(struct intel_engine_cs *engine, const struct i915_request *rq) 1755 1.1 riastrad { 1756 1.1 riastrad if (list_is_last(&rq->sched.link, &engine->active.requests)) 1757 1.1 riastrad return INT_MIN; 1758 1.1 riastrad 1759 1.1 riastrad return rq_prio(list_next_entry(rq, sched.link)); 1760 1.1 riastrad } 1761 1.1 riastrad 1762 1.1 riastrad static inline unsigned long 1763 1.1 riastrad timeslice(const struct intel_engine_cs *engine) 1764 1.1 riastrad { 1765 1.1 riastrad return READ_ONCE(engine->props.timeslice_duration_ms); 1766 1.1 riastrad } 1767 1.1 riastrad 1768 1.1 riastrad static unsigned long 1769 1.1 riastrad active_timeslice(const struct intel_engine_cs *engine) 1770 1.1 riastrad { 1771 1.1 riastrad const struct i915_request *rq = *engine->execlists.active; 1772 1.1 riastrad 1773 1.1 riastrad if (!rq || i915_request_completed(rq)) 1774 1.1 riastrad return 0; 1775 1.1 riastrad 1776 1.1 riastrad if (engine->execlists.switch_priority_hint < effective_prio(rq)) 1777 1.1 riastrad return 0; 1778 1.1 riastrad 1779 1.1 riastrad return timeslice(engine); 1780 1.1 riastrad } 1781 1.1 riastrad 1782 1.1 riastrad static void set_timeslice(struct intel_engine_cs *engine) 1783 1.1 riastrad { 1784 1.1 riastrad if (!intel_engine_has_timeslices(engine)) 1785 1.1 riastrad return; 1786 1.1 riastrad 1787 1.1 riastrad set_timer_ms(&engine->execlists.timer, active_timeslice(engine)); 1788 1.1 riastrad } 1789 1.1 riastrad 1790 1.1 riastrad static void record_preemption(struct intel_engine_execlists *execlists) 1791 1.1 riastrad { 1792 1.1 riastrad (void)I915_SELFTEST_ONLY(execlists->preempt_hang.count++); 1793 1.1 riastrad } 1794 1.1 riastrad 1795 1.1 riastrad static unsigned long active_preempt_timeout(struct intel_engine_cs *engine) 1796 1.1 riastrad { 1797 1.1 riastrad struct i915_request *rq; 1798 1.1 riastrad 1799 1.1 riastrad rq = last_active(&engine->execlists); 1800 1.1 riastrad if (!rq) 1801 1.1 riastrad return 0; 1802 1.1 riastrad 1803 1.1 riastrad /* Force a fast reset for terminated contexts (ignoring sysfs!) */ 1804 1.1 riastrad if (unlikely(intel_context_is_banned(rq->context))) 1805 1.1 riastrad return 1; 1806 1.1 riastrad 1807 1.1 riastrad return READ_ONCE(engine->props.preempt_timeout_ms); 1808 1.1 riastrad } 1809 1.1 riastrad 1810 1.1 riastrad static void set_preempt_timeout(struct intel_engine_cs *engine) 1811 1.1 riastrad { 1812 1.1 riastrad if (!intel_engine_has_preempt_reset(engine)) 1813 1.1 riastrad return; 1814 1.1 riastrad 1815 1.1 riastrad set_timer_ms(&engine->execlists.preempt, 1816 1.1 riastrad active_preempt_timeout(engine)); 1817 1.1 riastrad } 1818 1.1 riastrad 1819 1.1 riastrad static inline void clear_ports(struct i915_request **ports, int count) 1820 1.1 riastrad { 1821 1.1 riastrad memset_p((void **)ports, NULL, count); 1822 1.1 riastrad } 1823 1.1 riastrad 1824 1.1 riastrad static void execlists_dequeue(struct intel_engine_cs *engine) 1825 1.1 riastrad { 1826 1.1 riastrad struct intel_engine_execlists * const execlists = &engine->execlists; 1827 1.1 riastrad struct i915_request **port = execlists->pending; 1828 1.1 riastrad struct i915_request ** const last_port = port + execlists->port_mask; 1829 1.1 riastrad struct i915_request *last; 1830 1.1 riastrad struct rb_node *rb; 1831 1.1 riastrad bool submit = false; 1832 1.1 riastrad 1833 1.1 riastrad /* 1834 1.1 riastrad * Hardware submission is through 2 ports. Conceptually each port 1835 1.1 riastrad * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is 1836 1.1 riastrad * static for a context, and unique to each, so we only execute 1837 1.1 riastrad * requests belonging to a single context from each ring. RING_HEAD 1838 1.1 riastrad * is maintained by the CS in the context image, it marks the place 1839 1.1 riastrad * where it got up to last time, and through RING_TAIL we tell the CS 1840 1.1 riastrad * where we want to execute up to this time. 1841 1.1 riastrad * 1842 1.1 riastrad * In this list the requests are in order of execution. Consecutive 1843 1.1 riastrad * requests from the same context are adjacent in the ringbuffer. We 1844 1.1 riastrad * can combine these requests into a single RING_TAIL update: 1845 1.1 riastrad * 1846 1.1 riastrad * RING_HEAD...req1...req2 1847 1.1 riastrad * ^- RING_TAIL 1848 1.1 riastrad * since to execute req2 the CS must first execute req1. 1849 1.1 riastrad * 1850 1.1 riastrad * Our goal then is to point each port to the end of a consecutive 1851 1.1 riastrad * sequence of requests as being the most optimal (fewest wake ups 1852 1.1 riastrad * and context switches) submission. 1853 1.1 riastrad */ 1854 1.1 riastrad 1855 1.1 riastrad for (rb = rb_first_cached(&execlists->virtual); rb; ) { 1856 1.1 riastrad struct virtual_engine *ve = 1857 1.1 riastrad rb_entry(rb, typeof(*ve), nodes[engine->id].rb); 1858 1.1 riastrad struct i915_request *rq = READ_ONCE(ve->request); 1859 1.1 riastrad 1860 1.1 riastrad if (!rq) { /* lazily cleanup after another engine handled rq */ 1861 1.1 riastrad rb_erase_cached(rb, &execlists->virtual); 1862 1.7 riastrad container_of(rb, struct ve_node, rb)->inserted = 1863 1.7 riastrad false; 1864 1.1 riastrad rb = rb_first_cached(&execlists->virtual); 1865 1.1 riastrad continue; 1866 1.1 riastrad } 1867 1.1 riastrad 1868 1.1 riastrad if (!virtual_matches(ve, rq, engine)) { 1869 1.7 riastrad rb = rb_next2(&execlists->virtual.rb_root, rb); 1870 1.1 riastrad continue; 1871 1.1 riastrad } 1872 1.1 riastrad 1873 1.1 riastrad break; 1874 1.1 riastrad } 1875 1.1 riastrad 1876 1.1 riastrad /* 1877 1.1 riastrad * If the queue is higher priority than the last 1878 1.1 riastrad * request in the currently active context, submit afresh. 1879 1.1 riastrad * We will resubmit again afterwards in case we need to split 1880 1.1 riastrad * the active context to interject the preemption request, 1881 1.1 riastrad * i.e. we will retrigger preemption following the ack in case 1882 1.1 riastrad * of trouble. 1883 1.1 riastrad */ 1884 1.1 riastrad last = last_active(execlists); 1885 1.1 riastrad if (last) { 1886 1.1 riastrad if (need_preempt(engine, last, rb)) { 1887 1.1 riastrad ENGINE_TRACE(engine, 1888 1.1 riastrad "preempting last=%llx:%lld, prio=%d, hint=%d\n", 1889 1.1 riastrad last->fence.context, 1890 1.1 riastrad last->fence.seqno, 1891 1.1 riastrad last->sched.attr.priority, 1892 1.1 riastrad execlists->queue_priority_hint); 1893 1.1 riastrad record_preemption(execlists); 1894 1.1 riastrad 1895 1.1 riastrad /* 1896 1.1 riastrad * Don't let the RING_HEAD advance past the breadcrumb 1897 1.1 riastrad * as we unwind (and until we resubmit) so that we do 1898 1.1 riastrad * not accidentally tell it to go backwards. 1899 1.1 riastrad */ 1900 1.1 riastrad ring_set_paused(engine, 1); 1901 1.1 riastrad 1902 1.1 riastrad /* 1903 1.1 riastrad * Note that we have not stopped the GPU at this point, 1904 1.1 riastrad * so we are unwinding the incomplete requests as they 1905 1.1 riastrad * remain inflight and so by the time we do complete 1906 1.1 riastrad * the preemption, some of the unwound requests may 1907 1.1 riastrad * complete! 1908 1.1 riastrad */ 1909 1.1 riastrad __unwind_incomplete_requests(engine); 1910 1.1 riastrad 1911 1.1 riastrad last = NULL; 1912 1.1 riastrad } else if (need_timeslice(engine, last) && 1913 1.1 riastrad timer_expired(&engine->execlists.timer)) { 1914 1.1 riastrad ENGINE_TRACE(engine, 1915 1.1 riastrad "expired last=%llx:%lld, prio=%d, hint=%d\n", 1916 1.1 riastrad last->fence.context, 1917 1.1 riastrad last->fence.seqno, 1918 1.1 riastrad last->sched.attr.priority, 1919 1.1 riastrad execlists->queue_priority_hint); 1920 1.1 riastrad 1921 1.1 riastrad ring_set_paused(engine, 1); 1922 1.1 riastrad defer_active(engine); 1923 1.1 riastrad 1924 1.1 riastrad /* 1925 1.1 riastrad * Unlike for preemption, if we rewind and continue 1926 1.1 riastrad * executing the same context as previously active, 1927 1.1 riastrad * the order of execution will remain the same and 1928 1.1 riastrad * the tail will only advance. We do not need to 1929 1.1 riastrad * force a full context restore, as a lite-restore 1930 1.1 riastrad * is sufficient to resample the monotonic TAIL. 1931 1.1 riastrad * 1932 1.1 riastrad * If we switch to any other context, similarly we 1933 1.1 riastrad * will not rewind TAIL of current context, and 1934 1.1 riastrad * normal save/restore will preserve state and allow 1935 1.1 riastrad * us to later continue executing the same request. 1936 1.1 riastrad */ 1937 1.1 riastrad last = NULL; 1938 1.1 riastrad } else { 1939 1.1 riastrad /* 1940 1.1 riastrad * Otherwise if we already have a request pending 1941 1.1 riastrad * for execution after the current one, we can 1942 1.1 riastrad * just wait until the next CS event before 1943 1.1 riastrad * queuing more. In either case we will force a 1944 1.1 riastrad * lite-restore preemption event, but if we wait 1945 1.1 riastrad * we hopefully coalesce several updates into a single 1946 1.1 riastrad * submission. 1947 1.1 riastrad */ 1948 1.1 riastrad if (!list_is_last(&last->sched.link, 1949 1.1 riastrad &engine->active.requests)) { 1950 1.1 riastrad /* 1951 1.1 riastrad * Even if ELSP[1] is occupied and not worthy 1952 1.1 riastrad * of timeslices, our queue might be. 1953 1.1 riastrad */ 1954 1.7 riastrad if (!timer_pending(&execlists->timer) && 1955 1.1 riastrad need_timeslice(engine, last)) 1956 1.1 riastrad set_timer_ms(&execlists->timer, 1957 1.1 riastrad timeslice(engine)); 1958 1.1 riastrad 1959 1.1 riastrad return; 1960 1.1 riastrad } 1961 1.1 riastrad } 1962 1.1 riastrad } 1963 1.1 riastrad 1964 1.1 riastrad while (rb) { /* XXX virtual is always taking precedence */ 1965 1.1 riastrad struct virtual_engine *ve = 1966 1.1 riastrad rb_entry(rb, typeof(*ve), nodes[engine->id].rb); 1967 1.1 riastrad struct i915_request *rq; 1968 1.1 riastrad 1969 1.1 riastrad spin_lock(&ve->base.active.lock); 1970 1.1 riastrad 1971 1.1 riastrad rq = ve->request; 1972 1.1 riastrad if (unlikely(!rq)) { /* lost the race to a sibling */ 1973 1.1 riastrad spin_unlock(&ve->base.active.lock); 1974 1.1 riastrad rb_erase_cached(rb, &execlists->virtual); 1975 1.7 riastrad container_of(rb, struct ve_node, rb)->inserted = 1976 1.7 riastrad false; 1977 1.1 riastrad rb = rb_first_cached(&execlists->virtual); 1978 1.1 riastrad continue; 1979 1.1 riastrad } 1980 1.1 riastrad 1981 1.1 riastrad GEM_BUG_ON(rq != ve->request); 1982 1.1 riastrad GEM_BUG_ON(rq->engine != &ve->base); 1983 1.1 riastrad GEM_BUG_ON(rq->context != &ve->context); 1984 1.1 riastrad 1985 1.1 riastrad if (rq_prio(rq) >= queue_prio(execlists)) { 1986 1.1 riastrad if (!virtual_matches(ve, rq, engine)) { 1987 1.1 riastrad spin_unlock(&ve->base.active.lock); 1988 1.7 riastrad rb = rb_next2(&execlists->virtual.rb_root, 1989 1.7 riastrad rb); 1990 1.1 riastrad continue; 1991 1.1 riastrad } 1992 1.1 riastrad 1993 1.1 riastrad if (last && !can_merge_rq(last, rq)) { 1994 1.1 riastrad spin_unlock(&ve->base.active.lock); 1995 1.1 riastrad return; /* leave this for another */ 1996 1.1 riastrad } 1997 1.1 riastrad 1998 1.1 riastrad ENGINE_TRACE(engine, 1999 1.1 riastrad "virtual rq=%llx:%lld%s, new engine? %s\n", 2000 1.1 riastrad rq->fence.context, 2001 1.1 riastrad rq->fence.seqno, 2002 1.1 riastrad i915_request_completed(rq) ? "!" : 2003 1.1 riastrad i915_request_started(rq) ? "*" : 2004 1.1 riastrad "", 2005 1.1 riastrad yesno(engine != ve->siblings[0])); 2006 1.1 riastrad 2007 1.1 riastrad ve->request = NULL; 2008 1.1 riastrad ve->base.execlists.queue_priority_hint = INT_MIN; 2009 1.1 riastrad rb_erase_cached(rb, &execlists->virtual); 2010 1.7 riastrad container_of(rb, struct ve_node, rb)->inserted = 2011 1.7 riastrad false; 2012 1.1 riastrad 2013 1.1 riastrad GEM_BUG_ON(!(rq->execution_mask & engine->mask)); 2014 1.1 riastrad rq->engine = engine; 2015 1.1 riastrad 2016 1.1 riastrad if (engine != ve->siblings[0]) { 2017 1.1 riastrad u32 *regs = ve->context.lrc_reg_state; 2018 1.1 riastrad unsigned int n; 2019 1.1 riastrad 2020 1.1 riastrad GEM_BUG_ON(READ_ONCE(ve->context.inflight)); 2021 1.1 riastrad 2022 1.1 riastrad if (!intel_engine_has_relative_mmio(engine)) 2023 1.1 riastrad virtual_update_register_offsets(regs, 2024 1.1 riastrad engine); 2025 1.1 riastrad 2026 1.1 riastrad if (!list_empty(&ve->context.signals)) 2027 1.1 riastrad virtual_xfer_breadcrumbs(ve, engine); 2028 1.1 riastrad 2029 1.1 riastrad /* 2030 1.1 riastrad * Move the bound engine to the top of the list 2031 1.1 riastrad * for future execution. We then kick this 2032 1.1 riastrad * tasklet first before checking others, so that 2033 1.1 riastrad * we preferentially reuse this set of bound 2034 1.1 riastrad * registers. 2035 1.1 riastrad */ 2036 1.1 riastrad for (n = 1; n < ve->num_siblings; n++) { 2037 1.1 riastrad if (ve->siblings[n] == engine) { 2038 1.1 riastrad swap(ve->siblings[n], 2039 1.1 riastrad ve->siblings[0]); 2040 1.1 riastrad break; 2041 1.1 riastrad } 2042 1.1 riastrad } 2043 1.1 riastrad 2044 1.1 riastrad GEM_BUG_ON(ve->siblings[0] != engine); 2045 1.1 riastrad } 2046 1.1 riastrad 2047 1.1 riastrad if (__i915_request_submit(rq)) { 2048 1.1 riastrad submit = true; 2049 1.1 riastrad last = rq; 2050 1.1 riastrad } 2051 1.1 riastrad i915_request_put(rq); 2052 1.1 riastrad 2053 1.1 riastrad /* 2054 1.1 riastrad * Hmm, we have a bunch of virtual engine requests, 2055 1.1 riastrad * but the first one was already completed (thanks 2056 1.1 riastrad * preempt-to-busy!). Keep looking at the veng queue 2057 1.1 riastrad * until we have no more relevant requests (i.e. 2058 1.1 riastrad * the normal submit queue has higher priority). 2059 1.1 riastrad */ 2060 1.1 riastrad if (!submit) { 2061 1.1 riastrad spin_unlock(&ve->base.active.lock); 2062 1.1 riastrad rb = rb_first_cached(&execlists->virtual); 2063 1.1 riastrad continue; 2064 1.1 riastrad } 2065 1.1 riastrad } 2066 1.1 riastrad 2067 1.1 riastrad spin_unlock(&ve->base.active.lock); 2068 1.1 riastrad break; 2069 1.1 riastrad } 2070 1.1 riastrad 2071 1.1 riastrad while ((rb = rb_first_cached(&execlists->queue))) { 2072 1.1 riastrad struct i915_priolist *p = to_priolist(rb); 2073 1.1 riastrad struct i915_request *rq, *rn; 2074 1.1 riastrad int i; 2075 1.1 riastrad 2076 1.1 riastrad priolist_for_each_request_consume(rq, rn, p, i) { 2077 1.1 riastrad bool merge = true; 2078 1.1 riastrad 2079 1.1 riastrad /* 2080 1.1 riastrad * Can we combine this request with the current port? 2081 1.1 riastrad * It has to be the same context/ringbuffer and not 2082 1.1 riastrad * have any exceptions (e.g. GVT saying never to 2083 1.1 riastrad * combine contexts). 2084 1.1 riastrad * 2085 1.1 riastrad * If we can combine the requests, we can execute both 2086 1.1 riastrad * by updating the RING_TAIL to point to the end of the 2087 1.1 riastrad * second request, and so we never need to tell the 2088 1.1 riastrad * hardware about the first. 2089 1.1 riastrad */ 2090 1.1 riastrad if (last && !can_merge_rq(last, rq)) { 2091 1.1 riastrad /* 2092 1.1 riastrad * If we are on the second port and cannot 2093 1.1 riastrad * combine this request with the last, then we 2094 1.1 riastrad * are done. 2095 1.1 riastrad */ 2096 1.1 riastrad if (port == last_port) 2097 1.1 riastrad goto done; 2098 1.1 riastrad 2099 1.1 riastrad /* 2100 1.1 riastrad * We must not populate both ELSP[] with the 2101 1.1 riastrad * same LRCA, i.e. we must submit 2 different 2102 1.1 riastrad * contexts if we submit 2 ELSP. 2103 1.1 riastrad */ 2104 1.1 riastrad if (last->context == rq->context) 2105 1.1 riastrad goto done; 2106 1.1 riastrad 2107 1.1 riastrad if (i915_request_has_sentinel(last)) 2108 1.1 riastrad goto done; 2109 1.1 riastrad 2110 1.1 riastrad /* 2111 1.1 riastrad * If GVT overrides us we only ever submit 2112 1.1 riastrad * port[0], leaving port[1] empty. Note that we 2113 1.1 riastrad * also have to be careful that we don't queue 2114 1.1 riastrad * the same context (even though a different 2115 1.1 riastrad * request) to the second port. 2116 1.1 riastrad */ 2117 1.1 riastrad if (ctx_single_port_submission(last->context) || 2118 1.1 riastrad ctx_single_port_submission(rq->context)) 2119 1.1 riastrad goto done; 2120 1.1 riastrad 2121 1.1 riastrad merge = false; 2122 1.1 riastrad } 2123 1.1 riastrad 2124 1.1 riastrad if (__i915_request_submit(rq)) { 2125 1.1 riastrad if (!merge) { 2126 1.1 riastrad *port = execlists_schedule_in(last, port - execlists->pending); 2127 1.1 riastrad port++; 2128 1.1 riastrad last = NULL; 2129 1.1 riastrad } 2130 1.1 riastrad 2131 1.1 riastrad GEM_BUG_ON(last && 2132 1.1 riastrad !can_merge_ctx(last->context, 2133 1.1 riastrad rq->context)); 2134 1.1 riastrad 2135 1.1 riastrad submit = true; 2136 1.1 riastrad last = rq; 2137 1.1 riastrad } 2138 1.1 riastrad } 2139 1.1 riastrad 2140 1.1 riastrad rb_erase_cached(&p->node, &execlists->queue); 2141 1.1 riastrad i915_priolist_free(p); 2142 1.1 riastrad } 2143 1.1 riastrad 2144 1.1 riastrad done: 2145 1.1 riastrad /* 2146 1.1 riastrad * Here be a bit of magic! Or sleight-of-hand, whichever you prefer. 2147 1.1 riastrad * 2148 1.1 riastrad * We choose the priority hint such that if we add a request of greater 2149 1.1 riastrad * priority than this, we kick the submission tasklet to decide on 2150 1.1 riastrad * the right order of submitting the requests to hardware. We must 2151 1.1 riastrad * also be prepared to reorder requests as they are in-flight on the 2152 1.1 riastrad * HW. We derive the priority hint then as the first "hole" in 2153 1.1 riastrad * the HW submission ports and if there are no available slots, 2154 1.1 riastrad * the priority of the lowest executing request, i.e. last. 2155 1.1 riastrad * 2156 1.1 riastrad * When we do receive a higher priority request ready to run from the 2157 1.1 riastrad * user, see queue_request(), the priority hint is bumped to that 2158 1.1 riastrad * request triggering preemption on the next dequeue (or subsequent 2159 1.1 riastrad * interrupt for secondary ports). 2160 1.1 riastrad */ 2161 1.1 riastrad execlists->queue_priority_hint = queue_prio(execlists); 2162 1.1 riastrad 2163 1.1 riastrad if (submit) { 2164 1.1 riastrad *port = execlists_schedule_in(last, port - execlists->pending); 2165 1.1 riastrad execlists->switch_priority_hint = 2166 1.1 riastrad switch_prio(engine, *execlists->pending); 2167 1.1 riastrad 2168 1.1 riastrad /* 2169 1.1 riastrad * Skip if we ended up with exactly the same set of requests, 2170 1.1 riastrad * e.g. trying to timeslice a pair of ordered contexts 2171 1.1 riastrad */ 2172 1.1 riastrad if (!memcmp(execlists->active, execlists->pending, 2173 1.1 riastrad (port - execlists->pending + 1) * sizeof(*port))) { 2174 1.1 riastrad do 2175 1.1 riastrad execlists_schedule_out(fetch_and_zero(port)); 2176 1.1 riastrad while (port-- != execlists->pending); 2177 1.1 riastrad 2178 1.1 riastrad goto skip_submit; 2179 1.1 riastrad } 2180 1.1 riastrad clear_ports(port + 1, last_port - port); 2181 1.1 riastrad 2182 1.1 riastrad execlists_submit_ports(engine); 2183 1.1 riastrad set_preempt_timeout(engine); 2184 1.1 riastrad } else { 2185 1.1 riastrad skip_submit: 2186 1.1 riastrad ring_set_paused(engine, 0); 2187 1.1 riastrad } 2188 1.1 riastrad } 2189 1.1 riastrad 2190 1.1 riastrad static void 2191 1.1 riastrad cancel_port_requests(struct intel_engine_execlists * const execlists) 2192 1.1 riastrad { 2193 1.1 riastrad struct i915_request * const *port; 2194 1.1 riastrad 2195 1.1 riastrad for (port = execlists->pending; *port; port++) 2196 1.1 riastrad execlists_schedule_out(*port); 2197 1.1 riastrad clear_ports(execlists->pending, ARRAY_SIZE(execlists->pending)); 2198 1.1 riastrad 2199 1.1 riastrad /* Mark the end of active before we overwrite *active */ 2200 1.1 riastrad for (port = xchg(&execlists->active, execlists->pending); *port; port++) 2201 1.1 riastrad execlists_schedule_out(*port); 2202 1.1 riastrad clear_ports(execlists->inflight, ARRAY_SIZE(execlists->inflight)); 2203 1.1 riastrad 2204 1.1 riastrad WRITE_ONCE(execlists->active, execlists->inflight); 2205 1.1 riastrad } 2206 1.1 riastrad 2207 1.1 riastrad static inline void 2208 1.1 riastrad invalidate_csb_entries(const u32 *first, const u32 *last) 2209 1.1 riastrad { 2210 1.7 riastrad clflush(__UNCONST(first)); 2211 1.7 riastrad clflush(__UNCONST(last)); 2212 1.1 riastrad } 2213 1.1 riastrad 2214 1.1 riastrad static inline bool 2215 1.1 riastrad reset_in_progress(const struct intel_engine_execlists *execlists) 2216 1.1 riastrad { 2217 1.1 riastrad return unlikely(!__tasklet_is_enabled(&execlists->tasklet)); 2218 1.1 riastrad } 2219 1.1 riastrad 2220 1.1 riastrad /* 2221 1.1 riastrad * Starting with Gen12, the status has a new format: 2222 1.1 riastrad * 2223 1.1 riastrad * bit 0: switched to new queue 2224 1.1 riastrad * bit 1: reserved 2225 1.1 riastrad * bit 2: semaphore wait mode (poll or signal), only valid when 2226 1.1 riastrad * switch detail is set to "wait on semaphore" 2227 1.1 riastrad * bits 3-5: engine class 2228 1.1 riastrad * bits 6-11: engine instance 2229 1.1 riastrad * bits 12-14: reserved 2230 1.1 riastrad * bits 15-25: sw context id of the lrc the GT switched to 2231 1.1 riastrad * bits 26-31: sw counter of the lrc the GT switched to 2232 1.1 riastrad * bits 32-35: context switch detail 2233 1.1 riastrad * - 0: ctx complete 2234 1.1 riastrad * - 1: wait on sync flip 2235 1.1 riastrad * - 2: wait on vblank 2236 1.1 riastrad * - 3: wait on scanline 2237 1.1 riastrad * - 4: wait on semaphore 2238 1.1 riastrad * - 5: context preempted (not on SEMAPHORE_WAIT or 2239 1.1 riastrad * WAIT_FOR_EVENT) 2240 1.1 riastrad * bit 36: reserved 2241 1.1 riastrad * bits 37-43: wait detail (for switch detail 1 to 4) 2242 1.1 riastrad * bits 44-46: reserved 2243 1.1 riastrad * bits 47-57: sw context id of the lrc the GT switched away from 2244 1.1 riastrad * bits 58-63: sw counter of the lrc the GT switched away from 2245 1.1 riastrad */ 2246 1.1 riastrad static inline bool 2247 1.1 riastrad gen12_csb_parse(const struct intel_engine_execlists *execlists, const u32 *csb) 2248 1.1 riastrad { 2249 1.1 riastrad u32 lower_dw = csb[0]; 2250 1.1 riastrad u32 upper_dw = csb[1]; 2251 1.1 riastrad bool ctx_to_valid = GEN12_CSB_CTX_VALID(lower_dw); 2252 1.1 riastrad bool ctx_away_valid = GEN12_CSB_CTX_VALID(upper_dw); 2253 1.1 riastrad bool new_queue = lower_dw & GEN12_CTX_STATUS_SWITCHED_TO_NEW_QUEUE; 2254 1.1 riastrad 2255 1.1 riastrad /* 2256 1.1 riastrad * The context switch detail is not guaranteed to be 5 when a preemption 2257 1.1 riastrad * occurs, so we can't just check for that. The check below works for 2258 1.1 riastrad * all the cases we care about, including preemptions of WAIT 2259 1.1 riastrad * instructions and lite-restore. Preempt-to-idle via the CTRL register 2260 1.1 riastrad * would require some extra handling, but we don't support that. 2261 1.1 riastrad */ 2262 1.1 riastrad if (!ctx_away_valid || new_queue) { 2263 1.1 riastrad GEM_BUG_ON(!ctx_to_valid); 2264 1.1 riastrad return true; 2265 1.1 riastrad } 2266 1.1 riastrad 2267 1.1 riastrad /* 2268 1.1 riastrad * switch detail = 5 is covered by the case above and we do not expect a 2269 1.1 riastrad * context switch on an unsuccessful wait instruction since we always 2270 1.1 riastrad * use polling mode. 2271 1.1 riastrad */ 2272 1.1 riastrad GEM_BUG_ON(GEN12_CTX_SWITCH_DETAIL(upper_dw)); 2273 1.1 riastrad return false; 2274 1.1 riastrad } 2275 1.1 riastrad 2276 1.1 riastrad static inline bool 2277 1.1 riastrad gen8_csb_parse(const struct intel_engine_execlists *execlists, const u32 *csb) 2278 1.1 riastrad { 2279 1.1 riastrad return *csb & (GEN8_CTX_STATUS_IDLE_ACTIVE | GEN8_CTX_STATUS_PREEMPTED); 2280 1.1 riastrad } 2281 1.1 riastrad 2282 1.1 riastrad static void process_csb(struct intel_engine_cs *engine) 2283 1.1 riastrad { 2284 1.1 riastrad struct intel_engine_execlists * const execlists = &engine->execlists; 2285 1.1 riastrad const u32 * const buf = execlists->csb_status; 2286 1.1 riastrad const u8 num_entries = execlists->csb_size; 2287 1.1 riastrad u8 head, tail; 2288 1.1 riastrad 2289 1.1 riastrad /* 2290 1.1 riastrad * As we modify our execlists state tracking we require exclusive 2291 1.1 riastrad * access. Either we are inside the tasklet, or the tasklet is disabled 2292 1.1 riastrad * and we assume that is only inside the reset paths and so serialised. 2293 1.1 riastrad */ 2294 1.1 riastrad GEM_BUG_ON(!tasklet_is_locked(&execlists->tasklet) && 2295 1.1 riastrad !reset_in_progress(execlists)); 2296 1.1 riastrad GEM_BUG_ON(!intel_engine_in_execlists_submission_mode(engine)); 2297 1.1 riastrad 2298 1.1 riastrad /* 2299 1.1 riastrad * Note that csb_write, csb_status may be either in HWSP or mmio. 2300 1.1 riastrad * When reading from the csb_write mmio register, we have to be 2301 1.1 riastrad * careful to only use the GEN8_CSB_WRITE_PTR portion, which is 2302 1.1 riastrad * the low 4bits. As it happens we know the next 4bits are always 2303 1.1 riastrad * zero and so we can simply masked off the low u8 of the register 2304 1.1 riastrad * and treat it identically to reading from the HWSP (without having 2305 1.1 riastrad * to use explicit shifting and masking, and probably bifurcating 2306 1.1 riastrad * the code to handle the legacy mmio read). 2307 1.1 riastrad */ 2308 1.1 riastrad head = execlists->csb_head; 2309 1.1 riastrad tail = READ_ONCE(*execlists->csb_write); 2310 1.1 riastrad ENGINE_TRACE(engine, "cs-irq head=%d, tail=%d\n", head, tail); 2311 1.1 riastrad if (unlikely(head == tail)) 2312 1.1 riastrad return; 2313 1.1 riastrad 2314 1.1 riastrad /* 2315 1.1 riastrad * Hopefully paired with a wmb() in HW! 2316 1.1 riastrad * 2317 1.1 riastrad * We must complete the read of the write pointer before any reads 2318 1.1 riastrad * from the CSB, so that we do not see stale values. Without an rmb 2319 1.1 riastrad * (lfence) the HW may speculatively perform the CSB[] reads *before* 2320 1.1 riastrad * we perform the READ_ONCE(*csb_write). 2321 1.1 riastrad */ 2322 1.1 riastrad rmb(); 2323 1.1 riastrad 2324 1.1 riastrad do { 2325 1.1 riastrad bool promote; 2326 1.1 riastrad 2327 1.1 riastrad if (++head == num_entries) 2328 1.1 riastrad head = 0; 2329 1.1 riastrad 2330 1.1 riastrad /* 2331 1.1 riastrad * We are flying near dragons again. 2332 1.1 riastrad * 2333 1.1 riastrad * We hold a reference to the request in execlist_port[] 2334 1.1 riastrad * but no more than that. We are operating in softirq 2335 1.1 riastrad * context and so cannot hold any mutex or sleep. That 2336 1.1 riastrad * prevents us stopping the requests we are processing 2337 1.1 riastrad * in port[] from being retired simultaneously (the 2338 1.1 riastrad * breadcrumb will be complete before we see the 2339 1.1 riastrad * context-switch). As we only hold the reference to the 2340 1.1 riastrad * request, any pointer chasing underneath the request 2341 1.1 riastrad * is subject to a potential use-after-free. Thus we 2342 1.1 riastrad * store all of the bookkeeping within port[] as 2343 1.1 riastrad * required, and avoid using unguarded pointers beneath 2344 1.1 riastrad * request itself. The same applies to the atomic 2345 1.1 riastrad * status notifier. 2346 1.1 riastrad */ 2347 1.1 riastrad 2348 1.1 riastrad ENGINE_TRACE(engine, "csb[%d]: status=0x%08x:0x%08x\n", 2349 1.1 riastrad head, buf[2 * head + 0], buf[2 * head + 1]); 2350 1.1 riastrad 2351 1.1 riastrad if (INTEL_GEN(engine->i915) >= 12) 2352 1.1 riastrad promote = gen12_csb_parse(execlists, buf + 2 * head); 2353 1.1 riastrad else 2354 1.1 riastrad promote = gen8_csb_parse(execlists, buf + 2 * head); 2355 1.1 riastrad if (promote) { 2356 1.1 riastrad struct i915_request * const *old = execlists->active; 2357 1.1 riastrad 2358 1.1 riastrad /* Point active to the new ELSP; prevent overwriting */ 2359 1.1 riastrad WRITE_ONCE(execlists->active, execlists->pending); 2360 1.1 riastrad 2361 1.1 riastrad if (!inject_preempt_hang(execlists)) 2362 1.1 riastrad ring_set_paused(engine, 0); 2363 1.1 riastrad 2364 1.1 riastrad /* cancel old inflight, prepare for switch */ 2365 1.1 riastrad trace_ports(execlists, "preempted", old); 2366 1.1 riastrad while (*old) 2367 1.1 riastrad execlists_schedule_out(*old++); 2368 1.1 riastrad 2369 1.1 riastrad /* switch pending to inflight */ 2370 1.1 riastrad GEM_BUG_ON(!assert_pending_valid(execlists, "promote")); 2371 1.1 riastrad WRITE_ONCE(execlists->active, 2372 1.1 riastrad memcpy(execlists->inflight, 2373 1.1 riastrad execlists->pending, 2374 1.1 riastrad execlists_num_ports(execlists) * 2375 1.1 riastrad sizeof(*execlists->pending))); 2376 1.1 riastrad 2377 1.1 riastrad WRITE_ONCE(execlists->pending[0], NULL); 2378 1.1 riastrad } else { 2379 1.1 riastrad GEM_BUG_ON(!*execlists->active); 2380 1.1 riastrad 2381 1.1 riastrad /* port0 completed, advanced to port1 */ 2382 1.1 riastrad trace_ports(execlists, "completed", execlists->active); 2383 1.1 riastrad 2384 1.1 riastrad /* 2385 1.1 riastrad * We rely on the hardware being strongly 2386 1.1 riastrad * ordered, that the breadcrumb write is 2387 1.1 riastrad * coherent (visible from the CPU) before the 2388 1.1 riastrad * user interrupt and CSB is processed. 2389 1.1 riastrad */ 2390 1.1 riastrad GEM_BUG_ON(!i915_request_completed(*execlists->active) && 2391 1.1 riastrad !reset_in_progress(execlists)); 2392 1.1 riastrad execlists_schedule_out(*execlists->active++); 2393 1.1 riastrad 2394 1.1 riastrad GEM_BUG_ON(execlists->active - execlists->inflight > 2395 1.1 riastrad execlists_num_ports(execlists)); 2396 1.1 riastrad } 2397 1.1 riastrad } while (head != tail); 2398 1.1 riastrad 2399 1.1 riastrad execlists->csb_head = head; 2400 1.1 riastrad set_timeslice(engine); 2401 1.1 riastrad 2402 1.1 riastrad /* 2403 1.1 riastrad * Gen11 has proven to fail wrt global observation point between 2404 1.1 riastrad * entry and tail update, failing on the ordering and thus 2405 1.1 riastrad * we see an old entry in the context status buffer. 2406 1.1 riastrad * 2407 1.1 riastrad * Forcibly evict out entries for the next gpu csb update, 2408 1.1 riastrad * to increase the odds that we get a fresh entries with non 2409 1.1 riastrad * working hardware. The cost for doing so comes out mostly with 2410 1.1 riastrad * the wash as hardware, working or not, will need to do the 2411 1.1 riastrad * invalidation before. 2412 1.1 riastrad */ 2413 1.1 riastrad invalidate_csb_entries(&buf[0], &buf[num_entries - 1]); 2414 1.1 riastrad } 2415 1.1 riastrad 2416 1.1 riastrad static void __execlists_submission_tasklet(struct intel_engine_cs *const engine) 2417 1.1 riastrad { 2418 1.1 riastrad lockdep_assert_held(&engine->active.lock); 2419 1.1 riastrad if (!engine->execlists.pending[0]) { 2420 1.1 riastrad rcu_read_lock(); /* protect peeking at execlists->active */ 2421 1.1 riastrad execlists_dequeue(engine); 2422 1.1 riastrad rcu_read_unlock(); 2423 1.1 riastrad } 2424 1.1 riastrad } 2425 1.1 riastrad 2426 1.1 riastrad static void __execlists_hold(struct i915_request *rq) 2427 1.1 riastrad { 2428 1.1 riastrad LIST_HEAD(list); 2429 1.1 riastrad 2430 1.1 riastrad do { 2431 1.1 riastrad struct i915_dependency *p; 2432 1.1 riastrad 2433 1.1 riastrad if (i915_request_is_active(rq)) 2434 1.1 riastrad __i915_request_unsubmit(rq); 2435 1.1 riastrad 2436 1.1 riastrad RQ_TRACE(rq, "on hold\n"); 2437 1.1 riastrad clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 2438 1.1 riastrad list_move_tail(&rq->sched.link, &rq->engine->active.hold); 2439 1.1 riastrad i915_request_set_hold(rq); 2440 1.1 riastrad 2441 1.1 riastrad list_for_each_entry(p, &rq->sched.waiters_list, wait_link) { 2442 1.1 riastrad struct i915_request *w = 2443 1.1 riastrad container_of(p->waiter, typeof(*w), sched); 2444 1.1 riastrad 2445 1.1 riastrad /* Leave semaphores spinning on the other engines */ 2446 1.1 riastrad if (w->engine != rq->engine) 2447 1.1 riastrad continue; 2448 1.1 riastrad 2449 1.1 riastrad if (!i915_request_is_ready(w)) 2450 1.1 riastrad continue; 2451 1.1 riastrad 2452 1.1 riastrad if (i915_request_completed(w)) 2453 1.1 riastrad continue; 2454 1.1 riastrad 2455 1.1 riastrad if (i915_request_on_hold(rq)) 2456 1.1 riastrad continue; 2457 1.1 riastrad 2458 1.1 riastrad list_move_tail(&w->sched.link, &list); 2459 1.1 riastrad } 2460 1.1 riastrad 2461 1.1 riastrad rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); 2462 1.1 riastrad } while (rq); 2463 1.1 riastrad } 2464 1.1 riastrad 2465 1.1 riastrad static bool execlists_hold(struct intel_engine_cs *engine, 2466 1.1 riastrad struct i915_request *rq) 2467 1.1 riastrad { 2468 1.1 riastrad spin_lock_irq(&engine->active.lock); 2469 1.1 riastrad 2470 1.1 riastrad if (i915_request_completed(rq)) { /* too late! */ 2471 1.1 riastrad rq = NULL; 2472 1.1 riastrad goto unlock; 2473 1.1 riastrad } 2474 1.1 riastrad 2475 1.1 riastrad if (rq->engine != engine) { /* preempted virtual engine */ 2476 1.1 riastrad struct virtual_engine *ve = to_virtual_engine(rq->engine); 2477 1.1 riastrad 2478 1.1 riastrad /* 2479 1.1 riastrad * intel_context_inflight() is only protected by virtue 2480 1.1 riastrad * of process_csb() being called only by the tasklet (or 2481 1.1 riastrad * directly from inside reset while the tasklet is suspended). 2482 1.1 riastrad * Assert that neither of those are allowed to run while we 2483 1.1 riastrad * poke at the request queues. 2484 1.1 riastrad */ 2485 1.1 riastrad GEM_BUG_ON(!reset_in_progress(&engine->execlists)); 2486 1.1 riastrad 2487 1.1 riastrad /* 2488 1.1 riastrad * An unsubmitted request along a virtual engine will 2489 1.1 riastrad * remain on the active (this) engine until we are able 2490 1.1 riastrad * to process the context switch away (and so mark the 2491 1.1 riastrad * context as no longer in flight). That cannot have happened 2492 1.1 riastrad * yet, otherwise we would not be hanging! 2493 1.1 riastrad */ 2494 1.1 riastrad spin_lock(&ve->base.active.lock); 2495 1.1 riastrad GEM_BUG_ON(intel_context_inflight(rq->context) != engine); 2496 1.1 riastrad GEM_BUG_ON(ve->request != rq); 2497 1.1 riastrad ve->request = NULL; 2498 1.1 riastrad spin_unlock(&ve->base.active.lock); 2499 1.1 riastrad i915_request_put(rq); 2500 1.1 riastrad 2501 1.1 riastrad rq->engine = engine; 2502 1.1 riastrad } 2503 1.1 riastrad 2504 1.1 riastrad /* 2505 1.1 riastrad * Transfer this request onto the hold queue to prevent it 2506 1.1 riastrad * being resumbitted to HW (and potentially completed) before we have 2507 1.1 riastrad * released it. Since we may have already submitted following 2508 1.1 riastrad * requests, we need to remove those as well. 2509 1.1 riastrad */ 2510 1.1 riastrad GEM_BUG_ON(i915_request_on_hold(rq)); 2511 1.1 riastrad GEM_BUG_ON(rq->engine != engine); 2512 1.1 riastrad __execlists_hold(rq); 2513 1.1 riastrad 2514 1.1 riastrad unlock: 2515 1.1 riastrad spin_unlock_irq(&engine->active.lock); 2516 1.1 riastrad return rq; 2517 1.1 riastrad } 2518 1.1 riastrad 2519 1.1 riastrad static bool hold_request(const struct i915_request *rq) 2520 1.1 riastrad { 2521 1.1 riastrad struct i915_dependency *p; 2522 1.1 riastrad 2523 1.1 riastrad /* 2524 1.1 riastrad * If one of our ancestors is on hold, we must also be on hold, 2525 1.1 riastrad * otherwise we will bypass it and execute before it. 2526 1.1 riastrad */ 2527 1.1 riastrad list_for_each_entry(p, &rq->sched.signalers_list, signal_link) { 2528 1.1 riastrad const struct i915_request *s = 2529 1.1 riastrad container_of(p->signaler, typeof(*s), sched); 2530 1.1 riastrad 2531 1.1 riastrad if (s->engine != rq->engine) 2532 1.1 riastrad continue; 2533 1.1 riastrad 2534 1.1 riastrad if (i915_request_on_hold(s)) 2535 1.1 riastrad return true; 2536 1.1 riastrad } 2537 1.1 riastrad 2538 1.1 riastrad return false; 2539 1.1 riastrad } 2540 1.1 riastrad 2541 1.1 riastrad static void __execlists_unhold(struct i915_request *rq) 2542 1.1 riastrad { 2543 1.1 riastrad LIST_HEAD(list); 2544 1.1 riastrad 2545 1.1 riastrad do { 2546 1.1 riastrad struct i915_dependency *p; 2547 1.1 riastrad 2548 1.1 riastrad GEM_BUG_ON(!i915_request_on_hold(rq)); 2549 1.1 riastrad GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit)); 2550 1.1 riastrad 2551 1.1 riastrad i915_request_clear_hold(rq); 2552 1.1 riastrad list_move_tail(&rq->sched.link, 2553 1.1 riastrad i915_sched_lookup_priolist(rq->engine, 2554 1.1 riastrad rq_prio(rq))); 2555 1.1 riastrad set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 2556 1.1 riastrad RQ_TRACE(rq, "hold release\n"); 2557 1.1 riastrad 2558 1.1 riastrad /* Also release any children on this engine that are ready */ 2559 1.1 riastrad list_for_each_entry(p, &rq->sched.waiters_list, wait_link) { 2560 1.1 riastrad struct i915_request *w = 2561 1.1 riastrad container_of(p->waiter, typeof(*w), sched); 2562 1.1 riastrad 2563 1.1 riastrad if (w->engine != rq->engine) 2564 1.1 riastrad continue; 2565 1.1 riastrad 2566 1.1 riastrad if (!i915_request_on_hold(rq)) 2567 1.1 riastrad continue; 2568 1.1 riastrad 2569 1.1 riastrad /* Check that no other parents are also on hold */ 2570 1.1 riastrad if (hold_request(rq)) 2571 1.1 riastrad continue; 2572 1.1 riastrad 2573 1.1 riastrad list_move_tail(&w->sched.link, &list); 2574 1.1 riastrad } 2575 1.1 riastrad 2576 1.1 riastrad rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); 2577 1.1 riastrad } while (rq); 2578 1.1 riastrad } 2579 1.1 riastrad 2580 1.1 riastrad static void execlists_unhold(struct intel_engine_cs *engine, 2581 1.1 riastrad struct i915_request *rq) 2582 1.1 riastrad { 2583 1.1 riastrad spin_lock_irq(&engine->active.lock); 2584 1.1 riastrad 2585 1.1 riastrad /* 2586 1.1 riastrad * Move this request back to the priority queue, and all of its 2587 1.1 riastrad * children and grandchildren that were suspended along with it. 2588 1.1 riastrad */ 2589 1.1 riastrad __execlists_unhold(rq); 2590 1.1 riastrad 2591 1.1 riastrad if (rq_prio(rq) > engine->execlists.queue_priority_hint) { 2592 1.1 riastrad engine->execlists.queue_priority_hint = rq_prio(rq); 2593 1.1 riastrad tasklet_hi_schedule(&engine->execlists.tasklet); 2594 1.1 riastrad } 2595 1.1 riastrad 2596 1.1 riastrad spin_unlock_irq(&engine->active.lock); 2597 1.1 riastrad } 2598 1.1 riastrad 2599 1.1 riastrad struct execlists_capture { 2600 1.1 riastrad struct work_struct work; 2601 1.1 riastrad struct i915_request *rq; 2602 1.1 riastrad struct i915_gpu_coredump *error; 2603 1.1 riastrad }; 2604 1.1 riastrad 2605 1.1 riastrad static void execlists_capture_work(struct work_struct *work) 2606 1.1 riastrad { 2607 1.1 riastrad struct execlists_capture *cap = container_of(work, typeof(*cap), work); 2608 1.1 riastrad const gfp_t gfp = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN; 2609 1.1 riastrad struct intel_engine_cs *engine = cap->rq->engine; 2610 1.1 riastrad struct intel_gt_coredump *gt = cap->error->gt; 2611 1.1 riastrad struct intel_engine_capture_vma *vma; 2612 1.1 riastrad 2613 1.1 riastrad /* Compress all the objects attached to the request, slow! */ 2614 1.1 riastrad vma = intel_engine_coredump_add_request(gt->engine, cap->rq, gfp); 2615 1.1 riastrad if (vma) { 2616 1.1 riastrad struct i915_vma_compress *compress = 2617 1.1 riastrad i915_vma_capture_prepare(gt); 2618 1.1 riastrad 2619 1.1 riastrad intel_engine_coredump_add_vma(gt->engine, vma, compress); 2620 1.1 riastrad i915_vma_capture_finish(gt, compress); 2621 1.1 riastrad } 2622 1.1 riastrad 2623 1.1 riastrad gt->simulated = gt->engine->simulated; 2624 1.1 riastrad cap->error->simulated = gt->simulated; 2625 1.1 riastrad 2626 1.1 riastrad /* Publish the error state, and announce it to the world */ 2627 1.1 riastrad i915_error_state_store(cap->error); 2628 1.1 riastrad i915_gpu_coredump_put(cap->error); 2629 1.1 riastrad 2630 1.1 riastrad /* Return this request and all that depend upon it for signaling */ 2631 1.1 riastrad execlists_unhold(engine, cap->rq); 2632 1.1 riastrad i915_request_put(cap->rq); 2633 1.1 riastrad 2634 1.1 riastrad kfree(cap); 2635 1.1 riastrad } 2636 1.1 riastrad 2637 1.1 riastrad static struct execlists_capture *capture_regs(struct intel_engine_cs *engine) 2638 1.1 riastrad { 2639 1.1 riastrad const gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN; 2640 1.1 riastrad struct execlists_capture *cap; 2641 1.1 riastrad 2642 1.1 riastrad cap = kmalloc(sizeof(*cap), gfp); 2643 1.1 riastrad if (!cap) 2644 1.1 riastrad return NULL; 2645 1.1 riastrad 2646 1.1 riastrad cap->error = i915_gpu_coredump_alloc(engine->i915, gfp); 2647 1.1 riastrad if (!cap->error) 2648 1.1 riastrad goto err_cap; 2649 1.1 riastrad 2650 1.1 riastrad cap->error->gt = intel_gt_coredump_alloc(engine->gt, gfp); 2651 1.1 riastrad if (!cap->error->gt) 2652 1.1 riastrad goto err_gpu; 2653 1.1 riastrad 2654 1.1 riastrad cap->error->gt->engine = intel_engine_coredump_alloc(engine, gfp); 2655 1.1 riastrad if (!cap->error->gt->engine) 2656 1.1 riastrad goto err_gt; 2657 1.1 riastrad 2658 1.1 riastrad return cap; 2659 1.1 riastrad 2660 1.1 riastrad err_gt: 2661 1.1 riastrad kfree(cap->error->gt); 2662 1.1 riastrad err_gpu: 2663 1.1 riastrad kfree(cap->error); 2664 1.1 riastrad err_cap: 2665 1.1 riastrad kfree(cap); 2666 1.1 riastrad return NULL; 2667 1.1 riastrad } 2668 1.1 riastrad 2669 1.1 riastrad static bool execlists_capture(struct intel_engine_cs *engine) 2670 1.1 riastrad { 2671 1.1 riastrad struct execlists_capture *cap; 2672 1.1 riastrad 2673 1.1 riastrad if (!IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR)) 2674 1.1 riastrad return true; 2675 1.1 riastrad 2676 1.1 riastrad /* 2677 1.1 riastrad * We need to _quickly_ capture the engine state before we reset. 2678 1.1 riastrad * We are inside an atomic section (softirq) here and we are delaying 2679 1.1 riastrad * the forced preemption event. 2680 1.1 riastrad */ 2681 1.1 riastrad cap = capture_regs(engine); 2682 1.1 riastrad if (!cap) 2683 1.1 riastrad return true; 2684 1.1 riastrad 2685 1.1 riastrad cap->rq = execlists_active(&engine->execlists); 2686 1.1 riastrad GEM_BUG_ON(!cap->rq); 2687 1.1 riastrad 2688 1.1 riastrad rcu_read_lock(); 2689 1.1 riastrad cap->rq = active_request(cap->rq->context->timeline, cap->rq); 2690 1.1 riastrad cap->rq = i915_request_get_rcu(cap->rq); 2691 1.1 riastrad rcu_read_unlock(); 2692 1.1 riastrad if (!cap->rq) 2693 1.1 riastrad goto err_free; 2694 1.1 riastrad 2695 1.1 riastrad /* 2696 1.1 riastrad * Remove the request from the execlists queue, and take ownership 2697 1.1 riastrad * of the request. We pass it to our worker who will _slowly_ compress 2698 1.1 riastrad * all the pages the _user_ requested for debugging their batch, after 2699 1.1 riastrad * which we return it to the queue for signaling. 2700 1.1 riastrad * 2701 1.1 riastrad * By removing them from the execlists queue, we also remove the 2702 1.1 riastrad * requests from being processed by __unwind_incomplete_requests() 2703 1.1 riastrad * during the intel_engine_reset(), and so they will *not* be replayed 2704 1.1 riastrad * afterwards. 2705 1.1 riastrad * 2706 1.1 riastrad * Note that because we have not yet reset the engine at this point, 2707 1.1 riastrad * it is possible for the request that we have identified as being 2708 1.1 riastrad * guilty, did in fact complete and we will then hit an arbitration 2709 1.1 riastrad * point allowing the outstanding preemption to succeed. The likelihood 2710 1.1 riastrad * of that is very low (as capturing of the engine registers should be 2711 1.1 riastrad * fast enough to run inside an irq-off atomic section!), so we will 2712 1.1 riastrad * simply hold that request accountable for being non-preemptible 2713 1.1 riastrad * long enough to force the reset. 2714 1.1 riastrad */ 2715 1.1 riastrad if (!execlists_hold(engine, cap->rq)) 2716 1.1 riastrad goto err_rq; 2717 1.1 riastrad 2718 1.1 riastrad INIT_WORK(&cap->work, execlists_capture_work); 2719 1.1 riastrad schedule_work(&cap->work); 2720 1.1 riastrad return true; 2721 1.1 riastrad 2722 1.1 riastrad err_rq: 2723 1.1 riastrad i915_request_put(cap->rq); 2724 1.1 riastrad err_free: 2725 1.1 riastrad i915_gpu_coredump_put(cap->error); 2726 1.1 riastrad kfree(cap); 2727 1.1 riastrad return false; 2728 1.1 riastrad } 2729 1.1 riastrad 2730 1.1 riastrad static noinline void preempt_reset(struct intel_engine_cs *engine) 2731 1.1 riastrad { 2732 1.1 riastrad const unsigned int bit = I915_RESET_ENGINE + engine->id; 2733 1.1 riastrad unsigned long *lock = &engine->gt->reset.flags; 2734 1.1 riastrad 2735 1.1 riastrad if (i915_modparams.reset < 3) 2736 1.1 riastrad return; 2737 1.1 riastrad 2738 1.1 riastrad if (test_and_set_bit(bit, lock)) 2739 1.1 riastrad return; 2740 1.1 riastrad 2741 1.1 riastrad /* Mark this tasklet as disabled to avoid waiting for it to complete */ 2742 1.1 riastrad tasklet_disable_nosync(&engine->execlists.tasklet); 2743 1.1 riastrad 2744 1.1 riastrad ENGINE_TRACE(engine, "preempt timeout %lu+%ums\n", 2745 1.1 riastrad READ_ONCE(engine->props.preempt_timeout_ms), 2746 1.1 riastrad jiffies_to_msecs(jiffies - engine->execlists.preempt.expires)); 2747 1.1 riastrad 2748 1.1 riastrad ring_set_paused(engine, 1); /* Freeze the current request in place */ 2749 1.1 riastrad if (execlists_capture(engine)) 2750 1.1 riastrad intel_engine_reset(engine, "preemption time out"); 2751 1.1 riastrad else 2752 1.1 riastrad ring_set_paused(engine, 0); 2753 1.1 riastrad 2754 1.1 riastrad tasklet_enable(&engine->execlists.tasklet); 2755 1.1 riastrad clear_and_wake_up_bit(bit, lock); 2756 1.1 riastrad } 2757 1.1 riastrad 2758 1.1 riastrad static bool preempt_timeout(const struct intel_engine_cs *const engine) 2759 1.1 riastrad { 2760 1.1 riastrad const struct timer_list *t = &engine->execlists.preempt; 2761 1.1 riastrad 2762 1.1 riastrad if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT) 2763 1.1 riastrad return false; 2764 1.1 riastrad 2765 1.1 riastrad if (!timer_expired(t)) 2766 1.1 riastrad return false; 2767 1.1 riastrad 2768 1.1 riastrad return READ_ONCE(engine->execlists.pending[0]); 2769 1.1 riastrad } 2770 1.1 riastrad 2771 1.1 riastrad /* 2772 1.1 riastrad * Check the unread Context Status Buffers and manage the submission of new 2773 1.1 riastrad * contexts to the ELSP accordingly. 2774 1.1 riastrad */ 2775 1.1 riastrad static void execlists_submission_tasklet(unsigned long data) 2776 1.1 riastrad { 2777 1.1 riastrad struct intel_engine_cs * const engine = (struct intel_engine_cs *)data; 2778 1.1 riastrad bool timeout = preempt_timeout(engine); 2779 1.1 riastrad 2780 1.1 riastrad process_csb(engine); 2781 1.1 riastrad if (!READ_ONCE(engine->execlists.pending[0]) || timeout) { 2782 1.1 riastrad unsigned long flags; 2783 1.1 riastrad 2784 1.1 riastrad spin_lock_irqsave(&engine->active.lock, flags); 2785 1.1 riastrad __execlists_submission_tasklet(engine); 2786 1.1 riastrad spin_unlock_irqrestore(&engine->active.lock, flags); 2787 1.1 riastrad 2788 1.1 riastrad /* Recheck after serialising with direct-submission */ 2789 1.1 riastrad if (timeout && preempt_timeout(engine)) 2790 1.1 riastrad preempt_reset(engine); 2791 1.1 riastrad } 2792 1.1 riastrad } 2793 1.1 riastrad 2794 1.1 riastrad static void __execlists_kick(struct intel_engine_execlists *execlists) 2795 1.1 riastrad { 2796 1.1 riastrad /* Kick the tasklet for some interrupt coalescing and reset handling */ 2797 1.1 riastrad tasklet_hi_schedule(&execlists->tasklet); 2798 1.1 riastrad } 2799 1.1 riastrad 2800 1.1 riastrad #define execlists_kick(t, member) \ 2801 1.1 riastrad __execlists_kick(container_of(t, struct intel_engine_execlists, member)) 2802 1.1 riastrad 2803 1.1 riastrad static void execlists_timeslice(struct timer_list *timer) 2804 1.1 riastrad { 2805 1.1 riastrad execlists_kick(timer, timer); 2806 1.1 riastrad } 2807 1.1 riastrad 2808 1.1 riastrad static void execlists_preempt(struct timer_list *timer) 2809 1.1 riastrad { 2810 1.1 riastrad execlists_kick(timer, preempt); 2811 1.1 riastrad } 2812 1.1 riastrad 2813 1.1 riastrad static void queue_request(struct intel_engine_cs *engine, 2814 1.1 riastrad struct i915_request *rq) 2815 1.1 riastrad { 2816 1.1 riastrad GEM_BUG_ON(!list_empty(&rq->sched.link)); 2817 1.1 riastrad list_add_tail(&rq->sched.link, 2818 1.1 riastrad i915_sched_lookup_priolist(engine, rq_prio(rq))); 2819 1.1 riastrad set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 2820 1.1 riastrad } 2821 1.1 riastrad 2822 1.1 riastrad static void __submit_queue_imm(struct intel_engine_cs *engine) 2823 1.1 riastrad { 2824 1.1 riastrad struct intel_engine_execlists * const execlists = &engine->execlists; 2825 1.1 riastrad 2826 1.1 riastrad if (reset_in_progress(execlists)) 2827 1.1 riastrad return; /* defer until we restart the engine following reset */ 2828 1.1 riastrad 2829 1.1 riastrad if (execlists->tasklet.func == execlists_submission_tasklet) 2830 1.1 riastrad __execlists_submission_tasklet(engine); 2831 1.1 riastrad else 2832 1.1 riastrad tasklet_hi_schedule(&execlists->tasklet); 2833 1.1 riastrad } 2834 1.1 riastrad 2835 1.1 riastrad static void submit_queue(struct intel_engine_cs *engine, 2836 1.1 riastrad const struct i915_request *rq) 2837 1.1 riastrad { 2838 1.1 riastrad struct intel_engine_execlists *execlists = &engine->execlists; 2839 1.1 riastrad 2840 1.1 riastrad if (rq_prio(rq) <= execlists->queue_priority_hint) 2841 1.1 riastrad return; 2842 1.1 riastrad 2843 1.1 riastrad execlists->queue_priority_hint = rq_prio(rq); 2844 1.1 riastrad __submit_queue_imm(engine); 2845 1.1 riastrad } 2846 1.1 riastrad 2847 1.1 riastrad static bool ancestor_on_hold(const struct intel_engine_cs *engine, 2848 1.1 riastrad const struct i915_request *rq) 2849 1.1 riastrad { 2850 1.1 riastrad GEM_BUG_ON(i915_request_on_hold(rq)); 2851 1.1 riastrad return !list_empty(&engine->active.hold) && hold_request(rq); 2852 1.1 riastrad } 2853 1.1 riastrad 2854 1.1 riastrad static void execlists_submit_request(struct i915_request *request) 2855 1.1 riastrad { 2856 1.1 riastrad struct intel_engine_cs *engine = request->engine; 2857 1.1 riastrad unsigned long flags; 2858 1.1 riastrad 2859 1.1 riastrad /* Will be called from irq-context when using foreign fences. */ 2860 1.1 riastrad spin_lock_irqsave(&engine->active.lock, flags); 2861 1.1 riastrad 2862 1.1 riastrad if (unlikely(ancestor_on_hold(engine, request))) { 2863 1.1 riastrad list_add_tail(&request->sched.link, &engine->active.hold); 2864 1.1 riastrad i915_request_set_hold(request); 2865 1.1 riastrad } else { 2866 1.1 riastrad queue_request(engine, request); 2867 1.1 riastrad 2868 1.1 riastrad GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); 2869 1.1 riastrad GEM_BUG_ON(list_empty(&request->sched.link)); 2870 1.1 riastrad 2871 1.1 riastrad submit_queue(engine, request); 2872 1.1 riastrad } 2873 1.1 riastrad 2874 1.1 riastrad spin_unlock_irqrestore(&engine->active.lock, flags); 2875 1.1 riastrad } 2876 1.1 riastrad 2877 1.1 riastrad static void __execlists_context_fini(struct intel_context *ce) 2878 1.1 riastrad { 2879 1.1 riastrad intel_ring_put(ce->ring); 2880 1.1 riastrad i915_vma_put(ce->state); 2881 1.1 riastrad } 2882 1.1 riastrad 2883 1.1 riastrad static void execlists_context_destroy(struct kref *kref) 2884 1.1 riastrad { 2885 1.1 riastrad struct intel_context *ce = container_of(kref, typeof(*ce), ref); 2886 1.1 riastrad 2887 1.1 riastrad GEM_BUG_ON(!i915_active_is_idle(&ce->active)); 2888 1.1 riastrad GEM_BUG_ON(intel_context_is_pinned(ce)); 2889 1.1 riastrad 2890 1.1 riastrad if (ce->state) 2891 1.1 riastrad __execlists_context_fini(ce); 2892 1.1 riastrad 2893 1.1 riastrad intel_context_fini(ce); 2894 1.1 riastrad intel_context_free(ce); 2895 1.1 riastrad } 2896 1.1 riastrad 2897 1.1 riastrad static void 2898 1.1 riastrad set_redzone(void *vaddr, const struct intel_engine_cs *engine) 2899 1.1 riastrad { 2900 1.1 riastrad if (!IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) 2901 1.1 riastrad return; 2902 1.1 riastrad 2903 1.1 riastrad vaddr += engine->context_size; 2904 1.1 riastrad 2905 1.1 riastrad memset(vaddr, CONTEXT_REDZONE, I915_GTT_PAGE_SIZE); 2906 1.1 riastrad } 2907 1.1 riastrad 2908 1.1 riastrad static void 2909 1.1 riastrad check_redzone(const void *vaddr, const struct intel_engine_cs *engine) 2910 1.1 riastrad { 2911 1.1 riastrad if (!IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) 2912 1.1 riastrad return; 2913 1.1 riastrad 2914 1.1 riastrad vaddr += engine->context_size; 2915 1.1 riastrad 2916 1.1 riastrad if (memchr_inv(vaddr, CONTEXT_REDZONE, I915_GTT_PAGE_SIZE)) 2917 1.1 riastrad dev_err_once(engine->i915->drm.dev, 2918 1.1 riastrad "%s context redzone overwritten!\n", 2919 1.1 riastrad engine->name); 2920 1.1 riastrad } 2921 1.1 riastrad 2922 1.1 riastrad static void execlists_context_unpin(struct intel_context *ce) 2923 1.1 riastrad { 2924 1.1 riastrad check_redzone((void *)ce->lrc_reg_state - LRC_STATE_PN * PAGE_SIZE, 2925 1.1 riastrad ce->engine); 2926 1.1 riastrad 2927 1.1 riastrad i915_gem_object_unpin_map(ce->state->obj); 2928 1.1 riastrad } 2929 1.1 riastrad 2930 1.1 riastrad static void 2931 1.1 riastrad __execlists_update_reg_state(const struct intel_context *ce, 2932 1.1 riastrad const struct intel_engine_cs *engine, 2933 1.1 riastrad u32 head) 2934 1.1 riastrad { 2935 1.1 riastrad struct intel_ring *ring = ce->ring; 2936 1.1 riastrad u32 *regs = ce->lrc_reg_state; 2937 1.1 riastrad 2938 1.1 riastrad GEM_BUG_ON(!intel_ring_offset_valid(ring, head)); 2939 1.1 riastrad GEM_BUG_ON(!intel_ring_offset_valid(ring, ring->tail)); 2940 1.1 riastrad 2941 1.1 riastrad regs[CTX_RING_START] = i915_ggtt_offset(ring->vma); 2942 1.1 riastrad regs[CTX_RING_HEAD] = head; 2943 1.1 riastrad regs[CTX_RING_TAIL] = ring->tail; 2944 1.1 riastrad 2945 1.1 riastrad /* RPCS */ 2946 1.1 riastrad if (engine->class == RENDER_CLASS) { 2947 1.1 riastrad regs[CTX_R_PWR_CLK_STATE] = 2948 1.1 riastrad intel_sseu_make_rpcs(engine->i915, &ce->sseu); 2949 1.1 riastrad 2950 1.1 riastrad i915_oa_init_reg_state(ce, engine); 2951 1.1 riastrad } 2952 1.1 riastrad } 2953 1.1 riastrad 2954 1.1 riastrad static int 2955 1.1 riastrad __execlists_context_pin(struct intel_context *ce, 2956 1.1 riastrad struct intel_engine_cs *engine) 2957 1.1 riastrad { 2958 1.1 riastrad void *vaddr; 2959 1.1 riastrad 2960 1.1 riastrad GEM_BUG_ON(!ce->state); 2961 1.1 riastrad GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); 2962 1.1 riastrad 2963 1.1 riastrad vaddr = i915_gem_object_pin_map(ce->state->obj, 2964 1.1 riastrad i915_coherent_map_type(engine->i915) | 2965 1.1 riastrad I915_MAP_OVERRIDE); 2966 1.1 riastrad if (IS_ERR(vaddr)) 2967 1.1 riastrad return PTR_ERR(vaddr); 2968 1.1 riastrad 2969 1.1 riastrad ce->lrc_desc = lrc_descriptor(ce, engine) | CTX_DESC_FORCE_RESTORE; 2970 1.1 riastrad ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE; 2971 1.1 riastrad __execlists_update_reg_state(ce, engine, ce->ring->tail); 2972 1.1 riastrad 2973 1.1 riastrad return 0; 2974 1.1 riastrad } 2975 1.1 riastrad 2976 1.1 riastrad static int execlists_context_pin(struct intel_context *ce) 2977 1.1 riastrad { 2978 1.1 riastrad return __execlists_context_pin(ce, ce->engine); 2979 1.1 riastrad } 2980 1.1 riastrad 2981 1.1 riastrad static int execlists_context_alloc(struct intel_context *ce) 2982 1.1 riastrad { 2983 1.1 riastrad return __execlists_context_alloc(ce, ce->engine); 2984 1.1 riastrad } 2985 1.1 riastrad 2986 1.1 riastrad static void execlists_context_reset(struct intel_context *ce) 2987 1.1 riastrad { 2988 1.1 riastrad CE_TRACE(ce, "reset\n"); 2989 1.1 riastrad GEM_BUG_ON(!intel_context_is_pinned(ce)); 2990 1.1 riastrad 2991 1.1 riastrad /* 2992 1.1 riastrad * Because we emit WA_TAIL_DWORDS there may be a disparity 2993 1.1 riastrad * between our bookkeeping in ce->ring->head and ce->ring->tail and 2994 1.1 riastrad * that stored in context. As we only write new commands from 2995 1.1 riastrad * ce->ring->tail onwards, everything before that is junk. If the GPU 2996 1.1 riastrad * starts reading from its RING_HEAD from the context, it may try to 2997 1.1 riastrad * execute that junk and die. 2998 1.1 riastrad * 2999 1.1 riastrad * The contexts that are stilled pinned on resume belong to the 3000 1.1 riastrad * kernel, and are local to each engine. All other contexts will 3001 1.1 riastrad * have their head/tail sanitized upon pinning before use, so they 3002 1.1 riastrad * will never see garbage, 3003 1.1 riastrad * 3004 1.1 riastrad * So to avoid that we reset the context images upon resume. For 3005 1.1 riastrad * simplicity, we just zero everything out. 3006 1.1 riastrad */ 3007 1.1 riastrad intel_ring_reset(ce->ring, ce->ring->emit); 3008 1.1 riastrad 3009 1.1 riastrad /* Scrub away the garbage */ 3010 1.1 riastrad execlists_init_reg_state(ce->lrc_reg_state, 3011 1.1 riastrad ce, ce->engine, ce->ring, true); 3012 1.1 riastrad __execlists_update_reg_state(ce, ce->engine, ce->ring->tail); 3013 1.1 riastrad 3014 1.1 riastrad ce->lrc_desc |= CTX_DESC_FORCE_RESTORE; 3015 1.1 riastrad } 3016 1.1 riastrad 3017 1.1 riastrad static const struct intel_context_ops execlists_context_ops = { 3018 1.1 riastrad .alloc = execlists_context_alloc, 3019 1.1 riastrad 3020 1.1 riastrad .pin = execlists_context_pin, 3021 1.1 riastrad .unpin = execlists_context_unpin, 3022 1.1 riastrad 3023 1.1 riastrad .enter = intel_context_enter_engine, 3024 1.1 riastrad .exit = intel_context_exit_engine, 3025 1.1 riastrad 3026 1.1 riastrad .reset = execlists_context_reset, 3027 1.1 riastrad .destroy = execlists_context_destroy, 3028 1.1 riastrad }; 3029 1.1 riastrad 3030 1.1 riastrad static int gen8_emit_init_breadcrumb(struct i915_request *rq) 3031 1.1 riastrad { 3032 1.1 riastrad u32 *cs; 3033 1.1 riastrad 3034 1.1 riastrad GEM_BUG_ON(!i915_request_timeline(rq)->has_initial_breadcrumb); 3035 1.1 riastrad 3036 1.1 riastrad cs = intel_ring_begin(rq, 6); 3037 1.1 riastrad if (IS_ERR(cs)) 3038 1.1 riastrad return PTR_ERR(cs); 3039 1.1 riastrad 3040 1.1 riastrad /* 3041 1.1 riastrad * Check if we have been preempted before we even get started. 3042 1.1 riastrad * 3043 1.1 riastrad * After this point i915_request_started() reports true, even if 3044 1.1 riastrad * we get preempted and so are no longer running. 3045 1.1 riastrad */ 3046 1.1 riastrad *cs++ = MI_ARB_CHECK; 3047 1.1 riastrad *cs++ = MI_NOOP; 3048 1.1 riastrad 3049 1.1 riastrad *cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT; 3050 1.1 riastrad *cs++ = i915_request_timeline(rq)->hwsp_offset; 3051 1.1 riastrad *cs++ = 0; 3052 1.1 riastrad *cs++ = rq->fence.seqno - 1; 3053 1.1 riastrad 3054 1.1 riastrad intel_ring_advance(rq, cs); 3055 1.1 riastrad 3056 1.1 riastrad /* Record the updated position of the request's payload */ 3057 1.1 riastrad rq->infix = intel_ring_offset(rq, cs); 3058 1.1 riastrad 3059 1.1 riastrad return 0; 3060 1.1 riastrad } 3061 1.1 riastrad 3062 1.1 riastrad static int execlists_request_alloc(struct i915_request *request) 3063 1.1 riastrad { 3064 1.1 riastrad int ret; 3065 1.1 riastrad 3066 1.1 riastrad GEM_BUG_ON(!intel_context_is_pinned(request->context)); 3067 1.1 riastrad 3068 1.1 riastrad /* 3069 1.1 riastrad * Flush enough space to reduce the likelihood of waiting after 3070 1.1 riastrad * we start building the request - in which case we will just 3071 1.1 riastrad * have to repeat work. 3072 1.1 riastrad */ 3073 1.1 riastrad request->reserved_space += EXECLISTS_REQUEST_SIZE; 3074 1.1 riastrad 3075 1.1 riastrad /* 3076 1.1 riastrad * Note that after this point, we have committed to using 3077 1.1 riastrad * this request as it is being used to both track the 3078 1.1 riastrad * state of engine initialisation and liveness of the 3079 1.1 riastrad * golden renderstate above. Think twice before you try 3080 1.1 riastrad * to cancel/unwind this request now. 3081 1.1 riastrad */ 3082 1.1 riastrad 3083 1.1 riastrad /* Unconditionally invalidate GPU caches and TLBs. */ 3084 1.1 riastrad ret = request->engine->emit_flush(request, EMIT_INVALIDATE); 3085 1.1 riastrad if (ret) 3086 1.1 riastrad return ret; 3087 1.1 riastrad 3088 1.1 riastrad request->reserved_space -= EXECLISTS_REQUEST_SIZE; 3089 1.1 riastrad return 0; 3090 1.1 riastrad } 3091 1.1 riastrad 3092 1.1 riastrad /* 3093 1.1 riastrad * In this WA we need to set GEN8_L3SQCREG4[21:21] and reset it after 3094 1.1 riastrad * PIPE_CONTROL instruction. This is required for the flush to happen correctly 3095 1.1 riastrad * but there is a slight complication as this is applied in WA batch where the 3096 1.1 riastrad * values are only initialized once so we cannot take register value at the 3097 1.1 riastrad * beginning and reuse it further; hence we save its value to memory, upload a 3098 1.1 riastrad * constant value with bit21 set and then we restore it back with the saved value. 3099 1.1 riastrad * To simplify the WA, a constant value is formed by using the default value 3100 1.1 riastrad * of this register. This shouldn't be a problem because we are only modifying 3101 1.1 riastrad * it for a short period and this batch in non-premptible. We can ofcourse 3102 1.1 riastrad * use additional instructions that read the actual value of the register 3103 1.1 riastrad * at that time and set our bit of interest but it makes the WA complicated. 3104 1.1 riastrad * 3105 1.1 riastrad * This WA is also required for Gen9 so extracting as a function avoids 3106 1.1 riastrad * code duplication. 3107 1.1 riastrad */ 3108 1.1 riastrad static u32 * 3109 1.1 riastrad gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine, u32 *batch) 3110 1.1 riastrad { 3111 1.1 riastrad /* NB no one else is allowed to scribble over scratch + 256! */ 3112 1.1 riastrad *batch++ = MI_STORE_REGISTER_MEM_GEN8 | MI_SRM_LRM_GLOBAL_GTT; 3113 1.1 riastrad *batch++ = i915_mmio_reg_offset(GEN8_L3SQCREG4); 3114 1.1 riastrad *batch++ = intel_gt_scratch_offset(engine->gt, 3115 1.1 riastrad INTEL_GT_SCRATCH_FIELD_COHERENTL3_WA); 3116 1.1 riastrad *batch++ = 0; 3117 1.1 riastrad 3118 1.1 riastrad *batch++ = MI_LOAD_REGISTER_IMM(1); 3119 1.1 riastrad *batch++ = i915_mmio_reg_offset(GEN8_L3SQCREG4); 3120 1.1 riastrad *batch++ = 0x40400000 | GEN8_LQSC_FLUSH_COHERENT_LINES; 3121 1.1 riastrad 3122 1.1 riastrad batch = gen8_emit_pipe_control(batch, 3123 1.1 riastrad PIPE_CONTROL_CS_STALL | 3124 1.1 riastrad PIPE_CONTROL_DC_FLUSH_ENABLE, 3125 1.1 riastrad 0); 3126 1.1 riastrad 3127 1.1 riastrad *batch++ = MI_LOAD_REGISTER_MEM_GEN8 | MI_SRM_LRM_GLOBAL_GTT; 3128 1.1 riastrad *batch++ = i915_mmio_reg_offset(GEN8_L3SQCREG4); 3129 1.1 riastrad *batch++ = intel_gt_scratch_offset(engine->gt, 3130 1.1 riastrad INTEL_GT_SCRATCH_FIELD_COHERENTL3_WA); 3131 1.1 riastrad *batch++ = 0; 3132 1.1 riastrad 3133 1.1 riastrad return batch; 3134 1.1 riastrad } 3135 1.1 riastrad 3136 1.1 riastrad /* 3137 1.1 riastrad * Typically we only have one indirect_ctx and per_ctx batch buffer which are 3138 1.1 riastrad * initialized at the beginning and shared across all contexts but this field 3139 1.1 riastrad * helps us to have multiple batches at different offsets and select them based 3140 1.1 riastrad * on a criteria. At the moment this batch always start at the beginning of the page 3141 1.1 riastrad * and at this point we don't have multiple wa_ctx batch buffers. 3142 1.1 riastrad * 3143 1.1 riastrad * The number of WA applied are not known at the beginning; we use this field 3144 1.1 riastrad * to return the no of DWORDS written. 3145 1.1 riastrad * 3146 1.1 riastrad * It is to be noted that this batch does not contain MI_BATCH_BUFFER_END 3147 1.1 riastrad * so it adds NOOPs as padding to make it cacheline aligned. 3148 1.1 riastrad * MI_BATCH_BUFFER_END will be added to perctx batch and both of them together 3149 1.1 riastrad * makes a complete batch buffer. 3150 1.1 riastrad */ 3151 1.1 riastrad static u32 *gen8_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) 3152 1.1 riastrad { 3153 1.1 riastrad /* WaDisableCtxRestoreArbitration:bdw,chv */ 3154 1.1 riastrad *batch++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; 3155 1.1 riastrad 3156 1.1 riastrad /* WaFlushCoherentL3CacheLinesAtContextSwitch:bdw */ 3157 1.1 riastrad if (IS_BROADWELL(engine->i915)) 3158 1.1 riastrad batch = gen8_emit_flush_coherentl3_wa(engine, batch); 3159 1.1 riastrad 3160 1.1 riastrad /* WaClearSlmSpaceAtContextSwitch:bdw,chv */ 3161 1.1 riastrad /* Actual scratch location is at 128 bytes offset */ 3162 1.1 riastrad batch = gen8_emit_pipe_control(batch, 3163 1.1 riastrad PIPE_CONTROL_FLUSH_L3 | 3164 1.1 riastrad PIPE_CONTROL_STORE_DATA_INDEX | 3165 1.1 riastrad PIPE_CONTROL_CS_STALL | 3166 1.1 riastrad PIPE_CONTROL_QW_WRITE, 3167 1.1 riastrad LRC_PPHWSP_SCRATCH_ADDR); 3168 1.1 riastrad 3169 1.1 riastrad *batch++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; 3170 1.1 riastrad 3171 1.1 riastrad /* Pad to end of cacheline */ 3172 1.1 riastrad while ((unsigned long)batch % CACHELINE_BYTES) 3173 1.1 riastrad *batch++ = MI_NOOP; 3174 1.1 riastrad 3175 1.1 riastrad /* 3176 1.1 riastrad * MI_BATCH_BUFFER_END is not required in Indirect ctx BB because 3177 1.1 riastrad * execution depends on the length specified in terms of cache lines 3178 1.1 riastrad * in the register CTX_RCS_INDIRECT_CTX 3179 1.1 riastrad */ 3180 1.1 riastrad 3181 1.1 riastrad return batch; 3182 1.1 riastrad } 3183 1.1 riastrad 3184 1.1 riastrad struct lri { 3185 1.1 riastrad i915_reg_t reg; 3186 1.1 riastrad u32 value; 3187 1.1 riastrad }; 3188 1.1 riastrad 3189 1.1 riastrad static u32 *emit_lri(u32 *batch, const struct lri *lri, unsigned int count) 3190 1.1 riastrad { 3191 1.1 riastrad GEM_BUG_ON(!count || count > 63); 3192 1.1 riastrad 3193 1.1 riastrad *batch++ = MI_LOAD_REGISTER_IMM(count); 3194 1.1 riastrad do { 3195 1.1 riastrad *batch++ = i915_mmio_reg_offset(lri->reg); 3196 1.1 riastrad *batch++ = lri->value; 3197 1.1 riastrad } while (lri++, --count); 3198 1.1 riastrad *batch++ = MI_NOOP; 3199 1.1 riastrad 3200 1.1 riastrad return batch; 3201 1.1 riastrad } 3202 1.1 riastrad 3203 1.1 riastrad static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) 3204 1.1 riastrad { 3205 1.1 riastrad static const struct lri lri[] = { 3206 1.1 riastrad /* WaDisableGatherAtSetShaderCommonSlice:skl,bxt,kbl,glk */ 3207 1.1 riastrad { 3208 1.1 riastrad COMMON_SLICE_CHICKEN2, 3209 1.1 riastrad __MASKED_FIELD(GEN9_DISABLE_GATHER_AT_SET_SHADER_COMMON_SLICE, 3210 1.1 riastrad 0), 3211 1.1 riastrad }, 3212 1.1 riastrad 3213 1.1 riastrad /* BSpec: 11391 */ 3214 1.1 riastrad { 3215 1.1 riastrad FF_SLICE_CHICKEN, 3216 1.1 riastrad __MASKED_FIELD(FF_SLICE_CHICKEN_CL_PROVOKING_VERTEX_FIX, 3217 1.1 riastrad FF_SLICE_CHICKEN_CL_PROVOKING_VERTEX_FIX), 3218 1.1 riastrad }, 3219 1.1 riastrad 3220 1.1 riastrad /* BSpec: 11299 */ 3221 1.1 riastrad { 3222 1.1 riastrad _3D_CHICKEN3, 3223 1.1 riastrad __MASKED_FIELD(_3D_CHICKEN_SF_PROVOKING_VERTEX_FIX, 3224 1.1 riastrad _3D_CHICKEN_SF_PROVOKING_VERTEX_FIX), 3225 1.1 riastrad } 3226 1.1 riastrad }; 3227 1.1 riastrad 3228 1.1 riastrad *batch++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; 3229 1.1 riastrad 3230 1.1 riastrad /* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */ 3231 1.1 riastrad batch = gen8_emit_flush_coherentl3_wa(engine, batch); 3232 1.1 riastrad 3233 1.1 riastrad /* WaClearSlmSpaceAtContextSwitch:skl,bxt,kbl,glk,cfl */ 3234 1.1 riastrad batch = gen8_emit_pipe_control(batch, 3235 1.1 riastrad PIPE_CONTROL_FLUSH_L3 | 3236 1.1 riastrad PIPE_CONTROL_STORE_DATA_INDEX | 3237 1.1 riastrad PIPE_CONTROL_CS_STALL | 3238 1.1 riastrad PIPE_CONTROL_QW_WRITE, 3239 1.1 riastrad LRC_PPHWSP_SCRATCH_ADDR); 3240 1.1 riastrad 3241 1.1 riastrad batch = emit_lri(batch, lri, ARRAY_SIZE(lri)); 3242 1.1 riastrad 3243 1.1 riastrad /* WaMediaPoolStateCmdInWABB:bxt,glk */ 3244 1.1 riastrad if (HAS_POOLED_EU(engine->i915)) { 3245 1.1 riastrad /* 3246 1.1 riastrad * EU pool configuration is setup along with golden context 3247 1.1 riastrad * during context initialization. This value depends on 3248 1.1 riastrad * device type (2x6 or 3x6) and needs to be updated based 3249 1.1 riastrad * on which subslice is disabled especially for 2x6 3250 1.1 riastrad * devices, however it is safe to load default 3251 1.1 riastrad * configuration of 3x6 device instead of masking off 3252 1.1 riastrad * corresponding bits because HW ignores bits of a disabled 3253 1.1 riastrad * subslice and drops down to appropriate config. Please 3254 1.1 riastrad * see render_state_setup() in i915_gem_render_state.c for 3255 1.1 riastrad * possible configurations, to avoid duplication they are 3256 1.1 riastrad * not shown here again. 3257 1.1 riastrad */ 3258 1.1 riastrad *batch++ = GEN9_MEDIA_POOL_STATE; 3259 1.1 riastrad *batch++ = GEN9_MEDIA_POOL_ENABLE; 3260 1.1 riastrad *batch++ = 0x00777000; 3261 1.1 riastrad *batch++ = 0; 3262 1.1 riastrad *batch++ = 0; 3263 1.1 riastrad *batch++ = 0; 3264 1.1 riastrad } 3265 1.1 riastrad 3266 1.1 riastrad *batch++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; 3267 1.1 riastrad 3268 1.1 riastrad /* Pad to end of cacheline */ 3269 1.1 riastrad while ((unsigned long)batch % CACHELINE_BYTES) 3270 1.1 riastrad *batch++ = MI_NOOP; 3271 1.1 riastrad 3272 1.1 riastrad return batch; 3273 1.1 riastrad } 3274 1.1 riastrad 3275 1.1 riastrad static u32 * 3276 1.1 riastrad gen10_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) 3277 1.1 riastrad { 3278 1.1 riastrad int i; 3279 1.1 riastrad 3280 1.1 riastrad /* 3281 1.1 riastrad * WaPipeControlBefore3DStateSamplePattern: cnl 3282 1.1 riastrad * 3283 1.1 riastrad * Ensure the engine is idle prior to programming a 3284 1.1 riastrad * 3DSTATE_SAMPLE_PATTERN during a context restore. 3285 1.1 riastrad */ 3286 1.1 riastrad batch = gen8_emit_pipe_control(batch, 3287 1.1 riastrad PIPE_CONTROL_CS_STALL, 3288 1.1 riastrad 0); 3289 1.1 riastrad /* 3290 1.1 riastrad * WaPipeControlBefore3DStateSamplePattern says we need 4 dwords for 3291 1.1 riastrad * the PIPE_CONTROL followed by 12 dwords of 0x0, so 16 dwords in 3292 1.1 riastrad * total. However, a PIPE_CONTROL is 6 dwords long, not 4, which is 3293 1.1 riastrad * confusing. Since gen8_emit_pipe_control() already advances the 3294 1.1 riastrad * batch by 6 dwords, we advance the other 10 here, completing a 3295 1.1 riastrad * cacheline. It's not clear if the workaround requires this padding 3296 1.1 riastrad * before other commands, or if it's just the regular padding we would 3297 1.1 riastrad * already have for the workaround bb, so leave it here for now. 3298 1.1 riastrad */ 3299 1.1 riastrad for (i = 0; i < 10; i++) 3300 1.1 riastrad *batch++ = MI_NOOP; 3301 1.1 riastrad 3302 1.1 riastrad /* Pad to end of cacheline */ 3303 1.1 riastrad while ((unsigned long)batch % CACHELINE_BYTES) 3304 1.1 riastrad *batch++ = MI_NOOP; 3305 1.1 riastrad 3306 1.1 riastrad return batch; 3307 1.1 riastrad } 3308 1.1 riastrad 3309 1.1 riastrad #define CTX_WA_BB_OBJ_SIZE (PAGE_SIZE) 3310 1.1 riastrad 3311 1.1 riastrad static int lrc_setup_wa_ctx(struct intel_engine_cs *engine) 3312 1.1 riastrad { 3313 1.1 riastrad struct drm_i915_gem_object *obj; 3314 1.1 riastrad struct i915_vma *vma; 3315 1.1 riastrad int err; 3316 1.1 riastrad 3317 1.1 riastrad obj = i915_gem_object_create_shmem(engine->i915, CTX_WA_BB_OBJ_SIZE); 3318 1.1 riastrad if (IS_ERR(obj)) 3319 1.1 riastrad return PTR_ERR(obj); 3320 1.1 riastrad 3321 1.1 riastrad vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL); 3322 1.1 riastrad if (IS_ERR(vma)) { 3323 1.1 riastrad err = PTR_ERR(vma); 3324 1.1 riastrad goto err; 3325 1.1 riastrad } 3326 1.1 riastrad 3327 1.1 riastrad err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH); 3328 1.1 riastrad if (err) 3329 1.1 riastrad goto err; 3330 1.1 riastrad 3331 1.1 riastrad engine->wa_ctx.vma = vma; 3332 1.1 riastrad return 0; 3333 1.1 riastrad 3334 1.1 riastrad err: 3335 1.1 riastrad i915_gem_object_put(obj); 3336 1.1 riastrad return err; 3337 1.1 riastrad } 3338 1.1 riastrad 3339 1.1 riastrad static void lrc_destroy_wa_ctx(struct intel_engine_cs *engine) 3340 1.1 riastrad { 3341 1.1 riastrad i915_vma_unpin_and_release(&engine->wa_ctx.vma, 0); 3342 1.1 riastrad } 3343 1.1 riastrad 3344 1.1 riastrad typedef u32 *(*wa_bb_func_t)(struct intel_engine_cs *engine, u32 *batch); 3345 1.1 riastrad 3346 1.1 riastrad static int intel_init_workaround_bb(struct intel_engine_cs *engine) 3347 1.1 riastrad { 3348 1.1 riastrad struct i915_ctx_workarounds *wa_ctx = &engine->wa_ctx; 3349 1.1 riastrad struct i915_wa_ctx_bb *wa_bb[2] = { &wa_ctx->indirect_ctx, 3350 1.1 riastrad &wa_ctx->per_ctx }; 3351 1.1 riastrad wa_bb_func_t wa_bb_fn[2]; 3352 1.1 riastrad struct page *page; 3353 1.1 riastrad void *batch, *batch_ptr; 3354 1.1 riastrad unsigned int i; 3355 1.1 riastrad int ret; 3356 1.1 riastrad 3357 1.1 riastrad if (engine->class != RENDER_CLASS) 3358 1.1 riastrad return 0; 3359 1.1 riastrad 3360 1.1 riastrad switch (INTEL_GEN(engine->i915)) { 3361 1.1 riastrad case 12: 3362 1.1 riastrad case 11: 3363 1.1 riastrad return 0; 3364 1.1 riastrad case 10: 3365 1.1 riastrad wa_bb_fn[0] = gen10_init_indirectctx_bb; 3366 1.1 riastrad wa_bb_fn[1] = NULL; 3367 1.1 riastrad break; 3368 1.1 riastrad case 9: 3369 1.1 riastrad wa_bb_fn[0] = gen9_init_indirectctx_bb; 3370 1.1 riastrad wa_bb_fn[1] = NULL; 3371 1.1 riastrad break; 3372 1.1 riastrad case 8: 3373 1.1 riastrad wa_bb_fn[0] = gen8_init_indirectctx_bb; 3374 1.1 riastrad wa_bb_fn[1] = NULL; 3375 1.1 riastrad break; 3376 1.1 riastrad default: 3377 1.1 riastrad MISSING_CASE(INTEL_GEN(engine->i915)); 3378 1.1 riastrad return 0; 3379 1.1 riastrad } 3380 1.1 riastrad 3381 1.1 riastrad ret = lrc_setup_wa_ctx(engine); 3382 1.1 riastrad if (ret) { 3383 1.1 riastrad DRM_DEBUG_DRIVER("Failed to setup context WA page: %d\n", ret); 3384 1.1 riastrad return ret; 3385 1.1 riastrad } 3386 1.1 riastrad 3387 1.1 riastrad page = i915_gem_object_get_dirty_page(wa_ctx->vma->obj, 0); 3388 1.1 riastrad batch = batch_ptr = kmap_atomic(page); 3389 1.1 riastrad 3390 1.1 riastrad /* 3391 1.1 riastrad * Emit the two workaround batch buffers, recording the offset from the 3392 1.1 riastrad * start of the workaround batch buffer object for each and their 3393 1.1 riastrad * respective sizes. 3394 1.1 riastrad */ 3395 1.1 riastrad for (i = 0; i < ARRAY_SIZE(wa_bb_fn); i++) { 3396 1.1 riastrad wa_bb[i]->offset = batch_ptr - batch; 3397 1.1 riastrad if (GEM_DEBUG_WARN_ON(!IS_ALIGNED(wa_bb[i]->offset, 3398 1.1 riastrad CACHELINE_BYTES))) { 3399 1.1 riastrad ret = -EINVAL; 3400 1.1 riastrad break; 3401 1.1 riastrad } 3402 1.1 riastrad if (wa_bb_fn[i]) 3403 1.1 riastrad batch_ptr = wa_bb_fn[i](engine, batch_ptr); 3404 1.1 riastrad wa_bb[i]->size = batch_ptr - (batch + wa_bb[i]->offset); 3405 1.1 riastrad } 3406 1.1 riastrad 3407 1.1 riastrad BUG_ON(batch_ptr - batch > CTX_WA_BB_OBJ_SIZE); 3408 1.1 riastrad 3409 1.1 riastrad kunmap_atomic(batch); 3410 1.1 riastrad if (ret) 3411 1.1 riastrad lrc_destroy_wa_ctx(engine); 3412 1.1 riastrad 3413 1.1 riastrad return ret; 3414 1.1 riastrad } 3415 1.1 riastrad 3416 1.1 riastrad static void enable_execlists(struct intel_engine_cs *engine) 3417 1.1 riastrad { 3418 1.1 riastrad u32 mode; 3419 1.1 riastrad 3420 1.1 riastrad assert_forcewakes_active(engine->uncore, FORCEWAKE_ALL); 3421 1.1 riastrad 3422 1.1 riastrad intel_engine_set_hwsp_writemask(engine, ~0u); /* HWSTAM */ 3423 1.1 riastrad 3424 1.1 riastrad if (INTEL_GEN(engine->i915) >= 11) 3425 1.1 riastrad mode = _MASKED_BIT_ENABLE(GEN11_GFX_DISABLE_LEGACY_MODE); 3426 1.1 riastrad else 3427 1.1 riastrad mode = _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE); 3428 1.1 riastrad ENGINE_WRITE_FW(engine, RING_MODE_GEN7, mode); 3429 1.1 riastrad 3430 1.1 riastrad ENGINE_WRITE_FW(engine, RING_MI_MODE, _MASKED_BIT_DISABLE(STOP_RING)); 3431 1.1 riastrad 3432 1.1 riastrad ENGINE_WRITE_FW(engine, 3433 1.1 riastrad RING_HWS_PGA, 3434 1.1 riastrad i915_ggtt_offset(engine->status_page.vma)); 3435 1.1 riastrad ENGINE_POSTING_READ(engine, RING_HWS_PGA); 3436 1.1 riastrad 3437 1.1 riastrad engine->context_tag = 0; 3438 1.1 riastrad } 3439 1.1 riastrad 3440 1.1 riastrad static bool unexpected_starting_state(struct intel_engine_cs *engine) 3441 1.1 riastrad { 3442 1.1 riastrad bool unexpected = false; 3443 1.1 riastrad 3444 1.1 riastrad if (ENGINE_READ_FW(engine, RING_MI_MODE) & STOP_RING) { 3445 1.1 riastrad DRM_DEBUG_DRIVER("STOP_RING still set in RING_MI_MODE\n"); 3446 1.1 riastrad unexpected = true; 3447 1.1 riastrad } 3448 1.1 riastrad 3449 1.1 riastrad return unexpected; 3450 1.1 riastrad } 3451 1.1 riastrad 3452 1.1 riastrad static int execlists_resume(struct intel_engine_cs *engine) 3453 1.1 riastrad { 3454 1.1 riastrad intel_engine_apply_workarounds(engine); 3455 1.1 riastrad intel_engine_apply_whitelist(engine); 3456 1.1 riastrad 3457 1.1 riastrad intel_mocs_init_engine(engine); 3458 1.1 riastrad 3459 1.1 riastrad intel_engine_reset_breadcrumbs(engine); 3460 1.1 riastrad 3461 1.1 riastrad if (GEM_SHOW_DEBUG() && unexpected_starting_state(engine)) { 3462 1.1 riastrad struct drm_printer p = drm_debug_printer(__func__); 3463 1.1 riastrad 3464 1.1 riastrad intel_engine_dump(engine, &p, NULL); 3465 1.1 riastrad } 3466 1.1 riastrad 3467 1.1 riastrad enable_execlists(engine); 3468 1.1 riastrad 3469 1.1 riastrad return 0; 3470 1.1 riastrad } 3471 1.1 riastrad 3472 1.1 riastrad static void execlists_reset_prepare(struct intel_engine_cs *engine) 3473 1.1 riastrad { 3474 1.1 riastrad struct intel_engine_execlists * const execlists = &engine->execlists; 3475 1.1 riastrad unsigned long flags; 3476 1.1 riastrad 3477 1.1 riastrad ENGINE_TRACE(engine, "depth<-%d\n", 3478 1.1 riastrad atomic_read(&execlists->tasklet.count)); 3479 1.1 riastrad 3480 1.1 riastrad /* 3481 1.1 riastrad * Prevent request submission to the hardware until we have 3482 1.1 riastrad * completed the reset in i915_gem_reset_finish(). If a request 3483 1.1 riastrad * is completed by one engine, it may then queue a request 3484 1.1 riastrad * to a second via its execlists->tasklet *just* as we are 3485 1.1 riastrad * calling engine->resume() and also writing the ELSP. 3486 1.1 riastrad * Turning off the execlists->tasklet until the reset is over 3487 1.1 riastrad * prevents the race. 3488 1.1 riastrad */ 3489 1.1 riastrad __tasklet_disable_sync_once(&execlists->tasklet); 3490 1.1 riastrad GEM_BUG_ON(!reset_in_progress(execlists)); 3491 1.1 riastrad 3492 1.1 riastrad /* And flush any current direct submission. */ 3493 1.1 riastrad spin_lock_irqsave(&engine->active.lock, flags); 3494 1.1 riastrad spin_unlock_irqrestore(&engine->active.lock, flags); 3495 1.1 riastrad 3496 1.1 riastrad /* 3497 1.1 riastrad * We stop engines, otherwise we might get failed reset and a 3498 1.1 riastrad * dead gpu (on elk). Also as modern gpu as kbl can suffer 3499 1.1 riastrad * from system hang if batchbuffer is progressing when 3500 1.1 riastrad * the reset is issued, regardless of READY_TO_RESET ack. 3501 1.1 riastrad * Thus assume it is best to stop engines on all gens 3502 1.1 riastrad * where we have a gpu reset. 3503 1.1 riastrad * 3504 1.1 riastrad * WaKBLVECSSemaphoreWaitPoll:kbl (on ALL_ENGINES) 3505 1.1 riastrad * 3506 1.1 riastrad * FIXME: Wa for more modern gens needs to be validated 3507 1.1 riastrad */ 3508 1.1 riastrad intel_engine_stop_cs(engine); 3509 1.1 riastrad } 3510 1.1 riastrad 3511 1.1 riastrad static void reset_csb_pointers(struct intel_engine_cs *engine) 3512 1.1 riastrad { 3513 1.1 riastrad struct intel_engine_execlists * const execlists = &engine->execlists; 3514 1.1 riastrad const unsigned int reset_value = execlists->csb_size - 1; 3515 1.1 riastrad 3516 1.1 riastrad ring_set_paused(engine, 0); 3517 1.1 riastrad 3518 1.1 riastrad /* 3519 1.1 riastrad * After a reset, the HW starts writing into CSB entry [0]. We 3520 1.1 riastrad * therefore have to set our HEAD pointer back one entry so that 3521 1.1 riastrad * the *first* entry we check is entry 0. To complicate this further, 3522 1.1 riastrad * as we don't wait for the first interrupt after reset, we have to 3523 1.1 riastrad * fake the HW write to point back to the last entry so that our 3524 1.1 riastrad * inline comparison of our cached head position against the last HW 3525 1.1 riastrad * write works even before the first interrupt. 3526 1.1 riastrad */ 3527 1.1 riastrad execlists->csb_head = reset_value; 3528 1.1 riastrad WRITE_ONCE(*execlists->csb_write, reset_value); 3529 1.1 riastrad wmb(); /* Make sure this is visible to HW (paranoia?) */ 3530 1.1 riastrad 3531 1.1 riastrad /* 3532 1.1 riastrad * Sometimes Icelake forgets to reset its pointers on a GPU reset. 3533 1.1 riastrad * Bludgeon them with a mmio update to be sure. 3534 1.1 riastrad */ 3535 1.1 riastrad ENGINE_WRITE(engine, RING_CONTEXT_STATUS_PTR, 3536 1.1 riastrad reset_value << 8 | reset_value); 3537 1.1 riastrad ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR); 3538 1.1 riastrad 3539 1.1 riastrad invalidate_csb_entries(&execlists->csb_status[0], 3540 1.1 riastrad &execlists->csb_status[reset_value]); 3541 1.1 riastrad } 3542 1.1 riastrad 3543 1.1 riastrad static void __reset_stop_ring(u32 *regs, const struct intel_engine_cs *engine) 3544 1.1 riastrad { 3545 1.1 riastrad int x; 3546 1.1 riastrad 3547 1.1 riastrad x = lrc_ring_mi_mode(engine); 3548 1.1 riastrad if (x != -1) { 3549 1.1 riastrad regs[x + 1] &= ~STOP_RING; 3550 1.1 riastrad regs[x + 1] |= STOP_RING << 16; 3551 1.1 riastrad } 3552 1.1 riastrad } 3553 1.1 riastrad 3554 1.1 riastrad static void __execlists_reset_reg_state(const struct intel_context *ce, 3555 1.1 riastrad const struct intel_engine_cs *engine) 3556 1.1 riastrad { 3557 1.1 riastrad u32 *regs = ce->lrc_reg_state; 3558 1.1 riastrad 3559 1.1 riastrad __reset_stop_ring(regs, engine); 3560 1.1 riastrad } 3561 1.1 riastrad 3562 1.1 riastrad static void __execlists_reset(struct intel_engine_cs *engine, bool stalled) 3563 1.1 riastrad { 3564 1.1 riastrad struct intel_engine_execlists * const execlists = &engine->execlists; 3565 1.1 riastrad struct intel_context *ce; 3566 1.1 riastrad struct i915_request *rq; 3567 1.1 riastrad u32 head; 3568 1.1 riastrad 3569 1.1 riastrad mb(); /* paranoia: read the CSB pointers from after the reset */ 3570 1.1 riastrad clflush(execlists->csb_write); 3571 1.1 riastrad mb(); 3572 1.1 riastrad 3573 1.1 riastrad process_csb(engine); /* drain preemption events */ 3574 1.1 riastrad 3575 1.1 riastrad /* Following the reset, we need to reload the CSB read/write pointers */ 3576 1.1 riastrad reset_csb_pointers(engine); 3577 1.1 riastrad 3578 1.1 riastrad /* 3579 1.1 riastrad * Save the currently executing context, even if we completed 3580 1.1 riastrad * its request, it was still running at the time of the 3581 1.1 riastrad * reset and will have been clobbered. 3582 1.1 riastrad */ 3583 1.1 riastrad rq = execlists_active(execlists); 3584 1.1 riastrad if (!rq) 3585 1.1 riastrad goto unwind; 3586 1.1 riastrad 3587 1.1 riastrad /* We still have requests in-flight; the engine should be active */ 3588 1.1 riastrad GEM_BUG_ON(!intel_engine_pm_is_awake(engine)); 3589 1.1 riastrad 3590 1.1 riastrad ce = rq->context; 3591 1.1 riastrad GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); 3592 1.1 riastrad 3593 1.1 riastrad if (i915_request_completed(rq)) { 3594 1.1 riastrad /* Idle context; tidy up the ring so we can restart afresh */ 3595 1.1 riastrad head = intel_ring_wrap(ce->ring, rq->tail); 3596 1.1 riastrad goto out_replay; 3597 1.1 riastrad } 3598 1.1 riastrad 3599 1.1 riastrad /* Context has requests still in-flight; it should not be idle! */ 3600 1.1 riastrad GEM_BUG_ON(i915_active_is_idle(&ce->active)); 3601 1.1 riastrad rq = active_request(ce->timeline, rq); 3602 1.1 riastrad head = intel_ring_wrap(ce->ring, rq->head); 3603 1.1 riastrad GEM_BUG_ON(head == ce->ring->tail); 3604 1.1 riastrad 3605 1.1 riastrad /* 3606 1.1 riastrad * If this request hasn't started yet, e.g. it is waiting on a 3607 1.1 riastrad * semaphore, we need to avoid skipping the request or else we 3608 1.1 riastrad * break the signaling chain. However, if the context is corrupt 3609 1.1 riastrad * the request will not restart and we will be stuck with a wedged 3610 1.1 riastrad * device. It is quite often the case that if we issue a reset 3611 1.1 riastrad * while the GPU is loading the context image, that the context 3612 1.1 riastrad * image becomes corrupt. 3613 1.1 riastrad * 3614 1.1 riastrad * Otherwise, if we have not started yet, the request should replay 3615 1.1 riastrad * perfectly and we do not need to flag the result as being erroneous. 3616 1.1 riastrad */ 3617 1.1 riastrad if (!i915_request_started(rq)) 3618 1.1 riastrad goto out_replay; 3619 1.1 riastrad 3620 1.1 riastrad /* 3621 1.1 riastrad * If the request was innocent, we leave the request in the ELSP 3622 1.1 riastrad * and will try to replay it on restarting. The context image may 3623 1.1 riastrad * have been corrupted by the reset, in which case we may have 3624 1.1 riastrad * to service a new GPU hang, but more likely we can continue on 3625 1.1 riastrad * without impact. 3626 1.1 riastrad * 3627 1.1 riastrad * If the request was guilty, we presume the context is corrupt 3628 1.1 riastrad * and have to at least restore the RING register in the context 3629 1.1 riastrad * image back to the expected values to skip over the guilty request. 3630 1.1 riastrad */ 3631 1.1 riastrad __i915_request_reset(rq, stalled); 3632 1.1 riastrad if (!stalled) 3633 1.1 riastrad goto out_replay; 3634 1.1 riastrad 3635 1.1 riastrad /* 3636 1.1 riastrad * We want a simple context + ring to execute the breadcrumb update. 3637 1.1 riastrad * We cannot rely on the context being intact across the GPU hang, 3638 1.1 riastrad * so clear it and rebuild just what we need for the breadcrumb. 3639 1.1 riastrad * All pending requests for this context will be zapped, and any 3640 1.1 riastrad * future request will be after userspace has had the opportunity 3641 1.1 riastrad * to recreate its own state. 3642 1.1 riastrad */ 3643 1.1 riastrad GEM_BUG_ON(!intel_context_is_pinned(ce)); 3644 1.1 riastrad restore_default_state(ce, engine); 3645 1.1 riastrad 3646 1.1 riastrad out_replay: 3647 1.1 riastrad ENGINE_TRACE(engine, "replay {head:%04x, tail:%04x}\n", 3648 1.1 riastrad head, ce->ring->tail); 3649 1.1 riastrad __execlists_reset_reg_state(ce, engine); 3650 1.1 riastrad __execlists_update_reg_state(ce, engine, head); 3651 1.1 riastrad ce->lrc_desc |= CTX_DESC_FORCE_RESTORE; /* paranoid: GPU was reset! */ 3652 1.1 riastrad 3653 1.1 riastrad unwind: 3654 1.1 riastrad /* Push back any incomplete requests for replay after the reset. */ 3655 1.1 riastrad cancel_port_requests(execlists); 3656 1.1 riastrad __unwind_incomplete_requests(engine); 3657 1.1 riastrad } 3658 1.1 riastrad 3659 1.1 riastrad static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled) 3660 1.1 riastrad { 3661 1.1 riastrad unsigned long flags; 3662 1.1 riastrad 3663 1.1 riastrad ENGINE_TRACE(engine, "\n"); 3664 1.1 riastrad 3665 1.1 riastrad spin_lock_irqsave(&engine->active.lock, flags); 3666 1.1 riastrad 3667 1.1 riastrad __execlists_reset(engine, stalled); 3668 1.1 riastrad 3669 1.1 riastrad spin_unlock_irqrestore(&engine->active.lock, flags); 3670 1.1 riastrad } 3671 1.1 riastrad 3672 1.1 riastrad static void nop_submission_tasklet(unsigned long data) 3673 1.1 riastrad { 3674 1.1 riastrad /* The driver is wedged; don't process any more events. */ 3675 1.1 riastrad } 3676 1.1 riastrad 3677 1.1 riastrad static void execlists_reset_cancel(struct intel_engine_cs *engine) 3678 1.1 riastrad { 3679 1.1 riastrad struct intel_engine_execlists * const execlists = &engine->execlists; 3680 1.1 riastrad struct i915_request *rq, *rn; 3681 1.1 riastrad struct rb_node *rb; 3682 1.1 riastrad unsigned long flags; 3683 1.1 riastrad 3684 1.1 riastrad ENGINE_TRACE(engine, "\n"); 3685 1.1 riastrad 3686 1.1 riastrad /* 3687 1.1 riastrad * Before we call engine->cancel_requests(), we should have exclusive 3688 1.1 riastrad * access to the submission state. This is arranged for us by the 3689 1.1 riastrad * caller disabling the interrupt generation, the tasklet and other 3690 1.1 riastrad * threads that may then access the same state, giving us a free hand 3691 1.1 riastrad * to reset state. However, we still need to let lockdep be aware that 3692 1.1 riastrad * we know this state may be accessed in hardirq context, so we 3693 1.1 riastrad * disable the irq around this manipulation and we want to keep 3694 1.1 riastrad * the spinlock focused on its duties and not accidentally conflate 3695 1.1 riastrad * coverage to the submission's irq state. (Similarly, although we 3696 1.1 riastrad * shouldn't need to disable irq around the manipulation of the 3697 1.1 riastrad * submission's irq state, we also wish to remind ourselves that 3698 1.1 riastrad * it is irq state.) 3699 1.1 riastrad */ 3700 1.1 riastrad spin_lock_irqsave(&engine->active.lock, flags); 3701 1.1 riastrad 3702 1.1 riastrad __execlists_reset(engine, true); 3703 1.1 riastrad 3704 1.1 riastrad /* Mark all executing requests as skipped. */ 3705 1.1 riastrad list_for_each_entry(rq, &engine->active.requests, sched.link) 3706 1.1 riastrad mark_eio(rq); 3707 1.1 riastrad 3708 1.1 riastrad /* Flush the queued requests to the timeline list (for retiring). */ 3709 1.1 riastrad while ((rb = rb_first_cached(&execlists->queue))) { 3710 1.1 riastrad struct i915_priolist *p = to_priolist(rb); 3711 1.1 riastrad int i; 3712 1.1 riastrad 3713 1.1 riastrad priolist_for_each_request_consume(rq, rn, p, i) { 3714 1.1 riastrad mark_eio(rq); 3715 1.1 riastrad __i915_request_submit(rq); 3716 1.1 riastrad } 3717 1.1 riastrad 3718 1.1 riastrad rb_erase_cached(&p->node, &execlists->queue); 3719 1.1 riastrad i915_priolist_free(p); 3720 1.1 riastrad } 3721 1.1 riastrad 3722 1.1 riastrad /* On-hold requests will be flushed to timeline upon their release */ 3723 1.1 riastrad list_for_each_entry(rq, &engine->active.hold, sched.link) 3724 1.1 riastrad mark_eio(rq); 3725 1.1 riastrad 3726 1.1 riastrad /* Cancel all attached virtual engines */ 3727 1.1 riastrad while ((rb = rb_first_cached(&execlists->virtual))) { 3728 1.1 riastrad struct virtual_engine *ve = 3729 1.1 riastrad rb_entry(rb, typeof(*ve), nodes[engine->id].rb); 3730 1.1 riastrad 3731 1.1 riastrad rb_erase_cached(rb, &execlists->virtual); 3732 1.7 riastrad container_of(rb, struct ve_node, rb)->inserted = false; 3733 1.1 riastrad 3734 1.1 riastrad spin_lock(&ve->base.active.lock); 3735 1.1 riastrad rq = fetch_and_zero(&ve->request); 3736 1.1 riastrad if (rq) { 3737 1.1 riastrad mark_eio(rq); 3738 1.1 riastrad 3739 1.1 riastrad rq->engine = engine; 3740 1.1 riastrad __i915_request_submit(rq); 3741 1.1 riastrad i915_request_put(rq); 3742 1.1 riastrad 3743 1.1 riastrad ve->base.execlists.queue_priority_hint = INT_MIN; 3744 1.1 riastrad } 3745 1.1 riastrad spin_unlock(&ve->base.active.lock); 3746 1.1 riastrad } 3747 1.1 riastrad 3748 1.1 riastrad /* Remaining _unready_ requests will be nop'ed when submitted */ 3749 1.1 riastrad 3750 1.1 riastrad execlists->queue_priority_hint = INT_MIN; 3751 1.7 riastrad #ifdef __NetBSD__ 3752 1.7 riastrad i915_sched_init(execlists); 3753 1.7 riastrad rb_tree_init(&execlists->virtual.rb_root.rbr_tree, &ve_tree_ops); 3754 1.7 riastrad #else 3755 1.1 riastrad execlists->queue = RB_ROOT_CACHED; 3756 1.7 riastrad #endif 3757 1.1 riastrad 3758 1.1 riastrad GEM_BUG_ON(__tasklet_is_enabled(&execlists->tasklet)); 3759 1.1 riastrad execlists->tasklet.func = nop_submission_tasklet; 3760 1.1 riastrad 3761 1.1 riastrad spin_unlock_irqrestore(&engine->active.lock, flags); 3762 1.1 riastrad } 3763 1.1 riastrad 3764 1.1 riastrad static void execlists_reset_finish(struct intel_engine_cs *engine) 3765 1.1 riastrad { 3766 1.1 riastrad struct intel_engine_execlists * const execlists = &engine->execlists; 3767 1.1 riastrad 3768 1.1 riastrad /* 3769 1.1 riastrad * After a GPU reset, we may have requests to replay. Do so now while 3770 1.1 riastrad * we still have the forcewake to be sure that the GPU is not allowed 3771 1.1 riastrad * to sleep before we restart and reload a context. 3772 1.1 riastrad */ 3773 1.1 riastrad GEM_BUG_ON(!reset_in_progress(execlists)); 3774 1.1 riastrad if (!RB_EMPTY_ROOT(&execlists->queue.rb_root)) 3775 1.1 riastrad execlists->tasklet.func(execlists->tasklet.data); 3776 1.1 riastrad 3777 1.1 riastrad if (__tasklet_enable(&execlists->tasklet)) 3778 1.1 riastrad /* And kick in case we missed a new request submission. */ 3779 1.1 riastrad tasklet_hi_schedule(&execlists->tasklet); 3780 1.1 riastrad ENGINE_TRACE(engine, "depth->%d\n", 3781 1.1 riastrad atomic_read(&execlists->tasklet.count)); 3782 1.1 riastrad } 3783 1.1 riastrad 3784 1.1 riastrad static int gen8_emit_bb_start_noarb(struct i915_request *rq, 3785 1.1 riastrad u64 offset, u32 len, 3786 1.1 riastrad const unsigned int flags) 3787 1.1 riastrad { 3788 1.1 riastrad u32 *cs; 3789 1.1 riastrad 3790 1.1 riastrad cs = intel_ring_begin(rq, 4); 3791 1.1 riastrad if (IS_ERR(cs)) 3792 1.1 riastrad return PTR_ERR(cs); 3793 1.1 riastrad 3794 1.1 riastrad /* 3795 1.1 riastrad * WaDisableCtxRestoreArbitration:bdw,chv 3796 1.1 riastrad * 3797 1.1 riastrad * We don't need to perform MI_ARB_ENABLE as often as we do (in 3798 1.1 riastrad * particular all the gen that do not need the w/a at all!), if we 3799 1.1 riastrad * took care to make sure that on every switch into this context 3800 1.1 riastrad * (both ordinary and for preemption) that arbitrartion was enabled 3801 1.1 riastrad * we would be fine. However, for gen8 there is another w/a that 3802 1.1 riastrad * requires us to not preempt inside GPGPU execution, so we keep 3803 1.1 riastrad * arbitration disabled for gen8 batches. Arbitration will be 3804 1.1 riastrad * re-enabled before we close the request 3805 1.1 riastrad * (engine->emit_fini_breadcrumb). 3806 1.1 riastrad */ 3807 1.1 riastrad *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; 3808 1.1 riastrad 3809 1.1 riastrad /* FIXME(BDW+): Address space and security selectors. */ 3810 1.1 riastrad *cs++ = MI_BATCH_BUFFER_START_GEN8 | 3811 1.1 riastrad (flags & I915_DISPATCH_SECURE ? 0 : BIT(8)); 3812 1.1 riastrad *cs++ = lower_32_bits(offset); 3813 1.1 riastrad *cs++ = upper_32_bits(offset); 3814 1.1 riastrad 3815 1.1 riastrad intel_ring_advance(rq, cs); 3816 1.1 riastrad 3817 1.1 riastrad return 0; 3818 1.1 riastrad } 3819 1.1 riastrad 3820 1.1 riastrad static int gen8_emit_bb_start(struct i915_request *rq, 3821 1.1 riastrad u64 offset, u32 len, 3822 1.1 riastrad const unsigned int flags) 3823 1.1 riastrad { 3824 1.1 riastrad u32 *cs; 3825 1.1 riastrad 3826 1.1 riastrad cs = intel_ring_begin(rq, 6); 3827 1.1 riastrad if (IS_ERR(cs)) 3828 1.1 riastrad return PTR_ERR(cs); 3829 1.1 riastrad 3830 1.1 riastrad *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; 3831 1.1 riastrad 3832 1.1 riastrad *cs++ = MI_BATCH_BUFFER_START_GEN8 | 3833 1.1 riastrad (flags & I915_DISPATCH_SECURE ? 0 : BIT(8)); 3834 1.1 riastrad *cs++ = lower_32_bits(offset); 3835 1.1 riastrad *cs++ = upper_32_bits(offset); 3836 1.1 riastrad 3837 1.1 riastrad *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; 3838 1.1 riastrad *cs++ = MI_NOOP; 3839 1.1 riastrad 3840 1.1 riastrad intel_ring_advance(rq, cs); 3841 1.1 riastrad 3842 1.1 riastrad return 0; 3843 1.1 riastrad } 3844 1.1 riastrad 3845 1.1 riastrad static void gen8_logical_ring_enable_irq(struct intel_engine_cs *engine) 3846 1.1 riastrad { 3847 1.1 riastrad ENGINE_WRITE(engine, RING_IMR, 3848 1.1 riastrad ~(engine->irq_enable_mask | engine->irq_keep_mask)); 3849 1.1 riastrad ENGINE_POSTING_READ(engine, RING_IMR); 3850 1.1 riastrad } 3851 1.1 riastrad 3852 1.1 riastrad static void gen8_logical_ring_disable_irq(struct intel_engine_cs *engine) 3853 1.1 riastrad { 3854 1.1 riastrad ENGINE_WRITE(engine, RING_IMR, ~engine->irq_keep_mask); 3855 1.1 riastrad } 3856 1.1 riastrad 3857 1.1 riastrad static int gen8_emit_flush(struct i915_request *request, u32 mode) 3858 1.1 riastrad { 3859 1.1 riastrad u32 cmd, *cs; 3860 1.1 riastrad 3861 1.1 riastrad cs = intel_ring_begin(request, 4); 3862 1.1 riastrad if (IS_ERR(cs)) 3863 1.1 riastrad return PTR_ERR(cs); 3864 1.1 riastrad 3865 1.1 riastrad cmd = MI_FLUSH_DW + 1; 3866 1.1 riastrad 3867 1.1 riastrad /* We always require a command barrier so that subsequent 3868 1.1 riastrad * commands, such as breadcrumb interrupts, are strictly ordered 3869 1.1 riastrad * wrt the contents of the write cache being flushed to memory 3870 1.1 riastrad * (and thus being coherent from the CPU). 3871 1.1 riastrad */ 3872 1.1 riastrad cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; 3873 1.1 riastrad 3874 1.1 riastrad if (mode & EMIT_INVALIDATE) { 3875 1.1 riastrad cmd |= MI_INVALIDATE_TLB; 3876 1.1 riastrad if (request->engine->class == VIDEO_DECODE_CLASS) 3877 1.1 riastrad cmd |= MI_INVALIDATE_BSD; 3878 1.1 riastrad } 3879 1.1 riastrad 3880 1.1 riastrad *cs++ = cmd; 3881 1.1 riastrad *cs++ = LRC_PPHWSP_SCRATCH_ADDR; 3882 1.1 riastrad *cs++ = 0; /* upper addr */ 3883 1.1 riastrad *cs++ = 0; /* value */ 3884 1.1 riastrad intel_ring_advance(request, cs); 3885 1.1 riastrad 3886 1.1 riastrad return 0; 3887 1.1 riastrad } 3888 1.1 riastrad 3889 1.1 riastrad static int gen8_emit_flush_render(struct i915_request *request, 3890 1.1 riastrad u32 mode) 3891 1.1 riastrad { 3892 1.1 riastrad bool vf_flush_wa = false, dc_flush_wa = false; 3893 1.1 riastrad u32 *cs, flags = 0; 3894 1.1 riastrad int len; 3895 1.1 riastrad 3896 1.1 riastrad flags |= PIPE_CONTROL_CS_STALL; 3897 1.1 riastrad 3898 1.1 riastrad if (mode & EMIT_FLUSH) { 3899 1.1 riastrad flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; 3900 1.1 riastrad flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH; 3901 1.1 riastrad flags |= PIPE_CONTROL_DC_FLUSH_ENABLE; 3902 1.1 riastrad flags |= PIPE_CONTROL_FLUSH_ENABLE; 3903 1.1 riastrad } 3904 1.1 riastrad 3905 1.1 riastrad if (mode & EMIT_INVALIDATE) { 3906 1.1 riastrad flags |= PIPE_CONTROL_TLB_INVALIDATE; 3907 1.1 riastrad flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE; 3908 1.1 riastrad flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE; 3909 1.1 riastrad flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE; 3910 1.1 riastrad flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE; 3911 1.1 riastrad flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE; 3912 1.1 riastrad flags |= PIPE_CONTROL_QW_WRITE; 3913 1.1 riastrad flags |= PIPE_CONTROL_STORE_DATA_INDEX; 3914 1.1 riastrad 3915 1.1 riastrad /* 3916 1.1 riastrad * On GEN9: before VF_CACHE_INVALIDATE we need to emit a NULL 3917 1.1 riastrad * pipe control. 3918 1.1 riastrad */ 3919 1.1 riastrad if (IS_GEN(request->i915, 9)) 3920 1.1 riastrad vf_flush_wa = true; 3921 1.1 riastrad 3922 1.1 riastrad /* WaForGAMHang:kbl */ 3923 1.1 riastrad if (IS_KBL_REVID(request->i915, 0, KBL_REVID_B0)) 3924 1.1 riastrad dc_flush_wa = true; 3925 1.1 riastrad } 3926 1.1 riastrad 3927 1.1 riastrad len = 6; 3928 1.1 riastrad 3929 1.1 riastrad if (vf_flush_wa) 3930 1.1 riastrad len += 6; 3931 1.1 riastrad 3932 1.1 riastrad if (dc_flush_wa) 3933 1.1 riastrad len += 12; 3934 1.1 riastrad 3935 1.1 riastrad cs = intel_ring_begin(request, len); 3936 1.1 riastrad if (IS_ERR(cs)) 3937 1.1 riastrad return PTR_ERR(cs); 3938 1.1 riastrad 3939 1.1 riastrad if (vf_flush_wa) 3940 1.1 riastrad cs = gen8_emit_pipe_control(cs, 0, 0); 3941 1.1 riastrad 3942 1.1 riastrad if (dc_flush_wa) 3943 1.1 riastrad cs = gen8_emit_pipe_control(cs, PIPE_CONTROL_DC_FLUSH_ENABLE, 3944 1.1 riastrad 0); 3945 1.1 riastrad 3946 1.1 riastrad cs = gen8_emit_pipe_control(cs, flags, LRC_PPHWSP_SCRATCH_ADDR); 3947 1.1 riastrad 3948 1.1 riastrad if (dc_flush_wa) 3949 1.1 riastrad cs = gen8_emit_pipe_control(cs, PIPE_CONTROL_CS_STALL, 0); 3950 1.1 riastrad 3951 1.1 riastrad intel_ring_advance(request, cs); 3952 1.1 riastrad 3953 1.1 riastrad return 0; 3954 1.1 riastrad } 3955 1.1 riastrad 3956 1.1 riastrad static int gen11_emit_flush_render(struct i915_request *request, 3957 1.1 riastrad u32 mode) 3958 1.1 riastrad { 3959 1.1 riastrad if (mode & EMIT_FLUSH) { 3960 1.1 riastrad u32 *cs; 3961 1.1 riastrad u32 flags = 0; 3962 1.1 riastrad 3963 1.1 riastrad flags |= PIPE_CONTROL_CS_STALL; 3964 1.1 riastrad 3965 1.1 riastrad flags |= PIPE_CONTROL_TILE_CACHE_FLUSH; 3966 1.1 riastrad flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; 3967 1.1 riastrad flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH; 3968 1.1 riastrad flags |= PIPE_CONTROL_DC_FLUSH_ENABLE; 3969 1.1 riastrad flags |= PIPE_CONTROL_FLUSH_ENABLE; 3970 1.1 riastrad flags |= PIPE_CONTROL_QW_WRITE; 3971 1.1 riastrad flags |= PIPE_CONTROL_STORE_DATA_INDEX; 3972 1.1 riastrad 3973 1.1 riastrad cs = intel_ring_begin(request, 6); 3974 1.1 riastrad if (IS_ERR(cs)) 3975 1.1 riastrad return PTR_ERR(cs); 3976 1.1 riastrad 3977 1.1 riastrad cs = gen8_emit_pipe_control(cs, flags, LRC_PPHWSP_SCRATCH_ADDR); 3978 1.1 riastrad intel_ring_advance(request, cs); 3979 1.1 riastrad } 3980 1.1 riastrad 3981 1.1 riastrad if (mode & EMIT_INVALIDATE) { 3982 1.1 riastrad u32 *cs; 3983 1.1 riastrad u32 flags = 0; 3984 1.1 riastrad 3985 1.1 riastrad flags |= PIPE_CONTROL_CS_STALL; 3986 1.1 riastrad 3987 1.1 riastrad flags |= PIPE_CONTROL_COMMAND_CACHE_INVALIDATE; 3988 1.1 riastrad flags |= PIPE_CONTROL_TLB_INVALIDATE; 3989 1.1 riastrad flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE; 3990 1.1 riastrad flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE; 3991 1.1 riastrad flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE; 3992 1.1 riastrad flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE; 3993 1.1 riastrad flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE; 3994 1.1 riastrad flags |= PIPE_CONTROL_QW_WRITE; 3995 1.1 riastrad flags |= PIPE_CONTROL_STORE_DATA_INDEX; 3996 1.1 riastrad 3997 1.1 riastrad cs = intel_ring_begin(request, 6); 3998 1.1 riastrad if (IS_ERR(cs)) 3999 1.1 riastrad return PTR_ERR(cs); 4000 1.1 riastrad 4001 1.1 riastrad cs = gen8_emit_pipe_control(cs, flags, LRC_PPHWSP_SCRATCH_ADDR); 4002 1.1 riastrad intel_ring_advance(request, cs); 4003 1.1 riastrad } 4004 1.1 riastrad 4005 1.1 riastrad return 0; 4006 1.1 riastrad } 4007 1.1 riastrad 4008 1.1 riastrad static u32 preparser_disable(bool state) 4009 1.1 riastrad { 4010 1.1 riastrad return MI_ARB_CHECK | 1 << 8 | state; 4011 1.1 riastrad } 4012 1.1 riastrad 4013 1.1 riastrad static int gen12_emit_flush_render(struct i915_request *request, 4014 1.1 riastrad u32 mode) 4015 1.1 riastrad { 4016 1.1 riastrad if (mode & EMIT_FLUSH) { 4017 1.1 riastrad u32 flags = 0; 4018 1.1 riastrad u32 *cs; 4019 1.1 riastrad 4020 1.1 riastrad flags |= PIPE_CONTROL_TILE_CACHE_FLUSH; 4021 1.1 riastrad flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; 4022 1.1 riastrad flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH; 4023 1.1 riastrad /* Wa_1409600907:tgl */ 4024 1.1 riastrad flags |= PIPE_CONTROL_DEPTH_STALL; 4025 1.1 riastrad flags |= PIPE_CONTROL_DC_FLUSH_ENABLE; 4026 1.1 riastrad flags |= PIPE_CONTROL_FLUSH_ENABLE; 4027 1.1 riastrad flags |= PIPE_CONTROL_HDC_PIPELINE_FLUSH; 4028 1.1 riastrad 4029 1.1 riastrad flags |= PIPE_CONTROL_STORE_DATA_INDEX; 4030 1.1 riastrad flags |= PIPE_CONTROL_QW_WRITE; 4031 1.1 riastrad 4032 1.1 riastrad flags |= PIPE_CONTROL_CS_STALL; 4033 1.1 riastrad 4034 1.1 riastrad cs = intel_ring_begin(request, 6); 4035 1.1 riastrad if (IS_ERR(cs)) 4036 1.1 riastrad return PTR_ERR(cs); 4037 1.1 riastrad 4038 1.1 riastrad cs = gen8_emit_pipe_control(cs, flags, LRC_PPHWSP_SCRATCH_ADDR); 4039 1.1 riastrad intel_ring_advance(request, cs); 4040 1.1 riastrad } 4041 1.1 riastrad 4042 1.1 riastrad if (mode & EMIT_INVALIDATE) { 4043 1.1 riastrad u32 flags = 0; 4044 1.1 riastrad u32 *cs; 4045 1.1 riastrad 4046 1.1 riastrad flags |= PIPE_CONTROL_COMMAND_CACHE_INVALIDATE; 4047 1.1 riastrad flags |= PIPE_CONTROL_TLB_INVALIDATE; 4048 1.1 riastrad flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE; 4049 1.1 riastrad flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE; 4050 1.1 riastrad flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE; 4051 1.1 riastrad flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE; 4052 1.1 riastrad flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE; 4053 1.1 riastrad flags |= PIPE_CONTROL_L3_RO_CACHE_INVALIDATE; 4054 1.1 riastrad 4055 1.1 riastrad flags |= PIPE_CONTROL_STORE_DATA_INDEX; 4056 1.1 riastrad flags |= PIPE_CONTROL_QW_WRITE; 4057 1.1 riastrad 4058 1.1 riastrad flags |= PIPE_CONTROL_CS_STALL; 4059 1.1 riastrad 4060 1.1 riastrad cs = intel_ring_begin(request, 8); 4061 1.1 riastrad if (IS_ERR(cs)) 4062 1.1 riastrad return PTR_ERR(cs); 4063 1.1 riastrad 4064 1.1 riastrad /* 4065 1.1 riastrad * Prevent the pre-parser from skipping past the TLB 4066 1.1 riastrad * invalidate and loading a stale page for the batch 4067 1.1 riastrad * buffer / request payload. 4068 1.1 riastrad */ 4069 1.1 riastrad *cs++ = preparser_disable(true); 4070 1.1 riastrad 4071 1.1 riastrad cs = gen8_emit_pipe_control(cs, flags, LRC_PPHWSP_SCRATCH_ADDR); 4072 1.1 riastrad 4073 1.1 riastrad *cs++ = preparser_disable(false); 4074 1.1 riastrad intel_ring_advance(request, cs); 4075 1.1 riastrad 4076 1.1 riastrad /* 4077 1.1 riastrad * Wa_1604544889:tgl 4078 1.1 riastrad */ 4079 1.1 riastrad if (IS_TGL_REVID(request->i915, TGL_REVID_A0, TGL_REVID_A0)) { 4080 1.1 riastrad flags = 0; 4081 1.1 riastrad flags |= PIPE_CONTROL_CS_STALL; 4082 1.1 riastrad flags |= PIPE_CONTROL_HDC_PIPELINE_FLUSH; 4083 1.1 riastrad 4084 1.1 riastrad flags |= PIPE_CONTROL_STORE_DATA_INDEX; 4085 1.1 riastrad flags |= PIPE_CONTROL_QW_WRITE; 4086 1.1 riastrad 4087 1.1 riastrad cs = intel_ring_begin(request, 6); 4088 1.1 riastrad if (IS_ERR(cs)) 4089 1.1 riastrad return PTR_ERR(cs); 4090 1.1 riastrad 4091 1.1 riastrad cs = gen8_emit_pipe_control(cs, flags, 4092 1.1 riastrad LRC_PPHWSP_SCRATCH_ADDR); 4093 1.1 riastrad intel_ring_advance(request, cs); 4094 1.1 riastrad } 4095 1.1 riastrad } 4096 1.1 riastrad 4097 1.1 riastrad return 0; 4098 1.1 riastrad } 4099 1.1 riastrad 4100 1.1 riastrad /* 4101 1.1 riastrad * Reserve space for 2 NOOPs at the end of each request to be 4102 1.1 riastrad * used as a workaround for not being allowed to do lite 4103 1.1 riastrad * restore with HEAD==TAIL (WaIdleLiteRestore). 4104 1.1 riastrad */ 4105 1.1 riastrad static u32 *gen8_emit_wa_tail(struct i915_request *request, u32 *cs) 4106 1.1 riastrad { 4107 1.1 riastrad /* Ensure there's always at least one preemption point per-request. */ 4108 1.1 riastrad *cs++ = MI_ARB_CHECK; 4109 1.1 riastrad *cs++ = MI_NOOP; 4110 1.1 riastrad request->wa_tail = intel_ring_offset(request, cs); 4111 1.1 riastrad 4112 1.1 riastrad return cs; 4113 1.1 riastrad } 4114 1.1 riastrad 4115 1.1 riastrad static u32 *emit_preempt_busywait(struct i915_request *request, u32 *cs) 4116 1.1 riastrad { 4117 1.1 riastrad *cs++ = MI_SEMAPHORE_WAIT | 4118 1.1 riastrad MI_SEMAPHORE_GLOBAL_GTT | 4119 1.1 riastrad MI_SEMAPHORE_POLL | 4120 1.1 riastrad MI_SEMAPHORE_SAD_EQ_SDD; 4121 1.1 riastrad *cs++ = 0; 4122 1.1 riastrad *cs++ = intel_hws_preempt_address(request->engine); 4123 1.1 riastrad *cs++ = 0; 4124 1.1 riastrad 4125 1.1 riastrad return cs; 4126 1.1 riastrad } 4127 1.1 riastrad 4128 1.1 riastrad static __always_inline u32* 4129 1.1 riastrad gen8_emit_fini_breadcrumb_footer(struct i915_request *request, 4130 1.1 riastrad u32 *cs) 4131 1.1 riastrad { 4132 1.1 riastrad *cs++ = MI_USER_INTERRUPT; 4133 1.1 riastrad 4134 1.1 riastrad *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; 4135 1.1 riastrad if (intel_engine_has_semaphores(request->engine)) 4136 1.1 riastrad cs = emit_preempt_busywait(request, cs); 4137 1.1 riastrad 4138 1.1 riastrad request->tail = intel_ring_offset(request, cs); 4139 1.1 riastrad assert_ring_tail_valid(request->ring, request->tail); 4140 1.1 riastrad 4141 1.1 riastrad return gen8_emit_wa_tail(request, cs); 4142 1.1 riastrad } 4143 1.1 riastrad 4144 1.1 riastrad static u32 *gen8_emit_fini_breadcrumb(struct i915_request *request, u32 *cs) 4145 1.1 riastrad { 4146 1.1 riastrad cs = gen8_emit_ggtt_write(cs, 4147 1.1 riastrad request->fence.seqno, 4148 1.1 riastrad i915_request_active_timeline(request)->hwsp_offset, 4149 1.1 riastrad 0); 4150 1.1 riastrad 4151 1.1 riastrad return gen8_emit_fini_breadcrumb_footer(request, cs); 4152 1.1 riastrad } 4153 1.1 riastrad 4154 1.1 riastrad static u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs) 4155 1.1 riastrad { 4156 1.1 riastrad cs = gen8_emit_pipe_control(cs, 4157 1.1 riastrad PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | 4158 1.1 riastrad PIPE_CONTROL_DEPTH_CACHE_FLUSH | 4159 1.1 riastrad PIPE_CONTROL_DC_FLUSH_ENABLE, 4160 1.1 riastrad 0); 4161 1.1 riastrad 4162 1.1 riastrad /* XXX flush+write+CS_STALL all in one upsets gem_concurrent_blt:kbl */ 4163 1.1 riastrad cs = gen8_emit_ggtt_write_rcs(cs, 4164 1.1 riastrad request->fence.seqno, 4165 1.1 riastrad i915_request_active_timeline(request)->hwsp_offset, 4166 1.1 riastrad PIPE_CONTROL_FLUSH_ENABLE | 4167 1.1 riastrad PIPE_CONTROL_CS_STALL); 4168 1.1 riastrad 4169 1.1 riastrad return gen8_emit_fini_breadcrumb_footer(request, cs); 4170 1.1 riastrad } 4171 1.1 riastrad 4172 1.1 riastrad static u32 * 4173 1.1 riastrad gen11_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs) 4174 1.1 riastrad { 4175 1.1 riastrad cs = gen8_emit_ggtt_write_rcs(cs, 4176 1.1 riastrad request->fence.seqno, 4177 1.1 riastrad i915_request_active_timeline(request)->hwsp_offset, 4178 1.1 riastrad PIPE_CONTROL_CS_STALL | 4179 1.1 riastrad PIPE_CONTROL_TILE_CACHE_FLUSH | 4180 1.1 riastrad PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | 4181 1.1 riastrad PIPE_CONTROL_DEPTH_CACHE_FLUSH | 4182 1.1 riastrad PIPE_CONTROL_DC_FLUSH_ENABLE | 4183 1.1 riastrad PIPE_CONTROL_FLUSH_ENABLE); 4184 1.1 riastrad 4185 1.1 riastrad return gen8_emit_fini_breadcrumb_footer(request, cs); 4186 1.1 riastrad } 4187 1.1 riastrad 4188 1.1 riastrad /* 4189 1.1 riastrad * Note that the CS instruction pre-parser will not stall on the breadcrumb 4190 1.1 riastrad * flush and will continue pre-fetching the instructions after it before the 4191 1.1 riastrad * memory sync is completed. On pre-gen12 HW, the pre-parser will stop at 4192 1.1 riastrad * BB_START/END instructions, so, even though we might pre-fetch the pre-amble 4193 1.1 riastrad * of the next request before the memory has been flushed, we're guaranteed that 4194 1.1 riastrad * we won't access the batch itself too early. 4195 1.1 riastrad * However, on gen12+ the parser can pre-fetch across the BB_START/END commands, 4196 1.1 riastrad * so, if the current request is modifying an instruction in the next request on 4197 1.1 riastrad * the same intel_context, we might pre-fetch and then execute the pre-update 4198 1.1 riastrad * instruction. To avoid this, the users of self-modifying code should either 4199 1.1 riastrad * disable the parser around the code emitting the memory writes, via a new flag 4200 1.1 riastrad * added to MI_ARB_CHECK, or emit the writes from a different intel_context. For 4201 1.1 riastrad * the in-kernel use-cases we've opted to use a separate context, see 4202 1.1 riastrad * reloc_gpu() as an example. 4203 1.1 riastrad * All the above applies only to the instructions themselves. Non-inline data 4204 1.1 riastrad * used by the instructions is not pre-fetched. 4205 1.1 riastrad */ 4206 1.1 riastrad 4207 1.1 riastrad static u32 *gen12_emit_preempt_busywait(struct i915_request *request, u32 *cs) 4208 1.1 riastrad { 4209 1.1 riastrad *cs++ = MI_SEMAPHORE_WAIT_TOKEN | 4210 1.1 riastrad MI_SEMAPHORE_GLOBAL_GTT | 4211 1.1 riastrad MI_SEMAPHORE_POLL | 4212 1.1 riastrad MI_SEMAPHORE_SAD_EQ_SDD; 4213 1.1 riastrad *cs++ = 0; 4214 1.1 riastrad *cs++ = intel_hws_preempt_address(request->engine); 4215 1.1 riastrad *cs++ = 0; 4216 1.1 riastrad *cs++ = 0; 4217 1.1 riastrad *cs++ = MI_NOOP; 4218 1.1 riastrad 4219 1.1 riastrad return cs; 4220 1.1 riastrad } 4221 1.1 riastrad 4222 1.1 riastrad static __always_inline u32* 4223 1.1 riastrad gen12_emit_fini_breadcrumb_footer(struct i915_request *request, u32 *cs) 4224 1.1 riastrad { 4225 1.1 riastrad *cs++ = MI_USER_INTERRUPT; 4226 1.1 riastrad 4227 1.1 riastrad *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; 4228 1.1 riastrad if (intel_engine_has_semaphores(request->engine)) 4229 1.1 riastrad cs = gen12_emit_preempt_busywait(request, cs); 4230 1.1 riastrad 4231 1.1 riastrad request->tail = intel_ring_offset(request, cs); 4232 1.1 riastrad assert_ring_tail_valid(request->ring, request->tail); 4233 1.1 riastrad 4234 1.1 riastrad return gen8_emit_wa_tail(request, cs); 4235 1.1 riastrad } 4236 1.1 riastrad 4237 1.1 riastrad static u32 *gen12_emit_fini_breadcrumb(struct i915_request *request, u32 *cs) 4238 1.1 riastrad { 4239 1.1 riastrad cs = gen8_emit_ggtt_write(cs, 4240 1.1 riastrad request->fence.seqno, 4241 1.1 riastrad i915_request_active_timeline(request)->hwsp_offset, 4242 1.1 riastrad 0); 4243 1.1 riastrad 4244 1.1 riastrad return gen12_emit_fini_breadcrumb_footer(request, cs); 4245 1.1 riastrad } 4246 1.1 riastrad 4247 1.1 riastrad static u32 * 4248 1.1 riastrad gen12_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs) 4249 1.1 riastrad { 4250 1.1 riastrad cs = gen8_emit_ggtt_write_rcs(cs, 4251 1.1 riastrad request->fence.seqno, 4252 1.1 riastrad i915_request_active_timeline(request)->hwsp_offset, 4253 1.1 riastrad PIPE_CONTROL_CS_STALL | 4254 1.1 riastrad PIPE_CONTROL_TILE_CACHE_FLUSH | 4255 1.1 riastrad PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | 4256 1.1 riastrad PIPE_CONTROL_DEPTH_CACHE_FLUSH | 4257 1.1 riastrad /* Wa_1409600907:tgl */ 4258 1.1 riastrad PIPE_CONTROL_DEPTH_STALL | 4259 1.1 riastrad PIPE_CONTROL_DC_FLUSH_ENABLE | 4260 1.1 riastrad PIPE_CONTROL_FLUSH_ENABLE | 4261 1.1 riastrad PIPE_CONTROL_HDC_PIPELINE_FLUSH); 4262 1.1 riastrad 4263 1.1 riastrad return gen12_emit_fini_breadcrumb_footer(request, cs); 4264 1.1 riastrad } 4265 1.1 riastrad 4266 1.1 riastrad static void execlists_park(struct intel_engine_cs *engine) 4267 1.1 riastrad { 4268 1.1 riastrad cancel_timer(&engine->execlists.timer); 4269 1.1 riastrad cancel_timer(&engine->execlists.preempt); 4270 1.1 riastrad } 4271 1.1 riastrad 4272 1.1 riastrad void intel_execlists_set_default_submission(struct intel_engine_cs *engine) 4273 1.1 riastrad { 4274 1.1 riastrad engine->submit_request = execlists_submit_request; 4275 1.1 riastrad engine->schedule = i915_schedule; 4276 1.1 riastrad engine->execlists.tasklet.func = execlists_submission_tasklet; 4277 1.1 riastrad 4278 1.1 riastrad engine->reset.prepare = execlists_reset_prepare; 4279 1.1 riastrad engine->reset.rewind = execlists_reset_rewind; 4280 1.1 riastrad engine->reset.cancel = execlists_reset_cancel; 4281 1.1 riastrad engine->reset.finish = execlists_reset_finish; 4282 1.1 riastrad 4283 1.1 riastrad engine->park = execlists_park; 4284 1.1 riastrad engine->unpark = NULL; 4285 1.1 riastrad 4286 1.1 riastrad engine->flags |= I915_ENGINE_SUPPORTS_STATS; 4287 1.1 riastrad if (!intel_vgpu_active(engine->i915)) { 4288 1.1 riastrad engine->flags |= I915_ENGINE_HAS_SEMAPHORES; 4289 1.1 riastrad if (HAS_LOGICAL_RING_PREEMPTION(engine->i915)) 4290 1.1 riastrad engine->flags |= I915_ENGINE_HAS_PREEMPTION; 4291 1.1 riastrad } 4292 1.1 riastrad 4293 1.1 riastrad if (INTEL_GEN(engine->i915) >= 12) 4294 1.1 riastrad engine->flags |= I915_ENGINE_HAS_RELATIVE_MMIO; 4295 1.1 riastrad 4296 1.1 riastrad if (intel_engine_has_preemption(engine)) 4297 1.1 riastrad engine->emit_bb_start = gen8_emit_bb_start; 4298 1.1 riastrad else 4299 1.1 riastrad engine->emit_bb_start = gen8_emit_bb_start_noarb; 4300 1.1 riastrad } 4301 1.1 riastrad 4302 1.1 riastrad static void execlists_shutdown(struct intel_engine_cs *engine) 4303 1.1 riastrad { 4304 1.1 riastrad /* Synchronise with residual timers and any softirq they raise */ 4305 1.1 riastrad del_timer_sync(&engine->execlists.timer); 4306 1.1 riastrad del_timer_sync(&engine->execlists.preempt); 4307 1.1 riastrad tasklet_kill(&engine->execlists.tasklet); 4308 1.1 riastrad } 4309 1.1 riastrad 4310 1.1 riastrad static void execlists_release(struct intel_engine_cs *engine) 4311 1.1 riastrad { 4312 1.1 riastrad execlists_shutdown(engine); 4313 1.1 riastrad 4314 1.1 riastrad intel_engine_cleanup_common(engine); 4315 1.1 riastrad lrc_destroy_wa_ctx(engine); 4316 1.1 riastrad } 4317 1.1 riastrad 4318 1.1 riastrad static void 4319 1.1 riastrad logical_ring_default_vfuncs(struct intel_engine_cs *engine) 4320 1.1 riastrad { 4321 1.1 riastrad /* Default vfuncs which can be overriden by each engine. */ 4322 1.1 riastrad 4323 1.1 riastrad engine->resume = execlists_resume; 4324 1.1 riastrad 4325 1.1 riastrad engine->cops = &execlists_context_ops; 4326 1.1 riastrad engine->request_alloc = execlists_request_alloc; 4327 1.1 riastrad 4328 1.1 riastrad engine->emit_flush = gen8_emit_flush; 4329 1.1 riastrad engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb; 4330 1.1 riastrad engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb; 4331 1.1 riastrad if (INTEL_GEN(engine->i915) >= 12) 4332 1.1 riastrad engine->emit_fini_breadcrumb = gen12_emit_fini_breadcrumb; 4333 1.1 riastrad 4334 1.1 riastrad engine->set_default_submission = intel_execlists_set_default_submission; 4335 1.1 riastrad 4336 1.1 riastrad if (INTEL_GEN(engine->i915) < 11) { 4337 1.1 riastrad engine->irq_enable = gen8_logical_ring_enable_irq; 4338 1.1 riastrad engine->irq_disable = gen8_logical_ring_disable_irq; 4339 1.1 riastrad } else { 4340 1.1 riastrad /* 4341 1.1 riastrad * TODO: On Gen11 interrupt masks need to be clear 4342 1.1 riastrad * to allow C6 entry. Keep interrupts enabled at 4343 1.1 riastrad * and take the hit of generating extra interrupts 4344 1.1 riastrad * until a more refined solution exists. 4345 1.1 riastrad */ 4346 1.1 riastrad } 4347 1.1 riastrad } 4348 1.1 riastrad 4349 1.1 riastrad static inline void 4350 1.1 riastrad logical_ring_default_irqs(struct intel_engine_cs *engine) 4351 1.1 riastrad { 4352 1.1 riastrad unsigned int shift = 0; 4353 1.1 riastrad 4354 1.1 riastrad if (INTEL_GEN(engine->i915) < 11) { 4355 1.1 riastrad const u8 irq_shifts[] = { 4356 1.1 riastrad [RCS0] = GEN8_RCS_IRQ_SHIFT, 4357 1.1 riastrad [BCS0] = GEN8_BCS_IRQ_SHIFT, 4358 1.1 riastrad [VCS0] = GEN8_VCS0_IRQ_SHIFT, 4359 1.1 riastrad [VCS1] = GEN8_VCS1_IRQ_SHIFT, 4360 1.1 riastrad [VECS0] = GEN8_VECS_IRQ_SHIFT, 4361 1.1 riastrad }; 4362 1.1 riastrad 4363 1.1 riastrad shift = irq_shifts[engine->id]; 4364 1.1 riastrad } 4365 1.1 riastrad 4366 1.1 riastrad engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT << shift; 4367 1.1 riastrad engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift; 4368 1.1 riastrad } 4369 1.1 riastrad 4370 1.1 riastrad static void rcs_submission_override(struct intel_engine_cs *engine) 4371 1.1 riastrad { 4372 1.1 riastrad switch (INTEL_GEN(engine->i915)) { 4373 1.1 riastrad case 12: 4374 1.1 riastrad engine->emit_flush = gen12_emit_flush_render; 4375 1.1 riastrad engine->emit_fini_breadcrumb = gen12_emit_fini_breadcrumb_rcs; 4376 1.1 riastrad break; 4377 1.1 riastrad case 11: 4378 1.1 riastrad engine->emit_flush = gen11_emit_flush_render; 4379 1.1 riastrad engine->emit_fini_breadcrumb = gen11_emit_fini_breadcrumb_rcs; 4380 1.1 riastrad break; 4381 1.1 riastrad default: 4382 1.1 riastrad engine->emit_flush = gen8_emit_flush_render; 4383 1.1 riastrad engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_rcs; 4384 1.1 riastrad break; 4385 1.1 riastrad } 4386 1.1 riastrad } 4387 1.1 riastrad 4388 1.1 riastrad int intel_execlists_submission_setup(struct intel_engine_cs *engine) 4389 1.1 riastrad { 4390 1.1 riastrad struct intel_engine_execlists * const execlists = &engine->execlists; 4391 1.1 riastrad struct drm_i915_private *i915 = engine->i915; 4392 1.1 riastrad struct intel_uncore *uncore = engine->uncore; 4393 1.1 riastrad u32 base = engine->mmio_base; 4394 1.1 riastrad 4395 1.3 riastrad i915_sched_init(&engine->execlists); 4396 1.3 riastrad 4397 1.1 riastrad tasklet_init(&engine->execlists.tasklet, 4398 1.1 riastrad execlists_submission_tasklet, (unsigned long)engine); 4399 1.1 riastrad timer_setup(&engine->execlists.timer, execlists_timeslice, 0); 4400 1.1 riastrad timer_setup(&engine->execlists.preempt, execlists_preempt, 0); 4401 1.1 riastrad 4402 1.1 riastrad logical_ring_default_vfuncs(engine); 4403 1.1 riastrad logical_ring_default_irqs(engine); 4404 1.1 riastrad 4405 1.1 riastrad if (engine->class == RENDER_CLASS) 4406 1.1 riastrad rcs_submission_override(engine); 4407 1.1 riastrad 4408 1.1 riastrad if (intel_init_workaround_bb(engine)) 4409 1.1 riastrad /* 4410 1.1 riastrad * We continue even if we fail to initialize WA batch 4411 1.1 riastrad * because we only expect rare glitches but nothing 4412 1.1 riastrad * critical to prevent us from using GPU 4413 1.1 riastrad */ 4414 1.1 riastrad DRM_ERROR("WA batch buffer initialization failed\n"); 4415 1.1 riastrad 4416 1.1 riastrad if (HAS_LOGICAL_RING_ELSQ(i915)) { 4417 1.4 riastrad #ifdef __NetBSD__ 4418 1.4 riastrad execlists->submit_reg = i915_mmio_reg_offset(RING_EXECLIST_SQ_CONTENTS(base)); 4419 1.4 riastrad execlists->ctrl_reg = i915_mmio_reg_offset(RING_EXECLIST_CONTROL(base)); 4420 1.4 riastrad execlists->bsh = uncore->regs_bsh; 4421 1.4 riastrad execlists->bst = uncore->regs_bst; 4422 1.4 riastrad #else 4423 1.1 riastrad execlists->submit_reg = uncore->regs + 4424 1.1 riastrad i915_mmio_reg_offset(RING_EXECLIST_SQ_CONTENTS(base)); 4425 1.1 riastrad execlists->ctrl_reg = uncore->regs + 4426 1.1 riastrad i915_mmio_reg_offset(RING_EXECLIST_CONTROL(base)); 4427 1.4 riastrad #endif 4428 1.1 riastrad } else { 4429 1.4 riastrad #ifdef __NetBSD__ 4430 1.4 riastrad execlists->submit_reg = i915_mmio_reg_offset(RING_ELSP(base)); 4431 1.4 riastrad execlists->bsh = uncore->regs_bsh; 4432 1.4 riastrad execlists->bst = uncore->regs_bst; 4433 1.4 riastrad #else 4434 1.1 riastrad execlists->submit_reg = uncore->regs + 4435 1.1 riastrad i915_mmio_reg_offset(RING_ELSP(base)); 4436 1.4 riastrad #endif 4437 1.1 riastrad } 4438 1.1 riastrad 4439 1.1 riastrad execlists->csb_status = 4440 1.1 riastrad &engine->status_page.addr[I915_HWS_CSB_BUF0_INDEX]; 4441 1.1 riastrad 4442 1.1 riastrad execlists->csb_write = 4443 1.1 riastrad &engine->status_page.addr[intel_hws_csb_write_index(i915)]; 4444 1.1 riastrad 4445 1.1 riastrad if (INTEL_GEN(i915) < 11) 4446 1.1 riastrad execlists->csb_size = GEN8_CSB_ENTRIES; 4447 1.1 riastrad else 4448 1.1 riastrad execlists->csb_size = GEN11_CSB_ENTRIES; 4449 1.1 riastrad 4450 1.1 riastrad reset_csb_pointers(engine); 4451 1.1 riastrad 4452 1.1 riastrad /* Finally, take ownership and responsibility for cleanup! */ 4453 1.1 riastrad engine->release = execlists_release; 4454 1.1 riastrad 4455 1.1 riastrad return 0; 4456 1.1 riastrad } 4457 1.1 riastrad 4458 1.1 riastrad static u32 intel_lr_indirect_ctx_offset(const struct intel_engine_cs *engine) 4459 1.1 riastrad { 4460 1.1 riastrad u32 indirect_ctx_offset; 4461 1.1 riastrad 4462 1.1 riastrad switch (INTEL_GEN(engine->i915)) { 4463 1.1 riastrad default: 4464 1.1 riastrad MISSING_CASE(INTEL_GEN(engine->i915)); 4465 1.1 riastrad /* fall through */ 4466 1.1 riastrad case 12: 4467 1.1 riastrad indirect_ctx_offset = 4468 1.1 riastrad GEN12_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT; 4469 1.1 riastrad break; 4470 1.1 riastrad case 11: 4471 1.1 riastrad indirect_ctx_offset = 4472 1.1 riastrad GEN11_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT; 4473 1.1 riastrad break; 4474 1.1 riastrad case 10: 4475 1.1 riastrad indirect_ctx_offset = 4476 1.1 riastrad GEN10_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT; 4477 1.1 riastrad break; 4478 1.1 riastrad case 9: 4479 1.1 riastrad indirect_ctx_offset = 4480 1.1 riastrad GEN9_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT; 4481 1.1 riastrad break; 4482 1.1 riastrad case 8: 4483 1.1 riastrad indirect_ctx_offset = 4484 1.1 riastrad GEN8_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT; 4485 1.1 riastrad break; 4486 1.1 riastrad } 4487 1.1 riastrad 4488 1.1 riastrad return indirect_ctx_offset; 4489 1.1 riastrad } 4490 1.1 riastrad 4491 1.1 riastrad 4492 1.1 riastrad static void init_common_reg_state(u32 * const regs, 4493 1.1 riastrad const struct intel_engine_cs *engine, 4494 1.1 riastrad const struct intel_ring *ring, 4495 1.1 riastrad bool inhibit) 4496 1.1 riastrad { 4497 1.1 riastrad u32 ctl; 4498 1.1 riastrad 4499 1.1 riastrad ctl = _MASKED_BIT_ENABLE(CTX_CTRL_INHIBIT_SYN_CTX_SWITCH); 4500 1.1 riastrad ctl |= _MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT); 4501 1.1 riastrad if (inhibit) 4502 1.1 riastrad ctl |= CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT; 4503 1.1 riastrad if (INTEL_GEN(engine->i915) < 11) 4504 1.1 riastrad ctl |= _MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT | 4505 1.1 riastrad CTX_CTRL_RS_CTX_ENABLE); 4506 1.1 riastrad regs[CTX_CONTEXT_CONTROL] = ctl; 4507 1.1 riastrad 4508 1.1 riastrad regs[CTX_RING_CTL] = RING_CTL_SIZE(ring->size) | RING_VALID; 4509 1.1 riastrad } 4510 1.1 riastrad 4511 1.1 riastrad static void init_wa_bb_reg_state(u32 * const regs, 4512 1.1 riastrad const struct intel_engine_cs *engine, 4513 1.1 riastrad u32 pos_bb_per_ctx) 4514 1.1 riastrad { 4515 1.1 riastrad const struct i915_ctx_workarounds * const wa_ctx = &engine->wa_ctx; 4516 1.1 riastrad 4517 1.1 riastrad if (wa_ctx->per_ctx.size) { 4518 1.1 riastrad const u32 ggtt_offset = i915_ggtt_offset(wa_ctx->vma); 4519 1.1 riastrad 4520 1.1 riastrad regs[pos_bb_per_ctx] = 4521 1.1 riastrad (ggtt_offset + wa_ctx->per_ctx.offset) | 0x01; 4522 1.1 riastrad } 4523 1.1 riastrad 4524 1.1 riastrad if (wa_ctx->indirect_ctx.size) { 4525 1.1 riastrad const u32 ggtt_offset = i915_ggtt_offset(wa_ctx->vma); 4526 1.1 riastrad 4527 1.1 riastrad regs[pos_bb_per_ctx + 2] = 4528 1.1 riastrad (ggtt_offset + wa_ctx->indirect_ctx.offset) | 4529 1.1 riastrad (wa_ctx->indirect_ctx.size / CACHELINE_BYTES); 4530 1.1 riastrad 4531 1.1 riastrad regs[pos_bb_per_ctx + 4] = 4532 1.1 riastrad intel_lr_indirect_ctx_offset(engine) << 6; 4533 1.1 riastrad } 4534 1.1 riastrad } 4535 1.1 riastrad 4536 1.1 riastrad static void init_ppgtt_reg_state(u32 *regs, const struct i915_ppgtt *ppgtt) 4537 1.1 riastrad { 4538 1.1 riastrad if (i915_vm_is_4lvl(&ppgtt->vm)) { 4539 1.1 riastrad /* 64b PPGTT (48bit canonical) 4540 1.1 riastrad * PDP0_DESCRIPTOR contains the base address to PML4 and 4541 1.1 riastrad * other PDP Descriptors are ignored. 4542 1.1 riastrad */ 4543 1.1 riastrad ASSIGN_CTX_PML4(ppgtt, regs); 4544 1.1 riastrad } else { 4545 1.1 riastrad ASSIGN_CTX_PDP(ppgtt, regs, 3); 4546 1.1 riastrad ASSIGN_CTX_PDP(ppgtt, regs, 2); 4547 1.1 riastrad ASSIGN_CTX_PDP(ppgtt, regs, 1); 4548 1.1 riastrad ASSIGN_CTX_PDP(ppgtt, regs, 0); 4549 1.1 riastrad } 4550 1.1 riastrad } 4551 1.1 riastrad 4552 1.1 riastrad static struct i915_ppgtt *vm_alias(struct i915_address_space *vm) 4553 1.1 riastrad { 4554 1.1 riastrad if (i915_is_ggtt(vm)) 4555 1.1 riastrad return i915_vm_to_ggtt(vm)->alias; 4556 1.1 riastrad else 4557 1.1 riastrad return i915_vm_to_ppgtt(vm); 4558 1.1 riastrad } 4559 1.1 riastrad 4560 1.1 riastrad static void execlists_init_reg_state(u32 *regs, 4561 1.1 riastrad const struct intel_context *ce, 4562 1.1 riastrad const struct intel_engine_cs *engine, 4563 1.1 riastrad const struct intel_ring *ring, 4564 1.1 riastrad bool inhibit) 4565 1.1 riastrad { 4566 1.1 riastrad /* 4567 1.1 riastrad * A context is actually a big batch buffer with several 4568 1.1 riastrad * MI_LOAD_REGISTER_IMM commands followed by (reg, value) pairs. The 4569 1.1 riastrad * values we are setting here are only for the first context restore: 4570 1.1 riastrad * on a subsequent save, the GPU will recreate this batchbuffer with new 4571 1.1 riastrad * values (including all the missing MI_LOAD_REGISTER_IMM commands that 4572 1.1 riastrad * we are not initializing here). 4573 1.1 riastrad * 4574 1.1 riastrad * Must keep consistent with virtual_update_register_offsets(). 4575 1.1 riastrad */ 4576 1.1 riastrad set_offsets(regs, reg_offsets(engine), engine, inhibit); 4577 1.1 riastrad 4578 1.1 riastrad init_common_reg_state(regs, engine, ring, inhibit); 4579 1.1 riastrad init_ppgtt_reg_state(regs, vm_alias(ce->vm)); 4580 1.1 riastrad 4581 1.1 riastrad init_wa_bb_reg_state(regs, engine, 4582 1.1 riastrad INTEL_GEN(engine->i915) >= 12 ? 4583 1.1 riastrad GEN12_CTX_BB_PER_CTX_PTR : 4584 1.1 riastrad CTX_BB_PER_CTX_PTR); 4585 1.1 riastrad 4586 1.1 riastrad __reset_stop_ring(regs, engine); 4587 1.1 riastrad } 4588 1.1 riastrad 4589 1.1 riastrad static int 4590 1.1 riastrad populate_lr_context(struct intel_context *ce, 4591 1.1 riastrad struct drm_i915_gem_object *ctx_obj, 4592 1.1 riastrad struct intel_engine_cs *engine, 4593 1.1 riastrad struct intel_ring *ring) 4594 1.1 riastrad { 4595 1.1 riastrad bool inhibit = true; 4596 1.1 riastrad void *vaddr; 4597 1.1 riastrad int ret; 4598 1.1 riastrad 4599 1.1 riastrad vaddr = i915_gem_object_pin_map(ctx_obj, I915_MAP_WB); 4600 1.1 riastrad if (IS_ERR(vaddr)) { 4601 1.1 riastrad ret = PTR_ERR(vaddr); 4602 1.1 riastrad DRM_DEBUG_DRIVER("Could not map object pages! (%d)\n", ret); 4603 1.1 riastrad return ret; 4604 1.1 riastrad } 4605 1.1 riastrad 4606 1.1 riastrad set_redzone(vaddr, engine); 4607 1.1 riastrad 4608 1.1 riastrad if (engine->default_state) { 4609 1.1 riastrad void *defaults; 4610 1.1 riastrad 4611 1.1 riastrad defaults = i915_gem_object_pin_map(engine->default_state, 4612 1.1 riastrad I915_MAP_WB); 4613 1.1 riastrad if (IS_ERR(defaults)) { 4614 1.1 riastrad ret = PTR_ERR(defaults); 4615 1.1 riastrad goto err_unpin_ctx; 4616 1.1 riastrad } 4617 1.1 riastrad 4618 1.1 riastrad memcpy(vaddr, defaults, engine->context_size); 4619 1.1 riastrad i915_gem_object_unpin_map(engine->default_state); 4620 1.1 riastrad __set_bit(CONTEXT_VALID_BIT, &ce->flags); 4621 1.1 riastrad inhibit = false; 4622 1.1 riastrad } 4623 1.1 riastrad 4624 1.1 riastrad /* The second page of the context object contains some fields which must 4625 1.1 riastrad * be set up prior to the first execution. */ 4626 1.1 riastrad execlists_init_reg_state(vaddr + LRC_STATE_PN * PAGE_SIZE, 4627 1.1 riastrad ce, engine, ring, inhibit); 4628 1.1 riastrad 4629 1.1 riastrad ret = 0; 4630 1.1 riastrad err_unpin_ctx: 4631 1.1 riastrad __i915_gem_object_flush_map(ctx_obj, 0, engine->context_size); 4632 1.1 riastrad i915_gem_object_unpin_map(ctx_obj); 4633 1.1 riastrad return ret; 4634 1.1 riastrad } 4635 1.1 riastrad 4636 1.1 riastrad static int __execlists_context_alloc(struct intel_context *ce, 4637 1.1 riastrad struct intel_engine_cs *engine) 4638 1.1 riastrad { 4639 1.1 riastrad struct drm_i915_gem_object *ctx_obj; 4640 1.1 riastrad struct intel_ring *ring; 4641 1.1 riastrad struct i915_vma *vma; 4642 1.1 riastrad u32 context_size; 4643 1.1 riastrad int ret; 4644 1.1 riastrad 4645 1.1 riastrad GEM_BUG_ON(ce->state); 4646 1.1 riastrad context_size = round_up(engine->context_size, I915_GTT_PAGE_SIZE); 4647 1.1 riastrad 4648 1.1 riastrad if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) 4649 1.1 riastrad context_size += I915_GTT_PAGE_SIZE; /* for redzone */ 4650 1.1 riastrad 4651 1.1 riastrad ctx_obj = i915_gem_object_create_shmem(engine->i915, context_size); 4652 1.1 riastrad if (IS_ERR(ctx_obj)) 4653 1.1 riastrad return PTR_ERR(ctx_obj); 4654 1.1 riastrad 4655 1.1 riastrad vma = i915_vma_instance(ctx_obj, &engine->gt->ggtt->vm, NULL); 4656 1.1 riastrad if (IS_ERR(vma)) { 4657 1.1 riastrad ret = PTR_ERR(vma); 4658 1.1 riastrad goto error_deref_obj; 4659 1.1 riastrad } 4660 1.1 riastrad 4661 1.1 riastrad if (!ce->timeline) { 4662 1.1 riastrad struct intel_timeline *tl; 4663 1.1 riastrad 4664 1.1 riastrad tl = intel_timeline_create(engine->gt, NULL); 4665 1.1 riastrad if (IS_ERR(tl)) { 4666 1.1 riastrad ret = PTR_ERR(tl); 4667 1.1 riastrad goto error_deref_obj; 4668 1.1 riastrad } 4669 1.1 riastrad 4670 1.1 riastrad ce->timeline = tl; 4671 1.1 riastrad } 4672 1.1 riastrad 4673 1.1 riastrad ring = intel_engine_create_ring(engine, (unsigned long)ce->ring); 4674 1.1 riastrad if (IS_ERR(ring)) { 4675 1.1 riastrad ret = PTR_ERR(ring); 4676 1.1 riastrad goto error_deref_obj; 4677 1.1 riastrad } 4678 1.1 riastrad 4679 1.1 riastrad ret = populate_lr_context(ce, ctx_obj, engine, ring); 4680 1.1 riastrad if (ret) { 4681 1.1 riastrad DRM_DEBUG_DRIVER("Failed to populate LRC: %d\n", ret); 4682 1.1 riastrad goto error_ring_free; 4683 1.1 riastrad } 4684 1.1 riastrad 4685 1.1 riastrad ce->ring = ring; 4686 1.1 riastrad ce->state = vma; 4687 1.1 riastrad 4688 1.1 riastrad return 0; 4689 1.1 riastrad 4690 1.1 riastrad error_ring_free: 4691 1.1 riastrad intel_ring_put(ring); 4692 1.1 riastrad error_deref_obj: 4693 1.1 riastrad i915_gem_object_put(ctx_obj); 4694 1.1 riastrad return ret; 4695 1.1 riastrad } 4696 1.1 riastrad 4697 1.1 riastrad static struct list_head *virtual_queue(struct virtual_engine *ve) 4698 1.1 riastrad { 4699 1.1 riastrad return &ve->base.execlists.default_priolist.requests[0]; 4700 1.1 riastrad } 4701 1.1 riastrad 4702 1.1 riastrad static void virtual_context_destroy(struct kref *kref) 4703 1.1 riastrad { 4704 1.1 riastrad struct virtual_engine *ve = 4705 1.1 riastrad container_of(kref, typeof(*ve), context.ref); 4706 1.1 riastrad unsigned int n; 4707 1.1 riastrad 4708 1.1 riastrad GEM_BUG_ON(!list_empty(virtual_queue(ve))); 4709 1.1 riastrad GEM_BUG_ON(ve->request); 4710 1.1 riastrad GEM_BUG_ON(ve->context.inflight); 4711 1.1 riastrad 4712 1.1 riastrad for (n = 0; n < ve->num_siblings; n++) { 4713 1.1 riastrad struct intel_engine_cs *sibling = ve->siblings[n]; 4714 1.1 riastrad struct rb_node *node = &ve->nodes[sibling->id].rb; 4715 1.1 riastrad unsigned long flags; 4716 1.1 riastrad 4717 1.7 riastrad if (!ve->nodes[sibling->id].inserted) 4718 1.1 riastrad continue; 4719 1.1 riastrad 4720 1.1 riastrad spin_lock_irqsave(&sibling->active.lock, flags); 4721 1.1 riastrad 4722 1.1 riastrad /* Detachment is lazily performed in the execlists tasklet */ 4723 1.7 riastrad if (ve->nodes[sibling->id].inserted) { 4724 1.1 riastrad rb_erase_cached(node, &sibling->execlists.virtual); 4725 1.7 riastrad ve->nodes[sibling->id].inserted = false; 4726 1.7 riastrad } 4727 1.1 riastrad 4728 1.1 riastrad spin_unlock_irqrestore(&sibling->active.lock, flags); 4729 1.1 riastrad } 4730 1.1 riastrad GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet)); 4731 1.1 riastrad 4732 1.1 riastrad if (ve->context.state) 4733 1.1 riastrad __execlists_context_fini(&ve->context); 4734 1.1 riastrad intel_context_fini(&ve->context); 4735 1.1 riastrad 4736 1.8 riastrad intel_engine_fini_breadcrumbs(&ve->base); 4737 1.8 riastrad spin_lock_destroy(&ve->base.active.lock); 4738 1.8 riastrad 4739 1.1 riastrad kfree(ve->bonds); 4740 1.1 riastrad kfree(ve); 4741 1.1 riastrad } 4742 1.1 riastrad 4743 1.1 riastrad static void virtual_engine_initial_hint(struct virtual_engine *ve) 4744 1.1 riastrad { 4745 1.1 riastrad int swp; 4746 1.1 riastrad 4747 1.1 riastrad /* 4748 1.1 riastrad * Pick a random sibling on starting to help spread the load around. 4749 1.1 riastrad * 4750 1.1 riastrad * New contexts are typically created with exactly the same order 4751 1.1 riastrad * of siblings, and often started in batches. Due to the way we iterate 4752 1.1 riastrad * the array of sibling when submitting requests, sibling[0] is 4753 1.1 riastrad * prioritised for dequeuing. If we make sure that sibling[0] is fairly 4754 1.1 riastrad * randomised across the system, we also help spread the load by the 4755 1.1 riastrad * first engine we inspect being different each time. 4756 1.1 riastrad * 4757 1.1 riastrad * NB This does not force us to execute on this engine, it will just 4758 1.1 riastrad * typically be the first we inspect for submission. 4759 1.1 riastrad */ 4760 1.1 riastrad swp = prandom_u32_max(ve->num_siblings); 4761 1.1 riastrad if (!swp) 4762 1.1 riastrad return; 4763 1.1 riastrad 4764 1.1 riastrad swap(ve->siblings[swp], ve->siblings[0]); 4765 1.1 riastrad if (!intel_engine_has_relative_mmio(ve->siblings[0])) 4766 1.1 riastrad virtual_update_register_offsets(ve->context.lrc_reg_state, 4767 1.1 riastrad ve->siblings[0]); 4768 1.1 riastrad } 4769 1.1 riastrad 4770 1.1 riastrad static int virtual_context_alloc(struct intel_context *ce) 4771 1.1 riastrad { 4772 1.1 riastrad struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 4773 1.1 riastrad 4774 1.1 riastrad return __execlists_context_alloc(ce, ve->siblings[0]); 4775 1.1 riastrad } 4776 1.1 riastrad 4777 1.1 riastrad static int virtual_context_pin(struct intel_context *ce) 4778 1.1 riastrad { 4779 1.1 riastrad struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 4780 1.1 riastrad int err; 4781 1.1 riastrad 4782 1.1 riastrad /* Note: we must use a real engine class for setting up reg state */ 4783 1.1 riastrad err = __execlists_context_pin(ce, ve->siblings[0]); 4784 1.1 riastrad if (err) 4785 1.1 riastrad return err; 4786 1.1 riastrad 4787 1.1 riastrad virtual_engine_initial_hint(ve); 4788 1.1 riastrad return 0; 4789 1.1 riastrad } 4790 1.1 riastrad 4791 1.1 riastrad static void virtual_context_enter(struct intel_context *ce) 4792 1.1 riastrad { 4793 1.1 riastrad struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 4794 1.1 riastrad unsigned int n; 4795 1.1 riastrad 4796 1.1 riastrad for (n = 0; n < ve->num_siblings; n++) 4797 1.1 riastrad intel_engine_pm_get(ve->siblings[n]); 4798 1.1 riastrad 4799 1.1 riastrad intel_timeline_enter(ce->timeline); 4800 1.1 riastrad } 4801 1.1 riastrad 4802 1.1 riastrad static void virtual_context_exit(struct intel_context *ce) 4803 1.1 riastrad { 4804 1.1 riastrad struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 4805 1.1 riastrad unsigned int n; 4806 1.1 riastrad 4807 1.1 riastrad intel_timeline_exit(ce->timeline); 4808 1.1 riastrad 4809 1.1 riastrad for (n = 0; n < ve->num_siblings; n++) 4810 1.1 riastrad intel_engine_pm_put(ve->siblings[n]); 4811 1.1 riastrad } 4812 1.1 riastrad 4813 1.1 riastrad static const struct intel_context_ops virtual_context_ops = { 4814 1.1 riastrad .alloc = virtual_context_alloc, 4815 1.1 riastrad 4816 1.1 riastrad .pin = virtual_context_pin, 4817 1.1 riastrad .unpin = execlists_context_unpin, 4818 1.1 riastrad 4819 1.1 riastrad .enter = virtual_context_enter, 4820 1.1 riastrad .exit = virtual_context_exit, 4821 1.1 riastrad 4822 1.1 riastrad .destroy = virtual_context_destroy, 4823 1.1 riastrad }; 4824 1.1 riastrad 4825 1.1 riastrad static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve) 4826 1.1 riastrad { 4827 1.1 riastrad struct i915_request *rq; 4828 1.1 riastrad intel_engine_mask_t mask; 4829 1.1 riastrad 4830 1.1 riastrad rq = READ_ONCE(ve->request); 4831 1.1 riastrad if (!rq) 4832 1.1 riastrad return 0; 4833 1.1 riastrad 4834 1.1 riastrad /* The rq is ready for submission; rq->execution_mask is now stable. */ 4835 1.1 riastrad mask = rq->execution_mask; 4836 1.1 riastrad if (unlikely(!mask)) { 4837 1.1 riastrad /* Invalid selection, submit to a random engine in error */ 4838 1.1 riastrad i915_request_skip(rq, -ENODEV); 4839 1.1 riastrad mask = ve->siblings[0]->mask; 4840 1.1 riastrad } 4841 1.1 riastrad 4842 1.1 riastrad ENGINE_TRACE(&ve->base, "rq=%llx:%lld, mask=%x, prio=%d\n", 4843 1.1 riastrad rq->fence.context, rq->fence.seqno, 4844 1.1 riastrad mask, ve->base.execlists.queue_priority_hint); 4845 1.1 riastrad 4846 1.1 riastrad return mask; 4847 1.1 riastrad } 4848 1.1 riastrad 4849 1.1 riastrad static void virtual_submission_tasklet(unsigned long data) 4850 1.1 riastrad { 4851 1.1 riastrad struct virtual_engine * const ve = (struct virtual_engine *)data; 4852 1.1 riastrad const int prio = ve->base.execlists.queue_priority_hint; 4853 1.1 riastrad intel_engine_mask_t mask; 4854 1.1 riastrad unsigned int n; 4855 1.1 riastrad 4856 1.1 riastrad rcu_read_lock(); 4857 1.1 riastrad mask = virtual_submission_mask(ve); 4858 1.1 riastrad rcu_read_unlock(); 4859 1.1 riastrad if (unlikely(!mask)) 4860 1.1 riastrad return; 4861 1.1 riastrad 4862 1.7 riastrad #ifdef __NetBSD__ 4863 1.7 riastrad int s = splsoftserial(); /* block tasklets=softints */ 4864 1.7 riastrad #else 4865 1.1 riastrad local_irq_disable(); 4866 1.7 riastrad #endif 4867 1.1 riastrad for (n = 0; READ_ONCE(ve->request) && n < ve->num_siblings; n++) { 4868 1.1 riastrad struct intel_engine_cs *sibling = ve->siblings[n]; 4869 1.1 riastrad struct ve_node * const node = &ve->nodes[sibling->id]; 4870 1.1 riastrad struct rb_node **parent, *rb; 4871 1.1 riastrad bool first; 4872 1.1 riastrad 4873 1.1 riastrad if (unlikely(!(mask & sibling->mask))) { 4874 1.7 riastrad if (node->inserted) { 4875 1.1 riastrad spin_lock(&sibling->active.lock); 4876 1.1 riastrad rb_erase_cached(&node->rb, 4877 1.1 riastrad &sibling->execlists.virtual); 4878 1.7 riastrad node->inserted = false; 4879 1.1 riastrad spin_unlock(&sibling->active.lock); 4880 1.1 riastrad } 4881 1.1 riastrad continue; 4882 1.1 riastrad } 4883 1.1 riastrad 4884 1.1 riastrad spin_lock(&sibling->active.lock); 4885 1.1 riastrad 4886 1.7 riastrad if (node->inserted) { 4887 1.1 riastrad /* 4888 1.1 riastrad * Cheat and avoid rebalancing the tree if we can 4889 1.1 riastrad * reuse this node in situ. 4890 1.1 riastrad */ 4891 1.1 riastrad first = rb_first_cached(&sibling->execlists.virtual) == 4892 1.1 riastrad &node->rb; 4893 1.1 riastrad if (prio == node->prio || (prio > node->prio && first)) 4894 1.1 riastrad goto submit_engine; 4895 1.1 riastrad 4896 1.1 riastrad rb_erase_cached(&node->rb, &sibling->execlists.virtual); 4897 1.7 riastrad node->inserted = false; 4898 1.1 riastrad } 4899 1.1 riastrad 4900 1.7 riastrad #ifdef __NetBSD__ 4901 1.7 riastrad __USE(parent); 4902 1.7 riastrad __USE(rb); 4903 1.7 riastrad struct ve_node *collision __diagused; 4904 1.7 riastrad /* XXX kludge to get insertion order */ 4905 1.7 riastrad node->order = ve->order++; 4906 1.7 riastrad collision = rb_tree_insert_node( 4907 1.7 riastrad &sibling->execlists.virtual.rb_root.rbr_tree, 4908 1.7 riastrad node); 4909 1.7 riastrad KASSERT(collision == node); 4910 1.7 riastrad node->inserted = true; 4911 1.7 riastrad first = rb_tree_find_node_geq( 4912 1.7 riastrad &sibling->execlists.virtual.rb_root.rbr_tree, 4913 1.7 riastrad &node->prio) == node; 4914 1.7 riastrad #else 4915 1.1 riastrad rb = NULL; 4916 1.1 riastrad first = true; 4917 1.1 riastrad parent = &sibling->execlists.virtual.rb_root.rb_node; 4918 1.1 riastrad while (*parent) { 4919 1.1 riastrad struct ve_node *other; 4920 1.1 riastrad 4921 1.1 riastrad rb = *parent; 4922 1.1 riastrad other = rb_entry(rb, typeof(*other), rb); 4923 1.1 riastrad if (prio > other->prio) { 4924 1.1 riastrad parent = &rb->rb_left; 4925 1.1 riastrad } else { 4926 1.1 riastrad parent = &rb->rb_right; 4927 1.1 riastrad first = false; 4928 1.1 riastrad } 4929 1.1 riastrad } 4930 1.1 riastrad 4931 1.1 riastrad rb_link_node(&node->rb, rb, parent); 4932 1.1 riastrad rb_insert_color_cached(&node->rb, 4933 1.1 riastrad &sibling->execlists.virtual, 4934 1.1 riastrad first); 4935 1.7 riastrad #endif 4936 1.1 riastrad 4937 1.1 riastrad submit_engine: 4938 1.7 riastrad GEM_BUG_ON(!node->inserted); 4939 1.1 riastrad node->prio = prio; 4940 1.1 riastrad if (first && prio > sibling->execlists.queue_priority_hint) { 4941 1.1 riastrad sibling->execlists.queue_priority_hint = prio; 4942 1.1 riastrad tasklet_hi_schedule(&sibling->execlists.tasklet); 4943 1.1 riastrad } 4944 1.1 riastrad 4945 1.1 riastrad spin_unlock(&sibling->active.lock); 4946 1.1 riastrad } 4947 1.7 riastrad #ifdef __NetBSD__ 4948 1.7 riastrad splx(s); 4949 1.7 riastrad #else 4950 1.1 riastrad local_irq_enable(); 4951 1.7 riastrad #endif 4952 1.1 riastrad } 4953 1.1 riastrad 4954 1.1 riastrad static void virtual_submit_request(struct i915_request *rq) 4955 1.1 riastrad { 4956 1.1 riastrad struct virtual_engine *ve = to_virtual_engine(rq->engine); 4957 1.1 riastrad struct i915_request *old; 4958 1.1 riastrad unsigned long flags; 4959 1.1 riastrad 4960 1.1 riastrad ENGINE_TRACE(&ve->base, "rq=%llx:%lld\n", 4961 1.1 riastrad rq->fence.context, 4962 1.1 riastrad rq->fence.seqno); 4963 1.1 riastrad 4964 1.1 riastrad GEM_BUG_ON(ve->base.submit_request != virtual_submit_request); 4965 1.1 riastrad 4966 1.1 riastrad spin_lock_irqsave(&ve->base.active.lock, flags); 4967 1.1 riastrad 4968 1.1 riastrad old = ve->request; 4969 1.1 riastrad if (old) { /* background completion event from preempt-to-busy */ 4970 1.1 riastrad GEM_BUG_ON(!i915_request_completed(old)); 4971 1.1 riastrad __i915_request_submit(old); 4972 1.1 riastrad i915_request_put(old); 4973 1.1 riastrad } 4974 1.1 riastrad 4975 1.1 riastrad if (i915_request_completed(rq)) { 4976 1.1 riastrad __i915_request_submit(rq); 4977 1.1 riastrad 4978 1.1 riastrad ve->base.execlists.queue_priority_hint = INT_MIN; 4979 1.1 riastrad ve->request = NULL; 4980 1.1 riastrad } else { 4981 1.1 riastrad ve->base.execlists.queue_priority_hint = rq_prio(rq); 4982 1.1 riastrad ve->request = i915_request_get(rq); 4983 1.1 riastrad 4984 1.1 riastrad GEM_BUG_ON(!list_empty(virtual_queue(ve))); 4985 1.1 riastrad list_move_tail(&rq->sched.link, virtual_queue(ve)); 4986 1.1 riastrad 4987 1.1 riastrad tasklet_schedule(&ve->base.execlists.tasklet); 4988 1.1 riastrad } 4989 1.1 riastrad 4990 1.1 riastrad spin_unlock_irqrestore(&ve->base.active.lock, flags); 4991 1.1 riastrad } 4992 1.1 riastrad 4993 1.1 riastrad static struct ve_bond * 4994 1.1 riastrad virtual_find_bond(struct virtual_engine *ve, 4995 1.1 riastrad const struct intel_engine_cs *master) 4996 1.1 riastrad { 4997 1.1 riastrad int i; 4998 1.1 riastrad 4999 1.1 riastrad for (i = 0; i < ve->num_bonds; i++) { 5000 1.1 riastrad if (ve->bonds[i].master == master) 5001 1.1 riastrad return &ve->bonds[i]; 5002 1.1 riastrad } 5003 1.1 riastrad 5004 1.1 riastrad return NULL; 5005 1.1 riastrad } 5006 1.1 riastrad 5007 1.1 riastrad static void 5008 1.1 riastrad virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal) 5009 1.1 riastrad { 5010 1.1 riastrad struct virtual_engine *ve = to_virtual_engine(rq->engine); 5011 1.1 riastrad intel_engine_mask_t allowed, exec; 5012 1.1 riastrad struct ve_bond *bond; 5013 1.1 riastrad 5014 1.1 riastrad allowed = ~to_request(signal)->engine->mask; 5015 1.1 riastrad 5016 1.1 riastrad bond = virtual_find_bond(ve, to_request(signal)->engine); 5017 1.1 riastrad if (bond) 5018 1.1 riastrad allowed &= bond->sibling_mask; 5019 1.1 riastrad 5020 1.1 riastrad /* Restrict the bonded request to run on only the available engines */ 5021 1.1 riastrad exec = READ_ONCE(rq->execution_mask); 5022 1.1 riastrad while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed)) 5023 1.1 riastrad ; 5024 1.1 riastrad 5025 1.1 riastrad /* Prevent the master from being re-run on the bonded engines */ 5026 1.1 riastrad to_request(signal)->execution_mask &= ~allowed; 5027 1.1 riastrad } 5028 1.1 riastrad 5029 1.1 riastrad struct intel_context * 5030 1.1 riastrad intel_execlists_create_virtual(struct intel_engine_cs **siblings, 5031 1.1 riastrad unsigned int count) 5032 1.1 riastrad { 5033 1.1 riastrad struct virtual_engine *ve; 5034 1.1 riastrad unsigned int n; 5035 1.1 riastrad int err; 5036 1.1 riastrad 5037 1.1 riastrad if (count == 0) 5038 1.1 riastrad return ERR_PTR(-EINVAL); 5039 1.1 riastrad 5040 1.1 riastrad if (count == 1) 5041 1.1 riastrad return intel_context_create(siblings[0]); 5042 1.1 riastrad 5043 1.1 riastrad ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL); 5044 1.1 riastrad if (!ve) 5045 1.1 riastrad return ERR_PTR(-ENOMEM); 5046 1.1 riastrad 5047 1.1 riastrad ve->base.i915 = siblings[0]->i915; 5048 1.1 riastrad ve->base.gt = siblings[0]->gt; 5049 1.1 riastrad ve->base.uncore = siblings[0]->uncore; 5050 1.1 riastrad ve->base.id = -1; 5051 1.1 riastrad 5052 1.1 riastrad ve->base.class = OTHER_CLASS; 5053 1.1 riastrad ve->base.uabi_class = I915_ENGINE_CLASS_INVALID; 5054 1.1 riastrad ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; 5055 1.1 riastrad ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; 5056 1.1 riastrad 5057 1.1 riastrad /* 5058 1.1 riastrad * The decision on whether to submit a request using semaphores 5059 1.1 riastrad * depends on the saturated state of the engine. We only compute 5060 1.1 riastrad * this during HW submission of the request, and we need for this 5061 1.1 riastrad * state to be globally applied to all requests being submitted 5062 1.1 riastrad * to this engine. Virtual engines encompass more than one physical 5063 1.1 riastrad * engine and so we cannot accurately tell in advance if one of those 5064 1.1 riastrad * engines is already saturated and so cannot afford to use a semaphore 5065 1.1 riastrad * and be pessimized in priority for doing so -- if we are the only 5066 1.1 riastrad * context using semaphores after all other clients have stopped, we 5067 1.1 riastrad * will be starved on the saturated system. Such a global switch for 5068 1.1 riastrad * semaphores is less than ideal, but alas is the current compromise. 5069 1.1 riastrad */ 5070 1.1 riastrad ve->base.saturated = ALL_ENGINES; 5071 1.1 riastrad 5072 1.1 riastrad snprintf(ve->base.name, sizeof(ve->base.name), "virtual"); 5073 1.1 riastrad 5074 1.1 riastrad intel_engine_init_active(&ve->base, ENGINE_VIRTUAL); 5075 1.1 riastrad intel_engine_init_breadcrumbs(&ve->base); 5076 1.1 riastrad intel_engine_init_execlists(&ve->base); 5077 1.1 riastrad 5078 1.1 riastrad ve->base.cops = &virtual_context_ops; 5079 1.1 riastrad ve->base.request_alloc = execlists_request_alloc; 5080 1.1 riastrad 5081 1.1 riastrad ve->base.schedule = i915_schedule; 5082 1.1 riastrad ve->base.submit_request = virtual_submit_request; 5083 1.1 riastrad ve->base.bond_execute = virtual_bond_execute; 5084 1.1 riastrad 5085 1.1 riastrad INIT_LIST_HEAD(virtual_queue(ve)); 5086 1.1 riastrad ve->base.execlists.queue_priority_hint = INT_MIN; 5087 1.1 riastrad tasklet_init(&ve->base.execlists.tasklet, 5088 1.1 riastrad virtual_submission_tasklet, 5089 1.1 riastrad (unsigned long)ve); 5090 1.1 riastrad 5091 1.1 riastrad intel_context_init(&ve->context, &ve->base); 5092 1.1 riastrad 5093 1.1 riastrad for (n = 0; n < count; n++) { 5094 1.1 riastrad struct intel_engine_cs *sibling = siblings[n]; 5095 1.1 riastrad 5096 1.1 riastrad GEM_BUG_ON(!is_power_of_2(sibling->mask)); 5097 1.1 riastrad if (sibling->mask & ve->base.mask) { 5098 1.1 riastrad DRM_DEBUG("duplicate %s entry in load balancer\n", 5099 1.1 riastrad sibling->name); 5100 1.1 riastrad err = -EINVAL; 5101 1.1 riastrad goto err_put; 5102 1.1 riastrad } 5103 1.1 riastrad 5104 1.1 riastrad /* 5105 1.1 riastrad * The virtual engine implementation is tightly coupled to 5106 1.1 riastrad * the execlists backend -- we push out request directly 5107 1.1 riastrad * into a tree inside each physical engine. We could support 5108 1.1 riastrad * layering if we handle cloning of the requests and 5109 1.1 riastrad * submitting a copy into each backend. 5110 1.1 riastrad */ 5111 1.1 riastrad if (sibling->execlists.tasklet.func != 5112 1.1 riastrad execlists_submission_tasklet) { 5113 1.1 riastrad err = -ENODEV; 5114 1.1 riastrad goto err_put; 5115 1.1 riastrad } 5116 1.1 riastrad 5117 1.7 riastrad GEM_BUG_ON(!ve->nodes[sibling->id].inserted); 5118 1.7 riastrad ve->nodes[sibling->id].inserted = false; 5119 1.1 riastrad 5120 1.1 riastrad ve->siblings[ve->num_siblings++] = sibling; 5121 1.1 riastrad ve->base.mask |= sibling->mask; 5122 1.1 riastrad 5123 1.1 riastrad /* 5124 1.1 riastrad * All physical engines must be compatible for their emission 5125 1.1 riastrad * functions (as we build the instructions during request 5126 1.1 riastrad * construction and do not alter them before submission 5127 1.1 riastrad * on the physical engine). We use the engine class as a guide 5128 1.1 riastrad * here, although that could be refined. 5129 1.1 riastrad */ 5130 1.1 riastrad if (ve->base.class != OTHER_CLASS) { 5131 1.1 riastrad if (ve->base.class != sibling->class) { 5132 1.1 riastrad DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n", 5133 1.1 riastrad sibling->class, ve->base.class); 5134 1.1 riastrad err = -EINVAL; 5135 1.1 riastrad goto err_put; 5136 1.1 riastrad } 5137 1.1 riastrad continue; 5138 1.1 riastrad } 5139 1.1 riastrad 5140 1.1 riastrad ve->base.class = sibling->class; 5141 1.1 riastrad ve->base.uabi_class = sibling->uabi_class; 5142 1.1 riastrad snprintf(ve->base.name, sizeof(ve->base.name), 5143 1.1 riastrad "v%dx%d", ve->base.class, count); 5144 1.1 riastrad ve->base.context_size = sibling->context_size; 5145 1.1 riastrad 5146 1.1 riastrad ve->base.emit_bb_start = sibling->emit_bb_start; 5147 1.1 riastrad ve->base.emit_flush = sibling->emit_flush; 5148 1.1 riastrad ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb; 5149 1.1 riastrad ve->base.emit_fini_breadcrumb = sibling->emit_fini_breadcrumb; 5150 1.1 riastrad ve->base.emit_fini_breadcrumb_dw = 5151 1.1 riastrad sibling->emit_fini_breadcrumb_dw; 5152 1.1 riastrad 5153 1.1 riastrad ve->base.flags = sibling->flags; 5154 1.1 riastrad } 5155 1.1 riastrad 5156 1.1 riastrad ve->base.flags |= I915_ENGINE_IS_VIRTUAL; 5157 1.1 riastrad 5158 1.1 riastrad return &ve->context; 5159 1.1 riastrad 5160 1.1 riastrad err_put: 5161 1.1 riastrad intel_context_put(&ve->context); 5162 1.1 riastrad return ERR_PTR(err); 5163 1.1 riastrad } 5164 1.1 riastrad 5165 1.1 riastrad struct intel_context * 5166 1.1 riastrad intel_execlists_clone_virtual(struct intel_engine_cs *src) 5167 1.1 riastrad { 5168 1.1 riastrad struct virtual_engine *se = to_virtual_engine(src); 5169 1.1 riastrad struct intel_context *dst; 5170 1.1 riastrad 5171 1.1 riastrad dst = intel_execlists_create_virtual(se->siblings, 5172 1.1 riastrad se->num_siblings); 5173 1.1 riastrad if (IS_ERR(dst)) 5174 1.1 riastrad return dst; 5175 1.1 riastrad 5176 1.1 riastrad if (se->num_bonds) { 5177 1.1 riastrad struct virtual_engine *de = to_virtual_engine(dst->engine); 5178 1.1 riastrad 5179 1.1 riastrad de->bonds = kmemdup(se->bonds, 5180 1.1 riastrad sizeof(*se->bonds) * se->num_bonds, 5181 1.1 riastrad GFP_KERNEL); 5182 1.1 riastrad if (!de->bonds) { 5183 1.1 riastrad intel_context_put(dst); 5184 1.1 riastrad return ERR_PTR(-ENOMEM); 5185 1.1 riastrad } 5186 1.1 riastrad 5187 1.1 riastrad de->num_bonds = se->num_bonds; 5188 1.1 riastrad } 5189 1.1 riastrad 5190 1.1 riastrad return dst; 5191 1.1 riastrad } 5192 1.1 riastrad 5193 1.1 riastrad int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine, 5194 1.1 riastrad const struct intel_engine_cs *master, 5195 1.1 riastrad const struct intel_engine_cs *sibling) 5196 1.1 riastrad { 5197 1.1 riastrad struct virtual_engine *ve = to_virtual_engine(engine); 5198 1.1 riastrad struct ve_bond *bond; 5199 1.1 riastrad int n; 5200 1.1 riastrad 5201 1.1 riastrad /* Sanity check the sibling is part of the virtual engine */ 5202 1.1 riastrad for (n = 0; n < ve->num_siblings; n++) 5203 1.1 riastrad if (sibling == ve->siblings[n]) 5204 1.1 riastrad break; 5205 1.1 riastrad if (n == ve->num_siblings) 5206 1.1 riastrad return -EINVAL; 5207 1.1 riastrad 5208 1.1 riastrad bond = virtual_find_bond(ve, master); 5209 1.1 riastrad if (bond) { 5210 1.1 riastrad bond->sibling_mask |= sibling->mask; 5211 1.1 riastrad return 0; 5212 1.1 riastrad } 5213 1.1 riastrad 5214 1.1 riastrad bond = krealloc(ve->bonds, 5215 1.1 riastrad sizeof(*bond) * (ve->num_bonds + 1), 5216 1.1 riastrad GFP_KERNEL); 5217 1.1 riastrad if (!bond) 5218 1.1 riastrad return -ENOMEM; 5219 1.1 riastrad 5220 1.1 riastrad bond[ve->num_bonds].master = master; 5221 1.1 riastrad bond[ve->num_bonds].sibling_mask = sibling->mask; 5222 1.1 riastrad 5223 1.1 riastrad ve->bonds = bond; 5224 1.1 riastrad ve->num_bonds++; 5225 1.1 riastrad 5226 1.1 riastrad return 0; 5227 1.1 riastrad } 5228 1.1 riastrad 5229 1.1 riastrad struct intel_engine_cs * 5230 1.1 riastrad intel_virtual_engine_get_sibling(struct intel_engine_cs *engine, 5231 1.1 riastrad unsigned int sibling) 5232 1.1 riastrad { 5233 1.1 riastrad struct virtual_engine *ve = to_virtual_engine(engine); 5234 1.1 riastrad 5235 1.1 riastrad if (sibling >= ve->num_siblings) 5236 1.1 riastrad return NULL; 5237 1.1 riastrad 5238 1.1 riastrad return ve->siblings[sibling]; 5239 1.1 riastrad } 5240 1.1 riastrad 5241 1.1 riastrad void intel_execlists_show_requests(struct intel_engine_cs *engine, 5242 1.1 riastrad struct drm_printer *m, 5243 1.1 riastrad void (*show_request)(struct drm_printer *m, 5244 1.1 riastrad struct i915_request *rq, 5245 1.1 riastrad const char *prefix), 5246 1.1 riastrad unsigned int max) 5247 1.1 riastrad { 5248 1.1 riastrad const struct intel_engine_execlists *execlists = &engine->execlists; 5249 1.1 riastrad struct i915_request *rq, *last; 5250 1.1 riastrad unsigned long flags; 5251 1.1 riastrad unsigned int count; 5252 1.1 riastrad struct rb_node *rb; 5253 1.1 riastrad 5254 1.1 riastrad spin_lock_irqsave(&engine->active.lock, flags); 5255 1.1 riastrad 5256 1.1 riastrad last = NULL; 5257 1.1 riastrad count = 0; 5258 1.1 riastrad list_for_each_entry(rq, &engine->active.requests, sched.link) { 5259 1.1 riastrad if (count++ < max - 1) 5260 1.1 riastrad show_request(m, rq, "\t\tE "); 5261 1.1 riastrad else 5262 1.1 riastrad last = rq; 5263 1.1 riastrad } 5264 1.1 riastrad if (last) { 5265 1.1 riastrad if (count > max) { 5266 1.1 riastrad drm_printf(m, 5267 1.1 riastrad "\t\t...skipping %d executing requests...\n", 5268 1.1 riastrad count - max); 5269 1.1 riastrad } 5270 1.1 riastrad show_request(m, last, "\t\tE "); 5271 1.1 riastrad } 5272 1.1 riastrad 5273 1.1 riastrad last = NULL; 5274 1.1 riastrad count = 0; 5275 1.1 riastrad if (execlists->queue_priority_hint != INT_MIN) 5276 1.1 riastrad drm_printf(m, "\t\tQueue priority hint: %d\n", 5277 1.1 riastrad execlists->queue_priority_hint); 5278 1.7 riastrad for (rb = rb_first_cached(&execlists->queue); 5279 1.7 riastrad rb; 5280 1.7 riastrad rb = rb_next2(&execlists->queue.rb_root, rb)) { 5281 1.1 riastrad struct i915_priolist *p = rb_entry(rb, typeof(*p), node); 5282 1.1 riastrad int i; 5283 1.1 riastrad 5284 1.1 riastrad priolist_for_each_request(rq, p, i) { 5285 1.1 riastrad if (count++ < max - 1) 5286 1.1 riastrad show_request(m, rq, "\t\tQ "); 5287 1.1 riastrad else 5288 1.1 riastrad last = rq; 5289 1.1 riastrad } 5290 1.1 riastrad } 5291 1.1 riastrad if (last) { 5292 1.1 riastrad if (count > max) { 5293 1.1 riastrad drm_printf(m, 5294 1.1 riastrad "\t\t...skipping %d queued requests...\n", 5295 1.1 riastrad count - max); 5296 1.1 riastrad } 5297 1.1 riastrad show_request(m, last, "\t\tQ "); 5298 1.1 riastrad } 5299 1.1 riastrad 5300 1.1 riastrad last = NULL; 5301 1.1 riastrad count = 0; 5302 1.7 riastrad for (rb = rb_first_cached(&execlists->virtual); 5303 1.7 riastrad rb; 5304 1.7 riastrad rb = rb_next2(&execlists->virtual.rb_root, rb)) { 5305 1.1 riastrad struct virtual_engine *ve = 5306 1.1 riastrad rb_entry(rb, typeof(*ve), nodes[engine->id].rb); 5307 1.1 riastrad struct i915_request *rq = READ_ONCE(ve->request); 5308 1.1 riastrad 5309 1.1 riastrad if (rq) { 5310 1.1 riastrad if (count++ < max - 1) 5311 1.1 riastrad show_request(m, rq, "\t\tV "); 5312 1.1 riastrad else 5313 1.1 riastrad last = rq; 5314 1.1 riastrad } 5315 1.1 riastrad } 5316 1.1 riastrad if (last) { 5317 1.1 riastrad if (count > max) { 5318 1.1 riastrad drm_printf(m, 5319 1.1 riastrad "\t\t...skipping %d virtual requests...\n", 5320 1.1 riastrad count - max); 5321 1.1 riastrad } 5322 1.1 riastrad show_request(m, last, "\t\tV "); 5323 1.1 riastrad } 5324 1.1 riastrad 5325 1.1 riastrad spin_unlock_irqrestore(&engine->active.lock, flags); 5326 1.1 riastrad } 5327 1.1 riastrad 5328 1.1 riastrad void intel_lr_context_reset(struct intel_engine_cs *engine, 5329 1.1 riastrad struct intel_context *ce, 5330 1.1 riastrad u32 head, 5331 1.1 riastrad bool scrub) 5332 1.1 riastrad { 5333 1.1 riastrad GEM_BUG_ON(!intel_context_is_pinned(ce)); 5334 1.1 riastrad 5335 1.1 riastrad /* 5336 1.1 riastrad * We want a simple context + ring to execute the breadcrumb update. 5337 1.1 riastrad * We cannot rely on the context being intact across the GPU hang, 5338 1.1 riastrad * so clear it and rebuild just what we need for the breadcrumb. 5339 1.1 riastrad * All pending requests for this context will be zapped, and any 5340 1.1 riastrad * future request will be after userspace has had the opportunity 5341 1.1 riastrad * to recreate its own state. 5342 1.1 riastrad */ 5343 1.1 riastrad if (scrub) 5344 1.1 riastrad restore_default_state(ce, engine); 5345 1.1 riastrad 5346 1.1 riastrad /* Rerun the request; its payload has been neutered (if guilty). */ 5347 1.1 riastrad __execlists_update_reg_state(ce, engine, head); 5348 1.1 riastrad } 5349 1.1 riastrad 5350 1.1 riastrad bool 5351 1.1 riastrad intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine) 5352 1.1 riastrad { 5353 1.1 riastrad return engine->set_default_submission == 5354 1.1 riastrad intel_execlists_set_default_submission; 5355 1.1 riastrad } 5356 1.1 riastrad 5357 1.1 riastrad #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) 5358 1.1 riastrad #include "selftest_lrc.c" 5359 1.1 riastrad #endif 5360