Home Phoronix Phoronix Forums X.Org Videos From FOSDEM 2008

Radeon IRC Logs For 2008-11-23

Search This Log:

ssieb: I was told that the software renderer also identifies as direct rendering
fwc: hmm well ET is getting 40-80fps with maxxed settings/1024x768
fwc: agh, crashed hard.. great :P
rmh3093: airlied, ping
rmh3093: im getting that no modes/screens issue when i start X
rmh3093: i cant remember what the fix was for that
airlied: rmh3093: hmm not sure....
rmh3093: is rawhide drm still newer than modesetting-gem?
airlied: rmh3093: the APIs have diverged.
airlied: so modesetting-gem lost interest to me, I need to port over some of the newer changes I made to rawhide
rmh3093: that could be it
rmh3093: my libdrm is from modesetting-gem
rmh3093: but all my other stuff is from rawhide
rmh3093: are you gonna combine the dri2 and kms branches :)
airlied: well I need to just get all the userspace working and stable on glisse new tree
airlied: shouldn't be too hard the current r300 isn't that stable :)
airlied: yeah backout the api changes from modesetting-gem
airlied: I must make a new libdrm branch with the fedora compat
janboe: Hi, what's is TTM?
glisse: janboe: a short name for one of the memory manager
janboe: Is GTT same thing?
janboe: Is GEM introduced to replace TTM?
glisse: janboe: bit complex
glisse: there is the API
glisse: what userspace see
glisse: so for radeon we are aiming at somethings close to gem api
glisse: but we will use things that were done for TTM
kqr: i have an radeon IGP 340M graphics chip, i use the 'radeon' drivers, and i have direct rendering enabled... and even the simplest game runs at 5fps. glxgears runs at ~300 fps... what's went wrong?
janboe: Ok. So GEM is intel style management, and TTM is general one, right?
glisse: janboe: it's bit more complex
glisse: GEM is also the API intel memory manager use
janboe: Is there some high level docs for TTM and GEM? I only find GEM description which is sent in the patchset mails.
glisse: TTM was described few times on mesa3d-devel or on dri-devel
glisse: it's all about memory management
glisse: so it's pretty much card agnostic
glisse: except that gem internals offer no help for managing vram
glisse: while ttm
glisse: does
glisse: in the mean time ttm api is way too complex especialy fencing stuff
janboe: So there are two memory management design for DRM? Is there third one?
glisse: janboe: well maybe there will be a third one
glisse: and mores to come ;)
janboe: Oh, my god, it's too complex:-(\
glisse: janboe: just don't pay attention to memory manager
glisse: today it's just about moving memory around and trying to save as much as possible any memcpy
glisse: tomorrow memory management for gpu might becomes as complex as for cpu
otaylor: glisse: In some ways it's more complex today...
glisse: otaylor: i guess we can say that :)
glisse: memory fault is somethings nice
otaylor: glisse: multiple domains, multiple entities accessing the memory, limited hardware support for page tables on the gpu side, etc.
glisse: otaylor: well i think GPU people are bit concern about how pagefault affect performance
glisse: when you are drawing billions pixel you don't like to pagefault
glisse: neither you like a tlb miss
glisse: bandwidth of GPU memory is somethings like 10 time bigger than cpu one for a reason :)
glisse: GPU is greedy
otaylor: yeah. But if the software can't make use of it because its too complex to do all the memory allocations ahead of time
otaylor: airlied's blog posts this week were a little horrifying
glisse: well memory fragmentation is the biggest concern
glisse: of course once in a game you shoud not fragment often
otaylor: it's uncomfortable to assume that the person creating the gpu command stream knows everything about the available memory and where to put stuff
otaylor: you think fragmentation is a big problem? compaction in the kernel seems fairly tractable, though you do have to "stop the world"
glisse: otaylor: well cs is about giving hint to where bo should be then it's up to the kernel to things, but in this case i think airlied is too optimistic on memory limits it report to userspace
glisse: we should report way less memory i think
kqr: i have an radeon IGP 340M graphics chip, i use the 'radeon' drivers, and i have direct rendering enabled... and even the simplest game runs at 5fps. glxgears runs at ~300 fps... what's went wrong?
glisse: but all this needs testing
glisse: kqr: maybe nothings is wrong
glisse: glxgears is not a benchmark
kqr: glisse: but the gfx card was able to do these things earlier
glisse: things like quake3 for your card might be
otaylor: glisse: Seems hard to figure out the amount of available memory ahead of time
kqr: glisse: uh, 300 fps in glxgears is a very low result
kqr: glisse: so my fps is overall very low
glisse: otaylor: i think the clue here is that it should change at each cs we should return how much memory the next cs should use at most
otaylor: *If* you can assume that everything other than the frontbuffer (and cursor, and a few other fixed allocation) can be moved out at need, then you can make a pretty good guess at what is available to a single client
otaylor: it doesn't seem like we have any recovery mechanism if that number was too big
otaylor: "Oops, I can't execute that command stream. Sorry bud"
glisse: otaylor: i fully agree with that having somethings like oh sorry can you try again to tell to the X server would rocks
glisse: mesa/gallium are bit easier to fix in this aspect (well feeling i have)
glisse: i think we can somehow do this ddx side
otaylor: glisse: one random idea I had was that you could submit along with then the command buffer a "state mask" which was basically a bit mask of what commands in the buffer a) affected state b) actually did drawing
glisse: but this needs some kind of clever cs infrastructure
otaylor: Then you you could replay only the tail portion of a command buffer by replaying the state part of the head, and all of the tail
otaylor: But I somehow think that would be hard to make work in practice....
glisse: also maybe we should avoid to always copy things in vram, having somekind of threshold on the size
glisse: accessing small texture through gart shouldn't hurt too much
glisse: while for big texture we want them in vram
glisse: also trying to put big things in vram should help with fragmentation
otaylor: airlied implied that writing to things in gart isn't possible
glisse: i am thinking about read object here
glisse: because they are only few write buffer
glisse: at most colorbuffer and zbuffer
otaylor: What I really want to be able to get into gart space on my memory limited laptop is actually the big read-only textures ... composited windows
glisse: for r6xx hw their will also be vertex buffer
glisse: well if big read only texture fit in vram you rather want them in vram
glisse: otherwise having them in gart should hurt beside performance
otaylor: putting small textures into gart space for reads I don't think actually helps anything other then, as you say, fragmentation
glisse: shouldn't
otaylor: glisse: It's a lot less hurt than swapping them out of vram into system ram!
glisse: all this needs testing :)
glisse: we are drived by intuition and intuition could be dame wrong from time to time :)
otaylor: Indeed. Well, and some prototypes we can test withh
otaylor: it's inherently fun stuff to think about, but rather intimidating to work on, especially with everything in so much flux.
glisse: yup
glisse: also ttm cleanup which are awaited might help to make testing different policy either
glisse: but in the end i think we will likely want to be able to change policy dynamicly depending on what the gpu is doing
glisse: a fullscreen games should have right to evict all other app from vram
kqr: i have an radeon IGP 340M graphics chip, i use the 'radeon' drivers, and i have direct rendering enabled... and even the simplest game runs at 5fps (bad result). glxgears runs at ~300 fps (also a bad result)... what's went wrong?
chithead: kqr: running "glxinfo | grep direct" as the user logged into X should report that direct rendering is enabled
kqr: chithead: it does
chithead: are you using compiz or some other compositing desktop?
kqr: nope
marcheu: the 340m is the IGP equivalent of a radeon 7000
kqr: it seems the computer was out of vram... if i decreased my resolution to 800×600 it ran smoothly
marcheu: it is indeed a slow card
kqr: but i don't want to run 800×600, looks ugly ^^
chithead: kqr: you could try to use 16 bit color instead of 24 bit
kqr: is there a way to increase vram share on the physical ram, or something like that?
marcheu: kqr: enable hyperz, enable tiling. that's all you can do
kqr: chithead: hmm
kqr: chithead: where do i change that in debian/gnome? xD
chithead: kqr: in xorg.conf, set DefaultDepth 16 in the screen section
kqr: ^^
kqr: forgot xD
kqr: chithead: is that an Option "DefaultDepth" "16"?
chithead: kqr: also don't set the virtual size to a larger value than necessary
kqr: virtual size?
chithead: kqr: no, see "man xorg.conf"
kqr: okey ^^
chithead: the current virtual size will be displayed if you run "xrandr"
chithead: (the "maximum" value)
kqr: maximum is 1024×1024
kqr: uh brb restart x
kqr: hmm
kqr: works now
kqr: but
kqr: with
kqr: issues
kqr: :S
kqr: the background is smeared out
kqr: some graphics isn't shown without some ugly square behind them
chithead: kqr: you can try with the accelmethod, colortiling and enablepageflip options
chithead: maybe some combination will work better
kqr: hmm
kqr: a more advanced game just crashed xorg ^^
kqr: some graphic elements still glitchy though
kqr: but the background in the lowtech game is no more smeared
airlied: glisse: I can't think of a nice way to get mesa or DDX to resubmit on CS fail.
airlied: it really is too late in the game at that point unless you are going to flush afer every operation
airlied: which won't really do much for speed.
airlied: we could probably make defrag smarter alright.
airlied: and we should probably flush when we hit a limit like 3/4 of VRAM.
airlied: but I don't want to limit someone to 3/4 of VRAM if they have one single texture that could fit and that is all they need.
airlied: otaylor: we currently use GART for reads from pixmap/texture, I just need to make the GART biffer.
airlied: bigger.
airlied: I could probably enable write to GART on a case by case basis.
otaylor: airlied: Do you do VRAM=>gart pixmap migration?
airlied: yes.
airlied: well eviction does it.
airlied: reads from pixmaps are GART|VRAM, writes are VRAM
otaylor: sounds great, once we have dri2 :-)
glisse: airlied: well i was thinking of storing state and draw cmd in separate queue (userspace)
glisse: state don't consume memory beside colorbuffer & possibly zbuffer
glisse: while draw cmd consumme memory
airlied: texture state?
glisse: so if cs fail
glisse: you resubmit state and you split draw
glisse: texture state will go along draw cmd :)
airlied: it would be easier to add split points to the stream I would suspect.
glisse: well i mean all this in userspace
glisse: i don't really like split point because i see no easy way to handle that in kernel
airlied: so if it fails, you redo the stream with a smaller limit?
glisse: i mean this needs quite a bit of code
glisse: well you reemit state and append less draw cmd
glisse: i guess cs should return a limit
airlied: its might an optimisation I suppose if we do hit stalls when the kernel defragments.
glisse: so userspace can try to meet that limit
airlied: we should probably see how bad defrag time is first.
glisse: defrag is puzzling me, i see no clever way, i have the feeling that once we need defrag we already loose :)
glisse: for defrag i think we should really have some kind of treshold on the bo size
glisse: so small bo are (allmost) never in vram
glisse: of course if we have to defrag big bo we loose big times
airlied: its the big BO that are the issue.
airlied: you have to evict them if they end up in the wrong place.
glisse: maybe we should use some kind of clever memory filling pattern
airlied: I don't think evicting all the bos is the perfect answer, it just the most likely to always work.
glisse: like using memory from both end
airlied: glisse: cleverness probably won't help the corner cases are always going to exist.
glisse: true
airlied: if you tell userspace it can use 100MB of space, you better not be lying.
glisse: it's just about making the corner smaller :)
marcheu: I wonder if just having only static buffers would make things faster
glisse: :)
marcheu: no relocs + no migration + no defrag + simple code with less bugs
marcheu: I mean, you're building something that's already too big to imagine properly, let alone impelment
airlied: marcheu: its all pretty logical
marcheu: given how many layers of stuff I see you're adding, I think it's not
airlied: what layers?
airlied: its all in the kernel.
marcheu: yup
airlied: you give the kernel a big load of stuff to draw, and it doesn't throw up is the goal.
marcheu: that doesn't require migration. that requires dynamic AGP remapping at most
airlied: marcheu: you can still submit more buffers than can fit in AGP
airlied: marcheu: Intel are already hitting defrag problem
airlied: and are now implementing it.
marcheu: yeah because you migrate a lot, then you have to defrag...
airlied: they don't have migrate
airlied: they have no VRAM, fully dynamic AGP
airlied: they still hit the same problem.
marcheu: how come we never hit it before ?
airlied: because we just did everything in sw
airlied: if we had too big a buffer.
glisse: and we never used all the ram ;)
marcheu: with the dri drivers this never happened
airlied: marcheu: because we had a hardcoded limit in userspace
airlied: and the userspace driver put the buffers into the memory
airlied: but at the same time evicted all the other users buffers.
airlied: the userspace bufmgr faces the same problem.
marcheu: no, because user space was able to do tricks to avoid this
airlied: maybe some sort of big state structure shared between userspace and kernslspace is the answer
marcheu: if you have to defrag vram you're creating yet another graphics slowdown
airlied: its not something I would forsee us hitting every day.
marcheu: user space never needed sarea to impelment that
otaylor: airlied: doesn't that come down to a big lock
marcheu: so you're re-adding sarea now :)
airlied: marcheu: the thing is userspace memory management sucked.
otaylor: airlied: that is, the inherent problem is that the userspace is assembling buffers asynchronously without knowing what the memory situation will be when you execute them
marcheu: airlied: I agree, but you're telling about something sarea-like here...
airlied: marcheu: no that was a joke :)
marcheu: airlied: so FYI the sarea was never needed for DRI to do tricks to avoid saturating VRAM
airlied: I have no intention of using a shared memory area.
airlied: marcheu: userspace just had a fixed VRAM limit.
airlied: marcheu: here is 32MB you will always have it, if you want more you lose.
marcheu: airlied: on radeon with AGP + VRAM you never hit it. plus we now have dynamic remapping
otaylor: marcheu: and you lost most of the time
marcheu: otaylor: nope you didn't
airlied: marcheu: we did hit the texture limits a lot. EXA vs textures on radeon sucks.
otaylor: marcheu: I've looked at profiles, I've looked at my desktop crawling because textures were thrashing
marcheu: airlied: again, the split memory pool is another problem
marcheu: otaylor: I'm talking radeon not r200
otaylor: marcheu: and this isnt' even considering the case of texture_for_pixmap where the *same* buffer is a texture and a pixmap, how do you account that?
airlied: marcheu: so you are syaing we should allocate VRAM on a first come first served basid and use AGP for everything else?\
airlied: ^basid^basis
marcheu: airlied: yeah, I'm saying we should 1. place all pixmaps in GART 2. use VRAM for textures and otherwise AGP
otaylor: marcheu: you mean, you could play an 8 year old game on an 8 year old card, and no problems?
marcheu: otaylor: you should look at radeon, it's better than r200, doesn't work the same.
airlied: marcheu: where do FBOs go?
marcheu: airlied: VRAM is for FBOs
airlied: see I don't see pixmaps as much different than textures.
airlied: in fact I see no difference.
airlied: you have a bunch of data the GPU wants to do stuff do.
otaylor: marcheu: radeon would still be pretty much constrained by dri1 and its limitations
otaylor: marcheu: I'm not really sure what it could do that was fundamentally different
airlied: radeon just used GART textures as a fallover
marcheu: airlied: there is a big usage difference. the pixmaps are accessed a lot less
airlied: you could still run out of space
marcheu: otaylor: I wrote radeon code, so I guess I know...
airlied: marcheu: my pixmaps get access a lot more.
airlied: since I don't have a 3d game running
otaylor: marcheu: sure, I'm not doubting that
marcheu: airlied: nope you could still have a usable desktop without pixmaps in VRAM
airlied: marcheu: I'd still have to defrag AGP space
otaylor: marcheu: but you put my pixmaps (that is, the way I'm rendering my desktop!) always into gart over my dead body!
airlied: marcheu: as I said Intel are already hitting the issue
airlied: defrag isn't as much overhead granted
marcheu: airlied: but memcpy it is, so it's much smoother
airlied: but you still have to do all the codepaths.
airlied: well it doesn't need memcpy
marcheu: airlied: plus you just remap the pages and get more free GART :)
airlied: the codepaths are pretty much all the same though, same number of layers
marcheu: otaylor: why is that a problem ? you didn't try it
marcheu: airlied: gart is like an infinite resource. we can put all src pixmaps there and it's not a problem
airlied: marcheu: its no infinite per cmdbuf.
airlied: its quite fixed and suffers the same issue, if the kernel says it can fit something it better fit it.
otaylor: marcheu: I dont' understand why you want my rendering *target* a pixmap to be in gart (because it's 2d and 2d is a second class citizen?)
airlied: marcheu: we currently render most pixmaps on radeon in hw so VRAM is a lot faster.
marcheu: airlied: yeah so we have to report proper limits. and ?
marcheu: otaylor: because they fallback often, and then fallbacks are slow from VRAM and fast in GART
airlied: marcheu: so you end up with the same problem, the same layers.
otaylor: marcheu: Clearly *anything* is better than memcopying from vram, so if you avoid that you can get away with a lot and it's "beter"
marcheu: otaylor: don't tell me there are no fallbacks, because every drivers have some
airlied: marcheu: we don't fallback that often on radeon on Fedora GNOME desktop
otaylor: marcheu: they aren't very fast in gart unless you change the memory to acheable
otaylor: marcheu: I did some measurements and basically nothing was falling back
airlied: yeah cacheable makes the big diff at which point memcpy is possibly faster.
marcheu: slowness in GART is because you fail at using prefetching in sw code...
marcheu: really there are solutions, I think you're overengineering it with defrag
marcheu: defrag is sure lose
airlied: marcheu: you have to defrag GART, the code is all the same really, just buffer moves.
marcheu: airlied: no you don't, you just grab more GART
airlied: marcheu: GART is statically sized at startup
airlied: you remove buffers from it
airlied: but that is a buffer move to CPU.
marcheu: on radeon you can choose the pages...
airlied: same as defrag VRAM is a buffer move to GART
marcheu: on nvidia you can choose the pages with a PCIGART
marcheu: so there are ways to extend the gart as you like
airlied: marcheu: you still have a fixed GART limit
marcheu: yes but you can _dynamically_ change its pages
airlied: if you get one big object close to the limit, and two small objects in the middle you will need to move them.
airlied: marcheu: dynamic is still moving buffers around.
otaylor: marcheu: I'd tend to agree with what I think is an overall point you are making - which is that in the short term at least - reporting a "safe size" to userspace for your buffer - is in easy approach. But if you take the approach, it doesnt' matter what's a texture, what's a pixmap, what's a fbo. You have things you read from, and things you write to.
marcheu: airlied: nope swapping pages isn't
airlied: marcheu: but you can't do that midoperation, if the mask/src/dst fill gart
marcheu: airlied: I'm not contesting that
airlied: you need to fit all the buffers in the command stream into GART or VRAM at the same time
marcheu: I'm saying there are ways to implement a workable, mostly migration-free desktop
airlied: if they don't fit because of fragmentation, you need to defrag.
airlied: GART defrag is a lot easier yes, but the dieas are the same
marcheu: nope GART "defrag" is just a page table rewrite
airlied: you kick out buffers until space exists, and you put them all back in so they fit nice.
marcheu: it's free
airlied: marcheu: its cheap.
marcheu: yeah
marcheu: I see us adding bandaid over bandaid here to the point it's depressing really
airlied: 3D games will still suffer the same issue.... no matter what limit you give
airlied: it can always get screwd by fragmentatioin
marcheu: well currently they don't, esp not on r100
glisse: in newer game you have to move texture out & in texture in the same frame
glisse: oups one too many texture word :)
airlied: marcheu: newer games would hit those limits easily
marcheu: airlied: no, because you place extra stuff in AGP ?
airlied: marcheu: I'm placeing extra stuff in AGP now
marcheu: airlied: with my approach, those games will be faster than with your approach
airlied: if a command gets submitted with VRAM and GART limits reached
airlied: you need to exceute it.
airlied: maybe its not a common operation but it needs to be workable.
marcheu: there is no way you can execute such a command, with any scheme :)
airlied: yes I can thats the point
marcheu: if a single command requires more than VRAM + AGP, you're dead, that's it
marcheu: no you can't :)
airlied: marcheu: it doesn't requrie more VRAM + AGP
airlied: it requires the limits
airlied: the stuff can fit in the limits, however the VRAM is fragmented.
airlied: I can't spill over to GART because we already have the limit for it
marcheu: when you start doing that, you're super slow already
airlied: yes but you can't die.
marcheu: you can sw render
airlied: this is the problem slowness is bad, crashing X is worse.
airlied: you can't sw render as you are too late
airlied: you could have sw rendered.
marcheu: you're never too late
airlied: if you knew the kernel limits we wrong
airlied: marcheu: how can you backtrack?
airlied: the command submit is way later than the operations.
marcheu: IMO you're complexifying everything for a very small point: games with to-the-limit textures
airlied: marcheu: no the point is crashing X or game is never the answer
glisse: at least not if we want to go forward :)
marcheu: we're not talking about crashing stuff.. we're just saying it's slow
airlied: if the kernel gives me limits the kernel needs to handle those limits.
airlied: of I have more data thtan the limits I need to sw render
airlied: however its too late for the kernel to tell me I need to sw render when I submit the command strea,
airlied: decreasing the limits to less than actual limits works, but fragmentation can always happen at some point.
marcheu: yeah, you know, on nvidia there's a limit of texturing source that's the vram size. you can't texture more than the whole vram size, it's a hw limit
marcheu: that never bothered nvidia you know
airlied: marcheu:explains the compiz goes white bugs
marcheu: but we soooo need to be handling that case very well ? I don't see why
marcheu: oh that's all about compiz, the misdesigned window manager
airlied: marcheu: we don't need to handle it well, hence it can be slow
airlied: but we do need to handle it.
marcheu: I see the point now
marcheu: bling has won
airlied: well composited desktops are what the people seem to want.
marcheu: well you'll be handling that specific case. and we'll keep on sucking at everything else
airlied: marcheu: having 512MB of VRAM sit idle burning power, unless I run a GL app seems stupid
otaylor: marcheu: what percentage of linux machines (or personal computers, for that matter) are sold to principally play 3d games on?
marcheu: airlied: it might seem stupid, but it'll provide fast fallbacks
airlied: marcheu: how about we just avoid fallbacks.
marcheu: airlied: and it'll avoid the fragmentation issues mostly
airlied: marcheu: you still need to handle the frag issue, might as well get it tested.
marcheu: airlied: you can't, you'll always find the odd desktop that wants $WEIRD_çEATURE$
airlied: marcheu: but those people deserve what they get.
glisse: airlied: btw back to what i was saying is that i think we should not assume the best solution is to always report the higher limit ie each times an app do rendering it can assumes it has all memory, in normal desktop usage i am wondering if we should not instead move the limit of memory per cmd stream along the memory pressure we have so we don't evict this 10Mo pixmap we use everyframe
marcheu: airlied: ATM it's more like "we don't have a single desktop env that properly accelerated everywhere"
airlied: marcheu: GNOME is pretty good here, I don't think we hit any fallbacks on r300 upwards that I would consider slow.
otaylor: glisse: I'm actually curious in what cases we are submitting single buffers that require too much memory
airlied: yes the fallbacks could be faster, but they aren't crippling users.
marcheu: airlied: that depends on your gtk theme, icon size, dont setting, rotation...
marcheu: font setting *
airlied: glisse: yeah scaling the limit is probably a good idea.
glisse: otaylor: well here on 1280x1024 desktop a fullscreen window with compiz on 32M Vram & 16M gart hit the limit :)
airlied: glisse: I think bridgman mentioned how they had to do that, it was better to be unfair everwhere than try to be fair.
glisse: airlied: yup this's what i heard too
otaylor: glisse: Oh, yeah, compiz - because it's just "draw gigantic texture" "draw enough gigantic texture" ...
otaylor: another gigantic texture
airlied: compiz hits the limits really easy, as it just has short cmds references big textures
airlied: esp without TFP
airlied: or DRI2
otaylor: glisse: makes sense.... you have a relatively smal command screen referencing lolts of memory
airlied: sorry without zero-copy TFP
airlied: hitting al-tab in compiz while at the limit always pisses it off.
marcheu: I suppose you could solve your issues with a couple of compiz patches instead
otaylor: glisse: So, yes, for that you'd rather it didnt' evict every single last 2d pixmap out before breaking it's buffer up into two pieces
airlied: as it renders the icons and fallsover quickly.
glisse: well what i think is that we should not set in stone policy and be able to switch it any time so latter we can have some clever algorithm which decide what kind of policy is best given circumstences
marcheu: I'm not too interested in designing a compiz-specific graphics stack...
airlied: marcheu: its just a GL app
marcheu: airlied: with a real weird use of resources
airlied: marcheu: lots of other GL apps do equally large things, virtual forbidden city.
otaylor: glisse: Maybe it's best to leave that decision to the dri driver though
airlied: I think keithp has seen it referencing over 100MB with one command stream.
marcheu: airlied: nope that one uses lots of textures but doesn't hit the limit
marcheu: airlied: i.e. you can split stuff there
glisse: the easiest solution is to buy everyone a 4G gpu card =)
airlied: glisse: I can always open more windows.
glisse: i will touch a word to bad santa about that
otaylor: glisse: You have N megs of read space and M megs of write space that you *can* use. Then leave it up to it to decide how much it actually uses before deciding to flkush buffers
airlied: thats what we have now pretty much.
airlied: the flush occurs around 80-90% in the DDX
glisse: otaylor: oh i agree with that, i am just telling that we should report the new limit as an output of cs ioctl
glisse: so to have today scheme we always report 3/4 of memory as a limit
airlied: the limit also determines when we hit sw fallbacks
glisse: and at latter point we can change how and what limit the ioctl report
airlied: so we need to be careful to not use the limit for a single operation I think
otaylor: glisse: it seems easier to tune that from user space
glisse: otaylor: i wish too but kernel is in charge here
otaylor: glisse: And if you leave it up to the client, then the client can know "this command stream really takes 95% of all available memory to execute, and I have to swrender to break it up, so let me cause the evictions and take the full 95%"
otaylor: glisse: Even if in most cases it is saying "let me stop at 75% (or 50%)"
marcheu: technically, you'll be migrating back & forth until the radeon card crashes
marcheu: I think we're going into the wall here
otaylor: marcheu: well, you have to have anti-migration heuristics
glisse: otaylor: maybe reporting a score might be better like somethings in [-1;1] so -1 means you caused a lot of eviction
glisse: and 1 means you totaly fine
otaylor: marcheu: the kernel shouldn't put a buffer into vram from gart if it has reason to believe that it's going to be immediately migrated back
glisse: because the app could use 75% of memory but if there is another app which use 75% too i think we would rather want them to use 50% each
glisse: of course this is trying to be fair
otaylor: marcheu: and a reasonable heuristic might be *never* to migrate a buffer from vram to gart for a read, at least in vram starved system
otaylor: marcheu: err, from gart to vram
otaylor: glisse: yeah, fair is hard :-(
glisse: fair is perfection we will never reach :)
marcheu: otaylor: then you're saying that once vram is filled, you have to basically start using GART...
marcheu: otaylor: which is what I said, half an hour ago
airlied: at the moment we use GART for reads.
airlied: unless we have written to the buffer previously in VRAM
otaylor: marcheu: irc isn't the best mechanism for productive discussion always :-)
airlied: granted I could switch all pixmaps to GART and see what the results are for fun.
glisse: memory management is the devil
otaylor: airlied: sounds right. (Seems like a common theme this evening. "otaylor: the driver should do X. airled: it does X. otaylor: oh."
otaylor: )
marcheu: otaylor: which is also how r100 works, and why it outperforms r200 with the highest q3 texture settings
otaylor: marcheu: cool!
glisse: marcheu: also beceause r200 always try to make texture fit in vram an cause a lot of eviction per frame
marcheu: glisse: yes exactly what you're willing to implement now
glisse: if r200 start using gart once vram is exhausted i think it would outperform r100 easily
marcheu: sure, I'm not saying r100 or r200 is better, I'm using those as use cases :)
glisse: marcheu: we are talking desktop usage here
glisse: with new app opening anytime
glisse: for games i too want to fill ram and then fill gart
glisse: and avoid at any cost eviction
glisse: this is why i was saying we should be able to change policy dynamicly
glisse: so once we go fullscreen we can adapt a policy that will do the best for this case
glisse: but in desktop mode we have to be unfair to everyone
airlied: well userspace driver decides where to put things it could do it all itself
glisse: we can allow each app to think it has the whole vram for itself
airlied: never let the kernel choose.