Game runs too fast on newer Ryzens and such - fix described

new

« Prev
1
Next »

reapersms
Noobie

Posts: 6

Game runs too fast on newer Ryzens and such - fix described Jan 29, 2023 19:20:03 GMT -5 Antifen likes this

Quote

Post by reapersms on Jan 29, 2023 19:20:03 GMT -5

Symptoms: UI is very unresponsive, if you are able to get into the game, you run around at warp speed, jumps are instant, camping is sped up, etc.

People who would run into this: Nominally any cpu faster than 4.2 GHz, with some variation between AMD and Intel

Root cause: eqgame.exe, as part of initializing its timers, asks EQGraphicsDX9.dll for the CPU speed. EQGraphics cut some corners, and used entirely 32 bit math when calculating the speed over a 1 second period, and returns drastically wrong values past 4.294 GHz or so. At 4.7 GHz, it means eqgame thinks time is progressing about 11x faster than it should.

The Fix: by patching about 60ish bytes of EQGraphicsDX9.dll, the math can be corrected to handle things correctly until CPUs hit 17 GHz. An IDA DIF file is attached with the particulars, applying it is a bit of an exercise for the reader right now with a hex editor or some tool that can apply those.

Lengthy explanation and line by line description of what exactly the patch does to follow.

Attachments:

EQGraphicsDX9.dll.dif (1.54 KB)

reapersms
Noobie

Posts: 6

Game runs too fast on newer Ryzens and such - fix described Jan 29, 2023 19:57:13 GMT -5

Quote

Post by reapersms on Jan 29, 2023 19:57:13 GMT -5

During startup, eqgame.exe loads the EQGraphicsDX9.dll into memory, and asks it for a "timestamp ticks per millisecond" value from the exports EQG_GetCpuSpeed2 and EQG_GetCpuSpeed3.

These two functions work nearly identically to each other. They poll one of the windows time functions until they see it change value, grab a baseline timestamp counter value, poll the time function again until a second has passed, grab a new timestamp value, and return (new - old) / 1000. They also have some disabled debug prints in there, but those are leftover cruft.

The difference between the two is which windows time function they use -- GetCpuSpeed2 uses the windows multimedia timer (timeGetTime), which has millisecond precision, but not necessarily millisecond accuracy. GetCpuSpeed3 uses the _time32 function, which has second precision, and a somewhat variable accuracy.

A rough C reconstruction of EQG_GetCpuSpeed2 looks something like this:


uint32_t EQG_GetCpuSpeed2()
{
	uint32_t time_begin, time_base, timestamp_base, timestamp_end;
	uint32_t rate;
	char Buffer[64];
	
	time_begin = timeGetTime();
	do {
		time_base = timeGetTime();
	} while ((time_base - time_begin) <= 1);
	timestamp_base = __rdtsc();

	while (timeGetTime() - time_base <= 1000);
	timestamp_end = __rdtsc();

	rate = timestamp_end - timestamp_base;
	rate /= 1000;
	
	sprintf(Buffer, "TimeGetTime-cpuSpeed: %d\n", rate);

	return rate;
}

The structure is a little messy, as it's back-constructed from the raw assembly. The first loop waits for when more than 2 milliseconds have passed, so it knows it's relatively close to a transition. The second loop waits for 1 second.

Where the problem comes in is that it's only saving the lower 32 bits of the timestamp counter, and the division code only deals with 32 bit values.

The fix effectively changes timestamp_base and timestamp_end into 64 bit values, and splits the math up into rate = (timestamp_end - timestamp_base) / 4; rate /= 250. I was able to do this without shifting any of the other code around, moving any of the function calls, or requiring any more stack space. The assembly view of the EQG_GetCpuSpeed2 part of the patch looks like this, original on the left, patched instructions on the right, minus a couple of nops where some things got shorter:


.text:10011CD6 054 83 F8 01                                cmp     eax, 1
.text:10011CD9 054 7E F5                                   jle     short loc_10011CD0 ; spin until we see it tick from 1->2
.text:10011CDB 054 33 DB                                   xor     ebx, ebx
.text:10011CDD 054 89 5C 24 10                             mov     [esp+54h+var_44], ebx				rdtsc
.text:10011CE1 054 0F 31                                   rdtsc                   ; grab the TSC			mov	[esp+54h+var_40], edx
.text:10011CE3 054 89 44 24 10                             mov     [esp+54h+var_44], eax									
.text:10011CE7
.text:10011CE7                             loc_10011CE7:                           ; CODE XREF: EQG_GetCpuSpeed2+30↓j
.text:10011CE7 054 FF D7                                   call    edi ; timeGetTime
.text:10011CE9 054 2B C6                                   sub     eax, esi
.text:10011CEB 054 3D E8 03 00 00                          cmp     eax, 1000       ; spin for 1 second
.text:10011CF0 054 7E F5                                   jle     short loc_10011CE7
.text:10011CF2 054 89 5C 24 0C                             mov     [esp+54h+var_48], ebx				rdtsc
.text:10011CF6 054 0F 31                                   rdtsc							sub	eax, [esp+54h+var_44]
.text:10011CF8 054 89 44 24 0C                             mov     [esp+54h+var_48], eax				sbb	edx, [esp+54h+var_40]
.text:10011CFC 054 8B 4C 24 0C                             mov     ecx, [esp+54h+var_48]				shrd	eax, edx, 2
.text:10011D00 054 2B 4C 24 10                             sub     ecx, [esp+54h+var_44]				mov	ecx, eax
.text:10011D04 054 B8 D3 4D 62 10                          mov     eax, 10624DD3h  ; / 1000
.text:10011D09 054 F7 E1                                   mul     ecx
.text:10011D0B 054 8B F2                                   mov     esi, edx
.text:10011D0D 054 C1 EE 06                                shr     esi, 6						shr	esi, 4
.text:10011D10 054 56                                      push    esi
.text:10011D11 058 8D 54 24 18                             lea     edx, [esp+58h+Buffer]
.text:10011D15 058 68 14 2F 13 10                          push    offset aTimegettimeCpu ; "TimeGetTime-cpuSpeed: %d\n"

RDTSC leaves the 64-bit timestamp result spread across EDX and EAX.

The C compiler explicitly wrote some zeros into the stack that would get completly obliterated, which left room in the code to save the upper half of the timestamp, and I used the first dword of the debug text buffer to store it. The first change just swaps the zero store and timestamp read, and changes the zero store into an upper word store.

The second patch takes advantage of the compiler wasting some time and space stuffing the later timestamp value through the stack. The RDTSC is moved up an instruction, the real 64-bit difference is calculated into EDX:EAX, subtracting from the baseline value, which is then shifted right 2 bits for a divide by 4, and the lower half put into ECX to move right into the divide by 1000 section. That is that strange multiply and shift chunk. The compiler takes advantage of the fact that it knows the divisor, and changes it from (d / 1000) into something along the lines of d * (2^32 / 1000) >> 6. The nasty details of how it comes up with those numbers can be found in chapter 8 of the Athlon Optimization Guide

As it so happens, the only difference between divide-by-1000 and divide-by-250 is the shift amount.

By dividing by 4 with the 64-bit shift, the intermediate value only overflows 32 bits for the later divide when it clears 16.8 billion or so. Those are the patch entries for 110DD through 1110F, the 11151 through 1118F range is the same, but for EQG_GetCpuSpeed3. The salient differences between them are a slightly different stack offset within the function, and the _time32 one cleared the existing space with immediate 0 writes, which were much larger instructions, so there are several more NOPs dropped in.

reapersms
Noobie

Posts: 6

Game runs too fast on newer Ryzens and such - fix described Jan 29, 2023 20:25:36 GMT -5

Quote

Post by reapersms on Jan 29, 2023 20:25:36 GMT -5

Historical Timestamp Counter Trivia, and a hypothesis as to why this might not come up on Intel as often, and why it's not quite the same as the other historical AMD issues with it:

Back in the dark ages, when cellphones were beefy enough to kill a man, if you wanted to know how long a sequence of code took, you either had to look at some system timers, and do some math, to get an approximate number, or spend a long time manually tallying up each instruction that the compiler generated, taking into account loops, etc, etc. It was rather tedious and error prone.

Once the Pentium showed up, and started executing multiple instructions at once, with some arcane restrictions, Intel threw everyone a bone and provided the RDTSC instruction. It would return a 64-bit count of cycles since reset. There was much rejoicing, as it it was generally far more accurate and lower overhead than most of the other approaches. There were issues with multi-processor setups, where the two counters wouldn't necessarily be in sync with each other, but those issues were generally ignorable for the consumer level until much later. Another issue that could come up was that since it only tracked cycles since reset, you had to work around what happened if the OS set you aside and ran some other long, slow process between your timestamp samples.

Windows provided a somewhat abstracted interface to it, via QueryPerformanceCounter, but while it works, it has a slight bit of overhead above and beyond raw RDTSC instructions -- so naturally game developers had a tendency to bypass it.

Later, after there were enough Pentium and Pentium-compatible chips out there for developers to actually start relying on the feature, that particular quirk came to a head when dual-core processors started showing up in volume. The original AMD dual-core issue reared it's head here, as if the OS bounced the thread around the cores, it could see drastic shifts in the perceived rate of time. Both Intel and AMD had some mechanisms to let the OS smooth that over a bit (generally by letting it reset the count value whenever it changed a task around, so software could see the TSC as a process-relative time), but there were teething issues with getting Windows updated to take advantage of that.

Things were fine then for a while, until the thermal issues started popping up, and then the CPU fellows had the grand idea to start downclocking or overclocking the cores to stretch that heat budget further. Suddenly, that reliable TSC tick rate went right out the window -- hence the second round of AMD issues, and the "turn off Cool & Quiet", "Have other things running so the core is woken up at startup", etc. fixes came around. Around that time, the motherboard started providing its own high-quality timer, usually something closer to the bus frequency than the CPU frequency. Windows would shift QPC over to that on appropriate systems. It wouldn't be as precise as the timestamp, but still plenty for things like making sure your game doesn't suddenly jump to 1000 fps and turn inside out.

Somewhat more recently, AMD & Intel have provided indications as to whether the timestamp counter rate changes during execution or not. I believe what happened is Intel chose something along the lines of some small multiplier to the bus frequency, and AMD decided to just have it always track with either the normal clock or the boost clock, regardless of whether the core was running at full blast. My intel machine got hit by lightning though, so I don't have anything on-hand to test that theory, but it would somewhat explain the apparent AMD-specificity of the issue.

Shoutbox

Sturm: The server upgrade has been completed. Thank you all for your patience while we worked through the bugs. Nov 15, 2024 18:46:18 GMT -5

Sturm: During this time period the server will be unavailable while we perform hardware and software upgrades. discord.gg/XuQ83xNw?event=1300627118056800256 Oct 28, 2024 20:15:01 GMT -5 *

Sturm: Sunday's Reboot will be happening earlier in the morning this week. We've got to be at Comic Con for 11am. Reboot should happen around 8am. Spread the word! Aug 22, 2024 19:49:12 GMT -5

Bupu: Server goes down after fishing and maybe a Skirmish TONIGHT, approx. 8 PM Pacific. We will allot a short time after we finish for ornament submission, about 30-45 minutes. We will be down a minimum of 48 hours. Jun 28, 2024 8:07:57 GMT -5

Bupu: We are 1 day from the move, server goes down after fishing and maybe a Skirmish Friday June 28, approx. 8 PM Pacific. We will allot a short time after we finish for ornament submission, about 30-45 minutes. We will be down a minimum of 48 hours. Jun 27, 2024 10:31:33 GMT -5

Sturm: We're going to be doing the reboot tomorrow much earlier than anticipated. 6am PST that's 11 hours from now. Spread the word! Jun 8, 2024 21:09:29 GMT -5

Sturm: We had a power outage last night and that messed up something with our ISP.
We are aware the server is down and are working to resolve the issue.
Estimated downtime is undetermined at the moment.

Please join our discord to remain informed. May 17, 2024 13:55:58 GMT -5

glyph: Clickie Mar 24, 2024 18:21:24 GMT -5

riot: too early for reset... Mar 10, 2024 10:04:51 GMT -5

riot: Imperium not showing up in server list this morning? is it down Mar 10, 2024 10:04:38 GMT -5

lgm444: Cheers Mar 9, 2024 7:27:46 GMT -5

Bupu: #coords Mar 8, 2024 20:31:26 GMT -5

lgm444: What's the command to call for help with an NPC at your loc? Feb 28, 2024 3:38:15 GMT -5

Bupu: Miss you too Socar, things been crazy. Pop into stream tomorrow, if you can

Oct 5, 2023 18:00:42 GMT -5 *

socar: how is everyone doing. Miss you all my life got real hectic and still continues to be a daily grind. Oct 5, 2023 17:50:07 GMT -5

Sturm: We're back online! The move went great.
Happy Adventuring! Oct 1, 2023 17:15:49 GMT -5

Sturm: Please see the discord #general channel for more information about the server outage. You can find the initial notification here: imperiumserver.boards.net/thread/3415/server-outage-09-27-23 Sept 26, 2023 20:14:11 GMT -5 *

twister: Server down ? Jul 21, 2023 14:03:13 GMT -5

Sturm: I replied to the thread you started kenaida

May 25, 2023 14:50:00 GMT -5

kenaida: Launcher Problem - when I run the imperium patcher, after it completes the patching process, it just says "Close" instead of "Launch Everquest" does anyone know why that might be? The patcher files are in the same root folder as the game. May 25, 2023 11:33:43 GMT -5