Borked RAM
lolfighter
Snark, Dire Join Date: 2003-04-20 Member: 15693Members
in Off-Topic
<div class="IPBDescription">Now what?</div>So here's the deal: My computer usually has two RAM sticks (we'll call 'em stick A and B) in two RAM slots (slot 1 and 2). Suddenly my computer was crashing a lot, both individual programs, drivers, bluescreens, the whole shebang. So I run memtest, and lo and behold ######loads of errors. So I yank stick B (in slot 2) out and test again. No errors found. I'm now going to run the computer for a while to see if it's stable.
Assuming it remains stable, I take this to mean that slot 1 and stick A are both good, which means either slot 2 or stick B is bad, or both. This means I have two options going forward:<ul><li>Remove stick A from slot 1 and put stick B in slot 1. If errors, stick B = bad. If not, slot 2 = bad.</li><li>Move stick A from slot 1 to slot 2. If errors, slot 2 = bad. If not, stick B = bad.</li></ul>Any comments? Does it matter which one of the methods I use? Is one of them inadvisable? And I should probably try both to be sure, shouldn't I?
Assuming it remains stable, I take this to mean that slot 1 and stick A are both good, which means either slot 2 or stick B is bad, or both. This means I have two options going forward:<ul><li>Remove stick A from slot 1 and put stick B in slot 1. If errors, stick B = bad. If not, slot 2 = bad.</li><li>Move stick A from slot 1 to slot 2. If errors, slot 2 = bad. If not, stick B = bad.</li></ul>Any comments? Does it matter which one of the methods I use? Is one of them inadvisable? And I should probably try both to be sure, shouldn't I?
Comments
<a href="http://www.downloadmoreram.com/" target="_blank">http://www.downloadmoreram.com/</a>
But otherwise, sounds good.
Today: Computer dies again, I test RAM again, RAM is bork. On a whim, I move RAM stick A from slot 1 to slot 2 (I'm switching numbering system now, stick B was in slot 3 until I removed it, 'kay?) and test, test runs well. In other words, no matter where the problem lies, it isn't consistent. This is going to be fun...
The options I am looking at right now are:<ul><li>One or both sticks are kaput.</li><li>One or both of slot 1 and 3 are kaput (both would be somewhat plausible since slot 1 and 3 (and 2 and 4 for that matter) can be used for dual-channel shenanigans, thus that "channel" could be out of order).</li><li>The entire memory controller is kaput.</li></ul>And on top of that, whatever the error is it's intermittent. This is going to be fun, huh?
This order is actually only to ensure you can use the RAM in dual channel mode. It's a system that only works when you have two paired sticks and allows faster memory access.
Intermittent errors = pain in the arse. Typically these are heat related, though. i.e. when cool, things work OK, but eventually its heats up and causes the error.
If you want to resolve this once and for all, I'd just re-build. However, it sounds like your sticks are OK, but the memory accessing system on the mobo is borken. Have you tried placing it in slots 2 and 4 and using the 2nd channel instead? Or, more correctly, will the mobo let you? Typically they don't actually care what order, just make sure if you want to use the dual-channel support that the pairs are placed in pairs.
That's what she said!
Yeah. The other option is go through their annoying help lines and then get them to "fix" it for you by shipping it back to them. Sometimes though they just do some basic tests and ship it back. Or replace some component.
Heh, I had an HP laptop (the one with a rotating touch screen) that didn't want to boot up (not even powering up, no lights, no nothing). I sent it back for repairs, they tested it and sent it back.
Their verdict?
Nothing wrong.
The computer still wasn't working of course. After much warranty seal-breaking, unscrewing and other taking apart-ing, I (note that I != HP 'service') found out that the screen short-circuited whatever something, which basically meant that I could only work on my laptop after removing the screen. I guess their testing method is automated, they don't even look at the PC and just toss it in the "test machine"
Their verdict?
Nothing wrong.
The computer still wasn't working of course. After much warranty seal-breaking, unscrewing and other taking apart-ing, I (note that I != HP 'service') found out that the screen short-circuited whatever something, which basically meant that I could only work on my laptop after removing the screen. I guess their testing method is automated, they don't even look at the PC and just toss it in the "test machine"<!--QuoteEnd--></div><!--QuoteEEnd-->
I've known more than a few people who owned HP pcs and laptops that suddenly went dead-dead as you describe. Coincidentally it usually happened just after warranty expiration.
Their verdict?
Nothing wrong.
The computer still wasn't working of course. After much warranty seal-breaking, unscrewing and other taking apart-ing, I (note that I != HP 'service') found out that the screen short-circuited whatever something, which basically meant that I could only work on my laptop after removing the screen. I guess their testing method is automated, they don't even look at the PC and just toss it in the "test machine"<!--QuoteEnd--></div><!--QuoteEEnd-->
This is why I refuse to buy recertified products.
Previously used, they claim it's working, but since it's used their lifespan has been chopped by a few months/years.
Also try what Rob said if you can find another PC with mobo/OS that supports the type of RAM you have.
Not trying to be patronising, but in case you didn't know, you can quickly check how much RAM your PC is recognising either in the BIOS or during the boot sequence (if you aren't setup to skip this visualisation).
Don't forget about parts people sent back claiming they don't work.
The tests say it does work? Put it in a box and resell it!
This is why I refuse to buy recertified products.<!--QuoteEnd--></div><!--QuoteEEnd-->
Which is mostly funny, because it was new.
So why do I say that? Well, since my memory woes began I have replaced memory, processor AND mainboard. And still the problems persisted. I had sort of given up, sort of learned to live with, and sort of gotten really angry at computer hardware in general. And then I sort of discovered I had more money than I thought I had and could thus budget in a new graphics card.
So why'd I buy a new graphics card when I had a GeForce 8800GTS, a card that is still far from poor by today's standards? Because the temperature sensor was busted, and it really, really annoyed me. Imagine this: You boot up the computer and the fan runs at full blast. And it keeps running at full blast. Until you start doing something graphically demanding, at which point the fan goes down to 60% (the lowest setting by default) and stays there while the GPU gets hotter.
So I installed Rivatuner and started controlling the fan manually. Launch a game, fan to full power. Quit game, fan to low power. Except sometimes when the computer booted up (defaulting to my low power setting) apparently something in the card would try to override the manual settings and set the fan to full, resulting in a sort of tug-of-war between the two, with the fan constantly spooling up and down. Very annoying, and the only way to get it to stop was to "prime" the GPU with something heat-generating. Add to that inconsistent temperature readings and the card was obviously in poor shape. But hey, it was a fan/temp sensor problem, right? Couldn't possibly be affecting my computer stability, right? So I let that problem slide for a while. For far, far too long as it eventually turned out.
Because guess what? The problem's gone now, after I yanked the old graphics card and put a new one in. It's been over a week with no program or system crashes, so I'm reasonably certain it's over now. So there you have it: A faulty graphics card causing memory errors. You heard it here first kids. I suppose it's not that implausible really - the graphics card IS a north bridge unit, and thus if it screws up it's possible that it could make the north bridge and/or any associated units go "ach, mein Leben!" It's still a weird error that was hard to diagnose, although in hindsight I should've spotted it long ago.
On a personal note, the USB-code BSOD crashes are NOT necessarily caused by any USB controllers.
I think what eventually made me begin to suspect what the actual problem might be (I DID suspect the GPU even before I swapped it out - it was the only unit LEFT that could be the problem!) was that the problem would almost inevitably disappear if I powered the system down entirely (switched off on the power bar) for 10+ minutes. Apparently a mere shutdown didn't entirely re-initialise all the relevant components. And all of this was odd enough to make me begin to suspect that the problem was actually showing up in a different device than the one that was causing it through some sort of failure propagation.
Also, sudden flash of insight. I may actually have some leftover memory chips that I thought to be faulty but which might still be good. I'll have to see if they fit with my new ones. :D
So is your memory still faulty or was it indeed the GPU? If it was really the GPU doing something funky with the memory, then your memory should be fine after it gets cleared.
So is your memory still faulty or was it indeed the GPU? If it was really the GPU doing something funky with the memory, then your memory should be fine after it gets cleared.<!--QuoteEnd--></div><!--QuoteEEnd-->
Specifically, the way your Mobos/CPU do stuffs is that it maps the memory in fun ways to access your peripherals. So, yes, peripherals screwing with things can and will influence the rest of your computer.
And so yes, apparently replacing the GPU fixed it all, in theory. Assuming that was the only problem.
I broke one memory module and my gfx card >.>