About a month ago the new Ubuntu 8.04 was released and I wanted a clean install. I downloaded an image and burned it to a CD. Just before installing, I tried “check CD for defects” and found a few. Turns out (*) this was because of bad memory – and memtest confirmed it.
So I went to the shop, replaced the bad memory, and also bought two new sticks. I went home to install the new Ubuntu, and after the installation, Firefox crashed. After rebooting back to memtest, I saw this:
Back at the computer shop, they asked me to reproduce the errors. Just firing up the computer and booting directly into memtest didn’t seem to do the trick, so I suspected that I had to overwork my computer a bit to reproduce this.
Since I was at the lab, I didn’t want to muck around too much.
So I thought, “what’s the quickest way to give your CPU a run around the block?”
That’s right – a tight loop:
while True: pass |
However, this snippet doesn’t really play with the memory.
The next simplest thing to do, that also jiggles some ram is the following (and one of my favorites):
In [1]: x = 4**(4**4) In [2]: y = 4**x |
I will talk about this peculiar piece of code at a later post.
In any case, this snippet also didn’t reproduce the error. It is also quite unwieldy, as it raises a MemoryError after some time. Later at home I tried two more scripts.
The first is a variation on the one above:
x = 4**(4**4) while True: try: y = 4**x except MemoryError: pass |
I ran a few of those in parallel. However, my Ubuntu machine actually killed the processes running this one by one.
The second is smarter. It allocates some memory and then just copies it around:
import sys import copy megabytes = int(sys.argv[1]) x1 = [["a"*1000 + str(i) for i in range(1000)] for j in range(megabytes)] while True: x2 = copy.deepcopy(x1) |
After both of these scripts didn’t reproduce the problem and it still persisted arbitrarily, I returned the computer to the lab. Turns out that the two replacement sticks and the two new sticks weren’t exactly identical, and that was the cause of the problem. So now my memory is well again.
As for the scripts above, I once wrote a similar script at work. I was asked to help with testing some software in some stress testing. The goal was to simulate a heavily used computer. A few lines of Python later and the testing environment was ready.
Footnotes:
(*) – Finding out that it was a memory issue wasn’t as easy as it sounds. I didn’t think of running memtest. I checked the image on my HD with md5, and the hash didn’t match. I downloaded a second image, and again the hash didn’t match. I checked twice.
At this point I was really surprised: not only the second check didn’t match the published md5, it also didn’t match the first check. Some hours and plenty of voodoo later, a friend suggested running memtest, and the culprit was found.
Next time (well, hope never ;o), maybe try fooling around, first:
http://www.linuxjournal.com/article/4489
http://badmem.sourceforge.net/
http://rick.vanrein.org/linux/badram/
http://www.ibm.com/developerworks/library/l-hw1/
Just for the fun of it, of course.
;o)