Uncategorized

CPUs are so sophisticated that they make mistakes more and more often

In data centers, incorrect calculations are more and more common. (Image: Facebook)

Google and Facebook have encountered increasingly common unpredictable calculation errors that can be traced back to CPU. This can have serious consequences.

Until now, CPUs have been considered largely reliable in terms of their calculations, despite recurring calculation errors. That seems to be changing now, as reported by Google, among others. The more and more sophisticated CPUs apparently calculate incorrectly more and more often, which is particularly evident in large data centers. Facebook recently noticed an increasing so-called “silent data corruption”.

Google and Facebook: CPU corrupt data

Google engineer Peter Hochschild reported last week at the Hot Topics in Operating Systems (HotOS) 2021 conference that production teams at the search engine company were increasingly complaining about machines that would corrupt data. The machines would have damaged various stable and actually error-free applications. In conventional investigations, however, no errors could be found, according to a corresponding one report.

The Google engineers then turned their attention to the hardware. The result: hardware errors occurred more frequently than expected. In addition, the problems would have appeared sporadically and long after installation – and especially with individual CPU cores. Google describes the phenomenon as Silent Corrupt Execution Errors (CEE) and the incorrectly behaving cores as unpredictable.

Google blames CPU designs

Back in February, Facebook published a report in which the social media group described silent data corruption as a phenomenon that is now occurring more often in data centers than it should be predicted. Facebook did not give a reason for this. For Google, meanwhile, it is clear that the ever faster computing and smaller CPU designs are responsible for how The Register writes.

Almost finished!

Please click on the link in the confirmation email to complete your registration.

Would you like more information about the newsletter? Find out more now

The problem: the calculation errors can have serious consequences. A CPU in a Google data center is said to have carried out a kind of unpredictable ransomware attack in which the machine encrypted something – incorrectly – in such a way that only it could decrypt it again. The experts also see crashes and data loss as increasing challenges. Google and Facebook now want to expand their tests to find solutions to the problem.

You might be interested in that too

Leave a Reply

Your email address will not be published. Required fields are marked *