A Primer on Data Scrubbing, and Why It’s Essential to Keeping Small Businesses Fresh

A Primer on Data Scrubbing, and Why It’s Essential to Keeping Small Businesses Fresh

By: Hewitt Lee, Director of Synology Product Management Group

A Primer on Data Scrubbing, and Why It’s Essential to Keeping Small Businesses Fresh

A common perception held by many is that digital data is far more resistant to deterioration compared to physical objects – think of crinkled and torn documents, yellowed pages, as well as faded ink and photographs. Yet as time goes by, data may fall victim to slow degradation that impacts data integrity. Worse still, this can occur silently without any warning.

This is especially important for entrepreneurs – small business owners have recognised the need for data records in the running of any organisation, which help to identify both potential problems as well as business opportunities. Corrupted data can have serious implications for businesses, such as falsely identifying issues or opportunities that don’t exist. Before that happens, one of the primary methods of proactively dealing with the issue is data scrubbing: utilising software to inspect data volumes and modifying any inconsistencies detected.

There are two main types of data-scrubbing: RAID scrubbing, and Btrfs data scrubbing. For businesses that have smaller teams, it will be useful for business leaders to familiarise themselves with the advantages and potential issues with each.

RAID scrubbing

RAID stands for redundant array of independent disks. Simply put, it combines multiple drives into a single storage pool, offering fault tolerance and data redundancy, and is a standard method of data storage management. There are different RAID levels or configurations depending on the needs of users, offering different levels of reliability, availability, performance, and capacity.

For example, in RAID 5, sequential data is written across at least three different drives; if one of the drives fail, RAID 5 will repair the missing data by using the content on the remaining drives. By using math functions and the introduction of “parity blocks” across the drives, data can be recalculated to develop redundant copies.

Failure to recover your data is something serious, so it’s vital to retain data consistency. RAID scrubbing scans all the contents in an array, making sure each parity chunk in stripe satisfy the functions; should the content fail the function, the parity will be recalculated repeatedly until all values are consistent.

Unfortunately, data cannot be guaranteed to remain intact even with a regular RAID scrubbing schedule. The problem is that RAID scrubbing can only ensure data consistency. That is, it cannot tell which data block is incorrect. If a block is corrupted, every other block will be “consistently corrupt” as well. As such, sole reliance on RAID scrubbing may pose a potential risk for businesses.

Btrfs data scrubbing

File system data scrubbing employs a checksum mechanism to check the volumes in the Btrfs file system. If any data that is inconsistent with the checksum is detected, the system will try to use the redundant copy to repair the data. Once you enable data checksum when creating a shared folder, the Btrfs file system will calculate a checksum (data checksum) for every written file, and further protect that data checksum with another checksum (metadata checksum).

Every time data scrubbing is conducted, the file system will recalculate the checksum and compare it with the previously stored data checksum. Meanwhile, the data checksum will cross-check its corresponding metadata checksum to make sure the data checksum itself is intact. Once data corruption is detected, the system will try to repair the corrupt data by retrieving the redundant copy.

One thing to note, though, is that Btrfs data checksum may take a toll on system performance. It’s not suggested to enable data checksum if it’s a shared folder storing databases, virtual machines, or surveillance video recordings. Rest easy if you only store documents or photos in shared folders or if you use these folders for file access or sharing, as it has a very modest influence on performance.

Keeping data integrity risk at bay

Given all this, the two data scrubbing methods work best when combined, helping to ensure data integrity. Some data management solutions offer both: when running data scrubbing on a Btrfs volume, file system data scrubbing will be performed first to make sure the data is accurate. RAID scrubbing will be implemented next to achieve data consistency. They work together to mitigate the risk of silent data corruption and help businesses maintain a healthy storage system.