Partition Data instead of Archiving (PNA)

Having history on line and available while improving response times and optimizing backup times.

A modern way of storing past data without the need of tedious archiving and recovery processes.

One of the biggest problem that IT/CIO managers have always faced is the growth in terms of “volume” of data of their system after years of activity.

The side effects created by this problem are a continuative decrease of the system performance, a continuative increase of the needed time for data administration and backup, additional cost for the hardware (disk space and disk performance).

Data archiving is the old school solution, but in Datatex we believe that the “table partitioning by data” methodology is a better and more effective alternative considering the possibilities provided by the latest technology. Data archiving require an archiving process onto additional infrastructure and a tedious complex recovery programs when this data is required to be accessed.

The application of “table partitioning” showed in many cases the following results:

Improve response times

Improve back up times

Optimize usage of fast expensive disks

This article aims to introduce a technology which with the possibility of partitioning the tables we can avoid the use of data archiving. It is an extremely useful technology as it improves the efficiency of the database and consequently the performance of the system. The partitioning of tables is automatically and transparently managed by the database itself, without creating additional work for the application or for the EDP staff that manages the system. Basically, it consists of dividing a table into several parts that can be managed separately. The division is made according to logical criteria that we will explain below. Normally the most used criterion is the division by date: you can divide a table keeping in one part the most recent data and in the others the most historical ones, which, being consulted much less frequently, have a lower impact on the system.

In addition to defining the partitioning logic, we will also define the positioning on the system disks, keeping for example the most consulted partitions on the most performing disks and the partitions with historical data on the less efficient disks.

The backup phase can also benefit from this partitioning System because it can be done on single partitions in a quicker way. The technology of table partitioning is very useful because when a database is very big, the management of archived objects can be very expensive and problems can arise with data retention and recovery.

This solution solves these problems, providing fast response times and allowing to organize data effectively, in a customizable way, avoiding relevant high costs and without moving older data to an external  archive.

There are multiple methods of partitioning a table:

This way, thanks to an efficient organization of the disks,  recent and most used data can be located in the most modern disks. With more memory costs will be  reduced  since the most performing disks are used for most recent data and the less expensive disks for the less used data storage. This leads to faster response speed and lower costs. In addition, all administrative database tasks (such as backups) are optimized because they can be performed on individual partitions.

A further strength of this solution, compared to external storage, is that data always remains online and is not moved to any external device; this is very important to ensure uniformity and efficient retrieval of data. All data stays online, only changes its organization as it is moved to different disks and partitions that offer different performances depending on how frequently it us used.

This solution offers the best balance solving performance and time issues mentioned at the beginning. It is also an operating system independent solution and transparent with respect to the management system used: it can be  used with any database with no need to change anything of its pre-existing settings, this solution does not impact all the activities carried out and decisions made previously.

We can then summarize the benefits of table partitioning as follows:

Improved response times and query performance: The DB2 optimizer is aware of the partitions and if the query needs to scan many rows to get a result set and the predicate (where clause) uses ranges that have been defined, then only partitions that have rows that satisfy the query will be searched instead of searching the entire table.  This is called “partition elimination” and can greatly reduce query elapsed times.

Improved backup times: Various maintenance activities like reorganization can be done on individual partitions which will be very quick compared to reorganize the entire table. Working with separate partitions is easier and faster as certain operations can be chosen on individual spaces, and this will reduce time. Reduced backup times means that system resources are available for user transactions other than saving them on backup storage.

Optimization of the disks’ usage: Data remains online and there is no need to move it to external devices. It allows an efficient use of storage through a different organization: table partitioning helps us to better use storage, in fact old partitions which are accessed less frequently, can be placed on old/slow storage while in the newer and more performing storage there are the most used and recent info.

High degree of flexibility and customization: Tables can be customized and partitioned to fit specific tasks. It is very easy to adjust partitioning criteria to the specific needs, which can change over time. In fact, it allows optimized roll-in or roll-out processing of ranges: new partitions can very quickly be attached, and old partitions can quickly be detached as needed.  In this process loading or inserting data into a new partition before making it part of the partitioned table.  This load process has no more effect on the use of our subject table than the load of any other independent table in the database.  Once loading is complete to the new partition the ATTACH command is used to integrate the new partition into the primary Range Partition Table.

The best partition criteria will be selected According to each table and data distribution: g. on StockTransaction the most suitable criteria is the TRANSACTIONDATE or on other table it can be the CREATIONDATETIME and on ADSTORAGE can be the UNIQUEID but also based on real data by ENTITYNAME.

Datatex has conducted tests on real customer environment in order to measure the potential benefits of the solution in terms of end-user performance and results show that benefits can change according to different data and different way of using the ERP system. All factors influence the result and the most important thing is to work all together to understand which are the best partitions for the performance.

Written by Roberto Mazzola – Datatex CTO