Hugepage Library






Transparent Hugepage Support for Linux

Preliminaries:

Since kernel version 2.6, Linux is able to handle different page sizes concurrently. Thus, beside using 4 KB pages, processes are able to handle so called hugepages with 2 MB, 4 MB or 16 MB size. These pages can be accessed via the HugeTLBfs, which is a pseudo filesystem. Many applications with large amounts of memory (especially HPC applications) can benefit hereof because of the following reasons:
  • TLB misses may decrease perceptible and
  • data have a better physical locality in RAM.
To bypass the need for rewriting these applications, a transparent usage is desirable. This can be achieved with help of a preload library that can be linked to applications at start time and overwrites allocation and deallocation functions like malloc(), free(), etc. Our so called "hugepage library" is able to do this and is downloadable below.

System requirements:

Architecture: x86, x86_64, PPC and PPC64 are supported yet
Kernel version: 2.6, better >= 2.6.16 for MAP_PRIVATE mappings of hugepages
Kernel compile options: CONFIG_HUGETLBFS=y, CONFIG_HUGETLB_PAGE=y
Compiler: GCC

Download:

You can download the hugepage library here as free software under a BSD license.

Usage:

Extract and compile the library by issuing
	tar xzf hugepage_library.tar.gz
	cd hugepage_library
	make
After everything compiled cleanly (see the README file for special compile options), you have a 32bit/ subdirectory and a 64bit/ subdirectory for the 64-bit version. There are the following runtime prerequisites:
  • Hugepages have been allocated (e.g. by "echo 20 > /proc/sys/vm/nr_hugepages")
  • A HugeTLBfs filesystem has been mounted (e.g. by "mount -t hugetlbfs none /mnt/hugetlb")
  • Permissions of mounted HugeTLBfs have been set correctly
You can now prove the functionality by preloading the library to a process:
	LD_PRELOAD=32bit/libhugetlbfs.so [command]

References:

Robert Rex, Frank Mietke, Christoph Raisch, Hoang-Nam Nguyen and Wolfgang Rehm:
Improving Communication Performance on InfiniBand by Using Efficient Data Placement Strategies
Chemnitz University of Technology and IBM Deutschland Entwicklung GmbH, Accepted for publication at the International Conference on Cluster Computing (Cluster 2006), Barcelona, September 2006

Robert Rex:
Enhancing an InfiniBand driver by utilizing an efficient malloc/free library supporting multiple page sizes
Diploma Thesis, Chemnitz University of Technology, September 14, 2006
See also under publications.