 
 
Ancient Handwritten Characters Database 
(CASIA-AHCDB) is designed for character recognition research. The database 
contains more than 2.2 million annotated character samples of 10,658 classes. 
The character samples come from more than 12,000 pages of annotated Chinese 
ancient handwritten documents. According to different sources of documents, the 
database is mainly divided into two sub-databases: Complete Library in Four 
Sections (style1) and Ancient Buddhist Scriptures (style2). Each sub-database 
can be divided into three parts based on its applications: basic category set, 
enhanced category set and reserved category set. The basic category sets of 
style1 and style2 have the same 2,365 classes, and the enhanced category sets of 
style1 and style2 have no intersecting classes. For reserved category set, 
training and testing set are not divided due to the few samples.
Style1 contains 25 books, numbered “book_01” to 
“book_25”. Among them, (book_01, book_02) were written by one person, so did 
(book_03, book_04), (book_05, book_06) and (book_07, book_08) , the rest are 
written by different people. We make books 01-20 as training set and books 21-25 
as testing set.
Style2 contains Buddhist scriptures documents 
from 10 different periods. The writer of each volume is no longer verifiable. 
The 001 volumes of Buddhist scriptures in the 01 period are numbered 
“period_01/volume_001”. We make Buddhist scriptures from period 09-10 as 
training set and Buddhist scriptures from period 01-08 as testing 
set.
Table I. Structure and Statistic of 
CASIA-AHCDB
| Database 
      Structure | Classes | Characters | |||
| CASIA AHCDB | Style1 | Basic 
      Category | Train | 2,365 | 832,939 | 
| Test | 2,365 | 254,162 | |||
| Enhanced 
      Category | Train | 3,227 | 89,204 | ||
| Test | 3,227 | 36,258 | |||
| Reserved 
      Category | 3,819 | 19,763 | |||
| Style2 | Basic 
      Category | Train | 2,365 | 728,423 | |
| Test | 2,365 | 204,547 | |||
| Enhanced 
      Category | Train | 783 | 71,179 | ||
| Test | 783 | 19,597 | |||
| Reserved 
      Category | 2,450 | 8,213 | |||
| Summation | 12,229 | 2,264,285 | |||
Table II. GNTX Format
| Item | Length | Comment | 
| Sample size | 4 bytes | Number of bytes for one 
      sample | 
| Unicode | 4 bytes | Unicode | 
| Width | 2 bytes | Number of pixels in a 
      row | 
| Height | 2 bytes | Number of 
  rows | 
| Bitmap | width * height 
      bytes | Store row by 
    row | 
Data Download
    style1_basic_test
    style1_basic_train_part1
    style1_basic_train_part2
    style1_basic_train_part3
    style1_enhanced
    style2
Reference
Yue Xu, Fei Yin, Da-Han Wang, Xu-Yao Zhang, Zhaoxiang Zhang, Cheng-Lin Liu, CASIA-AHCDB: A large-scale Chinese ancient handwritten characters database, Proc. 15th ICDAR, Sydney, Australia, September 20-25, 2019, pp.793-798.
24th International Conference on Pattern Recognition
15th International Conference on Frontiers in Handwriting Recognition
10th IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition
Haidian | Beijing | China
Phone : (+86-10)8254-4797
Fax : (+86-10) 8254-4594
Email:liucl@nlpr.ia.ac.cn
Website:www.nlpr.ia.ac.cn/pal/