Dual-modality learning and transformer-based approach for high-quality vector font generation

Vector fonts, serving as the fundamental format of fonts, play a significant role in modern media. Described by a set of mathematical equations, vector fonts enable style modifications by adjusting drawing parameters, making them favored by font artists and designers. Due to the non-structural natur...

Full description

Bibliographic Details
Main Authors: Liu, Yu, Khalid, Fatimah binti, Mustaffa, Mas Rina binti, Azman, Azreen bin
Format: Article
Published: Elsevier 2024
Online Access:http://psasir.upm.edu.my/id/eprint/105601/
_version_ 1848864557737443328
author Liu, Yu
Khalid, Fatimah binti
Mustaffa, Mas Rina binti
Azman, Azreen bin
author_facet Liu, Yu
Khalid, Fatimah binti
Mustaffa, Mas Rina binti
Azman, Azreen bin
author_sort Liu, Yu
building UPM Institutional Repository
collection Online Access
description Vector fonts, serving as the fundamental format of fonts, play a significant role in modern media. Described by a set of mathematical equations, vector fonts enable style modifications by adjusting drawing parameters, making them favored by font artists and designers. Due to the non-structural nature of vector font data, the task of vector font generation resembles sequence generation. Existing methods, limited in handling long sequences, are only capable of synthesizing simple character vector fonts. In this paper, we propose a dual-modal learning strategy to convert raster glyph images to vector glyph images in an end-to-end manner. Specifically, by employing vector quantization, we comprehensively utilize the dual-modal information of vector fonts. We quantize image and sequence modal features using a shared codebook, mapping them to the same discrete space and aligning them. Through aligned features, we reconstruct raster glyph images and vector glyph images. During the transformation process of vector glyph data, we redesign the Transformer module for vector glyph data, leveraging multiple stacked sliding window attention mechanisms to model local and global information. By integrating reversible residuals with attention and feedforward layers within the Transformer module, we enhance the model's capability and stability in handling long sequence data without sacrificing accuracy. Finally, we perform cross-modal model distillation to obtain the model's backbone network. We further refine the backbone network using a differentiable rasterizer to minimize error accumulation during sequence generation. Qualitative and quantitative results demonstrate that our method achieves high-quality synthesis results in complex Chinese character glyphs. The synthesized vector fonts can be easily converted into TrueType fonts for practical use, holding valuable applications for anyone interested in personalized vector font styles.
first_indexed 2025-11-15T13:50:43Z
format Article
id upm-105601
institution Universiti Putra Malaysia
institution_category Local University
last_indexed 2025-11-15T13:50:43Z
publishDate 2024
publisher Elsevier
recordtype eprints
repository_type Digital Repository
spelling upm-1056012024-02-01T04:34:03Z http://psasir.upm.edu.my/id/eprint/105601/ Dual-modality learning and transformer-based approach for high-quality vector font generation Liu, Yu Khalid, Fatimah binti Mustaffa, Mas Rina binti Azman, Azreen bin Vector fonts, serving as the fundamental format of fonts, play a significant role in modern media. Described by a set of mathematical equations, vector fonts enable style modifications by adjusting drawing parameters, making them favored by font artists and designers. Due to the non-structural nature of vector font data, the task of vector font generation resembles sequence generation. Existing methods, limited in handling long sequences, are only capable of synthesizing simple character vector fonts. In this paper, we propose a dual-modal learning strategy to convert raster glyph images to vector glyph images in an end-to-end manner. Specifically, by employing vector quantization, we comprehensively utilize the dual-modal information of vector fonts. We quantize image and sequence modal features using a shared codebook, mapping them to the same discrete space and aligning them. Through aligned features, we reconstruct raster glyph images and vector glyph images. During the transformation process of vector glyph data, we redesign the Transformer module for vector glyph data, leveraging multiple stacked sliding window attention mechanisms to model local and global information. By integrating reversible residuals with attention and feedforward layers within the Transformer module, we enhance the model's capability and stability in handling long sequence data without sacrificing accuracy. Finally, we perform cross-modal model distillation to obtain the model's backbone network. We further refine the backbone network using a differentiable rasterizer to minimize error accumulation during sequence generation. Qualitative and quantitative results demonstrate that our method achieves high-quality synthesis results in complex Chinese character glyphs. The synthesized vector fonts can be easily converted into TrueType fonts for practical use, holding valuable applications for anyone interested in personalized vector font styles. Elsevier 2024-04 Article PeerReviewed Liu, Yu and Khalid, Fatimah binti and Mustaffa, Mas Rina binti and Azman, Azreen bin (2024) Dual-modality learning and transformer-based approach for high-quality vector font generation. Expert Systems with Applications, 240. art. no. 122405. pp. 1-19. ISSN 0957-4174 https://www.sciencedirect.com/science/article/pii/S095741742302907X?via%3Dihub 10.1016/j.eswa.2023.122405
spellingShingle Liu, Yu
Khalid, Fatimah binti
Mustaffa, Mas Rina binti
Azman, Azreen bin
Dual-modality learning and transformer-based approach for high-quality vector font generation
title Dual-modality learning and transformer-based approach for high-quality vector font generation
title_full Dual-modality learning and transformer-based approach for high-quality vector font generation
title_fullStr Dual-modality learning and transformer-based approach for high-quality vector font generation
title_full_unstemmed Dual-modality learning and transformer-based approach for high-quality vector font generation
title_short Dual-modality learning and transformer-based approach for high-quality vector font generation
title_sort dual-modality learning and transformer-based approach for high-quality vector font generation
url http://psasir.upm.edu.my/id/eprint/105601/
http://psasir.upm.edu.my/id/eprint/105601/
http://psasir.upm.edu.my/id/eprint/105601/