Publications
Sort:
Open Access Article Issue
Integrating Speech-to-Text for Image Generation Using Generative Adversarial Networks
Computer Modeling in Engineering & Sciences 2025, 143(2): 2001-2026
Published: 30 May 2025
Abstract PDF (2.9 MB) Collect
Downloads:23

The development of generative architectures has resulted in numerous novel deep-learning models that generate images using text inputs. However, humans naturally use speech for visualization prompts. Therefore, this paper proposes an architecture that integrates speech prompts as input to image-generation Generative Adversarial Networks (GANs) model, leveraging Speech-to-Text translation along with the CLIP + VQGAN model. The proposed method involves translating speech prompts into text, which is then used by the Contrastive Language-Image Pretraining (CLIP) + Vector Quantized Generative Adversarial Network (VQGAN) model to generate images. This paper outlines the steps required to implement such a model and describes in detail the methods used for evaluating the model. The GAN model successfully generates artwork from descriptions using speech and text prompts. Experimental outcomes of synthesized images demonstrate that the proposed methodology can produce beautiful abstract visuals containing elements from the input prompts. The model achieved a Fréchet Inception Distance (FID) score of 28.75, showcasing its capability to produce high-quality and diverse images. The proposed model can find numerous applications in educational, artistic, and design spaces due to its ability to generate images using speech and the distinct abstract artistry of the output images. This capability is demonstrated by giving the model out-of-the-box prompts to generate never-before-seen images with plausible realistic qualities.

Open Access Article Issue
Optimizing CNN Architectures for Face Liveness Detection: Performance, Efficiency, and Generalization across Datasets
Computer Modeling in Engineering & Sciences 2025, 143(3): 3677-3707
Published: 30 June 2025
Abstract PDF (2.3 MB) Collect
Downloads:54

Face liveness detection is essential for securing biometric authentication systems against spoofing attacks, including printed photos, replay videos, and 3D masks. This study systematically evaluates pre-trained CNN models— DenseNet201, VGG16, InceptionV3, ResNet50, VGG19, MobileNetV2, Xception, and InceptionResNetV2—leveraging transfer learning and fine-tuning to enhance liveness detection performance. The models were trained and tested on NUAA and Replay-Attack datasets, with cross-dataset generalization validated on SiW-MV2 to assess real-world adaptability. Performance was evaluated using accuracy, precision, recall, FAR, FRR, HTER, and specialized spoof detection metrics (APCER, NPCER, ACER). Fine-tuning significantly improved detection accuracy, with DenseNet201 achieving the highest performance (98.5% on NUAA, 97.71% on Replay-Attack), while MobileNetV2 proved the most efficient model for real-time applications (latency: 15 ms, memory usage: 45 MB, energy consumption: 30 mJ). A statistical significance analysis (paired t-tests, confidence intervals) validated these improvements. Cross-dataset experiments identified DenseNet201 and MobileNetV2 as the most generalizable architectures, with DenseNet201 achieving 86.4% accuracy on Replay-Attack when trained on NUAA, demonstrating robust feature extraction and adaptability. In contrast, ResNet50 showed lower generalization capabilities, struggling with dataset variability and complex spoofing attacks. These findings suggest that MobileNetV2 is well-suited for low-power applications, while DenseNet201 is ideal for high-security environments requiring superior accuracy. This research provides a framework for improving real-time face liveness detection, enhancing biometric security, and guiding future advancements in AI-driven anti-spoofing techniques.

Total 2