Zedge Releases New AI Dataset for Computer Vision and Generative AI Training

Monday, Jun 9, 2025 6:36 am ET1min read

Zedge has released a new image dataset, DataSeeds.AI Sample Dataset (DSD), in collaboration with Perle.ai and Émet Research. The DSD, comprised of over 7,800 high-quality photos, was sourced from Zedge's photography game GuruShots and annotated by expert reviewers. The dataset establishes new AI training benchmarks and sets a new standard for high-quality, human-aligned AI training data.

Zedge, Inc. has announced the release of the DataSeeds.AI Sample Dataset (DSD), a foundational image dataset designed to set new benchmarks in AI training. The dataset, developed in collaboration with Perle.ai and Émet Research, comprises over 7,800 high-quality photos sourced from Zedge's photography game, GuruShots, and annotated by expert reviewers. The DSD aims to establish a new standard for high-quality, human-aligned AI training data.

The DSD was created to address the growing need for high-fidelity, rights-cleared images in AI model training. By leveraging the global community of photographers on GuruShots, Zedge was able to assemble a diverse and context-rich dataset. Each image in the DSD was ranked by players of the game and subsequently annotated by expert reviewers, providing detailed descriptions and pixel-level segmentation. This multi-tiered annotation process ensures that the dataset is not only high-quality but also contextually rich, enabling AI models to learn in a way that is closer to human understanding.

The DSD is accompanied by a research paper titled "Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds' Annotated Imagery." The paper demonstrates that training AI models with the DSD yields 70% better results compared to typical benchmark datasets. The dataset, model weights, and paper are now available to the public on HuggingFace, encouraging further innovation and validation.

Key findings from the research paper include significant improvements in semantic precision and image-text alignment when using the DSD for fine-tuning models. For instance, fine-tuning the LLAVA-NEXT model on the DSD led to a 24.09% increase in BLEU-4 score, indicating stronger semantic precision and image-text alignment. Additionally, the DSD outperformed AWS Rekognition in terms of F1 score, highlighting the limitations of automated commercial tagging systems for high-quality dataset curation.

The DSD and DataSeeds.AI platform offer several unique advantages. They support a data-centric approach to AI development, prioritizing quality, context, and diversity in training inputs. The platform can rapidly build domain-specific datasets by launching on-demand GuruShots photo challenges or by drawing from Zedge Premium's and GuruShots' massive catalog of over 30 million images. The human-aligned annotation process provides interpretability, nuance, and grounding that support vision-language understanding and generalization to real-world use cases.

By providing open availability of all data, models, and benchmarking results, Zedge positions DataSeeds.AI as a differentiated supplier of high-fidelity, human-reviewed datasets tailored to the evolving needs of the generative AI ecosystem.

References:
[1] https://www.accessnewswire.com/newsroom/en/computers-technology-and-internet/zedges-dataseeds.ai-releases-foundational-dataset-for-computer-vi-1036758
[2] https://arxiv.org/html/2506.05673v1

Zedge Releases New AI Dataset for Computer Vision and Generative AI Training

Comments



Add a public comment...
No comments

No comments yet