Zedge Releases New AI Dataset for Computer Vision and Generative AI Training
PorAinvest
lunes, 9 de junio de 2025, 6:36 am ET1 min de lectura
ZDGE--
The DSD was created to address the growing need for high-fidelity, rights-cleared images in AI model training. By leveraging the global community of photographers on GuruShots, Zedge was able to assemble a diverse and context-rich dataset. Each image in the DSD was ranked by players of the game and subsequently annotated by expert reviewers, providing detailed descriptions and pixel-level segmentation. This multi-tiered annotation process ensures that the dataset is not only high-quality but also contextually rich, enabling AI models to learn in a way that is closer to human understanding.
The DSD is accompanied by a research paper titled "Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds' Annotated Imagery." The paper demonstrates that training AI models with the DSD yields 70% better results compared to typical benchmark datasets. The dataset, model weights, and paper are now available to the public on HuggingFace, encouraging further innovation and validation.
Key findings from the research paper include significant improvements in semantic precision and image-text alignment when using the DSD for fine-tuning models. For instance, fine-tuning the LLAVA-NEXT model on the DSD led to a 24.09% increase in BLEU-4 score, indicating stronger semantic precision and image-text alignment. Additionally, the DSD outperformed AWS Rekognition in terms of F1 score, highlighting the limitations of automated commercial tagging systems for high-quality dataset curation.
The DSD and DataSeeds.AI platform offer several unique advantages. They support a data-centric approach to AI development, prioritizing quality, context, and diversity in training inputs. The platform can rapidly build domain-specific datasets by launching on-demand GuruShots photo challenges or by drawing from Zedge Premium's and GuruShots' massive catalog of over 30 million images. The human-aligned annotation process provides interpretability, nuance, and grounding that support vision-language understanding and generalization to real-world use cases.
By providing open availability of all data, models, and benchmarking results, Zedge positions DataSeeds.AI as a differentiated supplier of high-fidelity, human-reviewed datasets tailored to the evolving needs of the generative AI ecosystem.
References:
[1] https://www.accessnewswire.com/newsroom/en/computers-technology-and-internet/zedges-dataseeds.ai-releases-foundational-dataset-for-computer-vi-1036758
[2] https://arxiv.org/html/2506.05673v1
Zedge has released a new image dataset, DataSeeds.AI Sample Dataset (DSD), in collaboration with Perle.ai and Émet Research. The DSD, comprised of over 7,800 high-quality photos, was sourced from Zedge's photography game GuruShots and annotated by expert reviewers. The dataset establishes new AI training benchmarks and sets a new standard for high-quality, human-aligned AI training data.
Zedge, Inc. has announced the release of the DataSeeds.AI Sample Dataset (DSD), a foundational image dataset designed to set new benchmarks in AI training. The dataset, developed in collaboration with Perle.ai and Émet Research, comprises over 7,800 high-quality photos sourced from Zedge's photography game, GuruShots, and annotated by expert reviewers. The DSD aims to establish a new standard for high-quality, human-aligned AI training data.The DSD was created to address the growing need for high-fidelity, rights-cleared images in AI model training. By leveraging the global community of photographers on GuruShots, Zedge was able to assemble a diverse and context-rich dataset. Each image in the DSD was ranked by players of the game and subsequently annotated by expert reviewers, providing detailed descriptions and pixel-level segmentation. This multi-tiered annotation process ensures that the dataset is not only high-quality but also contextually rich, enabling AI models to learn in a way that is closer to human understanding.
The DSD is accompanied by a research paper titled "Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds' Annotated Imagery." The paper demonstrates that training AI models with the DSD yields 70% better results compared to typical benchmark datasets. The dataset, model weights, and paper are now available to the public on HuggingFace, encouraging further innovation and validation.
Key findings from the research paper include significant improvements in semantic precision and image-text alignment when using the DSD for fine-tuning models. For instance, fine-tuning the LLAVA-NEXT model on the DSD led to a 24.09% increase in BLEU-4 score, indicating stronger semantic precision and image-text alignment. Additionally, the DSD outperformed AWS Rekognition in terms of F1 score, highlighting the limitations of automated commercial tagging systems for high-quality dataset curation.
The DSD and DataSeeds.AI platform offer several unique advantages. They support a data-centric approach to AI development, prioritizing quality, context, and diversity in training inputs. The platform can rapidly build domain-specific datasets by launching on-demand GuruShots photo challenges or by drawing from Zedge Premium's and GuruShots' massive catalog of over 30 million images. The human-aligned annotation process provides interpretability, nuance, and grounding that support vision-language understanding and generalization to real-world use cases.
By providing open availability of all data, models, and benchmarking results, Zedge positions DataSeeds.AI as a differentiated supplier of high-fidelity, human-reviewed datasets tailored to the evolving needs of the generative AI ecosystem.
References:
[1] https://www.accessnewswire.com/newsroom/en/computers-technology-and-internet/zedges-dataseeds.ai-releases-foundational-dataset-for-computer-vi-1036758
[2] https://arxiv.org/html/2506.05673v1

Divulgación editorial y transparencia de la IA: Ainvest News utiliza tecnología avanzada de Modelos de Lenguaje Largo (LLM) para sintetizar y analizar datos de mercado en tiempo real. Para garantizar los más altos estándares de integridad, cada artículo se somete a un riguroso proceso de verificación con participación humana.
Mientras la IA asiste en el procesamiento de datos y la redacción inicial, un miembro editorial profesional de Ainvest revisa, verifica y aprueba de forma independiente todo el contenido para garantizar su precisión y cumplimiento con los estándares editoriales de Ainvest Fintech Inc. Esta supervisión humana está diseñada para mitigar las alucinaciones de la IA y garantizar el contexto financiero.
Advertencia sobre inversiones: Este contenido se proporciona únicamente con fines informativos y no constituye asesoramiento profesional de inversión, legal o financiero. Los mercados conllevan riesgos inherentes. Se recomienda a los usuarios que realicen una investigación independiente o consulten a un asesor financiero certificado antes de tomar cualquier decisión. Ainvest Fintech Inc. se exime de toda responsabilidad por las acciones tomadas con base en esta información. ¿Encontró un error? Reportar un problema

Comentarios
Aún no hay comentarios