Vedrix — Enterprise plagiarism detection at 90% accuracy.
Academic institutions lacked an affordable, accurate, enterprise-grade plagiarism detection system that could handle large documents, match against live web content, and deliver detailed similarity reports in real-time. Existing tools were expensive, inaccurate, or couldn't process complex document formats at scale.
Built the full system using Django REST Framework as the API backbone. Custom document parsing algorithms extract clean text from PDF and Word files up to 50MB, handling complex formatting, tables, and multi-column layouts. TF-IDF vectorization with cosine similarity handles fast intra-corpus matching. LLM APIs add semantic similarity detection beyond keyword matching. Google Search API powers real-time web content matching to detect online sources. An automated report generator highlights matching segments with source attribution and similarity percentages.
90% plagiarism detection accuracy validated across academic datasets. Real-time processing of PDF/Word documents up to 50MB. Live web matching via Google Search API. Automated detailed reports with segment-level highlighting and source attribution.


