Project Showcase

Distinguishing
Human vs. Machine

An end-to-end machine learning system built to detect LLM-generated text with 99.44% accuracy. From raw data to cloud deployment.

Kaggle

The Challenge

Why do we need to detect AI text?

With the rise of GPT-4 and other LLMs, the line between human and machine creativity has blurred. This creates critical challenges:

Misinformation
Rapid spread of false information.
Plagiarism
Integrity in research and education.
Ethics
Transparency in sensitive news and legal docs.

Human vs. Machine

GENERATED BY AI

"The intricate dance of algorithms allows for the synthesis of coherent narratives..."

HUMAN WRITTEN

"I actually wrote this part myself, with all my typos and weird phrasing."

The Experiment

Data exploration, cleaning, and model training in the Notebook.

DAIGT V2 Dataset

Combining datasets for robust training

Human EssaysAI Generated

train = pd.read_csv('/kaggle/input/daigt-v2-train-dataset/train_v2_drcat_02.csv')

The Model Architecture

A "Simple" Neural Network that achieved extraordinary results.

Input Layer (TF-IDF Vector)5000 features

Dense Layer (ReLU)128 Neurons

Dropout0.3

Output Layer (Sigmoid)Binary Class

Final Accuracy

99.44%

ROC-AUC Score: 0.9998

Training Snippet

model = Sequential([
  Dense(128, activation='relu'),
  Dropout(0.3),
  Dense(64, activation='relu'),
  Dropout(0.3),
  Dense(1, activation='sigmoid')
])

model.compile(
  optimizer=Adam(lr=0.001), 
  loss='binary_crossentropy'
)

The Pipeline

Transforming a notebook experiment into a production-ready ML pipeline.

STAGE 01

Ingestion

Download & Extract Data

STAGE 02

Validation

Schema & Type Checks

STAGE 03

Transform

TF-IDF Vectorization

STAGE 04

Training

Model Fitting

STAGE 05

Evaluation

Metrics & Reporting

The Launch

Deploying to AWS EC2 with a CI/CD-ready setup.

AWS EC2 Instance

Provisioned a dedicated Ubuntu server, configured security groups for port 8501 (Streamlit).

Deployment Branch

Created a lightweight `Deployment` branch containing only `app.py` and essential artifacts to minimize overhead.

Systemd Service

Configured as a background service to ensure 24/7 availability and auto-restart on failure.

ubuntu@ip-172-31-0-0:~/LLM_Text_Detection

$ git clone -b Deployment https://github.com/Harshithvarma007/LLM.git

Cloning into 'LLM_Text_Detection'...

$ ./deploy.sh

Updating package lists...

Installing Python dependencies...

Setup systemd service...

Streamlit service started on port 8501

Live Demo

Interact with the model in real-time.

Live Demo In Progress

We are currently fine-tuning the serving infrastructure. The live interface will be available soon.

Input Text

Analysis

--%

Probability

Distinguishing Human vs. Machine