Distinguishing
Human vs. Machine
An end-to-end machine learning system built to detect LLM-generated text with 99.44% accuracy. From raw data to cloud deployment.
The Challenge
Why do we need to detect AI text?
With the rise of GPT-4 and other LLMs, the line between human and machine creativity has blurred. This creates critical challenges:
Misinformation
Rapid spread of false information.
Plagiarism
Integrity in research and education.
Ethics
Transparency in sensitive news and legal docs.
Human vs. Machine
GENERATED BY AI
"The intricate dance of algorithms allows for the synthesis of coherent narratives..."
HUMAN WRITTEN
"I actually wrote this part myself, with all my typos and weird phrasing."
The Experiment
Data exploration, cleaning, and model training in the Notebook.
train = pd.read_csv('/kaggle/input/daigt-v2-train-dataset/train_v2_drcat_02.csv')
The Model Architecture
A "Simple" Neural Network that achieved extraordinary results.
Final Accuracy
ROC-AUC Score: 0.9998
model = Sequential([
Dense(128, activation='relu'),
Dropout(0.3),
Dense(64, activation='relu'),
Dropout(0.3),
Dense(1, activation='sigmoid')
])
model.compile(
optimizer=Adam(lr=0.001),
loss='binary_crossentropy'
)The Pipeline
Transforming a notebook experiment into a production-ready ML pipeline.
STAGE 01
Ingestion
Download & Extract Data
STAGE 02
Validation
Schema & Type Checks
STAGE 03
Transform
TF-IDF Vectorization
STAGE 04
Training
Model Fitting
STAGE 05
Evaluation
Metrics & Reporting
The Launch
Deploying to AWS EC2 with a CI/CD-ready setup.
AWS EC2 Instance
Provisioned a dedicated Ubuntu server, configured security groups for port 8501 (Streamlit).
Deployment Branch
Created a lightweight `Deployment` branch containing only `app.py` and essential artifacts to minimize overhead.
Systemd Service
Configured as a background service to ensure 24/7 availability and auto-restart on failure.
Live Demo
Interact with the model in real-time.
Live Demo In Progress
We are currently fine-tuning the serving infrastructure. The live interface will be available soon.