Project Showcase

Distinguishing
Human vs. Machine

An end-to-end machine learning system built to detect LLM-generated text with 99.44% accuracy. From raw data to cloud deployment.

Kaggle
01

The Challenge

Why do we need to detect AI text?

With the rise of GPT-4 and other LLMs, the line between human and machine creativity has blurred. This creates critical challenges:

  • Misinformation

    Rapid spread of false information.

  • Plagiarism

    Integrity in research and education.

  • Ethics

    Transparency in sensitive news and legal docs.

Human vs. Machine

GENERATED BY AI

"The intricate dance of algorithms allows for the synthesis of coherent narratives..."

HUMAN WRITTEN

"I actually wrote this part myself, with all my typos and weird phrasing."

02

The Experiment

Data exploration, cleaning, and model training in the Notebook.

DAIGT V2 Dataset
Combining datasets for robust training
Human EssaysAI Generated

train = pd.read_csv('/kaggle/input/daigt-v2-train-dataset/train_v2_drcat_02.csv')

The Model Architecture

A "Simple" Neural Network that achieved extraordinary results.

Input Layer (TF-IDF Vector)5000 features
Dense Layer (ReLU)128 Neurons
Dropout0.3
Output Layer (Sigmoid)Binary Class

Final Accuracy

99.44%

ROC-AUC Score: 0.9998

Training Snippet
model = Sequential([
  Dense(128, activation='relu'),
  Dropout(0.3),
  Dense(64, activation='relu'),
  Dropout(0.3),
  Dense(1, activation='sigmoid')
])

model.compile(
  optimizer=Adam(lr=0.001), 
  loss='binary_crossentropy'
)
03

The Pipeline

Transforming a notebook experiment into a production-ready ML pipeline.

01

STAGE 01

Ingestion

Download & Extract Data

02

STAGE 02

Validation

Schema & Type Checks

03

STAGE 03

Transform

TF-IDF Vectorization

04

STAGE 04

Training

Model Fitting

05

STAGE 05

Evaluation

Metrics & Reporting

04

The Launch

Deploying to AWS EC2 with a CI/CD-ready setup.

AWS EC2 Instance

Provisioned a dedicated Ubuntu server, configured security groups for port 8501 (Streamlit).

Deployment Branch

Created a lightweight `Deployment` branch containing only `app.py` and essential artifacts to minimize overhead.

Systemd Service

Configured as a background service to ensure 24/7 availability and auto-restart on failure.

ubuntu@ip-172-31-0-0:~/LLM_Text_Detection
$ git clone -b Deployment https://github.com/Harshithvarma007/LLM.git
Cloning into 'LLM_Text_Detection'...
$ ./deploy.sh
Updating package lists...
Installing Python dependencies...
Setup systemd service...
Streamlit service started on port 8501
_
05

Live Demo

Interact with the model in real-time.

Live Demo In Progress

We are currently fine-tuning the serving infrastructure. The live interface will be available soon.

--%
Probability