Building Summify: AI Article Summarization with GPT-4o and Flask

Summify started from a real frustration: I'd find a long article I wanted to read but didn't have time for. I wanted a tool that could give me a faithful summary in seconds, not a watered-down paraphrase.

The result is a full-stack AI app — React frontend, Python Flask backend, GPT-4o for summarization. Live at summify-by-hbich-aymane.vercel.app and open source on GitHub.

How It Works

You paste a URL or raw text. The Flask API scrapes the article content using Extractus, cleans it, and sends it to GPT-4o with a prompt that asks for a structured summary: key points, main argument, and conclusion. The summary comes back as Markdown and the frontend renders it with React Markdown.

Frontend: React 19 + Redux Toolkit + RTK Query

This project was my first real use of RTK Query for data fetching, and it changed how I think about API calls in React.

RTK Query auto-generates hooks from your API definition. Instead of writing useEffect + fetch + loading/error state manually for every endpoint, you define the endpoint once and call a hook:

const { data, isLoading, isError } = useSummarizeQuery(inputUrl)

The caching behavior is built in. If you summarize the same URL twice in the same session, the second call is instant — it returns the cached result without hitting the API again. For an AI endpoint that costs money per call, this matters.

Redux Toolkit managed the UI state: whether the user is in URL mode or text mode, the input value, history of past summaries. Keeping this in Redux rather than local component state meant the history panel and the main input stayed in sync without prop drilling.

Backend: Flask + Extractus + GPT-4o

The Python side is a small Flask API with one main endpoint: POST /summarize.

Extractus handles article scraping. It extracts the main content from a URL — title, body text, author — while stripping ads, navigation, and other noise. It's more reliable than writing a custom scraper for every news site.

The extracted text goes to GPT-4o via the OpenAI API with a system prompt that instructs it to return structured Markdown. Using GPT-4o over GPT-3.5 was a deliberate choice — the summaries are noticeably more faithful to the original argument, not just shorter versions of the first few paragraphs.

The Flask API is deployed separately from the frontend (Vercel for the React app, a Python host for the API). The two talk over HTTP with CORS configured to only allow the production frontend domain.

The Hardest Part: Prompt Engineering

Getting the summary format consistent took more iteration than I expected. Early versions would sometimes return bullet points, sometimes paragraphs, sometimes just one sentence. The fix was being very explicit in the system prompt about structure:

Return a JSON object with:
- "keyPoints": array of 3-5 bullet strings
- "mainArgument": one paragraph
- "conclusion": one sentence

Asking for JSON instead of free-form Markdown made the response predictable. The frontend then formats it into the rendered Markdown the user sees.

What I'd Add Next

Streaming responses — right now the user waits for the full summary before anything appears. GPT-4o supports streaming and React can handle it with a readable stream. The experience would feel much faster even if the total time is the same.

Summary length control — a slider letting the user choose between a quick 3-bullet summary and a longer detailed breakdown.

Key Takeaways

RTK Query eliminates a surprising amount of boilerplate for data fetching — the caching and loading state management alone is worth adopting it
Prompt engineering is real work. Vague prompts produce inconsistent output; structured prompts with explicit format requirements produce reliable output
Separating the AI backend from the frontend gives you flexibility — you can swap the model, change the scraper, or scale the API independently