Architecture

GPT (Generative Pre-trained Transformer)

A series of large language models by OpenAI that generate text by predicting the next word, powering ChatGPT and many AI applications.

What is GPT?

GPT stands for Generative Pre-trained Transformer. It's a family of large language models developed by OpenAI that have driven much of the recent AI revolution.

Breaking down the name:

Generative: Produces new content (text) rather than just classifying or analyzing.

Pre-trained: First trained on massive text data to learn language patterns, then can be adapted for specific tasks.

Transformer: Uses the transformer architecture, which processes text efficiently using attention mechanisms.

How it works: GPT models predict the next word in a sequence. Given "The cat sat on the," they predict "mat" (or similar). This simple objective, at scale, produces remarkably capable models.

GPT evolution

GPT-1 (2018):

  • 117 million parameters
  • Proved transformer pre-training works for language
  • Could do basic text completion

GPT-2 (2019):

  • 1.5 billion parameters
  • Much more coherent text generation
  • OpenAI initially withheld release due to misuse concerns

GPT-3 (2020):

  • 175 billion parameters
  • Few-shot learning: could learn tasks from examples in prompts
  • Sparked widespread AI interest

GPT-3.5 / ChatGPT (2022):

  • Fine-tuned with RLHF for conversations
  • Made AI accessible to everyone
  • 100M users in 2 months

GPT-4 (2023):

  • Multimodal: text and images
  • Significantly improved reasoning
  • Powers ChatGPT Plus, Microsoft Copilot

GPT-4o (2024):

  • Faster, cheaper
  • Native multimodal (text, vision, audio)
  • Real-time voice conversations

GPT capabilities

Text generation: Write articles, stories, emails, code, poetry—any text format.

Conversation: Engage in natural dialogue, remember context, maintain coherence.

Question answering: Answer questions drawing on training knowledge.

Summarization: Condense long documents into key points.

Translation: Convert between languages (though not its primary strength).

Code generation: Write, explain, and debug code in many languages.

Reasoning: Solve math problems, logic puzzles, analyze arguments (with limitations).

Following instructions: Execute complex multi-step instructions.

Creative tasks: Brainstorm, roleplay, write in specific styles.

GPT-4 performs at human-level on many professional exams (bar exam, SAT, GRE).

Using GPT models

ChatGPT: Free web interface for conversations. ChatGPT Plus ($20/month) for GPT-4 access.

OpenAI API: Programmatic access for building applications.

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

Key parameters:

  • temperature: Randomness (0 = deterministic, 1 = creative)
  • max_tokens: Response length limit
  • system message: Set behavior and constraints

Best practices:

  • Be specific in instructions
  • Provide examples when possible
  • Break complex tasks into steps
  • Verify factual claims
  • Iterate on prompts

GPT limitations

Knowledge cutoff: GPT doesn't know events after its training data. GPT-4's cutoff is April 2024.

Hallucination: Generates plausible but false information. Always verify facts.

Context limits: Can only process so much text at once (though improving—GPT-4 Turbo handles 128K tokens).

No true understanding: Pattern matching, not genuine comprehension. Can fail on simple reasoning.

Inconsistency: Same prompt can give different results. May contradict itself.

Bias: Reflects biases in training data.

Cost: API usage adds up for high-volume applications.

No real-time information: Without tools, can't access current information.

Understanding limitations is crucial for building reliable applications on GPT.