atropos/environments/community/poker_holdem/DATASET.md

12 KiB

PokerGPT Target Dataset Format

This document outlines the input and output format structure used for training the PokerGPT language model. The dataset is structured as prompt-response pairs, where the prompt provides poker game state information and the response contains the action taken by winning players.

Dataset Schema

The final dataset exported to HuggingFace contains the following fields:

  1. id - Unique identifier for each record
  2. hand_id - Reference to the original hand history
  3. winner - Player ID of the winning player
  4. bb_won - Big blinds won in this hand
  5. game_type - Type of poker game (e.g., "Hold'em No Limit")
  6. big_blind - Value of the big blind
  7. game_stage - The furthest stage reached in the hand (PREFLOP, FLOP, TURN, or RIVER)
  8. evaluator_rank - Hand rank calculated by the poker_hand_evaluator
  9. pokerstars_description - Hand description from PokerStars summary
  10. pokergpt_prompt - The formatted prompt as shown below
  11. winning_action - The action taken by the winning player

Input Prompt Format

The input prompts follow a structured format that provides comprehensive information about the poker game state:

You are an experienced gambler. Now you need to assist me to make decisions in Texas Hold'em games. You have been provided with a series of observable information:

    Player amount: [6], Currency: USD, Blind value: [$0.50/$1.00], Order: [1, 2, 3, 4, 5, 6], Seat 3 is the button.

    My cards: ['Th', 'Ah'], the characteristics of my cards: ["suit", "high", "close"], My seat: [Seat 5]

    Stage: "FLOP", Public cards: ['Kh', '7d', '2s', '**', '**']
    My rank: ["Pair"], Money: [97.50], Action: ["post BB 1.00"]

    Seat 1: ['**', '**'], Money: [100.00], Action: ["fold"], Discard: [True]
    Seat 2: ['**', '**'], Money: [98.50], Action: ["call 2.00"], Discard: [False]
    Seat 3: ['**', '**'], Money: [99.00], Action: ["fold"], Discard: [True]
    Seat 4: ['**', '**'], Money: [95.00], Action: ["post SB 0.50", "fold"], Discard: [True]
    Seat 6: ['**', '**'], Money: [102.00], Action: ["raise 2.00"], Discard: [False]

The pot value is [10.50]

The actions can be: ["fold", "call", "re-raise", "all-in"]. What should I do? If I choose to "re-raise", then how much? Choose a number from [4.00, 6.00, 10.00, 20.00, 50.00].

Structure Breakdown:

  1. Introduction: Frames the context for the language model.

  2. Game Configuration:

    • Player amount: Total number of players at the table
    • Currency: Type of currency used
    • Blind value: Small and big blind amounts
    • Order: Order of players around the table
    • Button position: Which seat has the dealer button
  3. Player's Hand:

    • My cards: The player's private cards in standard poker notation
    • Card characteristics: Properties of the cards:
      • suit: If the cards are of the same suit
      • high: If any card is 9 or higher
      • close: If the card values are less than 5 apart
    • My seat: The player's position at the table
  4. Game State:

    • Stage: Current betting round (PREFLOP, FLOP, TURN, or RIVER)
    • Public cards: Community cards showing ('**' for unrevealed cards)
    • My rank: Hand rank based on current cards (High, Pair, Two Pair, etc.)
    • Money: Player's current stack size
    • Action: Player's previous actions in this hand
  5. Other Players' Information:

    • Cards: Always shown as ['', ''] (hidden from the player)
    • Money: Current stack size
    • Action: Previous actions taken by this player
    • Discard: Whether the player has folded (True/False)
  6. Decision Context:

    • Pot value: Current size of the pot
    • Available actions: List of possible actions to take (contextually determined)
    • Bet/raise sizing options: Available sizes if betting or raising, dynamically generated based on game state (all-in amount not included in this list)

Context-Dependent Action Types and Bet Sizing

  1. Action Types: The available actions change depending on the betting context:

    • If no one has bet in the current round: ["fold", "check", "bet", "all-in"]
    • If someone has bet but no raises: ["fold", "call", "raise", "all-in"]
    • If someone has already raised: ["fold", "call", "re-raise", "all-in"]
  2. Betting Terminology:

    • Use "bet" only when you're the first to put money in a betting round
    • Use "raise" when increasing someone else's bet for the first time
    • Use "re-raise" when raising after someone has already raised
  3. Bet Sizing Options:

    • The "Choose a number from [...]" section never includes the all-in amount
    • Bet/raise options are presented as multiples of the big blind or pot-related sizes
    • Options are determined by the game state and betting round

Expected Output Format

The output should be concise and follow poker terminology:

call

OR

raise 5.00

OR

re-raise 10.00

OR

bet 4.00

OR

check

OR

fold

OR

all-in

Output Structure Breakdown:

  1. Basic Actions (without amounts):

    • check: Pass the action without betting (only when no one has bet)
    • fold: Discard your hand and exit the hand
    • all-in: Commit your entire stack
  2. Actions with Amounts:

    • call X: Match the current bet of X (only when someone has bet)
    • bet X: Make a new bet of X (only when no one has bet)
    • raise X: Increase the current bet to X (when someone has bet, but no one has raised)
    • re-raise X: Increase after someone has already raised (when there's been at least one raise)

Example Prompt-Response Pairs

Example 1 - Preflop with One Raise Already

Input:

You are an experienced gambler. Now you need to assist me to make decisions in Texas Hold'em games. You have been provided with a series of observable information:

    Player amount: [6], Currency: USD, Blind value: [$0.50/$1.00], Order: [1, 2, 3, 4, 5, 6], Seat 3 is the button.

    My cards: ['As', 'Ac'], the characteristics of my cards: ["high", "close"], My seat: [Seat 5]

    Stage: "PREFLOP", Public cards: ['**', '**', '**', '**', '**']
    My rank: ["High"], Money: [100.00], Action: ["post BB 1.00"]

    Seat 1: ['**', '**'], Money: [99.00], Action: ["fold"], Discard: [True]
    Seat 2: ['**', '**'], Money: [98.00], Action: ["fold"], Discard: [True]
    Seat 3: ['**', '**'], Money: [100.00], Action: ["call"], Discard: [False]
    Seat 4: ['**', '**'], Money: [94.50], Action: ["post SB 0.50", "fold"], Discard: [True]
    Seat 6: ['**', '**'], Money: [97.00], Action: ["raise 2.00"], Discard: [False]

The pot value is [1.50]

The actions can be: ["fold", "call", "re-raise", "all-in"]. What should I do? If I choose to "re-raise", then how much? Choose a number from [3.00, 4.00, 7.00, 10.00, 20.00, 50.00].

Expected Output:

re-raise 3.00

Example 2 - Facing an Initial Bet

Input:

You are an experienced gambler. Now you need to assist me to make decisions in Texas Hold'em games. You have been provided with a series of observable information:

    Player amount: [4], Currency: USD, Blind value: [$1.00/$2.00], Order: [1, 2, 3, 4], Seat 4 is the button.

    My cards: ['Jh', 'Qh'], the characteristics of my cards: ["suit", "high", "close"], My seat: [Seat 2]

    Stage: "FLOP", Public cards: ['2h', '7h', 'Kd', '**', '**']
    My rank: ["Flush Draw"], Money: [196.00], Action: ["post BB 2.00"]

    Seat 1: ['**', '**'], Money: [199.00], Action: ["post SB 1.00", "call"], Discard: [False]
    Seat 3: ['**', '**'], Money: [176.00], Action: ["bet 4.00"], Discard: [False]
    Seat 4: ['**', '**'], Money: [198.00], Action: ["fold"], Discard: [True]

The pot value is [14.00]

The actions can be: ["fold", "call", "raise", "all-in"]. What should I do? If I choose to "raise", then how much? Choose a number from [8.00, 12.00, 20.00, 40.00, 80.00].

Expected Output:

call

Example 3 - Facing an All-In (Limited Options)

Input:

You are an experienced gambler. Now you need to assist me to make decisions in Texas Hold'em games. You have been provided with a series of observable information:

    Player amount: [3], Currency: USD, Blind value: [$2.00/$5.00], Order: [1, 2, 3], Seat 2 is the button.

    My cards: ['Ks', 'Kd'], the characteristics of my cards: ["high", "close"], My seat: [Seat 3]

    Stage: "PREFLOP", Public cards: ['**', '**', '**', '**', '**']
    My rank: ["High"], Money: [85.00], Action: ["post SB 2.00"]

    Seat 1: ['**', '**'], Money: [83.00], Action: ["post BB 2.00", "fold"], Discard: [True]
    Seat 2: ['**', '**'], Money: [0.00], Action: ["all-in 150.00"], Discard: [False]

The pot value is [157.00]

The actions can be: ["fold", "call"]. What should I do?

Expected Output:

call

(Note: Here we're implicitly calling the full amount we have, effectively going all-in to call)

Example 4 - Being First to Act (Check/Bet)

Input:

You are an experienced gambler. Now you need to assist me to make decisions in Texas Hold'em games. You have been provided with a series of observable information:

    Player amount: [5], Currency: USD, Blind value: [$1.00/$2.00], Order: [1, 2, 3, 4, 5], Seat 1 is the button.

    My cards: ['9c', 'Tc'], the characteristics of my cards: ["suit", "close"], My seat: [Seat 4]

    Stage: "FLOP", Public cards: ['3c', '8c', 'Jh', '**', '**']
    My rank: ["Flush Draw"], Money: [198.00], Action: ["call"]

    Seat 1: ['**', '**'], Money: [200.00], Action: ["fold"], Discard: [True]
    Seat 2: ['**', '**'], Money: [199.00], Action: ["post SB 1.00", "call", "check"], Discard: [False]
    Seat 3: ['**', '**'], Money: [200.00], Action: ["post BB 2.00", "check"], Discard: [False]
    Seat 5: ['**', '**'], Money: [200.00], Action: ["fold"], Discard: [True]

The pot value is [6.00]

The actions can be: ["fold", "check", "bet", "all-in"]. What should I do? If I choose to "bet", then how much? Choose a number from [2.00, 5.00, 10.00, 15.00, 30.00, 60.00, 120.00].

Expected Output:

bet 5.00

Example 5 - Facing a Raise (Re-raising opportunity)

Input:

You are an experienced gambler. Now you need to assist me to make decisions in Texas Hold'em games. You have been provided with a series of observable information:

    Player amount: [4], Currency: USD, Blind value: [$2.00/$5.00], Order: [1, 2, 3, 4], Seat 1 is the button.

    My cards: ['Ad', 'Kd'], the characteristics of my cards: ["suit", "high", "close"], My seat: [Seat 3]

    Stage: "TURN", Public cards: ['7d', 'Td', '2c', 'Qh', '**']
    My rank: ["Flush Draw"], Money: [180.00], Action: ["post BB 5.00", "bet 15.00"]

    Seat 1: ['**', '**'], Money: [200.00], Action: ["fold"], Discard: [True]
    Seat 2: ['**', '**'], Money: [135.00], Action: ["post SB 2.00", "call", "raise 45.00"], Discard: [False]
    Seat 4: ['**', '**'], Money: [195.00], Action: ["fold"], Discard: [True]

The pot value is [85.00]

The actions can be: ["fold", "call", "re-raise", "all-in"]. What should I do? If I choose to "re-raise", then how much? Choose a number from [90.00, 135.00].

Expected Output:

call

Reward Function Basis

The reward function used for reinforcement learning evaluates model outputs based on how closely they match the winning player's action. Two primary components are considered:

  1. Action Match Reward: Scores based on matching the action type (fold, check, call, bet, raise)

    • Exact match: 1.0
    • Action type match (e.g., bet vs. raise): 0.7
    • Related action match (e.g., aggressive vs. passive): 0.5
  2. Bet Sizing Reward: For bet/raise actions, scores based on how closely the bet amount matches

    • Perfect match: 1.0
    • Scores decrease linearly as deviation increases
    • Zero score beyond max deviation threshold (default 50%)

These components are combined in the CombinedPokerReward with configurable weights (default: 60% action, 40% sizing) to produce the final reward signal for training.