Skip to content

candidate_count > 1: candidates[0].content.parts contains every candidate's text; response.text concatenates them all #2507

@Python-Is-Long

Description

@Python-Is-Long

Description

When calling generate_content with candidate_count > 1, response.candidates[0].content.parts contains every candidate's text (one entry per candidate, in candidate order), not just candidate 0's own response. As a consequence, response.text — which joins all parts of candidates[0] — returns the concatenation of every candidate's response instead of a single candidate's text.

This is structurally surprising, undocumented (as far as I can find in the API reference and SDK docs), and easy to mishandle in downstream code that assumes parts of one candidate belong only to that candidate.

Environment

  • Programming language: Python 3.10
  • Package version: reproduced on google-genai 1.69.0 and 2.6.0
  • Model: gemini-2.5-flash (also reproduces on gemini-2.5-flash-lite)
  • API: Gemini Developer API (key-based, not Vertex)

Reproduction

import os
from google import genai
from google.genai import types

client = genai.Client(api_key=os.environ['GEMINI_API_KEY'])

def non_thought_texts(parts):
    return [p.text for p in (parts or [])
            if getattr(p, 'text', None) and not getattr(p, 'thought', False)
            and p.text.strip()]

for cc in (1, 2, 3):
    print(f"\n=== candidate_count = {cc} ===")
    response = client.models.generate_content(
        model='gemini-2.5-flash',
        contents=[types.Content(role='user',
                                parts=[types.Part(text='Write one short greeting.')])],
        config=types.GenerateContentConfig(
            max_output_tokens=4096,
            temperature=1.0,
            top_p=0.95,
            candidate_count=cc if cc > 1 else None,
            thinking_config=types.ThinkingConfig(thinking_budget=0),
        ),
    )
    candidates = response.candidates
    print(f"len(response.candidates) = {len(candidates)}")
    c0_texts = non_thought_texts(candidates[0].content.parts)
    print(f"candidates[0].non_thought_texts: count={len(c0_texts)}, "
          f"lens={[len(t) for t in c0_texts]}")
    for i in range(1, len(candidates)):
        ci_texts = non_thought_texts(candidates[i].content.parts)
        print(f"candidates[{i}].non_thought_texts: count={len(ci_texts)}, "
              f"lens={[len(t) for t in ci_texts]}")
        if i < len(c0_texts) and ci_texts:
            match = c0_texts[i] == ci_texts[0]
            print(f"  candidates[0].non_thought_texts[{i}] == "
                  f"candidates[{i}].non_thought_texts[0]?  {match}")
    rt = response.text or ''
    total_c0 = sum(len(t) for t in c0_texts)
    print(f"response.text len = {len(rt)}; sum(c[0].non_thought_lens) = {total_c0}")
    print(f"response.text equals concat of c[0].non_thought_texts? "
          f"{rt == ''.join(c0_texts)}")

Observed output

=== candidate_count = 1 ===
len(response.candidates) = 1
candidates[0].non_thought_texts: count=1, lens=[3]
response.text len = 3; sum(c[0].non_thought_lens) = 3
response.text equals concat of c[0].non_thought_texts? True

=== candidate_count = 2 ===
len(response.candidates) = 2
candidates[0].non_thought_texts: count=2, lens=[3, 3]
candidates[1].non_thought_texts: count=1, lens=[3]
  candidates[0].non_thought_texts[1] == candidates[1].non_thought_texts[0]?  True
response.text len = 6; sum(c[0].non_thought_lens) = 6
response.text equals concat of c[0].non_thought_texts? True

=== candidate_count = 3 ===
len(response.candidates) = 3
candidates[0].non_thought_texts: count=3, lens=[3, 9, 3]
candidates[1].non_thought_texts: count=1, lens=[9]
  candidates[0].non_thought_texts[1] == candidates[1].non_thought_texts[0]?  True
candidates[2].non_thought_texts: count=1, lens=[3]
  candidates[0].non_thought_texts[2] == candidates[2].non_thought_texts[0]?  True
response.text len = 15; sum(c[0].non_thought_lens) = 15
response.text equals concat of c[0].non_thought_texts? True

Note the byte-equality assertions: candidates[0].non_thought_texts[i] is byte-identical to candidates[i].non_thought_texts[0] for every i >= 1. So candidates[0].parts literally contains a copy of every sibling candidate's text.

With thinking enabled

The pattern extends: candidates[0].parts becomes [thought_0, text_0, thought_1, text_1, ..., thought_{N-1}, text_{N-1}], and each candidates[i].parts for i >= 1 is [thought_i, text_i]. So response.text (which now also drops thoughts via its property accessor — but the underlying packing is the same) still ends up joining sibling candidates' bodies.

Verified across

Combination Lengths observed
flash, cc=2, temp=0.7 c[0]=[30,31], c[1]=[31]
flash, cc=3, temp=0.7 c[0]=[20,29,27], c[1]=[29], c[2]=[27]
flash, cc=3, temp=1.0 c[0]=[179,347,201], c[1]=[347], c[2]=[201]
flash, cc=3, long output c[0]=[2404,2347,1709], c[1]=[2347], c[2]=[1709]
flash, cc=3, thinking on c[0]=[t,47,t,43,t,75], c[1]=[t,43], c[2]=[t,75]
flash-lite, cc=3 same pattern

In every case candidates[0].non_thought_texts[i] == candidates[i].non_thought_texts[0] (byte equality). Reproduced on both SDK 1.69.0 and 2.6.0.

Expected behavior

One of the following:

  1. candidates[0].content.parts should contain only candidate 0's own content. Each sibling candidate's content lives in candidates[i] already, so duplicating it into candidates[0].parts is redundant and surprising.
  2. If the current packing is intentional, it should be documented prominently in the candidate_count reference (both API docs and SDK docstrings), and the response.text property should either (a) materialize only candidates[0]'s own portion or (b) raise / warn more clearly than the current "returning text result from the first candidate" message, which doesn't hint that the result concatenates every sibling.

Actual behavior

response.text returns the joined text of every candidate when candidate_count > 1. Downstream code that uses response.text (a natural default) silently ships N candidates concatenated as one reply. Code that iterates response.candidates[i].content.parts and expects per-candidate isolation also breaks unless it knows to ignore parts[1:] of candidates[0].

Suggested fix

Either:

  • Stop populating candidates[0].content.parts with sibling text — let each candidate hold only its own content. This is the least-surprising shape and matches what the documentation implies.
  • Or, if the underlying API legitimately returns the data this way, have the SDK normalize it before exposing candidates[0] to the user, and make response.text raise on candidate_count > 1 rather than silently returning a concatenation.

Workaround

For each candidate i, read the first non-thought, non-empty text part rather than relying on response.text or joining candidates[0].content.parts:

def candidate_own_text(candidate):
    for part in (candidate.content.parts or []):
        if getattr(part, 'thought', False):
            continue
        text = (getattr(part, 'text', '') or '').strip()
        if text:
            return text
    return None

per_candidate_texts = [candidate_own_text(c) for c in response.candidates]

Metadata

Metadata

Labels

priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions