Embeddings & Citation-Pipeline¶

Semantische Suche in Wahlprogrammen, Zitat-Rekonstruktion und PDF-Highlighting.

Retrieval¶

`app.embeddings.find_relevant_chunks(query, parteien=None, typ=None, bundesland=None, top_k=3, min_similarity=0.5)` ¶

Find most relevant chunks for a query.

Parameters:

Name	Type	Description	Default
`bundesland`	`str`	Wenn gesetzt, werden nur Chunks dieses Bundeslands ODER globale Chunks (bundesland IS NULL, z.B. Grundsatzprogramme) berücksichtigt. Wenn None, kein Filter.	`None`

`app.embeddings.get_relevant_quotes_for_antrag(antrag_text, fraktionen, bundesland, top_k_per_partei=2)` ¶

Get relevant quotes from Wahl- and Parteiprogramme for an Antrag.

Parameters:

Name	Type	Description	Default
`bundesland`	`str`	Pflicht. Bestimmt, welche Wahlprogramme durchsucht werden und welche Regierungsfraktionen zusätzlich zu den Antragstellern einbezogen werden.	required

Prompt-Formatierung¶

`app.embeddings.format_quotes_for_prompt(quotes, searched_parties=None)` ¶

Format quotes for inclusion in LLM prompt.

Each chunk gets a stable ENUM-ID ([Q1], [Q2], …) and the prompt instructs the LLM to anchor every citation in one of those IDs and to copy the snippet verbatim from the cited chunk. This is the structural fix for Issue #60: pre-#60 the LLM was free to invent snippets under real source labels because nothing in the prompt bound a citation to a specific retrieved chunk.

Each quote is annotated with the fully-qualified source (programme name + page) so the LLM cannot fall back on training-set defaults when constructing its citations.

Issue #63 erweitert: wenn searched_parties übergeben wird, werden Parteien, für die kein Chunk retrievt wurde, im Prompt explizit als "keine Quellen im Index" markiert. Das LLM wird angewiesen, für diese Parteien score: null zu setzen statt aus dem Trainingswissen zu raten.

Citation Post-Processing (Issue #60)¶

`app.embeddings.reconstruct_zitate(data, semantic_quotes)` ¶

Replace LLM-emitted quelle/url with canonical chunk values; drop unbacked.

Walks over data['wahlprogrammScores'][i][kind]['zitate'] (the raw LLM-output dict, not the Pydantic model). For each Zitat:

Locate the chunk whose text contains the snippet (or a 5-word anchor from it). Search across all retrieved chunks regardless of party, so cross-mixes between Q-IDs become invisible to the persisted output.
If found: overwrite quelle and url with values derived from the matching chunk's programm_id + seite. The LLM is no longer trusted for these fields.
If not found: drop the Zitat entirely.

Returns the same data dict (mutated in place) for chaining.

`app.embeddings.find_chunk_for_text(text, chunks)` ¶

Locate the retrieved chunk that a Zitat snippet was copied from.

Two-stage match identical to Sub-D

Strict substring — full needle as substring of any chunk.
5-word anchor — any 5 consecutive words of the needle as substring of any chunk.

Snippets shorter than 20 characters are rejected (too weak to bind). Returns the matching chunk dict, or None.

PDF-Highlighting (Issue #47)¶

`app.embeddings.render_highlighted_page(programm_id, seite, query)` ¶

Render a single Wahlprogramm-page with yellow highlights for a query.

Used by the /api/wahlprogramm-cite endpoint to serve a one-page PDF where the cited snippet is visually highlighted via PyMuPDF add_highlight_annot. Returns the serialized PDF bytes, or None if the programme/page can't be resolved.

Returns a tuple (pdf_bytes, found_page, highlighted) where found_page is the 1-indexed page number and highlighted is True if the text was found and annotated. Returns (None, 0, False) if the programme/page can't be resolved.

Parameters:

Name	Type	Description	Default
`programm_id`	`str`	Key into PROGRAMME registry — validated by caller.	required
`seite`	`int`	1-indexed page number within the programme PDF.	required
`query`	`str`	Snippet text to search and highlight on the page. Long queries are truncated to the first 200 characters before the search; PyMuPDF's `search_for` falls over on huge needles anyway and a short anchor is what we want for the visual hit.	required

Indexierung¶

`app.embeddings.index_programm(programm_id, pdf_dir)` ¶

Index a single program PDF into embeddings database.

app.embeddings.PROGRAMME = {'spd-nrw-2022': {'name': 'SPD NRW Wahlprogramm 2022', 'typ': 'wahlprogramm', 'partei': 'SPD', 'bundesland': 'NRW', 'pdf': 'spd-nrw-2022.pdf'}, 'cdu-nrw-2022': {'name': 'CDU NRW Wahlprogramm 2022', 'typ': 'wahlprogramm', 'partei': 'CDU', 'bundesland': 'NRW', 'pdf': 'cdu-nrw-2022.pdf'}, 'gruene-nrw-2022': {'name': 'Grüne NRW Wahlprogramm 2022', 'typ': 'wahlprogramm', 'partei': 'GRÜNE', 'bundesland': 'NRW', 'pdf': 'gruene-nrw-2022.pdf'}, 'fdp-nrw-2022': {'name': 'FDP NRW Wahlprogramm 2022', 'typ': 'wahlprogramm', 'partei': 'FDP', 'bundesland': 'NRW', 'pdf': 'fdp-nrw-2022.pdf'}, 'afd-nrw-2022': {'name': 'AfD NRW Wahlprogramm 2022', 'typ': 'wahlprogramm', 'partei': 'AfD', 'bundesland': 'NRW', 'pdf': 'afd-nrw-2022.pdf'}, 'cdu-lsa-2021': {'name': 'CDU Sachsen-Anhalt Regierungsprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'CDU', 'bundesland': 'LSA', 'pdf': 'cdu-lsa-2021.pdf'}, 'spd-lsa-2021': {'name': 'SPD Sachsen-Anhalt Wahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'SPD', 'bundesland': 'LSA', 'pdf': 'spd-lsa-2021.pdf'}, 'gruene-lsa-2021': {'name': 'Grüne Sachsen-Anhalt Wahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'GRÜNE', 'bundesland': 'LSA', 'pdf': 'gruene-lsa-2021.pdf'}, 'fdp-lsa-2021': {'name': 'FDP Sachsen-Anhalt Wahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'FDP', 'bundesland': 'LSA', 'pdf': 'fdp-lsa-2021.pdf'}, 'afd-lsa-2021': {'name': 'AfD Sachsen-Anhalt Wahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'AfD', 'bundesland': 'LSA', 'pdf': 'afd-lsa-2021.pdf'}, 'linke-lsa-2021': {'name': 'DIE LINKE Sachsen-Anhalt Wahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'LINKE', 'bundesland': 'LSA', 'pdf': 'linke-lsa-2021.pdf'}, 'cdu-mv-2021': {'name': 'CDU Mecklenburg-Vorpommern Wahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'CDU', 'bundesland': 'MV', 'pdf': 'cdu-mv-2021.pdf'}, 'spd-mv-2021': {'name': 'SPD Mecklenburg-Vorpommern Regierungsprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'SPD', 'bundesland': 'MV', 'pdf': 'spd-mv-2021.pdf'}, 'gruene-mv-2021': {'name': 'Grüne Mecklenburg-Vorpommern Wahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'GRÜNE', 'bundesland': 'MV', 'pdf': 'gruene-mv-2021.pdf'}, 'fdp-mv-2021': {'name': 'FDP Mecklenburg-Vorpommern Wahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'FDP', 'bundesland': 'MV', 'pdf': 'fdp-mv-2021.pdf'}, 'afd-mv-2021': {'name': 'AfD Mecklenburg-Vorpommern Landeswahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'AfD', 'bundesland': 'MV', 'pdf': 'afd-mv-2021.pdf'}, 'linke-mv-2021': {'name': 'DIE LINKE Mecklenburg-Vorpommern Zukunftsprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'LINKE', 'bundesland': 'MV', 'pdf': 'linke-mv-2021.pdf'}, 'cdu-be-2023': {'name': 'CDU Berlin Berlin-Plan 2021', 'typ': 'wahlprogramm', 'partei': 'CDU', 'bundesland': 'BE', 'pdf': 'cdu-be-2023.pdf'}, 'spd-be-2023': {'name': 'SPD Berlin Wahlprogramm AGH 2021', 'typ': 'wahlprogramm', 'partei': 'SPD', 'bundesland': 'BE', 'pdf': 'spd-be-2023.pdf'}, 'gruene-be-2023': {'name': 'Grüne Berlin Landeswahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'GRÜNE', 'bundesland': 'BE', 'pdf': 'gruene-be-2023.pdf'}, 'linke-be-2023': {'name': 'DIE LINKE Berlin Wahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'LINKE', 'bundesland': 'BE', 'pdf': 'linke-be-2023.pdf'}, 'afd-be-2023': {'name': 'AfD Berlin Wahlprogramm AGH 2021', 'typ': 'wahlprogramm', 'partei': 'AfD', 'bundesland': 'BE', 'pdf': 'afd-be-2023.pdf'}, 'cdu-th-2024': {'name': 'CDU Thüringen Wahlprogramm 2024', 'typ': 'wahlprogramm', 'partei': 'CDU', 'bundesland': 'TH', 'pdf': 'cdu-th-2024.pdf'}, 'afd-th-2024': {'name': 'AfD Thüringen Wahlprogramm 2024', 'typ': 'wahlprogramm', 'partei': 'AfD', 'bundesland': 'TH', 'pdf': 'afd-th-2024.pdf'}, 'linke-th-2024': {'name': 'DIE LINKE Thüringen Wahlprogramm 2024', 'typ': 'wahlprogramm', 'partei': 'LINKE', 'bundesland': 'TH', 'pdf': 'linke-th-2024.pdf'}, 'bsw-th-2024': {'name': 'BSW Thüringen Wahlprogramm 2024', 'typ': 'wahlprogramm', 'partei': 'BSW', 'bundesland': 'TH', 'pdf': 'bsw-th-2024.pdf'}, 'spd-th-2024': {'name': 'SPD Thüringen Wahlprogramm 2024', 'typ': 'wahlprogramm', 'partei': 'SPD', 'bundesland': 'TH', 'pdf': 'spd-th-2024.pdf'}, 'spd-bb-2024': {'name': 'SPD Brandenburg Wahlprogramm 2024', 'typ': 'wahlprogramm', 'partei': 'SPD', 'bundesland': 'BB', 'pdf': 'spd-bb-2024.pdf'}, 'afd-bb-2024': {'name': 'AfD Brandenburg Wahlprogramm 2024', 'typ': 'wahlprogramm', 'partei': 'AfD', 'bundesland': 'BB', 'pdf': 'afd-bb-2024.pdf'}, 'cdu-bb-2024': {'name': 'CDU Brandenburg Wahlprogramm 2024', 'typ': 'wahlprogramm', 'partei': 'CDU', 'bundesland': 'BB', 'pdf': 'cdu-bb-2024.pdf'}, 'bsw-bb-2024': {'name': 'BSW Brandenburg Wahlprogramm 2024', 'typ': 'wahlprogramm', 'partei': 'BSW', 'bundesland': 'BB', 'pdf': 'bsw-bb-2024.pdf'}, 'spd-hh-2025': {'name': 'SPD Hamburg Wahlprogramm 2025', 'typ': 'wahlprogramm', 'partei': 'SPD', 'bundesland': 'HH', 'pdf': 'spd-hh-2025.pdf'}, 'cdu-hh-2025': {'name': 'CDU Hamburg Wahlprogramm 2025', 'typ': 'wahlprogramm', 'partei': 'CDU', 'bundesland': 'HH', 'pdf': 'cdu-hh-2025.pdf'}, 'gruene-hh-2025': {'name': 'Grüne Hamburg Regierungsprogramm 2025', 'typ': 'wahlprogramm', 'partei': 'GRÜNE', 'bundesland': 'HH', 'pdf': 'gruene-hh-2025.pdf'}, 'linke-hh-2025': {'name': 'DIE LINKE Hamburg Wahlprogramm 2025', 'typ': 'wahlprogramm', 'partei': 'LINKE', 'bundesland': 'HH', 'pdf': 'linke-hh-2025.pdf'}, 'afd-hh-2025': {'name': 'AfD Hamburg Wahlprogramm 2025', 'typ': 'wahlprogramm', 'partei': 'AfD', 'bundesland': 'HH', 'pdf': 'afd-hh-2025.pdf'}, 'cdu-sh-2022': {'name': 'CDU Schleswig-Holstein Wahlprogramm 2022', 'typ': 'wahlprogramm', 'partei': 'CDU', 'bundesland': 'SH', 'pdf': 'cdu-sh-2022.pdf'}, 'spd-sh-2022': {'name': 'SPD Schleswig-Holstein Wahlprogramm 2022', 'typ': 'wahlprogramm', 'partei': 'SPD', 'bundesland': 'SH', 'pdf': 'spd-sh-2022.pdf'}, 'gruene-sh-2022': {'name': 'Grüne Schleswig-Holstein Wahlprogramm 2022', 'typ': 'wahlprogramm', 'partei': 'GRÜNE', 'bundesland': 'SH', 'pdf': 'gruene-sh-2022.pdf'}, 'fdp-sh-2022': {'name': 'FDP Schleswig-Holstein Wahlprogramm 2022', 'typ': 'wahlprogramm', 'partei': 'FDP', 'bundesland': 'SH', 'pdf': 'fdp-sh-2022.pdf'}, 'ssw-sh-2022': {'name': 'SSW Schleswig-Holstein Wahlprogramm 2022', 'typ': 'wahlprogramm', 'partei': 'SSW', 'bundesland': 'SH', 'pdf': 'ssw-sh-2022.pdf'}, 'gruene-bw-2021': {'name': 'Grüne Baden-Württemberg Wahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'GRÜNE', 'bundesland': 'BW', 'pdf': 'gruene-bw-2021.pdf'}, 'cdu-bw-2021': {'name': 'CDU Baden-Württemberg Regierungsprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'CDU', 'bundesland': 'BW', 'pdf': 'cdu-bw-2021.pdf'}, 'afd-bw-2021': {'name': 'AfD Baden-Württemberg Wahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'AfD', 'bundesland': 'BW', 'pdf': 'afd-bw-2021.pdf'}, 'spd-bw-2021': {'name': 'SPD Baden-Württemberg Wahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'SPD', 'bundesland': 'BW', 'pdf': 'spd-bw-2021.pdf'}, 'fdp-bw-2021': {'name': 'FDP Baden-Württemberg Wahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'FDP', 'bundesland': 'BW', 'pdf': 'fdp-bw-2021.pdf'}, 'spd-rp-2021': {'name': 'SPD Rheinland-Pfalz Regierungsprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'SPD', 'bundesland': 'RP', 'pdf': 'spd-rp-2021.pdf'}, 'cdu-rp-2021': {'name': 'CDU Rheinland-Pfalz Regierungsprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'CDU', 'bundesland': 'RP', 'pdf': 'cdu-rp-2021.pdf'}, 'afd-rp-2021': {'name': 'AfD Rheinland-Pfalz Wahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'AfD', 'bundesland': 'RP', 'pdf': 'afd-rp-2021.pdf'}, 'gruene-rp-2021': {'name': 'Grüne Rheinland-Pfalz Wahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'GRÜNE', 'bundesland': 'RP', 'pdf': 'gruene-rp-2021.pdf'}, 'fw-rp-2021': {'name': 'FREIE WÄHLER Rheinland-Pfalz Wahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'FREIE WÄHLER', 'bundesland': 'RP', 'pdf': 'fw-rp-2021.pdf'}, 'fdp-rp-2021': {'name': 'FDP Rheinland-Pfalz Wahlprogramm 2021', 'typ': 'wahlprogramm', 'partei': 'FDP', 'bundesland': 'RP', 'pdf': 'fdp-rp-2021.pdf'}, 'spd-grundsatz': {'name': 'SPD Grundsatzprogramm 2007', 'typ': 'parteiprogramm', 'partei': 'SPD', 'pdf': 'spd-grundsatzprogramm.pdf'}, 'cdu-grundsatz': {'name': 'CDU Grundsatzprogramm 2007', 'typ': 'parteiprogramm', 'partei': 'CDU', 'pdf': 'cdu-grundsatzprogramm.pdf'}, 'gruene-grundsatz': {'name': 'Grüne Grundsatzprogramm 2020', 'typ': 'parteiprogramm', 'partei': 'GRÜNE', 'pdf': 'gruene-grundsatzprogramm.pdf'}, 'fdp-grundsatz': {'name': 'FDP Grundsatzprogramm 2012', 'typ': 'parteiprogramm', 'partei': 'FDP', 'pdf': 'fdp-grundsatzprogramm.pdf'}, 'afd-grundsatz': {'name': 'AfD Grundsatzprogramm 2016', 'typ': 'parteiprogramm', 'partei': 'AfD', 'pdf': 'afd-grundsatzprogramm.pdf'}, 'linke-grundsatz': {'name': 'DIE LINKE Erfurter Programm 2011', 'typ': 'parteiprogramm', 'partei': 'LINKE', 'pdf': 'linke-grundsatzprogramm.pdf'}, 'csu-by-2023': {'name': 'CSU Bayernplan 2023', 'typ': 'wahlprogramm', 'partei': 'CSU', 'bundesland': 'BY', 'pdf': 'csu-by-2023.pdf'}, 'gruene-by-2023': {'name': 'Grüne Bayern Regierungsprogramm 2023', 'typ': 'wahlprogramm', 'partei': 'GRÜNE', 'bundesland': 'BY', 'pdf': 'gruene-by-2023.pdf'}, 'fw-by-2023': {'name': 'FREIE WÄHLER Bayern Wahlprogramm 2023', 'typ': 'wahlprogramm', 'partei': 'FW', 'bundesland': 'BY', 'pdf': 'fw-by-2023.pdf'}, 'afd-by-2023': {'name': 'AfD Bayern Wahlprogramm 2023', 'typ': 'wahlprogramm', 'partei': 'AfD', 'bundesland': 'BY', 'pdf': 'afd-by-2023.pdf'}, 'spd-by-2023': {'name': 'SPD Bayern Zukunftsprogramm 2023', 'typ': 'wahlprogramm', 'partei': 'SPD', 'bundesland': 'BY', 'pdf': 'spd-by-2023.pdf'}, 'spd-ni-2022': {'name': 'SPD Niedersachsen Regierungsprogramm 2022', 'typ': 'wahlprogramm', 'partei': 'SPD', 'bundesland': 'NI', 'pdf': 'spd-ni-2022.pdf'}, 'cdu-ni-2022': {'name': 'CDU Niedersachsen Regierungsprogramm 2022', 'typ': 'wahlprogramm', 'partei': 'CDU', 'bundesland': 'NI', 'pdf': 'cdu-ni-2022.pdf'}, 'gruene-ni-2022': {'name': 'Grüne Niedersachsen Wahlprogramm 2022', 'typ': 'wahlprogramm', 'partei': 'GRÜNE', 'bundesland': 'NI', 'pdf': 'gruene-ni-2022.pdf'}, 'afd-ni-2022': {'name': 'AfD Niedersachsen Wahlprogramm 2022', 'typ': 'wahlprogramm', 'partei': 'AfD', 'bundesland': 'NI', 'pdf': 'afd-ni-2022.pdf'}, 'spd-sl-2022': {'name': 'SPD Saarland Wahlprogramm 2022', 'typ': 'wahlprogramm', 'partei': 'SPD', 'bundesland': 'SL', 'pdf': 'spd-sl-2022.pdf'}, 'cdu-sl-2022': {'name': 'CDU Saarland Wahlprogramm 2022', 'typ': 'wahlprogramm', 'partei': 'CDU', 'bundesland': 'SL', 'pdf': 'cdu-sl-2022.pdf'}} `module-attribute` ¶

Embeddings & Citation-Pipeline¶

Retrieval¶

app.embeddings.find_relevant_chunks(query, parteien=None, typ=None, bundesland=None, top_k=3, min_similarity=0.5) ¶

app.embeddings.get_relevant_quotes_for_antrag(antrag_text, fraktionen, bundesland, top_k_per_partei=2) ¶

Prompt-Formatierung¶

app.embeddings.format_quotes_for_prompt(quotes, searched_parties=None) ¶

Citation Post-Processing (Issue #60)¶

app.embeddings.reconstruct_zitate(data, semantic_quotes) ¶

app.embeddings.find_chunk_for_text(text, chunks) ¶

PDF-Highlighting (Issue #47)¶

app.embeddings.render_highlighted_page(programm_id, seite, query) ¶

Indexierung¶

app.embeddings.index_programm(programm_id, pdf_dir) ¶

`app.embeddings.find_relevant_chunks(query, parteien=None, typ=None, bundesland=None, top_k=3, min_similarity=0.5)` ¶

`app.embeddings.get_relevant_quotes_for_antrag(antrag_text, fraktionen, bundesland, top_k_per_partei=2)` ¶

`app.embeddings.format_quotes_for_prompt(quotes, searched_parties=None)` ¶

`app.embeddings.reconstruct_zitate(data, semantic_quotes)` ¶

`app.embeddings.find_chunk_for_text(text, chunks)` ¶

`app.embeddings.render_highlighted_page(programm_id, seite, query)` ¶

`app.embeddings.index_programm(programm_id, pdf_dir)` ¶