Get started with the yente-client Python SDK¶
A linear walk through the Python SDK: install, first match, matching in depth, fetch, search, async, errors, and entities.
If you want shell access rather than Python, see the CLI overview instead. The FollowTheMoney (FtM) data model underlies every entity returned by these endpoints; on first contact with FtM, the tutorial on the FtM model in section 8 is a useful entry point.
1. Install and authenticate¶
Python 3.11+. Runtime dependencies are pydantic and httpx.
The OpenSanctions API needs an API key. Get one, then export it:
A bare Client() constructor picks up OPENSANCTIONS_API_KEY and
YENTE_BASE_URL from the environment. To target a yente instance, set
YENTE_BASE_URL=http://localhost:8000 (or pass base_url= directly).
from yente_client import Client
with Client() as client:
print(client.healthz())
# StatusResponse(status='ok')
Client is a context manager. Use with Client() as client: for
deterministic cleanup of the underlying httpx.Client.
2. Your first match¶
The SDK's primary use case is matching: given a partial or complete entity description, find candidate matches in OpenSanctions data.
Construct an entity¶
Every FollowTheMoney (FtM) schema has a Python class. Construct one with the property fields you know:
from yente_client import Client, Person
query = Person(
firstName="Vladimir",
lastName="Putin",
birthDate="1952-10-07",
)
Property names use FtM camelCase (firstName, birthDate,
nationality, passportNumber) and match the wire format exactly.
Use yente-cli ref schema Person (or the
Person API reference) to discover what properties
each schema accepts.
Run the match¶
The SDK issues one HTTP request per match() call. Bulk workflows
(matching N entities concurrently) get their own surface in a later
release; for now wrap match() in a loop or in asyncio.gather (see
section 6).
Read the response¶
match() returns a flat MatchResponse:
print(response.total.value) # candidate count
print(response.top) # highest-scoring result, or None
for hit in response.matches: # results that crossed the threshold
print(hit.score, hit.caption, hit.datasets)
Each result is a ScoredEntity carrying:
score— float in[0.0, 1.0].match—Truewhenscore >= threshold(defaults to 0.70).caption— display name from the server.explanations— per-feature score breakdown (which features fired, what their weights were).- All the usual FtM properties under
entity.properties.
See MatchResponse and ScoredEntity
for the full shape.
3. Matching in depth¶
The threshold¶
By default the server returns candidates above score=0.7 in matches,
but the full result set comes back regardless — results carries every
candidate the server considered, scored. Lower the threshold to inspect
near-misses:
Choosing an algorithm¶
The server exposes several matching algorithms. BEST_ALGORITHM resolves
to whichever the server currently recommends. Pass it for
forward-compatibility:
Use client.algorithms() to see what's enabled on the target server.
Narrowing with MatchFilters¶
Filters constrain which candidates the server considers. Pass them as
kwargs or as a MatchFilters object:
from yente_client import MatchFilters
response = client.match(query, datasets=["sanctions"], topics=["sanction"])
# or
filters = MatchFilters(datasets=["sanctions"], topics=["sanction"])
response = client.match(query, filters=filters)
When both are supplied, the kwargs win on any field they specify.
MatchFilters lists every available field.
Querying a parent schema matches descendants too¶
FtM schemas form an inheritance tree, and querying a parent schema matches
every matchable descendant in a single call. LegalEntity is the parent of
Person, Organization, Company, and PublicBody.
Use the most specific schema you can. A parent query disables the
schema-specific scoring features that make matching accurate: birthDate
and firstName comparisons only activate for Person, vessel-identifier
matching only for Vessel, and so on. A LegalEntity query also returns
more low-confidence near-misses. Reach for LegalEntity only when the input
is genuinely ambiguous between an individual and an organization — raw payee
strings, unlabeled list entries — not as a default that widens the net:
from yente_client import LegalEntity
# Only when you genuinely can't tell a person from an organization.
response = client.match(
LegalEntity(name="Acme Industries"),
datasets=["sanctions"],
)
Notes on the matchable flag¶
Schemas with matchable: false (e.g. Document) can't be queried;
client.match() raises ConfigurationError before the call. Use
yente-cli ref schemas --matchable to find valid targets.
The per-property matchable flag (shown by yente-cli ref schema NAME)
is a routing detail inside the matcher, not a usefulness indicator.
Send every property you have.
4. Fetch and adjacency¶
Given an entity ID (typically from a match result, search hit, or an external system), fetch the full record:
entity = client.fetch("NK-aU5ybkbRFJucf8YMwsJvDw")
print(entity.caption, entity.schema_)
print(entity.properties.get("topics"))
Nested vs flat¶
fetch() returns nested entities by default (when properties reference
other entities, those are expanded inline). For the flat shape, pass
nested=False. This is useful in data pipelines where you don't want
to recurse:
Adjacent entities¶
The adjacency endpoint exposes paged neighbors by property name:
# All adjacencies, grouped by property:
adj = client.adjacent(entity_id)
for prop, page in adj.adjacent.items():
print(prop, page.total.value)
# One property at a time, with paging:
page = client.adjacent(entity_id, prop="familyRelative", limit=10, offset=0)
for ent in page.results:
print(ent.caption)
See AdjacentResponse and
AdjacentPropertyResponse.
5. Search (for user-facing search UIs)¶
search() is a different use case from matching. Reach for it when
you're building an end-user search experience: autocomplete fields,
browse pages, search-this-database forms where a human is typing into
the input.
results = client.search("acme", datasets=["default"], schema="Company")
for entity in results.results:
print(entity.caption, entity.id)
search() returns plain Entity objects (no score, no match flag). Use
SearchFilters for the full filter shape (countries,
schema, free-text filter: clauses, facets).
6. Async¶
AsyncClient mirrors Client method-for-method. Use it when running
many requests concurrently: the network is the bottleneck, and async
lets one event loop juggle hundreds of in-flight requests.
import asyncio
from yente_client import AsyncClient, Person
async def screen_all(queries: list[Person]) -> list:
async with AsyncClient() as client:
return await asyncio.gather(
*(client.match(q, datasets=["sanctions"]) for q in queries)
)
responses = asyncio.run(screen_all([
Person(firstName="Alex", lastName="Smith"),
Person(firstName="Maria", lastName="Garcia"),
]))
async with handles cleanup; await client.aclose() is the manual form.
See AsyncClient.
7. Errors¶
Every error raised by this client inherits from YenteError. The tree:
YenteError
├── ConfigurationError # bad client config, non-matchable schema, …
├── TransportError # network, timeout, TLS
└── APIError # non-2xx response
├── AuthenticationError # 401, 403
├── BadRequestError # 400
├── NotFoundError # 404
├── RateLimitError # 429 (carries .retry_after)
└── ServerError # 5xx
Catch by category when you want to handle a class of failure:
from yente_client.exceptions import APIError, TransportError
try:
response = client.match(query)
except RateLimitError as exc:
sleep(exc.retry_after or 5)
except APIError as exc:
log.error("server said %s: %s", exc.status_code, exc.detail)
except TransportError:
# The request never reached the server.
raise
Input-shape errors (typo in a property name, wrong value type) surface
as pydantic.ValidationError. The SDK does not wrap or alias it.
Retries are not built in. The client raises; callers handle backoff.
See exceptions for full per-class details.
8. Entities and the FtM model¶
The per-schema input classes (Person, Company, Vessel, …) are
generated from a bundled snapshot of the FtM model. Field names match
the wire format exactly (camelCase).
Discovering schemas at runtime¶
from yente_client.schemas import has_schema, iter_properties, matchable_schemata
has_schema("Person") # True
list(iter_properties("Person")) # ['abbreviation', 'address', …, 'weight']
matchable_schemata() # ['Address', 'Airplane', 'Company', …]
The same data is available via the CLI:
See schemas for the full lookup helper set.
Updating the bundled model¶
The model is a snapshot, pinned at SDK-release time. Properties added
upstream don't appear until the next SDK release. Maintainers run
make regen-model to refresh.
9. Statements — lineage and diagnostics (OpenSanctions API only)¶
The /statements endpoint exposes the atomic claims that compose into
entities: each row is one (entity_id, prop, value) triple plus the
dataset that asserted it, the language, the pre-cleaning original value,
and the first/last-seen timestamps. Reach for it for diagnostics:
tracking where a value came from, finding deduplication issues,
auditing data quality. See the
statement-based data model
for background.
for stmt in client.statements(canonical_id="NK-aU5ybkbRFJucf8YMwsJvDw").results:
print(stmt.dataset, stmt.prop, stmt.value, stmt.first_seen)
Use canonical_id= for almost every call. Pass the ID returned by
match / search / fetch. It returns every source fragment that was
deduplicated into the canonical entity — usually what you want when
investigating a record. entity_id= returns only one source's
pre-deduplication fragment, which is useful for source-level audits but is
not a
substitute for canonical_id: the same person across five sanctions
lists has five distinct entity_id values and only one canonical_id.
Only the OpenSanctions API serves this endpoint — it is backed by a
Postgres instance that yente does not ship. Calls against a yente
instance surface as NotFoundError.
Following entity references¶
Entity-typed properties (Sanction.entity, Ownership.owner,
Family.relative, …) encode references between entities. In a
statement row, each such reference carries:
prop_type=entityvalue= the canonical_id of the referenced entityoriginal_value= the entity_id of the referenced entity in its source system
The statements stream is therefore a graph that can be traversed in either direction.
Forward (what does this entity reference?). Pull all statements for the entity and pick out the rows whose property is entity-typed:
for stmt in client.statements(canonical_id="Q7747").results:
if stmt.prop_type == "entity":
print(stmt.prop, "→", stmt.value)
Reverse (what references this entity?). Filter on the schema,
the property name, and the target's canonical_id. To find every
Sanction that asserts Putin (Q7747) as its target:
for stmt in client.statements(
schema="Sanction",
prop="entity",
value="Q7747",
).results:
print(stmt.dataset, stmt.entity_id)
Each row is one source's assertion: a single canonical Sanction may
appear once per source list that recorded it. To find references
across all entity-typed properties of a schema (e.g. both
Ownership.owner and Ownership.asset), issue one query per
property.
Where to go next¶
- CLI overview —
yente-cli, agent automations, shell pipelines. - API reference — full signatures of every public symbol.
- OpenSanctions docs — domain context: sanctions screening, the FtM data model, the API quickstart, getting an API key.