The MKK API exposes a dedicated data quality endpoint that aggregates extraction issues across documents for one or more funds. You can use it to identify documents with too few line items, values that failed to parse as numbers, mappings that may need human review, and documents with no usable content at all. This is the starting point for any data validation or remediation workflow.Documentation Index
Fetch the complete documentation index at: https://demircancelebi.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
The data quality endpoint
Parameters
| Parameter | Type | Description |
|---|---|---|
fund_id | integer | Filter to a single fund by internal ID. |
fund_code | string | Filter to a single fund by code (e.g. OJB). |
limit | integer | Maximum number of issue records to return. |
low_line_item_threshold | integer | Documents with fewer line items than this value are flagged. Default varies by deployment. |
Quality issue categories
The response groups issues into six named categories:1. low_line_item_documents
Documents that contain fewer line items than thelow_line_item_threshold. This usually indicates a parsing failure, an unusual document layout, or a document that was submitted without structured financial data.
2. portfolio_only_documents
Documents that contain portfolio (holdings) data but no structured line item values. These documents are partially usable — portfolio analysis is possible — but financial summary data (e.g. net asset value, expense ratios) is absent.3. empty_documents
Documents with neither line items nor portfolio entries. These documents were processed but yielded no structured data at all. They may correspond to cover pages, amendments, or documents in an unsupported format.4. review_mappings
Line item values where the mapping confidence score falls below the acceptable threshold. These values were extracted and mapped to a known line item slug, but the match is uncertain enough to warrant manual verification before use in analysis.5. numeric_parse_failures
Values where the raw text was extracted but could not be parsed into a numeric value. Common causes include non-standard number formatting, footnote markers embedded in the value field, or text entries that are not numeric by nature.6. missing_pdfs
Documents whose original PDF source file is not available viaGET /documents/{docId}/pdf. The document record exists and may contain extracted data, but you cannot verify it against the source.
Response structure
A representative response forGET /data-quality?fund_code=OJB:
The summary section
Thesummary object provides aggregate counts for each issue category across all documents matching your filters. Use the summary to triage: if review_mappings is large, focus on improving mapping rules; if empty_documents is non-zero, investigate whether those document formats are supported.
The mapping_methods breakdown
Themapping_methods array shows how values were mapped to line item slugs and the average confidence for each method. Methods with low average confidence across many values indicate a systematic extraction or mapping issue that may need a rule update rather than individual review.
Adjusting the low_line_item_threshold
Thelow_line_item_threshold parameter controls how many line items a document must contain before it is considered adequately populated. Adjust it to match the typical richness of your fund’s documents.
The threshold only affects the
low_line_item_documents category. The other five categories are always computed regardless of the threshold value.Exporting the quality report as CSV
UseGET /exports/data-quality.csv to download the same report in CSV format for offline analysis or sharing with your data team.
fund_id, fund_code, low_line_item_threshold, and export_limit parameters as the JSON endpoint.