Advanced Configuration Guide

Connection Management Settings

These settings control how the library manages database connections across service principals, which is crucial for performance and reliability.

SettingDescriptionDefaultImpact of Changes & Default Reasoning
advanced.connection.max_connections_per_spMaximum concurrent connections per service principal. Dynamics 365 enforces a hard limit of 52 connections.50We default to 50 connections to leave headroom for system operations while maximizing throughput. Higher values enable more concurrent operations but risk hitting CRM's hard limit of 52, while lower values reduce resource usage and throughput but provide more safety margin.

Performance Settings

These settings control data processing and resource utilization. They significantly impact throughput and reliability.

SettingDescriptionDefaultImpact of Changes & Default Reasoning
advanced.performance.num_workersMaximum size of thread pool for parallel operations.32Default of 32 provides a good baseline for parallel processing without overwhelming system resources. Should be increased based on service principal count (typically num_sps * 50) and available system resources. Higher values increase throughput but consume more system resources; lower values provide more predictable performance with less resource usage (and ultimately more time).
advanced.performance.batch_sizeNumber of records per batch.50,000Default of 50,000 balances multiple factors: memory usage (larger batches need more memory), processing efficiency (bigger batches mean less overhead), API pagination limit for CRM (i.e. 50,000 records), pushdown filter limits (after a certain point it takes less time to pull the entire target set than filter with hundreds of thousands of predicates), and error recovery (smaller batches allow finer-grained recovery). Larger sizes improve throughput but require more memory and have longer recovery times on failure. Smaller sizes enable faster error recovery but introduce more processing overhead.
advanced.performance.id_based_strategy_enabledWhether to use ID-based upsert strategy.FalseDefault is False to ensure compatibility with all data scenarios. When enabled with available IDs, provides significant performance improvements by bypassing business key matching. Should be enabled when record IDs are consistently available in source data. ID here means the CRM generated surrogate key. NOTE: Using composite business keys (i.e. multiple columns needed to identify a row) should be avoided whenever possible to improve performance (this is particularly important when using pushdown filtering)
advanced.performance.deferred_retry_enabledWhether to enable deferred retry mechanism for failed operations.FalseDefault is False to provide immediate feedback. When enabled, failed operations (specifically rate-limited ones) are queued for later retry rather than failing immediately. Improves reliability for rate-limited scenarios but extends overall processing time. For more information see Understanding Rate Limit Handling.
advanced.performance.max_deferred_retriesMaximum number of retry attempts for deferred operations.3Default of 3 provides reasonable retry attempts without excessive delays. Higher values improve success rate for transient failures but can significantly extend processing time. Lower values fail faster but may miss recovery opportunities.
advanced.performance.pushdown_filter_enabledWhether to use key-based filtering when retrieving target data.TrueDefault is True for optimal performance. When enabled, filters target data at the database level using source keys, significantly reducing data transfer and memory usage. Disabling retrieves all target data, which is rarely needed and impacts performance.

Debug Settings

These settings control logging, troubleshooting capabilities, and development features.

SettingDescriptionDefaultImpact of Changes & Default Reasoning
advanced.debug.cache_queriesWhether to store executed queries in memory for inspection.FalseDefault is False to minimize memory usage in production. When enabled, stores all executed queries in memory for inspection, useful for debugging but can significantly impact memory usage on large operations. Should only be enabled temporarily for troubleshooting.
advanced.debug.disable_helper_textWhether to suppress helper text output during load operations.FalseDefault is False to provide helpful information during operations. When enabled, suppresses informational messages about configurations and processing status, useful for cleaner logs but may hide important contextual information.
advanced.debug.log_queriesWhether to log executed SQL queries to a table.FalseDefault is False to avoid storage overhead. When enabled, provides complete audit trail of all database operations including query parameters and timing, essential for troubleshooting data changes but requires additional storage space and slightly impacts performance.
advanced.debug.log_batch_mappingWhether to log batch IDs, run IDs, and key columns.FalseDefault is False to minimize storage usage. When enabled, maintains detailed record of batch processing metadata and key relationships, valuable for debugging processing issues but requires additional storage space and write operations.
advanced.debug.log_skips_missedWhether to log records that should have been filtered in pre-processing.FalseDefault is False to minimize overhead. When enabled, tracks records that were unnecessarily processed, helping identify optimization opportunities but requiring additional storage and processing.
advanced.debug.sampling_enabledWhether to process only a sample of records.FalseDefault is False to ensure complete data processing. When enabled, automatically enables pushdown filtering and processes only a subset of records, significantly reducing processing time for testing but providing incomplete data changes.
advanced.debug.sample_conditionSQL WHERE clause condition for sampling.NoneDefault is None for random sampling. When set, provides fine-grained control over which records are included in the sample, useful for testing specific scenarios but requires careful condition construction.
advanced.debug.sample_sizeNumber of records to include in sample.NoneDefault is None to use all matching records. When set with sampling enabled, limits the number of processed records. If None but condition is set, processes all records matching the condition.
advanced.debug.sample_seedSeed value for random sampling.NoneDefault is None for random sampling. When set, ensures reproducible sampling results across runs, useful for consistent testing but may not represent true data distribution.

Column Handling Settings

These settings control how the library manages and processes columns in your data.

SettingDescriptionDefaultImpact of Changes & Default Reasoning
advanced.columns.exclude_columnsColumns to exclude from processing.[]Empty by default to process all columns. Adding columns prevents them from being processed, improving performance but potentially missing important data changes. Commonly used for computed or unnecessary columns.
advanced.columns.include_only_columnsOnly columns specified will be included in processing.[]Empty by default to process all columns. Adding columns restricts the total processed data to exclusively those provided. Commonly used for improving performance or guaranteeing early processing of priority data by targeting specific columns to update reducing the query overhead in both reads and writes.
advanced.columns.compare_case_insensitiveWhether to ignore case when comparing values.FalseDefault is False for exact matching and better performance. When enabled, provides more flexible matching by ignoring case differences but adds processing overhead for string comparisons.

Retry Settings

These settings control how the library handles operation retries and failures.

SettingDescriptionDefaultImpact of Changes & Default Reasoning
advanced.retry.enabledWhether to automatically (immediately) retry failed operations due to rate limit errors.FalseDefault is False to provide explicit error handling. When enabled, automatically retries rate-limited operations based on configured parameters and Retry-After headings (within the max_delay threshold), improving reliability of operations but potentially extending processing time. Commonly used in combination with Deferred Retry (see Understanding Rate Limit Handling) to provide most reliable processing possible.
advanced.retry.max_attemptsMaximum number of retry attempts.3Default of 3 balances reliability with timely failure. Higher values improve success rate for transient failures but can significantly extend processing time for permanent failures. Lower values fail faster but may miss recovery opportunities.
advanced.retry.max_delayMaximum time allowed to wait before retrying API call (in seconds).60Default of 60 seconds balances reliability with timely failure. Higher values can significantly extend processing when hitting consistent API rate limit failures. Lower values fail faster but may miss valid recovery opportunities.