Advanced Configuration Guide
Connection Management Settings
These settings control how the library manages database connections across service principals, which is crucial for performance and reliability.
Setting | Description | Default | Impact of Changes & Default Reasoning |
---|---|---|---|
advanced.connection.max_connections_per_sp | Maximum concurrent connections per service principal. Dynamics 365 enforces a hard limit of 52 connections. | 50 | We default to 50 connections to leave headroom for system operations while maximizing throughput. Higher values enable more concurrent operations but risk hitting CRM's hard limit of 52, while lower values reduce resource usage and throughput but provide more safety margin. |
Performance Settings
These settings control data processing and resource utilization. They significantly impact throughput and reliability.
Setting | Description | Default | Impact of Changes & Default Reasoning |
---|---|---|---|
advanced.performance.num_workers | Maximum size of thread pool for parallel operations. | 32 | Default of 32 provides a good baseline for parallel processing without overwhelming system resources. Should be increased based on service principal count (typically num_sps * 50) and available system resources. Higher values increase throughput but consume more system resources; lower values provide more predictable performance with less resource usage (and ultimately more time). |
advanced.performance.batch_size | Number of records per batch. | 50,000 | Default of 50,000 balances multiple factors: memory usage (larger batches need more memory), processing efficiency (bigger batches mean less overhead), API pagination limit for CRM (i.e. 50,000 records), pushdown filter limits (after a certain point it takes less time to pull the entire target set than filter with hundreds of thousands of predicates), and error recovery (smaller batches allow finer-grained recovery). Larger sizes improve throughput but require more memory and have longer recovery times on failure. Smaller sizes enable faster error recovery but introduce more processing overhead. |
advanced.performance.id_based_strategy_enabled | Whether to use ID-based upsert strategy. | False | Default is False to ensure compatibility with all data scenarios. When enabled with available IDs, provides significant performance improvements by bypassing business key matching. Should be enabled when record IDs are consistently available in source data. ID here means the CRM generated surrogate key. NOTE: Using composite business keys (i.e. multiple columns needed to identify a row) should be avoided whenever possible to improve performance (this is particularly important when using pushdown filtering) |
advanced.performance.deferred_retry_enabled | Whether to enable deferred retry mechanism for failed operations. | False | Default is False to provide immediate feedback. When enabled, failed operations (specifically rate-limited ones) are queued for later retry rather than failing immediately. Improves reliability for rate-limited scenarios but extends overall processing time. For more information see Understanding Rate Limit Handling. |
advanced.performance.max_deferred_retries | Maximum number of retry attempts for deferred operations. | 3 | Default of 3 provides reasonable retry attempts without excessive delays. Higher values improve success rate for transient failures but can significantly extend processing time. Lower values fail faster but may miss recovery opportunities. |
advanced.performance.pushdown_filter_enabled | Whether to use key-based filtering when retrieving target data. | True | Default is True for optimal performance. When enabled, filters target data at the database level using source keys, significantly reducing data transfer and memory usage. Disabling retrieves all target data, which is rarely needed and impacts performance. |
Debug Settings
These settings control logging, troubleshooting capabilities, and development features.
Setting | Description | Default | Impact of Changes & Default Reasoning |
---|---|---|---|
advanced.debug.cache_queries | Whether to store executed queries in memory for inspection. | False | Default is False to minimize memory usage in production. When enabled, stores all executed queries in memory for inspection, useful for debugging but can significantly impact memory usage on large operations. Should only be enabled temporarily for troubleshooting. |
advanced.debug.disable_helper_text | Whether to suppress helper text output during load operations. | False | Default is False to provide helpful information during operations. When enabled, suppresses informational messages about configurations and processing status, useful for cleaner logs but may hide important contextual information. |
advanced.debug.log_queries | Whether to log executed SQL queries to a table. | False | Default is False to avoid storage overhead. When enabled, provides complete audit trail of all database operations including query parameters and timing, essential for troubleshooting data changes but requires additional storage space and slightly impacts performance. |
advanced.debug.log_batch_mapping | Whether to log batch IDs, run IDs, and key columns. | False | Default is False to minimize storage usage. When enabled, maintains detailed record of batch processing metadata and key relationships, valuable for debugging processing issues but requires additional storage space and write operations. |
advanced.debug.log_skips_missed | Whether to log records that should have been filtered in pre-processing. | False | Default is False to minimize overhead. When enabled, tracks records that were unnecessarily processed, helping identify optimization opportunities but requiring additional storage and processing. |
advanced.debug.sampling_enabled | Whether to process only a sample of records. | False | Default is False to ensure complete data processing. When enabled, automatically enables pushdown filtering and processes only a subset of records, significantly reducing processing time for testing but providing incomplete data changes. |
advanced.debug.sample_condition | SQL WHERE clause condition for sampling. | None | Default is None for random sampling. When set, provides fine-grained control over which records are included in the sample, useful for testing specific scenarios but requires careful condition construction. |
advanced.debug.sample_size | Number of records to include in sample. | None | Default is None to use all matching records. When set with sampling enabled, limits the number of processed records. If None but condition is set, processes all records matching the condition. |
advanced.debug.sample_seed | Seed value for random sampling. | None | Default is None for random sampling. When set, ensures reproducible sampling results across runs, useful for consistent testing but may not represent true data distribution. |
Column Handling Settings
These settings control how the library manages and processes columns in your data.
Setting | Description | Default | Impact of Changes & Default Reasoning |
---|---|---|---|
advanced.columns.exclude_columns | Columns to exclude from processing. | [] | Empty by default to process all columns. Adding columns prevents them from being processed, improving performance but potentially missing important data changes. Commonly used for computed or unnecessary columns. |
advanced.columns.include_only_columns | Only columns specified will be included in processing. | [] | Empty by default to process all columns. Adding columns restricts the total processed data to exclusively those provided. Commonly used for improving performance or guaranteeing early processing of priority data by targeting specific columns to update reducing the query overhead in both reads and writes. |
advanced.columns.compare_case_insensitive | Whether to ignore case when comparing values. | False | Default is False for exact matching and better performance. When enabled, provides more flexible matching by ignoring case differences but adds processing overhead for string comparisons. |
Retry Settings
These settings control how the library handles operation retries and failures.
Setting | Description | Default | Impact of Changes & Default Reasoning |
---|---|---|---|
advanced.retry.enabled | Whether to automatically (immediately) retry failed operations due to rate limit errors. | False | Default is False to provide explicit error handling. When enabled, automatically retries rate-limited operations based on configured parameters and Retry-After headings (within the max_delay threshold), improving reliability of operations but potentially extending processing time. Commonly used in combination with Deferred Retry (see Understanding Rate Limit Handling) to provide most reliable processing possible. |
advanced.retry.max_attempts | Maximum number of retry attempts. | 3 | Default of 3 balances reliability with timely failure. Higher values improve success rate for transient failures but can significantly extend processing time for permanent failures. Lower values fail faster but may miss recovery opportunities. |
advanced.retry.max_delay | Maximum time allowed to wait before retrying API call (in seconds). | 60 | Default of 60 seconds balances reliability with timely failure. Higher values can significantly extend processing when hitting consistent API rate limit failures. Lower values fail faster but may miss valid recovery opportunities. |
Updated 20 days ago