Dynamics V2 Package Documentation

A high-performance, fault-tolerant Python package for synchronizing data between Apache Spark and Dynamics 365 CRM. This package provides robust batch processing, intelligent retry mechanisms, comprehensive monitoring capabilities, and advanced tooling for efficient troubleshooting and performance tuning.

Table of Contents

  1. Getting Started
  2. Extended Configuration
  3. Troubleshooting

Key Vocabulary & Concepts

Operation Modes

  • Insert: Add new records
  • Upsert: Update existing records and insert new ones
  • Append: Add large numbers of records in bulk in a fault tolerant manner with micro-batching fallback (future release)
  • Update: Update existing records only -- useful for isolating specific columns to update with records known to exist on target (future release)
  • Delete: Soft delete records without additional column updates (future release)

Processing Pipeline

  1. Loading

    • Data retrieval from source
    • Key extraction
    • Target data retrieval (pushdown filtering optional)
  2. Pre-processing

    • Data validation
    • Record classification
    • Key verification
  3. Processing

    • Batch operation creation
    • Parallel execution
    • Error handling
  4. Post-processing

    • Metrics collection
    • Log writing
    • Resource cleanup

Features

  • High Performance

    • Multi-service principal support
    • Connection pooling
    • Configurable batch processing
    • Parallel execution with thread pooling
  • Smart Data Handling

    • ID-based or business key-based operations
    • Customizable data transformation rules
    • Efficient change detection
    • Column-level sanitization
  • Robust Error Handling

    • Multi-layer retry mechanism
    • Rate limit handling
    • Operation deferral
    • Comprehensive error logging
  • Monitoring & Debugging

    • Detailed batch metrics
    • Query logging
    • Error tracking
    • Performance profiling

Core Functionality

  • Upsert Operations: Support for insert, update, and upsert operations with key-based matching
  • Connection Pooling: Advanced connection management with support for multiple service principals
  • State Management: Support for managing record state transitions (Active/Inactive)
  • Owner Management: Handling of record ownership changes and validation
  • Batch Processing: Efficient handling of large datasets through configurable batching

Advanced Features

  • Rate Limit Handling: Configurable rate limit detection and retry mechanisms
  • Data Transformation: Configurable data normalization and sanitization rules
  • Deferred Operations: Automatic retry of rate-limited operations with configurable backoff
  • Query Logging: Detailed logging of executed queries for debugging
  • Performance Profiling: Built-in profiling capabilities for performance optimization