Designing Accountable Integration

Reviewing various Dynamics 365 integration mechanisms for over a decade now (yes, Dynamics CRM back then) taught me some valuable lessons:

  • Integration mechanisms break down silently the second you look away. Usually metadata or data changed without notification, endpoint moved or some port closed due to security considerations.
  • Sooner or later after an integration mechanism break, someone will claim the problem is on your side on the integration.
  • An integration mechanism performance may degrade over time. Sometimes the volume of data sets you started with on day has growing rapidly so queries/business logic operations now consume more time and resources. Other times, an external endpoint you are using is now responding slowly.
  • Application user used for integration on Dynamics 365 side almost always has more privileges than it actually requires, often granted the System Administrator role.

In this post I would like to share some design tips that can help you build an accountable integration. For me, accountable integration implements few key concepts I’ll explain next.
While these design concepts may sound trivial, it is surprising how little they are actually implemented.

  1. Failing gracefully
    When integration mechanism fails, system administrator must be able to get the full failure details, but end users and the integrating party should not be exposed to failure details, as it usually means nothing to them and they can’t do anything about it anyway. For them, it is enough to know a failure occurred with the minimum required details. Another concern here is security: unhandled exception details can expose sensitive information to hackers and such.

    That means that no exception should go unhandled. Hermetically seal your code with try/catch blocks and if you let an exception float, make sure it is not exposing any redundant details.
  2. Diagnostic enabled
    The first element I look for in failure scenario is diagnostics as it should direct to the problem cause and sometimes even allow solving it without recompiling code.
    In many implementations, diagnostic elements are missing or turned off. In other, it is missing critical information.
    Any diagnostics element should at least be able to confirm that an integration mechanism actually executed, even if it did not operate on any data or performed any action. Additionally, did it succeed or fail and if so, what is the failure message? What is the time stamp for each execution and what were the incoming and out going parameters?

    If you are using Cusotm Workflow Activities or Plug-ins, that means using the Trace service. Azure elements like Function and Logic Apps also allow tracing.
    Make your own convention but make sure you are tracing the following key events:

    1. Execution start/end along with time stamps. This will give you an indication of any execution, event if it did nothing.
    2. Incoming and outgoing parameters. This will allow you to diagnose problems faster as unexpected parameter values are the culprit most of the time.
    3. Any outgoing calls to external APIs. Trace before and after any such call as well as the outgoing message. This will allow you to measure the time such call actually took, the outgoing request and response.
    4. Any exception
  3. Proactive
    Many implementations has logging/tracing enabled but no one even know how to access or analyze it. Sometimes a failure is detected days or weeks after it actually appeared.
    When failure occurs, the system administrator should know about it ASAP. I don’t expect system administrators to spend time scanning trace and logs searching for anomalies.
    An integration mechanism should be able to send an immediate notification of failure in the form of an email, SMS or push.

    If you are using Custom Workflow Activities, you can leverage the hosting process built-in emailing capabilities. Set one of the outgoing parameters to indicate success/failure, so you can send an email to the system administrator every time a failures occurs.
    Azure elements can leverage Application Insights Alerts capability to send SMS, email or push notification on failure.
  4. Least Exposed
    What’s wrong with granting a designated application user (used only for integration scenarios) with the System Admin Role? Everything.
    If this omnipotent user credentials leak (sometimes leaked), it exposes all of your data and allows an attacker to do maximum damage.

    When setting an application user to perform integration operations within Dynamics 365, make sure it has a least possible privileges to allow the required operations. Best to create a custom Security Role.

In the past, I posted about Custom Code Tracing & Exception Logging in Dynamics. This actual solution embeds most of the accountable integration concepts discussed here and may be a good starting point.