Skip to content

Conversation

@edwardneal
Copy link
Contributor

@edwardneal edwardneal commented Jan 25, 2026

Description

For reference, the general OpenTelemetry semantic conventions are here and the SQL Server specific ones are here. I'll refer to them in a few places.

I'm preparing a PR which introduces Activity support (and thus, OpenTelemetry spans.) That PR will need to change the public API surface and will likely be quite large, so I'm introducing this one so that we can review some of the underlying plumbing beforehand.

This can be reviewed commit by commit. Core changes are below.

Cleanup of diagnostic scope

This encompasses a minor cleanup of the DiagnosticScope type to enable nullability annotations and clean up a few comments. It also addresses a to-do item in SqlCommand.ExecuteReader: it ports the method to use DiagnosticScope in the same way that ExecuteXmlReader and other methods do.

Field plumbing

Some of this plumbing is simple enough: the semantic conventions call for the batch size to be made available, so a simple internal property has been added. The instance name is more complex though, and it introduces some caveats and one behaviour change.

At the moment, the instance name isn't always written out in the PRELOGIN message. If we use the native SNI, it's passed up and written as a byte array. If we use the managed SNI, it's not written at all. Both are permissible - the TDS spec indicates that this is optional:

If available, the client SHOULD send the server the name of the instance to which it is connecting as a NULL-terminated multi-byte character set (MBCS) string in the INSTOPT option. If the string is empty or is case-insensitively equal, by using the server's locale for comparison to either the server's instance name or "MSSQLServer", the server SHOULD<38> return an INSTOPT containing a byte with the value 0 to indicate that the client's INSTOPT matches the server's instance. Otherwise, the server SHOULD return an INSTOPT containing a byte with the value of 1. The client SHOULD use the INSTOPT value from the server's PRELOGIN response for verification purposes and SHOULD terminate the connection if the INSTOPT option has the value 1.

We need this information to be accessible as a string at the point of span creation. Since the field doesn't need to be written, I've opted to make the native SNI behave in the same way as the managed SNI: it won't write the expected instance name.

With that behaviour change made, I've simply opted to decode the native SNI's byte array into a string and pass it upwards, where it's persisted in SqlConnectionInternal.InstanceName.

Telemetry constants

The TelemetryConstants.cs class provides the friendly names and documentation for span attributes and values documented in the OTel database semantic conventions. Most of these aren't controversial, but I have a few thoughts on the convention for db.namespace. I've placed these below, and in a comment in the file.

The value for db.namespace is usually {database name}. If connecting to a named instance, it is {instance name}|{database name}. This is fine for simple cases, but rapidly becomes unwieldy in more complex environments. A few examples of such environments are:

  1. Connecting to an AlwaysOn Availability Group (formed of two named SQL Server instances) using its listener. Although we have non-default instances, the client doesn't see these instance names and thus won't include it in the span.
  2. Database mirroring between two named SQL Server instances. If a client uses the failover partner details supplied by the server, these details will contain a port number, not an instance name. Spans will contain an instance name prior to failover, no instance name after failover, and no instance name after a fail back (even though we're back on the original server.)
  3. On Windows, a client-side SQL alias. A connection string can specify an alias name (which wouldn't normally contain an instance name) and this alias might resolve to a named instance. Spans will contain this post-resolution instance name.
  4. Any configuration to a named instance which conditionally performs routing - such as read-only routing, or anything along these lines in Azure. The namespace will contain an instance name if no routing is performed (perhaps if the application intent is ReadWrite) but won't if the connection is routed (e.g. an application intent of ReadOnly.)

There are situations where a named SQL Server instance is involved but the instance name can't be reliably determined, and a few particularly rough edge cases where the same connection string might only yield spans with an instance name sometimes.

We'll populate db.namespace as specified in the convention, but the only reliable piece of information in there is the database name. To make it simpler to extract this information, I'm proposing to add a sqlclient.db.database_name attribute. I'm also proposing that we should document db.namespace as a logical namespace and say that if a client wants to uniquely identify a specific connection then they need to look at the combination of server.address and server.port to determine the physical connection details (after routing/failover.)

Issues

Contributes to #2210.

Testing

Automated tests continue to pass. Could someone kick CI please?

With specific regard to the change in PRELOGIN packet contents, I've seen that footnote 38 of the TDS specification suggests that SQL Server 2000, 2005, 2008 and 2008 R2 validate the client-specified instance name. Although none of these SQL Server versions are in support, I've manually tested the new behaviour against a named SQL Server 2008 R2 instance and confirmed that we can still connect: although these older versions validate the client-specified instance name, SqlClient doesn't provide an instance name for them to validate against.

@edwardneal edwardneal requested a review from a team as a code owner January 25, 2026 00:44
@roji
Copy link
Member

roji commented Jan 25, 2026

To try to resolve this I've added an AppContext switch, SuppressNativeActivityTelemetry. This isn't set by default, so SqlClient will generate spans (which the client libraries may well duplicate.)

Note that OpenTelemetry isn't on by default - the consuming application has to opt into it by adding the library's source when building the TraceProvider (docs). Concretely, users of the 3rd-party instrumentation library would be calling .AddSqlClientInstrumentation(), whereas presumably the SqlClient native instrumentation would require users to call something like .AddSqlClient().

So I think SqlClient shouldn't need any sort of AppContext switch to turn tracing on or off.

@edwardneal
Copy link
Contributor Author

Thanks roji. The existing OTel implementation names its ActivitySource OpenTelemetry.Instrumentation.SqlClient (from typeof(SqlTelemetryHelper).Assembly.GetName().Name) so I agree - the AppContext switch is unnecessary, and we can simply proceed on the basis of exposing delegates for span enrichment/naming.

This isn't necessary. The existing OpenTelemetry ActivitySource uses the assembly name of SqlTelemetryHelper, but this type isn't in the SqlClient assembly. We can thus use an ActivitySource name of Microsoft.Data.SqlClient without generating any conflicts.
@benrr101
Copy link
Contributor

I think we're getting into the realm of things that'll require a proper design review. So it might take us a bit to get to this (and some of your recent PRs). Still, we're really grateful for your contributions, and please don't take our delays as ingratitude. We just want to make sure these bigger changes fit the foals we have in mind and are well constructed.

Still, I'll kick off a build for this so we can see what happens.

@benrr101
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@edwardneal
Copy link
Contributor Author

Thanks - and no problem, I've assumed that most of the bandwidth for preview4 is taken up by the Azure split and the changes to the authentication provider, and that preview5 will probably be limited to bugfixes from the Azure split and build/test cleanup.

The most important PR is #3719 (which is needed to address a bulk copy bug in an earlier preview) and I'd appreciate it if we could merge that prior to the 7.0 release. Everything else is miscellaneous cleanup or feature improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants