istio(1.28.1): Critical Stability Fixes and Gateway API Enhancements for Ambient Mesh

📋 Recommended Actions

⚠️ Action Required
Immediate upgrade is highly recommended for all users to benefit from critical stability fixes, especially concerning multi-revision deployments and Gateway API status reporting. Review new InferencePool capabilities to enhance AI/ML workloads.

📝 Summary

Istio 1.28.1 delivers essential stability fixes and powerful Gateway API enhancements. This patch release addresses critical issues in multi-revision environments, preventing status conflicts for Gateway API resources like HTTPRoutes. It also resolves a persistent SDS (Secret Discovery Service) WARMING state bug, crucial for secure certificate management. Ambient Mesh users will find significant improvements in service overlap resolution, ensuring Kubernetes Services take precedence over ServiceEntries, and more accurate endpoint discovery within scoped networks. A long-standing bug preventing HTTP servers from routing on the same port as an HTTPS server (but with different binds) has been fixed, enhancing gateway flexibility. Furthermore, the Gateway API Inference Extension now supports multiple targetPorts, a key feature for modern AI/ML workloads. Multiple dependency bumps and cleanup items are also included. Upgrading is a straightforward step to ensure a more robust and predictable Istio deployment.

✨ Gateway API Inference Extension: Multi-TargetPort Support

The Istio Gateway API Inference Extension now boasts a powerful enhancement: native support for multiple targetPorts within an InferencePool. This is a game-changer for AI/ML workloads, where different models or endpoints within the same logical service might listen on distinct ports. You can now define a single InferencePool that intelligently routes to multiple backend ports, streamlining your traffic management for complex inference services.

Previously, InferencePools were limited to a single target port. With this update, the shadow service created by Istio to represent the InferencePool will now generate multiple internal ports (starting from 54321) that map to your specified targetPorts. This allows the Endpoint Picker (EPP) to effectively load-balance and select endpoints across these different backend ports, offering granular control. For example, to expose ports 8000, 8001, and 8002 for an inference workload, your InferencePool definition would look like this:

Source:

pilot/pkg/config/kube/gateway/inferencepool_collection.go (504-517)
pilot/pkg/config/kube/gateway/conversion.go (1115-1120)
releasenotes/notes/57638.yaml (1-7)

🌐 Ambient Mesh: Enhanced Service Resolution and Multi-Network Stability

Ambient Mesh deployments receive crucial stability and predictability boosts in this release. We’ve significantly improved service resolution logic to prevent conflicts between Kubernetes Services and Istio ServiceEntries sharing the same hostname. Now, Kubernetes Services reliably take precedence, ensuring your core services behave as expected. Additionally, a long-standing issue where ambient East-West gateways and waypoints were incorrectly configured with remote cluster endpoints that shouldn’t be accessible has been resolved, enhancing network segmentation and security. Connections across ambient multi-networks will also be more stable, with a fix for issues arising from custom trust domains.

The ambient index now includes sophisticated deduplication logic for ServiceEntry resources, giving precedence to existing Kubernetes Services. If multiple ServiceEntry resources conflict, the oldest one will take effect. The ServiceEndpointsByPort function for InferencePools now correctly returns all endpoints, regardless of the specific port, to support flexible load balancing by the EPP. Furthermore, the filterIstioEndpoint method ensures that for local-scoped ambient services, only endpoints within the same cluster are considered, preventing unintended cross-cluster traffic routing.

// DedupedServiceEntriesInfo ensures Kubernetes Services take precedence over ServiceEntries
DedupedServiceEntriesInfo := krt.NewCollection(
	serviceEntryByHostname.AsCollection(),
	func(ctx krt.HandlerContext, se krt.IndexObject[string, ServiceEntryInfo]) *model.ServiceInfo {
		// ... deduplication logic ...
		s := krt.FetchOne(ctx, ServicesInfo, krt.FilterIndex(serviceByHostname, se.Objects[0].Service.Hostname))
		if s != nil {
			// if we have a hostname conflict with a kubernetes service, we should eliminate all ServiceEntry ServiceInfos
			return nil
		}
		// ... resolve conflicts among ServiceEntries by creation time ...
		return oldest
	},
	// ... opts ...
)

Source:

pilot/pkg/model/push_context.go (2526-2544)
pilot/pkg/serviceregistry/kube/controller/ambient/services.go (65-103)
pilot/pkg/serviceregistry/kube/controller/ambient/ambientindex_test.go (369-400)
pilot/pkg/xds/endpoints/endpoint_builder.go (528-533)
releasenotes/notes/57782.yaml (1-9)
releasenotes/notes/58139.yaml (1-8)
releasenotes/notes/58141.yaml (1-8)
releasenotes/notes/58427.yaml (1-7)

🛡️ Multi-Revision Resilience: Preventing Status Conflicts & SDS `WARMING` Issues

Deploying multiple Istio revisions side-by-side just got a whole lot smoother. This release includes critical fixes that prevent status conflicts on Gateway API Route resources (like HTTPRoute) when multiple Istio control planes are active. This means your route statuses will remain stable and accurate, regardless of which revision is managing them. Furthermore, we’ve squashed a bug that caused Envoy’s Secret Discovery Service (SDS) to get stuck in a WARMING state, particularly when Kubernetes Secrets were referenced using inconsistent naming conventions, ensuring robust certificate provisioning.

The RegisterStatus function now incorporates a tagWatcher that explicitly checks tagWatcher.IsMine() before attempting to write status updates. This ensures that only the Istio revision that ‘owns’ a particular resource’s status will update it, preventing race conditions and conflicts in multi-revision setups. For the SDS WARMING state issue, the SecretResource.Key() method has been updated to include the full ResourceName, differentiating between kubernetes://secret-name and kubernetes://namespace/secret-name references, thus resolving cache key collisions.

// In pilot/pkg/status/collections.go
func RegisterStatus[I controllers.Object, IS any](
	s *StatusCollections,
	statusCol krt.StatusCollection[I, IS],
	getStatus func(I) IS,
	tagWatcher revisions.TagWatcher,
) {
	// ... existing logic ...
		if !tagWatcher.IsMine(metav1.ObjectMeta{Namespace: l.Obj.GetNamespace(), Name: l.Obj.GetName(), Labels: l.Obj.GetLabels()}) {
			log.Debugf("suppress change for %v %v because it does not belong to my revision", l.ResourceName(), l.Obj.GetResourceVersion())
			return
		}
	// ... continue with status update if it's mine ...
}

Source:

pilot/pkg/config/kube/gateway/controller.go (238-360)
pilot/pkg/config/kube/ingress/controller.go (119-174)
pilot/pkg/model/credentials/resource.go (61-61)
pilot/pkg/xds/sds_test.go (370-394)
pilot/pkg/status/collections.go (72-93)
releasenotes/notes/57734.yaml (1-10)
releasenotes/notes/58146.yaml (1-9)

🐛 Gateway Configuration: HTTPS/HTTP Merging Fix & TLS Access

We’ve resolved a subtle but impactful bug in how Istio merges multiple Gateway resources. Previously, if an HTTPS server was configured on a specific port first, it could prevent an HTTP server from correctly creating routes on the same port but utilizing a different bind address. This fix ensures that both HTTP and HTTPS servers can coexist and function as expected under these conditions, offering greater flexibility in multi-protocol gateway deployments. Additionally, a fix addresses a bug where the experimental XListenerSet resources were unable to access necessary TLS secrets, restoring their intended functionality.

The mergeGateways function now correctly distinguishes and merges HTTP and HTTPS servers even when they share a port but have distinct bind addresses. This is facilitated by a refined logic that correctly identifies and adds plaintext servers. The XListenerSet resource logic has also been updated to ensure the service-account-name annotation is derived correctly from the parent Gateway, allowing proper secret access for TLS configurations.

// In pilot/pkg/model/gateway.go
func mergeGateways(gateways []gatewayWithInstances, proxy *Proxy, ps *PushContext) *MergedGateway {
	// ... existing logic ...
	// Handling servers with same port but different binds
						if current.Bind != serverPort.Bind {
							// Different bind, create new server
							addPlainTextServer(s, serverPort, routeName, plainTextServers, serversByRouteName, mergedServers, &serverPorts)
						} else {
							// Same bind, merge with existing server
							ms := mergedServers[current]
							ms.Servers = append(ms.Servers, s)
							serversByRouteName[routeName] = append(serversByRouteName[routeName], s)
						}
	// ... rest of the logic ...
}

Source:

pilot/pkg/model/gateway.go (277-302)
pilot/pkg/model/gateway_test.go (367-463)
releasenotes/notes/gateway-merging-https-first-bug-fix.yaml (1-10)
pilot/pkg/config/kube/gateway/gateway_collection.go (166-170)
releasenotes/notes/xls-tls.yaml (1-6)

⚙️ Traffic Management: IPv6 Nftables & WorkloadEntry Fixes

This release includes targeted fixes for traffic management components, improving correctness and resource utilization. We’ve ensured that IPv6 nftables rules are now only programmed when IPv6 is explicitly enabled in ambient mode, preventing unnecessary rule generation in IPv4-only environments. Additionally, an issue with istio-init failing in TPROXY mode with empty traffic.sidecar.istio.io/includeInboundPorts annotation has been resolved, enhancing proxy initialization robustness. WorkloadEntry processing has also been refined to consistently use correct metadata, preventing invalid EDS caching issues.

The nftables configurator now conditionally calls AppendV6RuleIfSupported for IPv6-specific rules, ensuring they are only active in dual-stack or IPv6-only environments. This avoids spurious IPv6 rules in environments where IPv6 is not in use. WorkloadEntry conversion functions (convertWorkloadEntryToWorkloadInstance) have been updated to correctly retrieve namespace and name from the config.Meta object, preventing incorrect service account SPIFFE URI generation and ensuring accurate caching of EDS (Endpoint Discovery Service) endpoints.

// In cni/pkg/nftables/nftables.go
// IPv6 rule appended only if supported
cfg.ruleBuilder.AppendV6RuleIfSupported(IstioPreroutingChain, AmbientNatTable,
	"meta l4proto tcp",
	"ip6 saddr", cfg.cfg.HostProbeV6SNATAddress.String(), Counter,
	"accept",
)

// In pilot/pkg/serviceregistry/serviceentry/conversion.go
// Use meta.Namespace and meta.Name consistently
sa = spiffe.MustGenSpiffeURI(s.meshWatcher.Mesh(), meta.Namespace, we.ServiceAccount)

workloadInstance := &model.WorkloadInstance{
	// ... fields ...
	Namespace:           meta.Namespace,
	Name:                meta.Name,
	// ... other fields ...
}

Source:

cni/pkg/nftables/nftables.go (224-379)
cni/pkg/nftables/testdata/default.golden (7-19)
releasenotes/notes/58249.yaml (1-8)
releasenotes/notes/58135.yaml (1-8)
pilot/pkg/serviceregistry/serviceentry/conversion.go (462-495)

Minor Updates & Housekeeping

This release also includes various dependency updates, such as golang.org/x/net to v0.44.0, github.com/prometheus/prometheus to v0.306.0, and sigs.k8s.io/gateway-api-inference-extension to a newer commit. Image references in samples and addons have been updated to use fully qualified docker.io paths, and the proxyOverrides annotation is now stripped from Pod objects for improved resource efficiency.

📋 Recommended Actions#

📝 Summary#

✨ Gateway API Inference Extension: Multi-TargetPort Support#

🌐 Ambient Mesh: Enhanced Service Resolution and Multi-Network Stability#

🛡️ Multi-Revision Resilience: Preventing Status Conflicts & SDS WARMING Issues#

🐛 Gateway Configuration: HTTPS/HTTP Merging Fix & TLS Access#

⚙️ Traffic Management: IPv6 Nftables & WorkloadEntry Fixes#

Minor Updates & Housekeeping#

📋 Recommended Actions

📝 Summary

✨ Gateway API Inference Extension: Multi-TargetPort Support

🌐 Ambient Mesh: Enhanced Service Resolution and Multi-Network Stability

🛡️ Multi-Revision Resilience: Preventing Status Conflicts & SDS `WARMING` Issues

🐛 Gateway Configuration: HTTPS/HTTP Merging Fix & TLS Access

⚙️ Traffic Management: IPv6 Nftables & WorkloadEntry Fixes

Minor Updates & Housekeeping