Skip to content

Retrieve TLS config from apiserver for OpenShift#1643

Draft
dkwon17 wants to merge 1 commit into
devfile:mainfrom
dkwon17:tls-update
Draft

Retrieve TLS config from apiserver for OpenShift#1643
dkwon17 wants to merge 1 commit into
devfile:mainfrom
dkwon17:tls-update

Conversation

@dkwon17

@dkwon17 dkwon17 commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Assisted-by: Claude Sonnet 4.6 noreply@anthropic.com

What does this PR do?

Retrieve TLS config from api server for Openshift (no changes for Kubernetes).

Example PR: https://github.com/openshift/cluster-machine-approver/pull/286/changes#diff-2873f79a86c0d8b3335cd7731b0ecf7dd4301eb19a82ef7a1cba7589b5252261

What issues does this PR fix or reference?

Is it tested? How?

  1. Install the DWO with changes from this PR by using this catalog source:
 oc apply -f - <<EOF
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: devworkspace-operator-catalog
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/dkwon17/devworkspace-operator-index:tls-update
  publisher: Red Hat
  displayName: DevWorkspace Operator Catalog
  updateStrategy:
    registryPoll:
      interval: 5m
EOF

When testing, I installed the operator in the openshift-operators namespace.

  1. Run the tls-scanner check to verify tls profile consistency. To do this, clone the scanner repo:
git clone https://github.com/shiftstack/tls-scanner && cd tls-scanner
  1. Run the scanner:
SCANNER_IMAGE=quay.io/dkwon17/tls-scanner:latest \
NAMESPACE=openshift-operators \
NAMESPACE_FILTER=openshift-operators \
./deploy.sh deploy
  1. View results:
 jq -r '.ip_results[] | .ip as $ip | .port_results[] | [$ip, (.port|tostring), (.tls_versions//[]|join("+")), .status] | @tsv' ./artifacts/results.json | column -t

The expected output is:

  10.100.0.53  6789                   PROBE_PORT
  10.100.0.53  8443  TLSv1.2+TLSv1.3  OK
  10.100.0.53  9443  TLSv1.2+TLSv1.3  OK
  10.100.0.54  6789                   PROBE_PORT
  10.100.0.54  8443  TLSv1.2+TLSv1.3  OK
  10.100.0.54  9443  TLSv1.2+TLSv1.3  OK
  10.100.0.55  6789                   PROBE_PORT
  10.100.0.55  8443  TLSv1.2+TLSv1.3  OK
  10.100.0.55  9443  TLSv1.2+TLSv1.3  OK

The pod IPs (first col) will likely be different on your cluster, but the important thing to verify is that we see PROBE_PORT column is OK for the three DWO pods for ports 8443 and 9443.

PR Checklist

  • E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
    • v8-devworkspace-operator-e2e: DevWorkspace e2e test
    • v8-che-happy-path: Happy path for verification integration with Che

@openshift-ci

openshift-ci Bot commented Jun 10, 2026

Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci

openshift-ci Bot commented Jun 10, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dkwon17

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c10b5581-ad33-4ffe-b3fe-cb72925406b5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tolusha

tolusha commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Hi! I'm che-ai-assistant — I help with your pull requests.

Available commands:

  • /che-ai-assistant generate-che-doc — Generate a documentation PR based on this PR's changes
  • /che-ai-assistant ok-pr-review — Run a comprehensive PR review (summary, code review, deep review, impact analysis)
  • /che-ai-assistant help — Show this help message

@dkwon17

dkwon17 commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

/che-ai-assistant ok-pr-review

Review is complete. Please check the review comments below.

@tolusha tolusha left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@## Comprehensive PR Review Summary

I've completed a full review including summary, standard code review, deep design analysis, and system-level impact assessment. Here are the key findings:

🔴 Critical Concerns

Security & Reliability:

  1. Silent TLS degradation - When TLS profile fetch fails on OpenShift, the operator falls back to Go defaults without any metric/health check visibility
  2. No startup timeout - Bootstrap TLS fetch uses context.Background() and could hang indefinitely if API server is slow
  3. No test coverage - 128 lines of security-critical TLS code with zero tests

Code Quality:
4. Misleading error signature - Function claims it can fail but always returns nil error, creating dead error-handling code
5. No CLI flag validation - Invalid --tls-min-version or --tls-cipher-suites values are silently mishandled

⚠️ Design Improvements Needed

  1. Non-idiomatic nested error handling - 4-level deep nesting instead of Go's early-return pattern
  2. RBAC inconsistency - Controller RBAC doesn't restrict get by resourceNames like webhook RBAC does

✅ What's Good

  • Clean separation of TLS setup into dedicated package
  • Uses canonical OpenShift library instead of reimplementing
  • Proper context cancellation chain for restart mechanism
  • Follows established pattern from cluster-machine-approver

See inline comments below for specific recommendations.


"github.com/devfile/devworkspace-operator/pkg/infrastructure"
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing timeout on bootstrap TLS fetch could cause startup hang

The BuildServerTLSOptions function uses context.Background() with no timeout when fetching the cluster TLS profile. If the API server is slow or unavailable at operator startup, the bootstrap client calls will block indefinitely, preventing the operator from starting.

Consider adding a timeout:

Suggested change
func BuildServerTLSOptions(ctx context.Context, cfg *rest.Config, scheme *k8sruntime.Scheme, tlsMinVersion, tlsCipherSuites string, log logr.Logger) (ServerTLS, error) {
var result ServerTLS
result.CLITLSOverride = tlsMinVersion != "" || tlsCipherSuites != ""
if infrastructure.IsOpenShift() {
// Use a timeout context for bootstrap operations to prevent startup hang
fetchCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
defer cancel()
bootstrapClient, err := client.New(cfg, client.Options{Scheme: scheme})

// OpenShift APIServer. If false, the SecurityProfileWatcher is not registered.
openShiftTLSFetched bool
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent TLS degradation with no observability

When TLS profile fetch fails on OpenShift, the operator silently falls back to Go's default TLS settings. This is a security concern:

  • No metric or health check reflects the degraded TLS state
  • The SecurityProfileWatcher is not registered, so future fixes won't auto-apply
  • The operator could serve webhooks with TLS 1.2 on a cluster that mandates TLS 1.3

Consider either:

  1. Making this a hard failure (refuse to start if TLS profile can't be fetched on OpenShift)
  2. Exposing a Prometheus metric like devworkspace_tls_profile_source{source="cluster|cli|default"} and a health check condition

This is critical infrastructure - silent degradation should be visible to cluster admins.

}
result.TLSOpts = []func(*tls.Config){tlsConfigFunc}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function signature claims it can fail but never does

BuildServerTLSOptions has signature (ServerTLS, error) but always returns nil error. All error paths log and continue with fallback behavior. This creates misleading dead code in callers at main.go:129-132 and webhook/main.go:105-108.

Either:

  • Return real errors and let callers decide whether to fail or fallback
  • Change signature to (ServerTLS) and remove the dead error-handling code

Comment thread main.go
func main() {
var metricsAddr string
var enableLeaderElection bool
var tlsMinVersion string

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No validation of CLI flag values

The new --tls-min-version and --tls-cipher-suites flags accept any string value without validation at startup. Invalid values are silently handled by ostls.NewTLSConfigFromProfile:

  • Invalid tlsMinVersion likely results in Go's default minimum version
  • Invalid cipher suite names are logged in unsupportedCiphers but the cipher is silently dropped

Consider validating at startup - check that tlsMinVersion is one of: VersionTLS10, VersionTLS11, VersionTLS12, VersionTLS13, or empty. Silent configuration errors make troubleshooting hard.

@@ -0,0 +1,128 @@
//

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No test coverage for new security-critical code

This file adds 128 lines of TLS configuration logic with multiple code paths (OpenShift vs non-OpenShift, fetch success vs failure, CLI override vs cluster profile, watcher registration guards) but includes no tests.

This is security infrastructure code (🔴 Critical level). Consider adding unit tests for:

  • Non-OpenShift returns empty ServerTLS
  • OpenShift with successful fetch populates TLSOpts from cluster profile
  • CLI flags override cluster profile
  • Failed bootstrap client/fetch falls back gracefully
  • RegisterSecurityProfileWatcher guards (non-OpenShift, CLI override, failed fetch)

The function takes explicit dependencies (cfg, scheme, log) making it straightforward to test with fake clients.

// +kubebuilder:rbac:groups=oauth.openshift.io,resources=oauthclients,verbs=get;list;watch;create;update;patch;delete;deletecollection
// +kubebuilder:rbac:groups=monitoring.coreos.com,resources=servicemonitors,verbs=get;create
// +kubebuilder:rbac:groups=config.openshift.io,resources=proxies,verbs=get,resourceNames=cluster
// +kubebuilder:rbac:groups=config.openshift.io,resources=apiservers,verbs=get;list;watch

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RBAC inconsistency between controller and webhook

The kubebuilder RBAC annotation grants get;list;watch on apiservers without restricting get by resourceNames. Compare with the webhook's RBAC at pkg/webhook/cluster_roles.go:138-155, which correctly splits:

  • list/watch (unrestricted - these verbs can't use resourceNames)
  • get (restricted to resourceNames: cluster)

Consider adding resourceNames=cluster to the controller's annotation for the get verb, or splitting into two annotations to match the webhook pattern for consistency and least privilege.

Comment on lines +40 to +66
InitialTLSAdherencePolicy configv1.TLSAdherencePolicy
// CLITLSOverride is true when tls-min-version or tls-cipher-suites flags were set.
CLITLSOverride bool
// openShiftTLSFetched is true when the TLS profile was successfully retrieved from the
// OpenShift APIServer. If false, the SecurityProfileWatcher is not registered.
openShiftTLSFetched bool
}

// BuildServerTLSOptions loads TLS settings for secure serving: on OpenShift, from the cluster
// APIServer resource unless CLI flags override; with CLI flags, those always win.
// If the APIServer TLS config is not accessible on OpenShift, a warning is logged and the
// function falls back to the original behaviour (no custom TLS options applied).
func BuildServerTLSOptions(ctx context.Context, cfg *rest.Config, scheme *k8sruntime.Scheme, tlsMinVersion, tlsCipherSuites string, log logr.Logger) (ServerTLS, error) {
var result ServerTLS
result.CLITLSOverride = tlsMinVersion != "" || tlsCipherSuites != ""

if infrastructure.IsOpenShift() {
bootstrapClient, err := client.New(cfg, client.Options{Scheme: scheme})
if err != nil {
log.Error(err, "Failed to create bootstrap client for TLS profile fetch; falling back to default TLS configuration")
} else {
profileSpec, err := ostls.FetchAPIServerTLSProfile(ctx, bootstrapClient)
if err != nil {
log.Error(err, "Failed to fetch TLS profile from APIServer; falling back to default TLS configuration")
} else {
adherencePolicy, err := ostls.FetchAPIServerTLSAdherencePolicy(ctx, bootstrapClient)
if err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deeply nested error handling is not idiomatic Go

The 4-level nested if err { log } else { ... } structure makes the success path hard to follow. Idiomatic Go uses early returns for cleaner code:

Suggested change
InitialTLSAdherencePolicy configv1.TLSAdherencePolicy
// CLITLSOverride is true when tls-min-version or tls-cipher-suites flags were set.
CLITLSOverride bool
// openShiftTLSFetched is true when the TLS profile was successfully retrieved from the
// OpenShift APIServer. If false, the SecurityProfileWatcher is not registered.
openShiftTLSFetched bool
}
// BuildServerTLSOptions loads TLS settings for secure serving: on OpenShift, from the cluster
// APIServer resource unless CLI flags override; with CLI flags, those always win.
// If the APIServer TLS config is not accessible on OpenShift, a warning is logged and the
// function falls back to the original behaviour (no custom TLS options applied).
func BuildServerTLSOptions(ctx context.Context, cfg *rest.Config, scheme *k8sruntime.Scheme, tlsMinVersion, tlsCipherSuites string, log logr.Logger) (ServerTLS, error) {
var result ServerTLS
result.CLITLSOverride = tlsMinVersion != "" || tlsCipherSuites != ""
if infrastructure.IsOpenShift() {
bootstrapClient, err := client.New(cfg, client.Options{Scheme: scheme})
if err != nil {
log.Error(err, "Failed to create bootstrap client for TLS profile fetch; falling back to default TLS configuration")
} else {
profileSpec, err := ostls.FetchAPIServerTLSProfile(ctx, bootstrapClient)
if err != nil {
log.Error(err, "Failed to fetch TLS profile from APIServer; falling back to default TLS configuration")
} else {
adherencePolicy, err := ostls.FetchAPIServerTLSAdherencePolicy(ctx, bootstrapClient)
if err != nil {
if infrastructure.IsOpenShift() {
bootstrapClient, err := client.New(cfg, client.Options{Scheme: scheme})
if err != nil {
log.Error(err, "Failed to create bootstrap client for TLS profile fetch; falling back to default TLS configuration")
return result, nil
}
profileSpec, err := ostls.FetchAPIServerTLSProfile(ctx, bootstrapClient)
if err != nil {
log.Error(err, "Failed to fetch TLS profile from APIServer; falling back to default TLS configuration")
return result, nil
}
adherencePolicy, err := ostls.FetchAPIServerTLSAdherencePolicy(ctx, bootstrapClient)
if err != nil {
log.Error(err, "Failed to fetch TLS adherence policy from APIServer; falling back to default TLS configuration")
return result, nil
}
result.InitialTLSProfileSpec = profileSpec
result.InitialTLSAdherencePolicy = adherencePolicy
result.openShiftTLSFetched = true
log.Info("Fetched TLS profile from APIServer",
"minTLSVersion", profileSpec.MinTLSVersion, "ciphers", profileSpec.Ciphers)
if !result.CLITLSOverride {
tlsConfigFunc, unsupportedCiphers := ostls.NewTLSConfigFromProfile(profileSpec)
if len(unsupportedCiphers) > 0 {
log.Info("Ignoring unsupported cipher suites from cluster TLS profile",
"ciphers", unsupportedCiphers)
}
result.TLSOpts = []func(*tls.Config){tlsConfigFunc}
}
}

InitialTLSProfileSpec configv1.TLSProfileSpec
InitialTLSAdherencePolicy configv1.TLSAdherencePolicy
// CLITLSOverride is true when tls-min-version or tls-cipher-suites flags were set.
CLITLSOverride bool

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log messages could be more actionable

When falling back to default TLS, the log says "falling back to default TLS configuration" but doesn't explain what the default is. Consider clarifying:

Suggested change
CLITLSOverride bool
log.Error(err, "Failed to create bootstrap client for TLS profile fetch; using Go's default TLS settings (TLS 1.2+)")

Same recommendation for the other fallback messages at lines 50-51 and 53-54.

@tolusha

tolusha commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Additional Review Notes

Dependency Management

Large version jump in openshift/api - Updated from v0.0.0-20200205133042-34f0ec8dab87 (2020) to v0.0.0-20260317165824-54a3998d81eb (2026) - a 6-year jump. This module defines types used throughout the operator (routev1, oauthv1, securityv1, configv1, templatev1). Confirm that this update doesn't introduce breaking changes in serialization or defaulting behavior for these types.

Pinned to commit hash - openshift/library-go at v0.0.0-20260213153706-03f1709971c5 is a pseudo-version. While common in OpenShift, consider documenting why this specific commit was chosen.

Operational Considerations

Webhook availability during TLS profile change restart - When the cluster TLS profile changes, the SecurityProfileWatcher triggers a restart by canceling the manager context. During the restart window, the webhook server is unavailable. Verify that:

  1. The webhooks have an appropriate failurePolicy setting for this scenario
  2. The restart behavior is documented
  3. There's protection against restart storms if the APIServer TLS profile is repeatedly modified

CRD controller-gen version bump - The annotation changed from v0.18.0 to v0.21.0 across 8 CRD files but isn't mentioned in the PR description. If this was a side effect of regenerating manifests, note it in the description.


Review methodology: This review included pre-flight summary, standard code review, deep design analysis, and system-level impact assessment using the ok-pr-review workflow.

Signed-off-by: David Kwon <dakwon@redhat.com>
Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants