Retrieve TLS config from apiserver for OpenShift#1643
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dkwon17 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi! I'm che-ai-assistant — I help with your pull requests. Available commands:
|
|
/che-ai-assistant ok-pr-review Review is complete. Please check the review comments below. |
tolusha
left a comment
There was a problem hiding this comment.
@## Comprehensive PR Review Summary
I've completed a full review including summary, standard code review, deep design analysis, and system-level impact assessment. Here are the key findings:
🔴 Critical Concerns
Security & Reliability:
- Silent TLS degradation - When TLS profile fetch fails on OpenShift, the operator falls back to Go defaults without any metric/health check visibility
- No startup timeout - Bootstrap TLS fetch uses
context.Background()and could hang indefinitely if API server is slow - No test coverage - 128 lines of security-critical TLS code with zero tests
Code Quality:
4. Misleading error signature - Function claims it can fail but always returns nil error, creating dead error-handling code
5. No CLI flag validation - Invalid --tls-min-version or --tls-cipher-suites values are silently mishandled
⚠️ Design Improvements Needed
- Non-idiomatic nested error handling - 4-level deep nesting instead of Go's early-return pattern
- RBAC inconsistency - Controller RBAC doesn't restrict
getby resourceNames like webhook RBAC does
✅ What's Good
- Clean separation of TLS setup into dedicated package
- Uses canonical OpenShift library instead of reimplementing
- Proper context cancellation chain for restart mechanism
- Follows established pattern from cluster-machine-approver
See inline comments below for specific recommendations.
|
|
||
| "github.com/devfile/devworkspace-operator/pkg/infrastructure" | ||
| ) | ||
|
|
There was a problem hiding this comment.
Missing timeout on bootstrap TLS fetch could cause startup hang
The BuildServerTLSOptions function uses context.Background() with no timeout when fetching the cluster TLS profile. If the API server is slow or unavailable at operator startup, the bootstrap client calls will block indefinitely, preventing the operator from starting.
Consider adding a timeout:
| func BuildServerTLSOptions(ctx context.Context, cfg *rest.Config, scheme *k8sruntime.Scheme, tlsMinVersion, tlsCipherSuites string, log logr.Logger) (ServerTLS, error) { | |
| var result ServerTLS | |
| result.CLITLSOverride = tlsMinVersion != "" || tlsCipherSuites != "" | |
| if infrastructure.IsOpenShift() { | |
| // Use a timeout context for bootstrap operations to prevent startup hang | |
| fetchCtx, cancel := context.WithTimeout(ctx, 30*time.Second) | |
| defer cancel() | |
| bootstrapClient, err := client.New(cfg, client.Options{Scheme: scheme}) |
| // OpenShift APIServer. If false, the SecurityProfileWatcher is not registered. | ||
| openShiftTLSFetched bool | ||
| } | ||
|
|
There was a problem hiding this comment.
Silent TLS degradation with no observability
When TLS profile fetch fails on OpenShift, the operator silently falls back to Go's default TLS settings. This is a security concern:
- No metric or health check reflects the degraded TLS state
- The
SecurityProfileWatcheris not registered, so future fixes won't auto-apply - The operator could serve webhooks with TLS 1.2 on a cluster that mandates TLS 1.3
Consider either:
- Making this a hard failure (refuse to start if TLS profile can't be fetched on OpenShift)
- Exposing a Prometheus metric like
devworkspace_tls_profile_source{source="cluster|cli|default"}and a health check condition
This is critical infrastructure - silent degradation should be visible to cluster admins.
| } | ||
| result.TLSOpts = []func(*tls.Config){tlsConfigFunc} | ||
| } | ||
| } |
There was a problem hiding this comment.
Function signature claims it can fail but never does
BuildServerTLSOptions has signature (ServerTLS, error) but always returns nil error. All error paths log and continue with fallback behavior. This creates misleading dead code in callers at main.go:129-132 and webhook/main.go:105-108.
Either:
- Return real errors and let callers decide whether to fail or fallback
- Change signature to
(ServerTLS)and remove the dead error-handling code
| func main() { | ||
| var metricsAddr string | ||
| var enableLeaderElection bool | ||
| var tlsMinVersion string |
There was a problem hiding this comment.
No validation of CLI flag values
The new --tls-min-version and --tls-cipher-suites flags accept any string value without validation at startup. Invalid values are silently handled by ostls.NewTLSConfigFromProfile:
- Invalid
tlsMinVersionlikely results in Go's default minimum version - Invalid cipher suite names are logged in
unsupportedCiphersbut the cipher is silently dropped
Consider validating at startup - check that tlsMinVersion is one of: VersionTLS10, VersionTLS11, VersionTLS12, VersionTLS13, or empty. Silent configuration errors make troubleshooting hard.
| @@ -0,0 +1,128 @@ | |||
| // | |||
There was a problem hiding this comment.
No test coverage for new security-critical code
This file adds 128 lines of TLS configuration logic with multiple code paths (OpenShift vs non-OpenShift, fetch success vs failure, CLI override vs cluster profile, watcher registration guards) but includes no tests.
This is security infrastructure code (🔴 Critical level). Consider adding unit tests for:
- Non-OpenShift returns empty ServerTLS
- OpenShift with successful fetch populates TLSOpts from cluster profile
- CLI flags override cluster profile
- Failed bootstrap client/fetch falls back gracefully
- RegisterSecurityProfileWatcher guards (non-OpenShift, CLI override, failed fetch)
The function takes explicit dependencies (cfg, scheme, log) making it straightforward to test with fake clients.
| // +kubebuilder:rbac:groups=oauth.openshift.io,resources=oauthclients,verbs=get;list;watch;create;update;patch;delete;deletecollection | ||
| // +kubebuilder:rbac:groups=monitoring.coreos.com,resources=servicemonitors,verbs=get;create | ||
| // +kubebuilder:rbac:groups=config.openshift.io,resources=proxies,verbs=get,resourceNames=cluster | ||
| // +kubebuilder:rbac:groups=config.openshift.io,resources=apiservers,verbs=get;list;watch |
There was a problem hiding this comment.
RBAC inconsistency between controller and webhook
The kubebuilder RBAC annotation grants get;list;watch on apiservers without restricting get by resourceNames. Compare with the webhook's RBAC at pkg/webhook/cluster_roles.go:138-155, which correctly splits:
list/watch(unrestricted - these verbs can't use resourceNames)get(restricted toresourceNames: cluster)
Consider adding resourceNames=cluster to the controller's annotation for the get verb, or splitting into two annotations to match the webhook pattern for consistency and least privilege.
| InitialTLSAdherencePolicy configv1.TLSAdherencePolicy | ||
| // CLITLSOverride is true when tls-min-version or tls-cipher-suites flags were set. | ||
| CLITLSOverride bool | ||
| // openShiftTLSFetched is true when the TLS profile was successfully retrieved from the | ||
| // OpenShift APIServer. If false, the SecurityProfileWatcher is not registered. | ||
| openShiftTLSFetched bool | ||
| } | ||
|
|
||
| // BuildServerTLSOptions loads TLS settings for secure serving: on OpenShift, from the cluster | ||
| // APIServer resource unless CLI flags override; with CLI flags, those always win. | ||
| // If the APIServer TLS config is not accessible on OpenShift, a warning is logged and the | ||
| // function falls back to the original behaviour (no custom TLS options applied). | ||
| func BuildServerTLSOptions(ctx context.Context, cfg *rest.Config, scheme *k8sruntime.Scheme, tlsMinVersion, tlsCipherSuites string, log logr.Logger) (ServerTLS, error) { | ||
| var result ServerTLS | ||
| result.CLITLSOverride = tlsMinVersion != "" || tlsCipherSuites != "" | ||
|
|
||
| if infrastructure.IsOpenShift() { | ||
| bootstrapClient, err := client.New(cfg, client.Options{Scheme: scheme}) | ||
| if err != nil { | ||
| log.Error(err, "Failed to create bootstrap client for TLS profile fetch; falling back to default TLS configuration") | ||
| } else { | ||
| profileSpec, err := ostls.FetchAPIServerTLSProfile(ctx, bootstrapClient) | ||
| if err != nil { | ||
| log.Error(err, "Failed to fetch TLS profile from APIServer; falling back to default TLS configuration") | ||
| } else { | ||
| adherencePolicy, err := ostls.FetchAPIServerTLSAdherencePolicy(ctx, bootstrapClient) | ||
| if err != nil { |
There was a problem hiding this comment.
Deeply nested error handling is not idiomatic Go
The 4-level nested if err { log } else { ... } structure makes the success path hard to follow. Idiomatic Go uses early returns for cleaner code:
| InitialTLSAdherencePolicy configv1.TLSAdherencePolicy | |
| // CLITLSOverride is true when tls-min-version or tls-cipher-suites flags were set. | |
| CLITLSOverride bool | |
| // openShiftTLSFetched is true when the TLS profile was successfully retrieved from the | |
| // OpenShift APIServer. If false, the SecurityProfileWatcher is not registered. | |
| openShiftTLSFetched bool | |
| } | |
| // BuildServerTLSOptions loads TLS settings for secure serving: on OpenShift, from the cluster | |
| // APIServer resource unless CLI flags override; with CLI flags, those always win. | |
| // If the APIServer TLS config is not accessible on OpenShift, a warning is logged and the | |
| // function falls back to the original behaviour (no custom TLS options applied). | |
| func BuildServerTLSOptions(ctx context.Context, cfg *rest.Config, scheme *k8sruntime.Scheme, tlsMinVersion, tlsCipherSuites string, log logr.Logger) (ServerTLS, error) { | |
| var result ServerTLS | |
| result.CLITLSOverride = tlsMinVersion != "" || tlsCipherSuites != "" | |
| if infrastructure.IsOpenShift() { | |
| bootstrapClient, err := client.New(cfg, client.Options{Scheme: scheme}) | |
| if err != nil { | |
| log.Error(err, "Failed to create bootstrap client for TLS profile fetch; falling back to default TLS configuration") | |
| } else { | |
| profileSpec, err := ostls.FetchAPIServerTLSProfile(ctx, bootstrapClient) | |
| if err != nil { | |
| log.Error(err, "Failed to fetch TLS profile from APIServer; falling back to default TLS configuration") | |
| } else { | |
| adherencePolicy, err := ostls.FetchAPIServerTLSAdherencePolicy(ctx, bootstrapClient) | |
| if err != nil { | |
| if infrastructure.IsOpenShift() { | |
| bootstrapClient, err := client.New(cfg, client.Options{Scheme: scheme}) | |
| if err != nil { | |
| log.Error(err, "Failed to create bootstrap client for TLS profile fetch; falling back to default TLS configuration") | |
| return result, nil | |
| } | |
| profileSpec, err := ostls.FetchAPIServerTLSProfile(ctx, bootstrapClient) | |
| if err != nil { | |
| log.Error(err, "Failed to fetch TLS profile from APIServer; falling back to default TLS configuration") | |
| return result, nil | |
| } | |
| adherencePolicy, err := ostls.FetchAPIServerTLSAdherencePolicy(ctx, bootstrapClient) | |
| if err != nil { | |
| log.Error(err, "Failed to fetch TLS adherence policy from APIServer; falling back to default TLS configuration") | |
| return result, nil | |
| } | |
| result.InitialTLSProfileSpec = profileSpec | |
| result.InitialTLSAdherencePolicy = adherencePolicy | |
| result.openShiftTLSFetched = true | |
| log.Info("Fetched TLS profile from APIServer", | |
| "minTLSVersion", profileSpec.MinTLSVersion, "ciphers", profileSpec.Ciphers) | |
| if !result.CLITLSOverride { | |
| tlsConfigFunc, unsupportedCiphers := ostls.NewTLSConfigFromProfile(profileSpec) | |
| if len(unsupportedCiphers) > 0 { | |
| log.Info("Ignoring unsupported cipher suites from cluster TLS profile", | |
| "ciphers", unsupportedCiphers) | |
| } | |
| result.TLSOpts = []func(*tls.Config){tlsConfigFunc} | |
| } | |
| } |
| InitialTLSProfileSpec configv1.TLSProfileSpec | ||
| InitialTLSAdherencePolicy configv1.TLSAdherencePolicy | ||
| // CLITLSOverride is true when tls-min-version or tls-cipher-suites flags were set. | ||
| CLITLSOverride bool |
There was a problem hiding this comment.
Log messages could be more actionable
When falling back to default TLS, the log says "falling back to default TLS configuration" but doesn't explain what the default is. Consider clarifying:
| CLITLSOverride bool | |
| log.Error(err, "Failed to create bootstrap client for TLS profile fetch; using Go's default TLS settings (TLS 1.2+)") |
Same recommendation for the other fallback messages at lines 50-51 and 53-54.
Additional Review NotesDependency ManagementLarge version jump in openshift/api - Updated from Pinned to commit hash - Operational ConsiderationsWebhook availability during TLS profile change restart - When the cluster TLS profile changes, the
CRD controller-gen version bump - The annotation changed from Review methodology: This review included pre-flight summary, standard code review, deep design analysis, and system-level impact assessment using the ok-pr-review workflow. |
Signed-off-by: David Kwon <dakwon@redhat.com> Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Assisted-by: Claude Sonnet 4.6 noreply@anthropic.com
What does this PR do?
Retrieve TLS config from api server for Openshift (no changes for Kubernetes).
Example PR: https://github.com/openshift/cluster-machine-approver/pull/286/changes#diff-2873f79a86c0d8b3335cd7731b0ecf7dd4301eb19a82ef7a1cba7589b5252261
What issues does this PR fix or reference?
Is it tested? How?
When testing, I installed the operator in the
openshift-operatorsnamespace.The expected output is:
The pod IPs (first col) will likely be different on your cluster, but the important thing to verify is that we see
PROBE_PORTcolumn isOKfor the three DWO pods for ports8443and9443.PR Checklist
/test v8-devworkspace-operator-e2e, v8-che-happy-pathto trigger)v8-devworkspace-operator-e2e: DevWorkspace e2e testv8-che-happy-path: Happy path for verification integration with Che