Dev Tools · 2h ago
Tracing a Databricks BOOTSTRAP_TIMEOUT Through AWS Transit Gateway and Firewall
A Databricks cluster on AWS failed with BOOTSTRAP_TIMEOUT despite healthy EC2 nodes and open port 443. The issue was traced to egress traffic dying in a Transit Gateway inspection firewall, preventing the node from reaching the control plane. The fix required routing Databricks control plane traffic through VPC endpoints or bypassing the firewall.
Meridian48 take
This deep-dive is essential for any team running Databricks in a locked-down AWS environment, but the niche audience means it won't move the needle for most readers.
Read the full reporting
[Databricks on AWS #4] The BOOTSTRAP_TIMEOUT Mystery: Tracing a Databricks Cluster from Data Plane to Control Plane (Transit Gateway + Firewall) →
DEV Community
databricksaws-networking