Starliner's Clock Was Really Really Wrong

fcrary says:

December 23, 2019 at 1:27 am

It sounds like that wasn’t clock drift. The spacecraft got an input time from the launch vehicle and the input was corrupt. I’m not sure how it ended up as 11 hours. If the clock was ticking off seconds, a single bit flip wouldn’t be 11 hours. But maybe it wasn’t seconds. Corrupt input data can easily be way off.

Terry Stetler says:

December 23, 2019 at 5:45 am

0
0
I believe Atlas V powered up at T-11 hours, so perhaps that’s what Starliner read.

SpaceFlightNow posted this quote back in November,

https://spaceflightnow.com/…

“Electrically, one of the unique things about this mission is that the launch vehicle and spacecraft are going to be talking to each other,” Weiss said in a recent interview. “We normally don’t have that. They will be sharing data throughout (the) flight.”

So Atlas V hasn’t been talking to spacecraft, and its first attempt was borked?
- richard_schumacher says:
  
  December 23, 2019 at 5:53 am
  
  0
  0
  Every new thing is wrong the first time it is used. – Morris’s Law of Product Development
- Bill Kristler says:
  
  December 23, 2019 at 11:56 am
  
  0
  0
  Wow, good catch. I’m betting we have an interface definition issue. Atlas team said here is mission (power on) time in this byte and Starliner team says, great, mission (launch) time, got it, perfect. Sad thing is I can easily see both sides fully testing this out 100% (even with hardware in the loop) with no issues, the only real problem is 1 team assumes this is from start of the “mission” (I.e. initial power on / countdown) and team 2 thinks yep, this is at T-0 / start of the mission. I would be going over that interface specification with a fine tooth comb (and any others) and doing an update with detailed definitions, then reviewing everything with both teams. This is almost a metric / imperial error.
  - snopercod says:
    
    December 23, 2019 at 11:33 pm
    
    0
    0
    On the Shuttle program we had a number of “Interface Control Documents” and a staff of people who were in charge of those. Examples would be the booster to orbiter interface or the ET to orbiter interface.
    - fcrary says:
      
      December 24, 2019 at 9:37 pm
      
      0
      0
      I suspect you had ICDs at much lower levels. I wouldn’t be surprised if there was one for the interface between the toilet and the water supply system. Those ICDs are critical, but they never end up being completely unambiguous. The people working on the two different systems may read it slightly differently and things may not quite work when the two are connected. That’s especially true when the people involved are in different places or don’t communicate frequently.
- MAGA_Ken says:
  
  December 23, 2019 at 1:33 pm
  
  0
  0
  I guess they are lucky Starliner didn’t fire off it orbital insertion rockets immediately.
  
  It still boggles my mind that this error wasn’t found during the thousands of simulation tests of the program that surely had been run.
  - fcrary says:
    
    December 23, 2019 at 9:20 pm
    
    0
    0
    I hope firing rockets before staging is impossible. It’s fairly easy to put in safeties and hardware or software enable switches. But, yes, this does imply the simulations and tests were not flight-like.
    - Michael Spencer says:
      
      December 24, 2019 at 3:48 pm
      
      0
      0
      Wouldn’t a function like setting the clock be unique to each flight or simulation?
    - fcrary says:
      
      December 24, 2019 at 9:32 pm
      
      0
      0
      Yes and no. For all the talk about testing like you fly, that’s never completely possible. From flight to flight, the clock would be set to a different number. But it would be set at a certain point in the launch sequence. And the likely range of numbers should be predictable. There’s an art to writing good test plans. I’d personally slip in tests with extreme values. But that’s because I’d worry about the clock ticking up into an overflow error or something. Other people might say that’s overkill, and with a finite amount of time for testing, other things are more important.
    - hikingmike says:
      
      December 27, 2019 at 9:26 pm
      
      0
      0
      So there was a save there by the design. It somehow checks for separation first (among other things).
- DetailCurious says:
  
  December 23, 2019 at 1:58 pm
  
  0
  0
  Finally, something that makes sense. The noise to signal ratio on this has been astoundingly high.
  
  1) Note that while many commentators refer to “clock” and “time” as absolute (like a wristwatch), the actual language is “Mission ELAPSED Time”. That’s a relative time – like the countdown clock. Many things are planned in relative time – you don’t know when there will be a delay, and updating hundreds – or thousands – of absolute event times is unwieldly – and error prone. And some hardware events must be in relative time – open this valve, wait 2 seconds, turn on the igniter. Other computations require absolute time – where is this satellite (or star).
  
  2) Operating systems tend to have two timebases – ticks since boot for relative events, and ticks since an epoch for absolute time. (A “tick” might be 1/60 or 1/50 sec, 100, 10, or one msec, or even 100 nsec). This is so when some external source (you, GPS, NTP,…) updates the absolute time, events queues with a relative time aren’t affected.
  
  3) The exact language used in the press conference was “starliner reached nto the Atlas and looked in the wrong place”.
  
  4) There are probably multiple timebases: Atlas has one from when it powers-up for relative time; Starliner’s computers probably power-up later – resulting in a different relative timebase. Both may sync to (an) absolulte time (e.g. from GPS). From that, they may need local time (welcome to time zones, not to mention leap seconds). But in any case they have to convert to the mission elapsed time. Which one would expect to be relative to T==0 (launch). But one might imagine Atlas and Starliner having different “missions” – at least for planning/simulation. (Let’s not talk about relativistic effects on time…they’re not needed to explain this.)
  
  5) It is not unthinkable that Starliner took the Atlas “Time since boot” (event relative timebase) to set its idea of MET, when it intended to take “Time since launch” – the usual definition of MET. In a simulation environment, these could easily be the same… Thus, a software bug that looked in the wrong place would get the “right” value in simulation, but not in real life.
  
  6) Catching this in simulation would require a checker that verifies that MET is the same in all systems. This is probably what was omitted. But would not necessarily be sufficient.
  
  7) Absent a checker, It is very likely that this would go undetected. Simulations often compress time – they omit/skip time when nothing of interest (to the simulation) happens. Even in systems integration testing, MET is the sort of thing that would be manually patched into the simulations. Even “do they agree on MET” wouldn’t necessarily catch this fault. For simulation purposes, the relative event timebase could be set to zero, or an arbitrary value. Perhaps even MET. And it would still “work” – in simulation.
  
  8) So how would you catch this? Perhaps a full-up simulation – if you had the computes to simulate an 8-day mission. (Note that simulators may run 100 – 1,000 times slower than real time, depending on fidelity,) Since this isn’t practical, you count on lots of assertion checkers and your test designers. Sometimes they miss. This would appear to be one of those times.
  
  After the failure, in simulation one might add an artificial check that MET is different from the event relative timebase. Adding one simulation artifact to cover for another isn’t uncommon.
  
  The point is that it’s not “one clock”. Or “two clocks”. Or even 3. Or “a few minutes off”. Under the covers, it’s really complicated. The people involved almost certainly aren’t dumb.
  
  This is an embarrassing mistake. But not particularly surprising if you think about it. After the fact.
  
  Engineering is hard. Armchair Monday-morning commentary based on simplifed explanations is a lot easier.
  
  Oh – I’m an engineer. I’m not in Boeing or NASA – or even aerospace. I have built – and simulated – complex systems. I’m not an apologist for any of the parties involved.
  
  But I thought some context might be helpful.
  - Dewey Vanderhoff says:
    
    December 23, 2019 at 3:38 pm
    
    0
    0
    Terry , it appears you cut thru the word fog and found the 11 hour discrepancy gremlin. Makes sense the capsule began the mission at the same time the booster was brought to life. Good call.
  - Steve Pemberton says:
    
    December 23, 2019 at 5:42 pm
    
    0
    0
    There is no reason the simulators would run hundreds of times slower,they can run at real time at full fidelity. I am not familiar with Boeing’s avionics lab but I am familiar with the Shuttle’s SAIL lab where the flight software was tested, and it could run simulated missions real time, even complete missions. In fact they often ran simulations for days just so they could get a data snapshot at a certain point in the mission so that they could later do multiple tests from that same starting point.
    
    The Shuttle lab used the exact same computers as the Shuttle. These computers were in turn connected to simulation computers which provided data exactly as it would be provided by actual sensors. When I say connected I mean if in the real Shuttle a wire ran down the cargo bay to an engine temperature sensor, then in the lab they ran a wire the same distance but instead of the wire terminating at an engine sensor it terminated at an input from an engine simulation computer, which provided the exact same voltages that an engine sensor would. The reason they ran the wires the actual distance was to accurately model any voltage drop, so that the Shuttle computers were “seeing” the data exactly like they would on a real mission.
    
    Starliners lab computer would have been connected to another computer which would be simulating the Atlas computer. That computer should have been providing data exactly like Atlas, which means the Starliner computer would have to correctly pick out the right data for MET. It would have been a mistake in the simulation design if they decided that since they only need just this one piece of data that’s all the computer simulating the Atlas needs to provide. Again that computer should have been providing data exactly like an Atlas does, which would include different timestamps for boot up and liftoff, and the Starliner simulation computer would have to pick the right one. Shortcutting that in the simulation design would have eliminated the possibility of catching the programming error during simulation.
    - fcrary says:
      
      December 23, 2019 at 9:16 pm
      
      0
      0
      I would definitely not trust tests and simulations done at anything less than real time. One of the goals of these tests is to verify timing and synchronization and CPU usage. This is one of those `test as you fly” things. I don’t actually care how well the system works when it’s clocked down by a factor of ten. That’s also another reason why the cable lengths have to be flight-like. Light travels at one nanosecond per foot (well, 11.7 inches; I know someone who named that unit the “phoot”.) Even with the relatively slow speed of flight microprocessors, it’s easy for a few phoots to cause or solve synchronization problems.
      
      But it is common for these tests to be shaved down. I don’t like it, but I’ve had and lost that argument before. People really want to do the whole thing in under eight hours. Since they insist on having someone present to babysit the whole run, that avoids having shifts or paying overtime. So idle time between events gets trimmed down. I don’t like that because I’ve seen things like bad garbage collection in flight software and an instrument crashing after operating perfectly well for a random number of days. I also don’t understand the need to have someone around all the time. With robotic, planetary missions, it’s going to be running on its own, often without contact with Earth for days at a stretch. If it can do that in space, why can’t the electronics run on its own in a lab in Colorado? But that is a common practice.
    - Ball Peen Hammer ✓ᵛᵉʳᶦᶠᶦᵉᵈ says:
      
      January 2, 2020 at 6:28 pm
      
      0
      0
      “Again that computer should have been providing data exactly like an Atlas does”
      
      This assumes the people who designed the computer to simulate the Atlas correctly interpreted address maps and data formats.
      Literally simulating the correct clock on the wrong register, or formatting the time in the right register incorrectly in the computer simulating the Atlas could easily account for the software passing testing under simulation but not in operation when communicating with the Atlas.
    - Steve Pemberton says:
      
      January 3, 2020 at 2:35 am
      
      0
      0
      English is rather vague at times isn’t it. “Should have been” can be taken two ways. One is referring to what is assumed to be true. The other use of the phrase is what ought to be true or is supposed to be true but there is an implication of doubt that it actually was. That’s the usage of the phrase that I meant.
      
      In other words, they sure ought to have exactly duplicated the Atlas computer, either by using an actual Atlas computer or by providing data exactly like an Atlas does. My implication was exactly what you are saying, if they didn’t exactly replicate the Atlas computer then it can lead to the situation that the Starliner software can pass simulation but fail in the actual launch.
      
      On reason they might not have used an actual Atlas computer is because then that computer also has to be fed all of the data that it expects, meaning even more simulated sensors, communications, etc. So as a shortcut they might have decided to program a regular off the shelf computer to put out data simulating an Atlas computer. Nice in theory, but risky in practice because as you said they better get it exactly right.
      
      I mentioned the Shuttle SAIL lab, the simulator there had an actual orbiter designation OV-95, and that wasn’t just to be whimsical, it was treated for inventory purposes as an actual orbiter. So if for example a redesigned electronics box was put into service, an order would be placed for orbiters OV-103, OV-104, OV-105 and OV-95.
      This ensured that the avionics test lab was getting the identical black box as the actual orbiters.
      
      Boeing from what we hear leans more towards software modeling of things, so it’s not inconceivable that they convinced themselves that they can just as accurately model an Atlas computer in software. Pure conjecture on my part, but I certainly won’t be surprised if it is found that the software error in Starliner was missed in simulation because of something like this.
  - Jack says:
    
    December 23, 2019 at 9:34 pm
    
    0
    0
    “8) So how would you catch this?”
    Integration testing where all systems and subsystems are tested together to make sure that each is exchanging the expected data with each other.
    I have done this many times myself on large distributed systems.
    
    One could easily imagine there was not adequate integration testing between the Starliner and the Atlas 5.
    - fcrary says:
      
      December 23, 2019 at 11:19 pm
      
      0
      0
      Does anyone know, physically, where the Atlas 5 and CST-100 Starliner are produced? I know Boeing is an ULA partner, but that doesn’t mean their facilities are co-located or they same people are involved. Actually, I think it’s unlikely. That means through, full-up integration tests wouldn’t be easy. It’s hard enough when you have everything in one high bay. Were the Atlas and Starliner every together before they got the cape?
    - Steve Pemberton says:
      
      December 24, 2019 at 2:15 am
      
      0
      0
      Boeing builds Starliner in Discovery’s old hangar OPF-3 near the VAB. Atlas V is built in Decatur, Alabama not far from Huntsville (to your point, historically Atlas V is a Lockheed Martin product). This particular Atlas started vertical stacking at SLC-41 on November 4th, Starliner was placed on top on November 21st. Certainly the first time these two particular vehicles were joined. As to whether a test capsule was ever fitted to an Atlas in Alabama I don’t know, but I expect not because they would have had to build some expensive stands to facilitate that even if it was done horizontally. Hard to imagine they would do all that just for a one time fitment and integration test when they knew they would be integration testing for a month prior to launch at KSC.
- rktsci says:
  
  December 24, 2019 at 2:32 am
  
  0
  0
  Most uncrewed launch vehicles don’t really talk to the payload. One of the requirements for man-rating Atlas (or any other launch vehicle) was to add a set of computers to relay status information on the vehicle’s performance and systems state to the human element on top. This allows for the crew and computers to monitor things and fire the abort systems when things go pear shaped.
  - fcrary says:
    
    December 24, 2019 at 3:39 am
    
    0
    0
    Yes, but CST-100 was designed to be launch vehicle neutral. One of the selling points was that it could fly on several different rockets. Making it dependent on information from the launch vehicle adds to the number of interfaces they have to get right. I have to wonder about that decision.
    - hikingmike says:
      
      December 27, 2019 at 9:33 pm
      
      0
      0
      Yeah you would have to imagine you’d want the spacecraft to be largely independent, not relying on external factors to know its environment. It reduces chance for error.
      
      The Starliner should keep the launch vehicle startup time as a saved critical parameter somewhere. Maybe the launch vehicle can provide an ignition time as well, but the spacecraft could identify liftoff on its own with that just one of several points of evidence.
    - fcrary says:
      
      December 27, 2019 at 10:09 pm
      
      0
      0
      I’m not sure I’d do it exactly that way, but I’d definitely want to keep the communications between the launch vehicle and payload to a minimum. And make sure that minimum is common to all the launch vehicles they plan to fly on. A complicated interface which is different for Atlas, Vulcan and Falcon? That just wouldn’t give me a warm, fuzzy feeling.
      
      On the other hand, I’ve never worked with a spacecraft which had to do a major maneuver just after having the flight software initialized, and where the details of the maneuver depended on exactly where the launch vehicle put it at separation.
    - Ball Peen Hammer ✓ᵛᵉʳᶦᶠᶦᵉᵈ says:
      
      January 2, 2020 at 6:34 pm
      
      0
      0
      “Yes, but CST-100 was designed to be launch vehicle neutral.”
      
      Anything launched outside of a fairing can’t be fully launcher agnostic. CST-100 schedule slipped a full 9 months when acoustic problems were found in the Atlas/Starliner integration study that had to be solved by the addition of the aeroskirt. That integration study, and any possible problems/solutions will need to be done again for integration with Vulcan/Centaur or any other launch vehicle.
    - fcrary says:
      
      January 2, 2020 at 11:14 pm
      
      0
      0
      Did I say trying to make CST-100 fly on just about any launch vehicle was a good idea? I’m not sure if that’s more trouble than than it’s worth, or if it opens the door for a hole bunch of ways to shoot yourself in the foot. But that’s what Boeing decided to do.

Bad Horse says:

December 23, 2019 at 12:48 pm

0

The issue (just like we predicted ). The spacecraft took mission elapsed time from the moment it was powered up and into a standby state.
It was not the launch vehicle. This was a fundamental software error that w(s)ould have been discovered during testing if they had bothered to do anything off-nominal. No IV&V, no real safety review, no independent testing worth a hoot! I would send the flight and ground software to an independent organization with all the requirements/IDD/IRS/IPICL, etc and have them test it before people step inside to fly.

6sbportsidevital says:

December 23, 2019 at 5:54 pm

0
0
I agree…an error this fundamental strongly implies the SWE process Boeing used on Starliner has flaws. In other words…who knows what else is wrong?

Al Vacado says:

December 23, 2019 at 1:46 pm

0

Maybe the accurate clock is on the upper stage?

Michael Spencer says:

December 23, 2019 at 2:02 pm

0

11 hours or 11 seconds – what’s the difference? Aside from providing possible source evidence, both bork the flight.

Jeff2Space says:

December 23, 2019 at 2:11 pm

0
0
If it were off by only 11 seconds, Starliner would likely have made it into an orbit which was close enough to the originally intended orbit that it could have continued its mission to ISS. The problem might not have even become apparent to the public in that situation.
- fcrary says:
  
  December 23, 2019 at 9:30 pm
  
  0
  0
  I’m pretty sure you’re right. They said an astronaut on board would have been able to fix the problem. 11 seconds probably isn’t enough time for an astronaut to notice, diagnose and act. Also a lot of the time criticality is tied to the fraction of an orbital period involved. 11 seconds out of 90 minutes isn’t a big deal. It might mean approach and docking would take hours longer, but probably not much more.

MAGA_Ken says:

December 23, 2019 at 2:23 pm

0

Interesting that the Boeing CEO just resigned, but probably necessary considering issues across all divisions of the organization.

Jeff2Space says:

December 23, 2019 at 6:53 pm

0
0
That was inevitable given the 737 MAX fiasco. If the recent Starliner issue had any impact, it was merely the the straw that broke the camel’s back.

Richard H. Shores says:

December 23, 2019 at 5:35 pm

0

The Boeing CEO resigned immediately today. It is just a Band-Aid. The culture there is broken and is going to require a housecleaning.

CommanderBill3 says:

December 23, 2019 at 5:46 pm

0

Software issues is the bane of the modern world. Having been a career naval officer and later into health care, I can say with assurance that software never works right often even years after it has been issued. The Navy refuses to move past its vintage 80s Aegis system software with its millions of lines of code because they know any new system will take a decade or more to get the bugs out. If you have a conflict in the interim you will find your weapons don’t work right to fatal consequences. In health care we just adopted the state of the art enterprise software that is used industry wide and there are hundred if not thousands of bugs that require millions of dollars and thousands of man-hours to resolve.

As more automation is put in space systems it is typical if not expected that there will be many software bugs to be discovered. Sometimes all you can do is run it over and over again and fixem when you find them.

fcrary says:

December 23, 2019 at 9:25 pm

0
0
With the Cassini instrument I worked on, I don’t believe we every did a flight software load, new version or patch that actually worked the first time. After a few years, we just decided to leave it alone. Occasionally someone new to the team would suggest a change (e.g. fixing data compression that ended up truncating data more often than we liked.) That got vetoed.
- cb450sc says:
  
  December 24, 2019 at 3:12 am
  
  0
  0
  Spitzer’s software worked quite well, but we spent a _lot_ of time on it. Any possible change (including new command sequences) had to go through a really robust review as well as be tested on multiple simulators on the ground. That included both a software-based simulation of the spacecraft flight systems, as well as essentially flight spares of the actual hardware running the flight software, with resistor packs simulating various parts (like the detectors). Test as you fly, fly as you test. Spitzer was old enough that it’s basically all in assembly, JWST uses much more high-level software architectures. I dread to imagine what kind of interactions can go on.
  - fcrary says:
    
    December 24, 2019 at 3:34 am
    
    0
    0
    Most of the Cassini software was in Ada. That didn’t help since very few people are really fluent in Ada and that complicated reviewing the code changes. But you can (and we did) run into problems with testing. Things like data compression take more or less CPU time, depending on the content of the data. And, for some reason we never figured out, the flight article for my instrument hit timeout errors when the engineering model in our lab showed a 20% margin. I don’t want to think about developing flight software in an object oriented language. I know it’s done but it strikes me as asking for trouble. I just don’t think it’s suited to real-time applications.
    - 6sbportsidevital says:
      
      December 24, 2019 at 5:56 pm
      
      0
      0
      I remember in my SWE grad school class, the professor was an Ada guy, and he would dismiss C as a ‘by-hackers-for hackers’ language. (lol)

cb450sc says:

December 24, 2019 at 3:23 am

0

Sounds to me like this is a SIS (software interface specification) failure, regarding the definition of “time” between the launch vehicle and payload? That really points to inadequate simulation for testing.

Tritium3H says:

December 24, 2019 at 9:35 am

0

Go Big, or Go Home. This first Starliner orbital flight test just went home a little earlier than originally planned.

fcrary says:

December 24, 2019 at 9:40 pm

0
0
Yes, but there’s the old story about a New England tombstone. The one reading, “I knew this would happen but not so soon.”

Bonnie Triezenberg says:

December 24, 2019 at 4:29 pm

0

mission elapsed time is by definition the relative time since some event. Typically, for a spacecraft this will be time since separation from the booster, OR time from boot. In this case, it does appear as if the spacecraft somehow got the Atlas’s time from boot (11 hours is a LONG time). No idea why anyone thought it was a good idea to get MET from the booster – so much simpler to just reset MET to zero at separation. Any time two systems share data, semantics like “time from when?”, become problematic. Semantic interoperability has been the root cause of more than a few launch failures (see Ariane 5).
However, there may be another reason MET was incorrect. If a processor resets post-separation, then MET gets reset to zero. There are a variety of ways to design to let the onboard processor know what to do in these cases. We used to have a set of hardened switches that would indicate whether or not the injection burn had completed or not and that would then determine what actions the spacecraft would take autonomously. I would not be surprised if someone took those out of the design to save a few ounces of weight. They may also have decided that was ok to take them out if there were always two processors powered up under the assumption that BOTH would never reset, but that’s not always a great assumption.
You catch these types of things in design reviews. Boeing used to have the fault engineers who had been there and done that and they would wirebrush the design. You can definitely find it in the System Integration Lab, but you need the imagination to hypothesize the test case. And if they had that imagination, they wouldn’t have made the design error.

Homer Hickam says:

December 24, 2019 at 5:00 pm

0

It is 2020. We have five years (20-21-22-23-24) to land Americans on the moon. Simply put, the present plan is unlikely to work because things like this are inevitably going to happen. Time for a reality check or we’re going to hurt somebody.

tutiger87 says:

December 24, 2019 at 6:04 pm

0
0
The present plan was never going to work anyway…
PsiSquared says:

December 25, 2019 at 3:25 am

0
0
Was there anyone outside of Washington DC that actually thought we’d be landing on the Moon in 5 years?
Ball Peen Hammer ✓ᵛᵉʳᶦᶠᶦᵉᵈ says:

January 2, 2020 at 6:40 pm

0
0
There is no plan to take people to the moon in a CST-100.

Joel McGinley says:

December 25, 2019 at 9:13 pm

0

Here are three observations regarding spacecraft tracking and timing during simulations: 1) I an an ex-JPL assembly/test engineer. When we launched the Galileo spacecraft to Jupiter in 1989 (out of the Shuttle Atlantis), there was a special effort made to avoid the TDRS gap in telemetry coverage: An Air Force cargo plane with the right antennas was prepared and dispatched to the Indian Ocean area. During the launch and up through the IUS booster burn that plane “flew orbits” over the ocean. I don’t recall too many details but happened to meet some of the crew a couple of years later at an airshow at Patrick AFB, near KSC. I think it was a C-130 or a P-3. That IUS was built by Boeing in WA state but I think JPL “insisted” on the full coverage of their spacecraft.
2) While at KSC, I ran the spin-balance operation for the Comet Nucleus Tour (CONTOUR) spacecraft with it’s embedded solid-rocket motor. It was launched in July 2002 and went into a parking orbit until the planned Aug. 15 solid-rocket burn to send it on it’s way to fly past three comets. However, the correct position/timing of that burn was – over the Indian Ocean ! but the CONTOUR builder, Johns Hopkins U’s Applied Physics Lab, did not arrange for tracking in the gap – or the spacecraft attitude made it temporarily unworkable. The engine firing was controlled by a timed sequence and the spacecraft was supposed to check-in with the ground after the burn. It was never heard from. A few days later amateur observers saw several items heading out past the moon with about the right velocity… The NASA Mishap Report (to which I contributed the records of my ground test) was very critical of the lack of telemetry during mission critical events and even mentioned the non-consideration of airborne P-3 assets, USAF assets… It also came out that US military assets did supply some confirmation that the burn did happen, but only later and at NASA request. (My guess is the ICBM/nuclear bomb spacecraft/observers had been alerted to avoid a false alarm when an unexpected engine firing was observed … Think of the song “99 red balloons.”)
3) I recall reading that one of the early Ariane rockets was lost because of a software timing (and testing impatience) error. There was supposed to be a 40 minute coast between the two burns of the rocket second stage – this would phase the burns to circularize the orbit. Apparently, during ground simulations, the test team got bored with sitting around for 40 minutes, so “for ground test only” the time was made…40 seconds. Guess what the rocket did? Yep, 40 seconds coast- wrong program, no orbit – wham!
I hope others notice that all of these represent lessons previously learned, but then forgotten or ignored in this mission.

Krocket says:

December 25, 2019 at 11:33 pm

0

I am more curious as to why the vehicle apparently used too much propellant maintaining attitude to make re-phasing of the orbit transfer burn possible.

It also seems to me the the timing of a transfer burn to the ISS should depend on the injected orbital parameters off the LV and absolute time, rather than MET, but I could be wrong about that.

Steve Pemberton says:

December 26, 2019 at 4:42 am

0
0
I think I read somewhere that when they manually did the orbit insertion burn they used RCS instead of the larger OMAC thrusters even though that took a lot longer, because they weren’t sure yet what the problem was that had prevented the burn and they didn’t want to risk firing the main thrusters. This used up a whole bunch of RCS fuel, adding to what had been wasted already when the system was erroneously trying to maintain fine control. So when they later said they still had 75% of fuel remaining and we were all wondering why that wasn’t enough, that was likely 75% overall fuel but proportionally they were much lower on RCS. Maybe it’s the same fuel but presumably for redundancy it’s in separate tanks. Also they apparently overheated some of the RCS thrusters doing the orbit burns because of the length of the burns and the system shut them down. Whether it was safe to use the overheated thrusters later after they cooled off I don’t know, but I suspect they chose not to use those particular thrusters for the remainder of the mission.

As for timing with ISS, yes real time would normally be involved, however they launch at the moment when SLC-41 rotates under the ISS orbital plane, and so I would think at least the initial orbit insertion burn will be at a fixed length of time after launch and it will be accurate enough to run off of MET. Subsequent burns that are designed to fine tune the approach speed to ISS would I imagine be based on real time.
- Krocket says:
  
  December 26, 2019 at 3:29 pm
  
  0
  0
  Thanks. This makes sense. It doesn’t inspire confidence in the design and qualification status of the propulsion system, though. The should know what firing durations produce overheating at What duty cycles from ground test.
  - Steve Pemberton says:
    
    December 26, 2019 at 4:06 pm
    
    0
    0
    I’m sure they do and they knew they were pushing RCS past its normal use and so I would think they weren’t surprised when the system shut a few of them off. Likely with no actual damage to the thrusters since the safeguards were in place. Hopefully details about that will be included in whatever report is made public.
    - fcrary says:
      
      December 26, 2019 at 8:02 pm
      
      0
      0
      If we ever see a publicly released version of the report. It would be full of Boeing proprietary material. I’d expect a press release without too many details and, at most, a highly redacted version of the report. This wasn’t a NASA launch, and it wasn’t even an accident. (As I understand it, if the vehicle isn’t unrepairablely damaged and no one was seriously injured, its an incident not an accident.)
    - Steve Pemberton says:
      
      December 26, 2019 at 9:03 pm
      
      0
      0
      I’m just thinking of SpaceX giving quite a bit of detail about the causes of the capsule explosion, and Boeing providing details about the parachute pin. I’m not sure if either were required to provide that information publicly. I agree the formal report probably won’t be made public, I should have used a different word than report I just meant when the results of the investigation are made public like in previous incidents.
- fcrary says:
  
  December 26, 2019 at 7:59 pm
  
  0
  0
  There’s a lot we know about this. What if they main engines for a manual burn (since the spacecraft thought it had happened 10.5 hours earlier), but the spacecraft had set tight attitude control dead bands (because that’s what the needed to be for something around +11 hours)? That would eat up RCS propellant very quickly. But it could have been any number of things. I’m not sure if we’ll ever find out, not to this level of detail.
  - Steve Pemberton says:
    
    December 26, 2019 at 9:47 pm
    
    0
    0
    I have considered the part about the spacecraft using tighter control than necessary as being common knowledge since the gist of that was in Jim Bridenstine’s opening statement in the press conference immediately following the launch when he said,
    
    “And because that timing was a little bit off what ended up happening is the spacecraft tried to maintain a very precise control that it normally wouldn’t have tried to maintain, and it burned a lot of prop in that part of the flight.”
    
    This was also briefly alluded to a few minutes later in the remarks by Jim Chilton of Boeing.
    
    What we have not heard that I know of is exactly what part of the flight the attitude control system thought the capsule was in. It would be interesting to know the exact details about that, no guarantee that we will find out but I am hoping we do.
Michael Spencer says:

December 26, 2019 at 2:53 pm

0
0
I have seen it explained this way: the capsule thought it was in a part of the flight profile that required very accurate relative positioning, compared to other parts of the profile in which a fair amount of wobble was acceptable.

Being a mere interested non-professional, this explanation seems like it works. But we all know how that can go.
- fcrary says:
  
  December 27, 2019 at 3:41 am
  
  0
  0
  That probably can’t be the whole story. A timing problem and keeping tight control of pointing at the wrong point it the flight would eat up propellant. But not a huge amount more than normal. To get something which consumed the amount implied by the reports, I think there must have been more to it than that. But, all things considered, there probably was more to it. At this point, I don’t think we can really put together the whole story.
- hikingmike says:
  
  December 27, 2019 at 9:37 pm
  
  0
  0
  They must find there should have been more checks in place so that the spacecraft would take a step back and not immediately burn all its propellant if something was way different than expected. There can be some “ask for human input” moments, maybe with a certain time window before action is taken without input.
  - Ball Peen Hammer ✓ᵛᵉʳᶦᶠᶦᵉᵈ says:
    
    January 2, 2020 at 6:37 pm
    
    0
    0
    “ask for human input”
    
    Unfortunately ask human for input doesn’t work when there are no humans available to ask – as was the case when the event happened in a dead-zone between communication links.
    - hikingmike says:
      
      January 10, 2020 at 3:45 pm
      
      0
      0
      Does that mean if the spacecraft had waited there, it would have never had a communication possibility between humans and the spacecraft?I guess if that was the case, then any time window for human input would be hit and the spacecraft would take action on its own according to programming and yes it would have given the same result in this situation.
    - Ball Peen Hammer ✓ᵛᵉʳᶦᶠᶦᵉᵈ says:
      
      January 12, 2020 at 9:09 pm
      
      0
      0
      I’m not sure what you mean by “waited there.” An object in orbit is moving quite rapidly, not sitting in one spot. The Starliner was firing engines in error when it was in a communication dead zone. By the time it was in communication again, too much propellant had been consumed to complete the mission with acceptable margins.
    - hikingmike says:
      
      January 22, 2020 at 5:02 am
      
      0
      0
      By “waited there”, I mean not firing rockets. I didn’t expect that would be understood any other way.
      
      Also, by some frames of reference, I am also moving quite rapidly sitting here at my computer desk.
    - Ball Peen Hammer ✓ᵛᵉʳᶦᶠᶦᵉᵈ says:
      
      January 22, 2020 at 12:43 pm
      
      0
      0
      Staying in the same orbit (which it did) brought it out of the dead zone and back into communication. By then it was too late for human intervention allow it to complete all of the mission goals, as too much propellant had been used in error.
    - hikingmike says:
      
      January 26, 2020 at 4:10 am
      
      0
      0
      Ok, what if it had stayed in the same orbit without using all the propellant?
      
      You can go back to my first comment 6 or so back. Imagine… the spacecraft separates, something is found to be waaaay off from what was expected. Comm with humans not possible temporarily due to a dead zone. The spacecraft has an option to burn (almost) all the propellant fruitlessly trying to carry out its objective, and it has another option to wait a certain amount for human interaction to help with the very unexpected situation.
      
      The mission and spacecraft design didn’t allow for that second option of course. But it seems like it could be useful to have that. Right?

Tom Mazowiesky says:

December 26, 2019 at 4:11 pm

0

Some of this may be due to the fact that this is the first time that Boeing has actually been in charge of a spacecraft. McDonnell, Rockwell and Grumman all had growing pains during Mercury, Gemini and Apollo. Replacing the CEO will have no effect on what’s going on lower down (IMHO).

Hopefully Boeing will be able to master the problems of manned spaceflight quickly with no casualties.

kcowing says:

December 27, 2019 at 3:31 am

0
0
FWIW I worked at Rockwell in the early 1980s and at NASA in the early 1990s. The Apollo guys were already retiring back then. I would wager that virtually no one from Apollo is still working at Boeing or NASA. If anyone is, please raise your hand.
fcrary says:

December 27, 2019 at 7:00 pm

0
0
I think someone at Boeing has operated spacecraft. The have subdivision in El Segundo which makes communications satellites (formerly part of Hughes.) As I understand it, the usual practice is for the manufacturer to operate the spacecraft from launch to the end of commissioning, then turn it over the the purchaser. If so, someone at Boeing has spacecraft operations experience. But I’ve noticed that, as often as not, the various subdivisions within a big aerospace company don’t talk to each other. I believe CST-100 is built by Boeing Launch Services Inc., and they may or may not ask Boeing Satellite Development Center about operations.

Daniel Woodard says:

December 26, 2019 at 7:00 pm

0

Like the SpaeX loss of the Zuma, continuous communication could have mitigated the problem. maybe next time they will have comm with the vehicle via Starlink.

Ball Peen Hammer ✓ᵛᵉʳᶦᶠᶦᵉᵈ says:

January 2, 2020 at 6:35 pm

0
0
Can you cite a source for the Zuma loss being due to a lack of communication?

Todd Martin says:

December 27, 2019 at 5:14 pm

0

The contract signed between NASA & Boeing requires a successful demonstration of berthing at ISS before allowing crew to fly. It is widely expected that NASA will allow that contract provision to be waived, since, you know, it’s Boeing. Meanwhile, SpaceX is moving along toward conducting an In-flight abort test of their capsule while Boeing skips that step. Then, there’s the Fixed price contract that Boeing got for commercial crew which discreetly became quasi-cost-plus just for Boeing (getting paid almost twice as much for the same service wasn’t enough).

I’m sure Boeing will fix this issue and Starliner is a fine spacecraft. I like the fact it can touch down on dry-land. I just believe NASA is overly deferential to Boeing.

Bad Horse says:

December 30, 2019 at 4:02 pm

0
0
That why the call it Boeing MSFC or Boeing JSC

Gregg says:

December 29, 2019 at 1:58 pm

0

The chock was running on India Standard Time, just as the software instructed it to do. It’s a Joke son, it’s just a joke!
Seriously, I would not discount a software bug for causing the issue.

Starliner's Clock Was Really Really Wrong

74 responses to “Starliner's Clock Was Really Really Wrong”