Operators meeting 18.12.2014
Agenda
1. Current issues:
2. Roster, any questions or comments
4. Other items
PD Below an email from Ed about recent Katherine issues:
Hi Jim, Jamie, Brett, Rich, Dan, and I have been looking at some recent problems Katherine had. Below is our diagnosis of what happened and some suggestions for handling them. Please share this with the operators if you think it would be helpful. AUST67: The major problem was not a Mark 5 problem, but it was the tip-off "canary in the coal mine". The FS PC appears to have been loaded (as happened in CONT) slowing the system down. This snowballed into the other errors. What was visible was the Mark 5 rejecting commands ("error m5 -900 not while recording or playing"). When problems like this are encountered, we recommend that the operator check the load with "uptime" and enter the numbers in a log comment. Until this issue is resolved, you might consider having a window opening running tload, xload or top, as part of normal operations for a visual indication of the load. A FS PC reboot may fix this problem when it occurs. Ideally, the problem program should be identified and fixed or at least the operator given instructions on what program to "kill", which would be less traumatic than a FS PC reboot. CRDS74: Again the Mark 5 was not the issue. The DBBC had 1 PPS problems, visible as a jump in /dbbc/pps_delay/ output. This snowballed into the other problems. The situation was visible as a Mark 5 time error ("ERROR sc -13 setcl: formatter to FS time difference 0.5 seconds or greater"). We suggest that you have the operator monitor /dbbc/pps_delay/ value and look for jumps. The DBBC needs to be resync'd/rebooted in this situation, but not the FS, or Mark 5, as the operator discovered. There is a learning curve. Of course, after resyncing/reboot the DBBC, the Mark 5 has to be resync'd and the time set. It seems like there are still DBBC PPS stability issues that need to be addressed. BTW, we noticed the following text in crds74ke end message: Comments from the log: Disk VSN: WSRT-049 Data volume at beginning: 1.159 GB UT 17:10 Sched start. First scan 343-1719 (-13.4 GB) (JS) UT 10:36 - Mark5B drifting wildly. Halted schedule. (vk) UT 10:54 To fix clock drift, restarted field system, GPIB and Mark 5 with no luck, reset counters and fmset with no luck, had to restart DBBC, finally fixed the problem. (Imogen) UT 11:50 - Schedule resumed. Missed scans: 344-1027 to 344-1141, corresponding to ~90 GB (vk) These comments don't seem to actually appear in the log. Did we just not look in the right place? Maybe they are just the operators hand written notes entered for the end message. If so, it would be very helpful if they could be injected into the log file (a la the 'msg' program) since these comments contain valuable information that didn't otherwise appear. Thanks, Ed