Operators meeting 18.12.2014

Agenda

1. Current issues:

2. Roster, any questions or comments

4. Other items

PD Below an email from Ed about recent Katherine issues:

Hi Jim, Jamie, Brett,

Rich, Dan, and I have been looking at some recent problems Katherine
had.  Below is our diagnosis of what happened and some suggestions for
handling them. Please share this with the operators if you think it
would be helpful. 

AUST67: The major problem was not a Mark 5 problem, but it was the
tip-off "canary in the coal mine". The FS PC appears to have been loaded
(as happened in CONT) slowing the system down. This snowballed into the
other errors. What was visible was the Mark 5 rejecting commands ("error
m5 -900 not while recording or playing"). When problems like this are
encountered, we recommend that the operator check the load with "uptime"
and enter the numbers in a log comment. Until this issue is resolved,
you might consider having a window opening running tload, xload or top,
as part of normal operations for a visual indication of the load. A FS
PC reboot may fix this problem when it occurs. Ideally, the problem
program should be identified and fixed or at least the operator given
instructions on what program to "kill", which would be less traumatic
than a FS PC reboot.

CRDS74: Again the Mark 5 was not the issue. The DBBC had 1 PPS problems,
visible as a jump in /dbbc/pps_delay/ output. This snowballed into the
other problems. The situation was visible  as a Mark 5 time error
("ERROR sc  -13 setcl: formatter to FS time difference 0.5 seconds or
greater"). We suggest that you have the operator monitor
/dbbc/pps_delay/ value and look for jumps. The DBBC needs to be
resync'd/rebooted in this situation, but not the FS, or Mark 5, as the
operator discovered. There is a learning curve. Of course, after
resyncing/reboot the DBBC, the Mark 5 has to be resync'd and the time
set. It seems like there are still DBBC PPS stability issues that need
to be addressed.

BTW, we noticed the following text in crds74ke end message:

Comments from the log:
Disk VSN: WSRT-049
Data volume at beginning: 1.159 GB
UT 17:10 Sched start. First scan 343-1719 (-13.4 GB) (JS)
UT 10:36 - Mark5B drifting wildly. Halted schedule. (vk)
UT 10:54 To fix clock drift, restarted field system, GPIB and Mark 5 with no luck, reset counters and fmset with no luck, had to restart DBBC, finally fixed the problem. (Imogen)
UT 11:50 - Schedule resumed. Missed scans: 344-1027 to 344-1141, corresponding to ~90 GB (vk)

These comments don't seem to actually appear in the log. Did we just not
look in the right place? Maybe they are just the operators hand written
notes entered for the end message. If so, it would be very helpful if
they could be injected into the log file (a la the 'msg' program) since
these comments contain valuable information that didn't otherwise appear.

Thanks,  Ed