Operators meeting 18.12.2014

Agenda

1. Current issues:
  * Observing at Mt Pleasant: experience, suggestions.
    *  Alarms at Mt Pleasant -- what is wrong? (seemingly nothing)
      * alarm is .ogg maybe not playing -- convert to another format?
      * another another player?
      * use the system alarm at Mt Pl - mute!
  * Hb recording consistently less than schedule assumes for certain AUST experiments (e.g., [[handover:aust67|AUST67]], [[handover:aust68|AUST68]])
      * Check scan_checks (Jamie)
  * Ke time setting -- mark5 doesn't synch (Recent [[handover:aust72|AUST72]]+generator issue)
  * Ke issues during [[handover:aust67|AUST67]] and [[handover:crds74|CRDS74]] //(see below)//
      * MONICA puts load on pcfs, but maybe not it is to plame
      * java?
      * keep monitoring and gather info
  * Ed's suggestion: add handover comments to the log rather than to the end message.


2. Roster, any questions or comments
  * Roster 2015

4. Other items
  * AuScope Observers Holiday drinks 5:30pm Monday 22nd of December in The Winston http://www.thewinstonbar.com/


PD Below an email from Ed about recent Katherine issues: 

<code>
Hi Jim, Jamie, Brett,

Rich, Dan, and I have been looking at some recent problems Katherine
had.  Below is our diagnosis of what happened and some suggestions for
handling them. Please share this with the operators if you think it
would be helpful. 

AUST67: The major problem was not a Mark 5 problem, but it was the
tip-off "canary in the coal mine". The FS PC appears to have been loaded
(as happened in CONT) slowing the system down. This snowballed into the
other errors. What was visible was the Mark 5 rejecting commands ("error
m5 -900 not while recording or playing"). When problems like this are
encountered, we recommend that the operator check the load with "uptime"
and enter the numbers in a log comment. Until this issue is resolved,
you might consider having a window opening running tload, xload or top,
as part of normal operations for a visual indication of the load. A FS
PC reboot may fix this problem when it occurs. Ideally, the problem
program should be identified and fixed or at least the operator given
instructions on what program to "kill", which would be less traumatic
than a FS PC reboot.

CRDS74: Again the Mark 5 was not the issue. The DBBC had 1 PPS problems,
visible as a jump in /dbbc/pps_delay/ output. This snowballed into the
other problems. The situation was visible  as a Mark 5 time error
("ERROR sc  -13 setcl: formatter to FS time difference 0.5 seconds or
greater"). We suggest that you have the operator monitor
/dbbc/pps_delay/ value and look for jumps. The DBBC needs to be
resync'd/rebooted in this situation, but not the FS, or Mark 5, as the
operator discovered. There is a learning curve. Of course, after
resyncing/reboot the DBBC, the Mark 5 has to be resync'd and the time
set. It seems like there are still DBBC PPS stability issues that need
to be addressed.

BTW, we noticed the following text in crds74ke end message:

Comments from the log:
Disk VSN: WSRT-049
Data volume at beginning: 1.159 GB
UT 17:10 Sched start. First scan 343-1719 (-13.4 GB) (JS)
UT 10:36 - Mark5B drifting wildly. Halted schedule. (vk)
UT 10:54 To fix clock drift, restarted field system, GPIB and Mark 5 with no luck, reset counters and fmset with no luck, had to restart DBBC, finally fixed the problem. (Imogen)
UT 11:50 - Schedule resumed. Missed scans: 344-1027 to 344-1141, corresponding to ~90 GB (vk)

These comments don't seem to actually appear in the log. Did we just not
look in the right place? Maybe they are just the operators hand written
notes entered for the end message. If so, it would be very helpful if
they could be injected into the log file (a la the 'msg' program) since
these comments contain valuable information that didn't otherwise appear.

Thanks,  Ed
</code>