Differences between revisions 100 and 101
Revision 100 as of 2019-04-12 08:20:28
Size: 13197
Editor: ShaojunLu
Comment:
Revision 101 as of 2019-04-12 08:21:02
Size: 13136
Editor: ShaojunLu
Comment:
Deletions are marked like this. Additions are marked like this.
Line 259: Line 259:
{{{
/afs/desy.de/group/flc/hcal/calice_soft/v04-08-03
}}}

NAF2.0 for ILC users

1. Basics

1.1. DESY User Consulting Office (UCO)

{i} If you have problems finding the right mailing list in the following documentation for reporting, you may try to send an e-mail to DESY UCO at least:

  • (!) Das UCO ist die zentrale Anlaufstelle bei Fragen und Problemen bezüglich der zentral bereitgestellten IT-Dienste am DESY.

  • (!) The UCO is the single point of contact for all questions and problems concerning your computer workplace and central IT-services at DESY.

1.2. Password Change

ssh USERNAME@passwd.desy.de

1.3. Getting Access to NAF2.0

1.3.1. Access to the ILC Ressources

  • DESY, University of Hamburg or Humboldt University member:

  • If you are a DESY user with a full account, please check your ID with command "id":

 [flc desktop] id <your_accout>
 uid=NNNNN(<your_account>)gid=NNN(flc) groups=1417(flc),5295(af-ilc)
  • If you have seen "5295(af-ilc)", you should be able to login already to the ILC Workgroupservers.
  • If you do NOT have "5295(af-ilc)", please send email to naf-ilc-support<at>desy.de.

  • If you have more questions about the "af-ilc" namespace, please send email to naf-ilc-support<at>desy.de

1.3.2. Batch System

  • In order to be able to send your jobs to BIRD cluster, you need permission to access resource "batch(IT)".
  • You may check it by yourself by login onto https://registry.desy.de/registry/ with your desy account (only from inside the DESY network), and looking at the column "Current resource access", verify that the item "batch(IT)" is listed.

  • If you don't have it, you may ask one administrator to help you. You will find the administrator list by clicking on "Administrators" at the left column on your login screen.

1.4. NAF2.0 ILC Workgroupservers

  • These serve as the access point for internal and external ILC users to the NAF.
  • The DESY-AFS user directories are used as home directories.
  • You can access the ilcsoft installation in CVMFS, GRID storage element dCache, and the NAF2 scratch storage space DUST

Name

OS

Group

Scheduler

Cores

naf-ilc11.desy.de

SL6

af-ilc

HTCondor

20

naf-ilc12.desy.de

SL6

af-ilc

HTCondor

12

naf-ilc13.desy.de

SL6

af-ilc

HTCondor

12

naf-ilc-el7.desy.de (VM)

EL7

af-ilc

HTCondor

2

  • For login, simply do an ssh to the load balanced alias naf-ilc.desy.de, e.g.:

ssh -X yourusername@naf-ilc.desy.de

The Workgroupservers are both rather powerful machines with which you may compile and test your programs without causing any issues. However:

/!\ Please do not copy or move several data files in parallel on these machines! This consumes the complete bandwidth and slows down the machines dramatically.

1.5. Where and how to run programs

  • BIRD: There are more than 8000 CPUs and a lot of memory.

  • ILC Workgroupservers have only O(10) CPUs. → for power hungry work use the BIRD system.

  • For executing programs on the BIRD system they are usually put into a special script.
  • This script is submitted to a scheduling system, which processes it as a so-called job.
  • /!\ Please do not run your production jobs on the login machine!

######################################################
# Simple HTC submit file
# 
#     my_htc_job.submit
# Next to submit it to BIRD:
#     condor_submit my_htc_job.submit
######################################################

Executable  = $ENV(HOME)/<path>/<to>/<your>/<executable>
Log         = $ENV(HOME)/log_$(Cluster)_$(Process).txt
Output      = $ENV(HOME)/out_$(Cluster)_$(Process).txt
Error       = $ENV(HOME)/error_$(Cluster)_$(Process).txt
Queue 1

{i} More information https://confluence.desy.de/display/IS/A+short+WalkThrough

2. Batch system - BIRD cluster

2.1. Overview

The NAF2.0 uses the general purpose batch system BIRD (Batch Infrastructure Resource at DESY). The BIRD cluster nodes have the same configuration as our work group server. They can access the CVMFS ilcsoft installation, GRID storage element dCache, and the NAF2 scratch storage space DUST.

When you find your jobs can run on most of the BIRD nodes, but have a problem on one specify node, please send an email to bird.service<at>desy.de, stating the BIRD node name, and the problem.

2.2. Scheduling jobs

condor_submit myjob.submit
  • There are special codes in the status output of HTCondor's commands: HTCondor Magic Numbers

  • If you have problems with HTCondor jobs, please send an email to bird.service<at>desy.de

2.3. Interactive job

  • If you just want to get one BIRD WN to run your task interactively, you may simply run the command condor_submit with -i

condor_submit -i
hostname

condor_submit -i interactive.submit
hostname
  • One example of interactive.submit.

######################################################
# HTCondor Submit Description File. COMMON TEMPLATE
#     interactive.submit
# Next to login onto one BIRD WN:
#     condor_submit -i interactive.submit
######################################################
# condor_config_val MaxJobRetirementTime : 3600 * 24 * 7

+MyProject = "af-ilc"
Requirements = OpSysAndVer=="SL6"

+RequestRuntime = 3600 * 3

queue 1
  • Please note that you might wait very long if the cluster is very busy.


{i} We are currently in the process of migrating to the new batch and scheduling software, HTCondor. The old "BIRD/SGE" is going to be removed after migrating to HTCondor completely. For the time being, you may still use it on nafhh-ilc01.

3. Storage Systems

Storage System

Size

Backup

Read/Write

Type

AFS user directory

16 GB

yes (multiple)

yes

Disk

DUST

1TB

no

yes

Disk

dCache

huge

yes

read only

Disk&Tape

CVMFS

O(GB)

yes, versioning

read only

Disk

3.1. AFS

AFS: The AFS cell /afs/desy.de is used to provide each user with a home directory.

To learn about your current quota usage,

fs lq [--human]

3.2. DUST

Large scratch space: The technology DUST is used to provide fast scratch space. As there is no backup you should use it for big data that can be easily reproduced.

To access your working space, do

cd /nfs/dust/ilc/user/<yourspace>

Your working space can be accessed from the BIRD cluster nodes, too.

  • For questions about DUST quota, please send an e-mail to naf-ilc-support<at>desy.de, and stating your space name.

  • When you finished your jobs, after clean up, and want to free some space, please send an e-mail to naf-ilc-support<at>desy.de, stating your space name and the quota that you still want to keep.

Checking DUST quota:

Administration:

  • As a registry namespace administrator, you will see an additional button "GPFS Management".
  • The following DUST related tasks can be managed via Amfora:
    • Managing quotas for user and group filesets
    • Creating new group filesets, currently:

/nfs/dust/ilc/group/ild
/nfs/dust/ilc/group/flctpc
/nfs/dust/ilc/group/flchcal

3.3. dCache

Access to experiments data on dCache: Fast access is provided to the DESY dCache systems where experiments data is hosted.

The ILC users may access the data at /pnfs/desy.de/ilc/.This mount is read-only.

Please check the data locality before send your jobs to access it!

with your GRID certificate, you can check it with door srm
srmls -l srm://dcache-se-desy.desy.de:8443/pnfs/desy.de/ilc/path/to/your-data.slcio
  • If the locality is ONLINE, it means available on disk, your jobs can access it.

  • If the locality is NEARLINE only, it means not available on disk for this moment.

    • it need pre-staging from tape to disk before running any jobs.

    • Please contact your group data managers/administrators.

/!\ All the BIRD working nodes and WGS can access the data directly with the full path.

3.4. Report Storage Issues

If you experience a problem with the storage you can report it, following these instructions: https://naf-wiki.desy.de/ReporaboutStorageIssues

4. Grid User Interface Tools

  • GRID UI is available on both SL6 WGS (naf-ilc11/naf-ilc12).
  • Please use clean bash/zsh environment to initialize your GRID certificate. Do NOT run any initialization script in your profile unless you know it will not affect your GRID UI system.

  • If you need to access Grid, please use ILCDirac.

5. ILC/CALICE software

Example for "zsh" and "bash" shell users. You may find out what kind of shell you are using by the command "echo $SHELL".

echo $SHELL
/bin/zsh

echo $SHELL
/bin/bash

5.1. ILCsoft CVMFS installation

[@naf-ilc12] ls /cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/v02-00-01/

[@naf-ilc12] ls /cvmfs/ilc.desy.de/sw/ILDConfig/v02-00-01/

You also have change to access one nightly build ilcsoft from cvmfs server.
[@naf-ilc12] ls /cvmfs/clicdp.cern.ch/iLCSoft/builds/nightly/

5.2. ILC specifics

For the NAF2.0 ILC users, (NAF2.0 are 64bit machine now), please use:

source /cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/v02-00-01/init_ilcsoft.sh

Please checkout ILDConfig. And following the README.md to run the sim/reco jobs.

git clone https://github.com/iLCSoft/ILDConfig.git
less ILDConfig/StandardConfig/production/README.md

Additional information can be found here: http://ilcsoft.desy.de, and https://confluence.desy.de/display/ILD+Software+Working+Group

5.3. CALICE specifics

For the CALICE, the DESY HCAL group provides the CALICE software.

more information about calice software: https://twiki.cern.ch/twiki/bin/view/CALICE/SoftwareNews

6. Modules on SLD6

  • If you are looking for another git version on SLD6, for example.
  • You can try module avail git and module load git.

  • You can check more available software with module avail

  • Quick help about module, please try module help.

[@naf-ilc11] module avail git
--------------------------------------------------------------------- /etc/modulefiles ---------------------------------------------------------------------
git/1.9

[@naf-ilc11] module load git
[@naf-ilc11] git --version
git version 1.9.0

[@naf-ilc11] which git
/opt/git/1.9/bin/git
  • Please find the following links, if you want to read something more.

DESY IT: Software_Env_with_Modules.

http://modules.sourceforge.net/.

NAF2Start (last edited 2019-04-12 08:21:02 by ShaojunLu)