Performance, Benchmark, Troubleshooting, Tuning

observability tool diagram cache from linux.com
Brendan Gregg on Linux Performance

CPU and memory
  • uptime
  • free -m, cat /proc/meminfo
  • top
  • htop # tends to return a different "top" process list than top/pidstat/nmon
  • atop # collect top like process info over time (when run as cron)
  • pidstat 1 # like top, run every 1 sec, but loggable. from sysstat rpm
  • mpstat -P ALL 1 # cpu utilization stat every 1 second (if no number defined, give avg of past X sec, not too useful)
  • procinfo # procinfo.rpm: http://www.kodkast.com/linux-package-installation-steps?pkg=procinfo
  • slabtop # realtime kernel slab cache utilization # think of slab as block of kernel memory holding cache of identical objects # instead of kmalloc, a block of memory is allocated at once for efficiency and reduced fragmentation
  • kernel, system calls
  • strace -tttT -p `pgrep java` 2>&1 | head -100 # see system calls, exit after getting 100 trace calls. # eg is it reading 0 or 1 bytes at a time and thus spinning the CPU? # note that strace cause ~100x delay
  • perf top # see perf_events
  • Network
  • sar -n DEV 1 # check network utilization. reported in kBytes/s.
  • sar -n EDEV 1 # rxfram reports frame alignment problems
  • sar -n TCP,ETCP 1 # check TCP stats, eg retransmit. active = outbound, passive = inbound
  • iptraf (ubuntu) iptraf-ng (rhel7) # TUI network IO by port, protocol, live trace [easy]
  • iftop -i eth2 # provide live stats for inter-host traffic
  • nicstat 1 # read/write KB/s, repeat every 1 second
  • netstat
  • ethtool eth0 # speed, duplex, but no collision info
  • mii-tool -vv eth0
  • ibstat # from ofed
  • nfsstat
  • ntop # run server, then http://localhost:3000 or https://localhost:3001
  • ntop -i "eth0,eth1,lo" # run server listening on multiple ports (need multi-thread version) # lots of network stats, but mtu info > 1500 is incorrect for v. 5.0 (2012)
  • tcpdump # trace network packages, advance
  • Disk I/O
  • iostat -xz 1 # -x = extended stats; -z = ommit zero / idle entries
  • iostat -xn 1 # NFS stats only
  • iotop # iostat in top-like fashion, show top process using disk. from iostat rpm
  • swapon -s
  • hdparam
  • fio # Flexible IO tester (fio rpm)
  • Multipurpose
    vmstat 1 5
    procs   -----------memory---------- ---swap-- -----io----   -system--  ------cpu-----
     r  b   swpd   free   buff  cache     si   so    bi    bo    in     cs us sy id wa st
    15  0      0 7207116  31284 1328320    0    0     0     0    24     12  0  0 99  0  0
    12  0      0 7207084  31284 1328320    0    0     0     0 88618 152257 44  1 55  0  0
    12  0      0 8030280  31284 1328224    0    0     0    12 88673 152364 44  1 55  0  0
    13  0      0 7210160  31284 1328224    0    0     0     0 88686 152315 43  1 55  0  0
    12  0      0 7210284  31284 1328224    0    0     0     0 88683 152378 44  1 55  0  0
    
    Advanced tools, eg to perform full software stack analysis



    advanced tools to perform full software stack analysis

    Determining memory utilization of a process

    Goal is to determine how much memory to request for an hpc batch job.  eg, what value to give "qsub -l m_mem_free".
    
    Ideally, there is a memory calculation equivalent to "time -p CMD".  
    But, AFAIK, there isn't.  
    So, these things will help finding a decent number for memory request.  
    
    - ps aux -q PID, look at VSZ (in kB) column of the running process.
    - cat /proc/PID/status, and look for VmPeak and VmSize (value should be similar to ps aux above) .
    - run top/htop and find the process and see how much memory is being reported under VIRT .
    - qacct -j JOBID, if accounting is enabled.  
         	The "mem" entry is in "GBytes CPU seconds" :(, but divide that by ru_wallclock and it give some idea.
    	The "maxvmem" is largest amount memory the job used.
    - perf_events don't seems to offer anything useful for this :(
    - TCMalloc ?
    - Valgrind ?
    
    

    Performance Measurements Commands


    These are pretty platform agnostic

    System Build-In tools

    ps -e -o pid,vsz,ruser,comm= | sort -n -k 2 
    	# show all process, sorted by virtual memory size (Kolumn 2)
    ps -efl | sort -n -k 10
    	# show all process with lots of details, sorted by SZ (Kolumn 10)
    
    ps fields:
    rss        RSS      resident set size, the non-swapped physical memory that a task has used (in kiloBytes).
                        (alias rssize, rsz).
    size       SZ       approximate amount of swap space that would be required if the process were to dirty all
                        writable pages and then be swapped out. This number is very rough!
    sz         SZ       size in physical pages of the core image of the process. This includes text, data, and
                        stack space. Device mappings are currently excluded; this is subject to change. See vsz
                        and rss.
    vsz        VSZ      virtual memory size of the process in KiB (1024-byte units). Device mappings are
                        currently excluded; this is subject to change. (alias vsize).
    
    netstat -ta     show current intenet services/connections
            -a  : show (a)ll  (include listening port process)
            -n  : ip (n)umber only (no dns lookup)
            -r  : (r)outing table   (change with route cmd)
            -i  : show stat for diff nic (i)nterfaces
            -k ce0 : lot of interface specific info, ce NIC will have duplex stat.
    
    netstat -tulpn	# list LISTEN port and process 
    
    netstat -p	: print ip to mac address table known to host
    netstat -k 	: print lot of kernel stat, among it hme0 is for the sun's build in 
    		  happy meal ethernet nic (see sunsolve infodoc 17416 for explanation of these 
    		  undocumented stats, good for trobleshooting network latency, comare agaist cisco stat.)
    netstat -s	: show high level packet send/receive/fragment info
    vmstat -a   : all 
           -n   : 
           -p   : process owning port
    
    iostat -xn 30	: check for disk activity, anything more than 5% busy and avg resp time > 30 ms is bad.
    
    nfsstat
    
    mpstat	10	: processor stats, repeat every 10 seconds
    		: In Solaris, it reports context switch, interrupt, mutex spin, xcal, etc
    		: see http://sunsite.uakom.sk/sunworldonline/swol-08-1998/swol-08-perf.html
    
    cpustat		: find out what cpu is doing...
    
    truss -c -p PID		: find number of system call and usr time for a process (sol, linux use strace)
    
    lockstat sleep 5	: gather kernel lock stats during the sleep period (5 sec)
    			: solaris, run as root
    protocol
    sysmon
    trapstat, thread list, kmastat, kmausers
    
    
    GUDS
    
    Guds is a script to gather performance stats for Solaris.
    Sample usage is
    ./guds_2.4.5
    ./guds_2.4.5 -qX -H3 -s65040465
    
    -qX is for quite mode
    -H3 is for running it for 3 hours
    -sNNNNN is the sun case number (info embeded in dirs created by guds to
    store the files).
    
    It collects lots of info in /var/tmp/CASEID/guds-DATE-TIME/...
    
    May need lot of know how to analyze data.
    Having a baseline when things is good and when there are 
    performance problems would help.
    
    
    
    
    
    date; mkfile 1000m test; date			 	# create a 1 GB file (filled with 0)
    date; dd if=/dev/urandom of=test bs=1024 count=100000 	# same, file has random data.
    
    NIC tuning
    Update /etc/sysctl.conf with the following min values for buffer size:
    1GbE connected systems
    net.core.wmem_max = 262144
    net.core.wmem_default = 262144
    net.core.rmem_max = 262144
    net.core.rmem_default = 262144
    
    10GbE connected systems
    net.core.wmem_max = 16777216
    net.core.wmem_default = 524287
    net.core.rmem_max = 16777216
    net.core.rmem_default = 524287
    
    NFS Performance Tunning
  • Use Jumbo Frame (MTU of 9000) if running specilized application (eg, cluster, RAC). Read up on Path MTU Discovery and ensure MSS is also set correctly.
  • NFS, use TCP instead of UDP, and specify a larger rsize and wsize of 32K instead of default 8K. Modern Linux will actually default to negotiate payload size with NFS server, so best to actually not set rsize/wsize anymore. Linux allow them to be up to 1MB. Isilon OneFS 7.2 can handle 128K and 512K for r/wsize. Netapp is still recommending harc coding something small like 32k or 64k.
  • NFS noac option means no attribute cache, this would dramatically increase getattr calls. Not recommended unless needed by the application.

    SAR - System Activity Reporter

    SAR is a simple tool to collect performance stat shipped with most OS. It is often a cronjob to collect iostat, vmsat, etc and put them in a nicely accessible directory structure. Setup and usage are very similar in HP-UX, Sun, AIX, and now even for Linux. Very lightweight, can run on all machines and keep logs for when trouble arises.
    Also check out ksar and sar2rrd ( http://www.trickytools.com/php/sar2rrd.php )
    SARReporter is moneyware.

    SAR in Linux

    
    yum install sysstat
    It install /etc/init.d/sysstat, run sadc to collect the stats.
    Data saved in /var/log/sa/saNN, where NN is the calendar day for which the stat is collected.  
    Data is kept for 1 week.
    sadc -S SNMP  is needed to collect history info for IP, TCP, UDP, ICMP.
    sadc -S IP6   is needed for v6 version of above.
    
    
    sar         # shows most common stats
    sar -r      # reports on memory and swap utilization  (98% util for 4 days on end is okay)
    sar -q      # shows load avg, run queue length
    sar -A      # shows all recorded activities
    
    # show report recorded on named file with start time of 8pm, end time of mid-nite, 
    # at interval of 1800sec (30min)  (default is 10min interval)
    sar -f /var/log/sa/sa22 -s 22:00:00 -e 23:59:00 -i 1800 
    
    
    Live Network Reporting
    
    sar -n KEYWORD 1		# live network report every 1 sec
    
    sar -n ALL 1-5			# report all network stats
    
    sar -n DEV 1			# check network dev utilization.  reported in kBytes/s. 
    sar -n EDEV 1			# dev errors, eg rxfram reports frame alighment errors
    sar -n NFS,NFSD 1		# NFS client and server stats
    sar -n SOCK 1			# reports IP fragment in use (from mismatched MTU?)
    
    sar -n IP 1				# check network dev utilization.  reported in kBytes/s. 
    sar -n TCP,ETCP 1  		# check TCP stats, and errors eg retransmit.  active = outbound, passive = inbound
    
    

    SAR in HP-UX

    
    Basic SAR Setup (from HP-UX sys admin handbook and tooltips, p503):
    
    sar -o /tmp/sar.data 60 300 		# run sar every 60 sec for 300 count, 
    	-o store info in file (bin)
    
    sar -u -f /var/adm/sa/saXX		# read data from file (Solaris, XX = date number)
    sar -u -f /tmp/sar.data			# read data from file (HP-UX)
    	-u display cpu info (similar to iostat and vmstat)
    	-b buffer cache activity, imp for oracle
    	-d disk activity
    	-q avg queue length (if run queue > num of cpu, will have to wait).
    	-w swap info
    
    Setup SAR data collection for HP-UX (should also work for other platform):
    
    http://www.sarcheck.com/sarhowto.htm	(Actually SarCheck.com, but cost money!)
    
    mkdir  /var/adm/sa, 
    then setup root crontab:
    
    #collect sar data  	# every 20 min 8-5, hourly outside normal work 
    0 * * * * /usr/lbin/sa/sa1
    20,40 8-17 * * 1-5 /usr/lbin/sa/sa1
    #reduce the sar data	# generate pre-formated report focus for business hrs
    5 18 * * * /usr/lbin/sa/sa2 -s 8:00 -e 18:01 -i 1200 -A
    
    # sample for SF + Minsk work hours (10 hours diff)
    0 * * * * /usr/lbin/sa/sa1
    15,30,45 0-8,10-19,23 * * 1-5 /usr/lbin/sa/sa1
    05 21 * * * /usr/lbin/sa/sa2 -i 3600 -A
    
    # sa1 is data collection to /var/adm/sa/saXX
    # sa2 really produce condense version of report to /var/adm/sa/sarXX (sar vs sa)
    
    # filenames are reused every month.
    # use sar -A -f /var/adm/sa/saXX to get more detail report than std summary.
    

    SAR in Solaris

    Solaris starts sadc in /etc/rc2.d/S21perf , a deamon to collect sar info.
    	the script really run /usr/lib/sa/sadc /var/adm/sa/sa`date +%d`
    
    

    SAR in AIX

    
    AIX has preset entries in crontab for 'adm'.  Check to ensure script exsit.
    sar logs are stored in /var/adm/sa
    
    

    perf_events

    perf_events aka "perf" is a sophisticated observation tool for linux that does in-kernel counts (as opposed to top's sampling at regular interval, which tends to miss really short lived process).
    perf is more friendly to sysadmin, DTrace is more targetted toward developers. But perf still has lots of gotchas to get it fully working (broken stack, missing symbol tables, etc). See slide 40 of http://www.brendangregg.com/blog/2015-02-27/linux-profiling-at-netflix.html
    Installing perf_events:

    Sample performance troubleshooting session with perf (pretty much run everything as root)
    ref: http://www.brendangregg.com/perf.html
    perf list    			# list events
    perf list | grep ext4		# look for fs related event
    
    perf stat CMD 			# count events from execution of "cmd", perf stat collect data till the CMD completes 
    perf stat uptime		# run the command uptime and show general cpu usage stat
    perf stat -p PID 		# hit ^C to see CPU usage stat of a running process collected while perf ran
    perf stat -a sleep 5		# system wide CPU usage, collect till CMD "sleep 5" completes
    perf stat -e 'ext4:*' ls	# run the command ls and capture ext4 related events
    perf stat -e 'ext4:*' -a sleep 5     # collect ext4 events of all process, for 5 seconds, then exit and print output to screen
    
    
    perf record -F 99 -a -g -- sleep 10    	# capture stack at freq of 99 Hz for 10 seconds
    					# result captured to file perf.data
    perf record -F 99 -p PID		# profile a running process, until ^C
    
    
    perf report   			# view perf record data stored in perf.data (interactive TUI)
    perf report --sort comm,dso
    perf report -n --stdio		# output report to console
    
    perf annotate --stdio		# annotated instructions w/ %, need debuginfo or will feel like reading assembly dump!
    
    perf script  			  	# print out all events as text
    
    # generate a "flamegraph"  
    perf script > out.perf02  	
    cat out.perf02 | ./stackcollapse-perf.pl | ./flamegraph.pl > out.svg 	
    
    flamegraph is stack trace across time (on x-axis), so to get idea what fn call is taking lot of time. The tool is avail from http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html, which will generate an image map on top of the SVG, allowing for interactive drill down!
    brendan gregg flamegraph svg
    perf one liners
    From http://www.brendangregg.com/blog/2015-02-27/linux-profiling-at-netflix.html
    
    # Sample CPU stack traces for the specified PID, at 99 Hertz, for 10 seconds:
    perf record -F 99 -p PID -g -- sleep 10
    
    # Sample CPU stack traces for the entire system, at 99 Hertz, for 10 seconds:
    perf record -F 99 -ag -- sleep 10
    
    # Sample CPU stack traces, once every 10,000 Level 1 data cache misses, for 5 s:
    perf record -e L1-dcache-load-misses -c 10000 -ag -- sleep 5
    
    # Sample CPU stack traces, once every 100 last level cache misses, for 5 seconds:
    perf record -e LLC-load-misses -c 100 -ag -- sleep 5 
    
    # Sample on-CPU kernel instructions, for 5 seconds:
    perf record -e cycles:k -a -- sleep 5 
    
    

    SE Toolkit

    Virtual Adrian Performance Monitor (SE) Toolkit for Solaris 6 to 10.
    Download: SunFreeware SourceForge
    
    setup env:
    
    export PATH=$PATH:/opt/RICHPse/bin
    export SEPATH=/opt/RICHPse/examples:/opt/RICHPse/toptool
    
    interactive tools:
    
    se zoom.se		# gui, summary status for all components.  Main Window. 
    se live_test.se		# text version of zoom.se
    se multimeter.se	# gui, cpu, cache, vm and locks meter
    
    se toptool.se		# gui, just like top
    se xload.se		# gui, just like xload, show hostname :)
    se infotool.se		# gui, menu to lot of sys info (cpu, net, disk, etc)
    se xit			# gui  wrap on text disk stat dump (xiostat.se)
    
    se -DWIDE pea.se 10	# text, dump top like info to stdout every 10 sec
    se disks.se		# text, dump lot of disk usage info
    
    se webtune.se		# display current, min and max values for perf params
    
    se virtual_adrain.se &	# text, dump warning to stdout if perf problem found 
    			# run cli in background, non permanent, only output to
    			# login screen; process end, all cleared.
    
    -------------------------
    
    # install:
    # pkgrm RICHPse
    # gunzip RICHPse.tar.gz
    # tar xf RICHPse.tar
    # pkgadd -d . RICHPse
    # edit /opt/RICHPse/etc/se_defines, enable "disk nfs"
    
    # alt, can just copy to network drive, and set PATH and SEPATH
    # at least for the interactive tools above
    
    # always run monitor:
    /opt/RICHPse/etc/init.d/vader start     # init.d script to start vader
    se /opt/RICHPse/examples/vader.se       # the "Virtual Adrian Daemon", 
                                            # start on host to be monitored
    
    se /opt/RICHPse/examples/darth.se -h remotehost # gui, start on client.
    	# This gui is the front end of the bg monitor
    
    
    
    #!/bin/sh
    
    # setoolkit-install.sh
    # quick script to setup  and start se toolkit
    
    cd /mnt/sa/share/software/SEtoolkit
    
    pkgadd -d . RICHPse.331
    
    
    (cd /opt/RICHPse/etc; tar cf - *.d) | (cd /etc ; tar xvf - )
    
    # /etc/init.d/mon_cm start
    /etc/init.d/monlog start
    /etc/init.d/percol start
    /etc/init.d/va_monitor start
    /etc/init.d/vader start
    
    

    Solaris Build-In tools

    netstat -ta     show current intenet services/connections
            -a  : show (a)ll  (include listening port process)
            -n  : ip (n)umber only (no dns lookup)
            -r  : (r)outing table   (change with route cmd)
            -i  : show stat for diff nic (i)nterfaces
            -k ce0 : lot of interface specific info, ce NIC will have duplex stat.
    
    netstat -p	: print ip to mac address table known to host
    netstat -k 	: print lot of kernel stat, among it hme0 is for the sun's build in 
    		  happy meal ethernet nic (see sunsolve infodoc 17416 for explanation of these 
    		  undocumented stats, good for trobleshooting network latency, comare agaist cisco stat.)
    netstat -s	: show high level packet send/receive/fragment info
    
    
    
    vmstat -a   : all 
           -n   : 
           -p   : process owning port
    
    iostat -xn 30	: check for disk activity, anything more than 5% busy and avg resp time > 30 ms is bad.
    
    nfsstat
    
    mpstat	10	: processor stats, repeat every 10 seconds
    		: In Solaris, it reports context switch, interrupt, mutex spin, xcal, etc
    		: see http://sunsite.uakom.sk/sunworldonline/swol-08-1998/swol-08-perf.html
    
    cpustat		: find out what cpu is doing...
    
    lockstat sleep 5	: gather kernel lock stats during the sleep period (5 sec)
    			: solaris, run as root
    
    truss -c -p PID		: find number of system call and usr time for a process (sol).  ??
    
    top
    protocol
    sysmon
    
    trapstat, thread list, kmastat, kmausers
    
    
    GUDS
    
    Guds is a script to gather performance stats for Solaris.
    Sample usage is
    ./guds_2.4.5
    ./guds_2.4.5 -qX -H3 -s65040465
    
    -qX is for quite mode
    -H3 is for running it for 3 hours
    -sNNNNN is the sun case number (info embeded in dirs created by guds to
    store the files).
    
    It collects lots of info in /var/tmp/CASEID/guds-DATE-TIME/...
    
    May need lot of know how to analyze data.
    Having a baseline when things is good and when there are 
    performance problems would help.
    
    
    
    
    
    date; mkfile 1000m test; date			 	# create a 1 GB file (filled with 0)
    date; dd if=/dev/urandom of=test bs=1024 count=100000 	# same, file has random data.
    
    Performance Tunning
  • Use Jumbo Frame (MTU of 9000) if running specilized application (eg, cluster, RAC).
  • NFS, use TCP instead of UDP, and specify a larger rsize and wsize of 32K instead of default 8K. (noac?)

    ganglia

    Ganglia is a good cluster stat collection tool. It does need an agent to be installed, and Apache + PHP server to record the stat and serve out graphs. It claims to be very thin and efficient, thus not rubbing performance from an HPC cluster.
    http://ganglia.sourceforge.net/

    Network Tracing

    traceroute DESTINATION-HOST

    tcpdump

    tcpdump is the de-facto standard network tracing command, available in just about every unix platform. It is powerful, but not exactly easy to use.
    
    tcpdump parameters
    -n: ip number, do no resolve hostname
    -e: ethernet (?)
    -i: interface
    -s 16000		: set capture frame size to 16k 
    -w [FILE]		: write output to file (capture use, more info than redirect output)
    			  file can be opened by wireshark
    host IP-or-NAME		: capture info only related to the specified host
    
    operators accepted:
    &&	= and
    ||	= or
    !	= not
    
    eg cmd of tcpdump [expression]  :
    
    tcpdump host 10.0.71.165			# bi-directional traffic
    tcpdump src  10.0.71.165 -w myfile.tcpdump	# -w write output to file
    tcpdump 'dst net 128.3'
    tcpdump 'src or dst port ftp-data'   
    tcpdump 'ether host 0:d0:b7:a9:c9:5a'
    tcpdump icmp -w myfile.tcpdump			# only icmp traffic, all hosts
    
    tcpdump -v -v host 10.0.71.165			# -v -v, increase verbosity.  
    						# usable in screen output for low amount of traffic
    
    tcpdump host 10.0.71.165 -i eth2 -s 64000 -w myfile.tcpdump	
    		# -i eth2, only traffic flowing thru specified nic
    		# -s 64000, max packet capture size.  64k is probably bigger than needed, but works well
    		# -w write output to file
    
    
    
    

    Sample trace output

    showmount -e 192.168.209.30 # VIP
    tcpdump -n host 172.24.51.182  # misconfigured NAT
    18:49:41.964873 eth0 < 172.24.51.182 > tin-linux.zambeel.com: icmp: 172.24.51.182 udp port sunrpc unreachable [tos 0xc0] 
    18:56:24.677264 eth0 < 172.24.51.182 > 10.0.15.11: icmp: 172.24.51.182 udp port sunrpc unreachable [tos 0xc0] 
    18:56:24.679401 eth0 < 172.24.51.182 > 10.0.15.11: icmp: 172.24.51.182 udp port sunrpc unreachable [tos 0xc0] 
    timestamp     src-if ?   source ip     destination prtl  err message
    
    tcpdump -n port sunrpc
    18:54:31.055821 eth0 > 10.0.15.11.1388 > 192.168.209.30.sunrpc: udp 56
                  src-if ? source  ip.port ? dest        ip.port  : protocol + port
    
    
       [z-00D0B7A873CE] # tcpdump -e port sunrpc
    18:15:55.628675 eth2 < 0:e0:52:d:7e:18 0:0:0:0:0:1 ip 74: 10.0.15.11.2499 > 172.24.51.182.sunrpc: S 4260207884:4260207884(0) win 32120  (DF)
    time            if   ? src mac         dst-mac(host)      src ip.port            dest ip.port    TCP SYN and other protocol info
    18:15:55.628696 eth2 > 0:0:0:0:0:0 0:2:e3:0:3b:9d ip 54: 172.24.51.182.sunrpc > 10.0.15.11.2499: R 0:0(0) ack 4260207885 win 0
    time            if   ? src mac         dst-mac(host)      src ip.port            dest ip.port    TCP SYN and other protocol info
    
    Here is an example of messed up translation.
    Note that source & dest mac-address is rewritten on each router hop.
    
    
       [z-00D0B7A871DF] # tcpdump -n | egrep '10\.0\.15\.11|192\.168'
    19:02:43.964206 eth2 > 172.24.51.12.telnet >   10.0.15.11.2411:   P 2646085534:2646085754(220) ack 2623622447 win 32120 {nop,nop,timestamp 2624922 80719743} (DF)
    19:02:43.982115 eth2 < 10.0.15.11.2411     > 172.24.51.12.telnet: . 1:1(0) ack 220 win 31856 {nop,nop,timestamp 80720053 2624922} (DF)
    19:02:45.277592 eth2 B 172.24.51.1.route   > 172.24.51.255.route: rip-resp 25: {192.168.13.0/255.255.255.0}(2) {192.168.14.0/255.255.255.0}(2) {192.168.15.0/255.255.255.0}(2) {192.168.16.0/255.255.255.0}(2) {192.168.17.0/255.255.255.0}(2)[|rip]
    
    
    

    snoop

    snoop is the default network tracer tool installed on solaris. Its default use is much easier than tcpdump and give output that is more verbose, ie easier to read.
    snoop host [IP]			# traffic with a given host (as src or dst)
    snoop -r port 25		# all traffic in port 25 (smtp), 
    				# do not resolve ip to dns names
    -s 	= sniplet length (def is whole packet)
    	= 80 ip hdr only, 120 = nfs header only
    
    -V	= layer info
    -v	= more verbose than -V, lot of info.
    
    
    from cli :
    
    Usage:  snoop
            [ -a ]                  # Listen to packets on audio
            [ -d device ]           # settable to le?, ie?, bf?, tr?
            [ -s snaplen ]          # Truncate packets
            [ -c count ]            # Quit after count packets
            [ -P ]                  # Turn OFF promiscuous mode
            [ -D ]                  # Report dropped packets
            [ -S ]                  # Report packet size
            [ -i file ]             # Read previously captured packets
            [ -o file ]             # Capture packets in file
            [ -n file ]             # Load addr-to-name table from file
            [ -N ]                  # Create addr-to-name table
            [ -t  r|a|d ]           # Time: Relative, Absolute or Delta
            [ -v ]                  # Verbose packet display
            [ -V ]                  # Show all summary lines
            [ -p first[,last] ]     # Select packet(s) to display
            [ -x offset[,length] ]  # Hex dump from offset for length
            [ -C ]                  # Print packet filter code
    
    

    Sample snoop

    
    Capture traffic on NIC hme0 specific to a host, capture up 8K of the packet, 
    and dump result to an output file:
    snoop -d hme0 -s 8192 -o /tmp/snoop.out host 10.215.55.211
    
    Read input file back.  May wish to use ethereal to read this file for easier access.
    snoop -i /tmp/snoop.out		
    
    
    snoop -s 120 port 25 host 211.196.53.194
    
    titaniumleg.com  mail server traffic monitor
    snoop -r -D -P -s 1500 -c 100000 -o /export/tmp/smtp01.20030122.snoop port 25
    
    snoop -n /dev/null  -D -P -s 1500 -c 100000 -o /export/tmp/smtp01.20030122.snoop port 25
    snoop -D -s 9000 -c 100000 -o jumpstartclient.snoop host jumpstartclient
    -r = do not resolve hostname  # not in sol 7 snoop
    -D = display num of dropped packets
    -P = non promiscuous mode capture   (don't use in troubleshooting jumpstart problems).
    -s snipplet length
    -c count num of backets to capture
    -o output file
    
    
    
    ###
    ### more explanations TBA
    ###
    
    

    Ethereal

    Ethereal (or the new July 2006 name of Wireshark) is a much easier tool for use than tcpdump (or snoop). However, the GUI tool need to be installed to the machine you run on. It is typically easiest to run tcpdump to capture to a file, then open it with the GUI ethereal running on Linux or Windows.
    ethereal  / wireshark (GUI)
    tethereal / tshark    (CLI)
    
    most flags work for both, but not for capture filters!.
    
    
    
    snoop-like behaviour (mostly for ethereal):
    -l	: scroll capture 
    -S 	: update as capture is in progress.
    -k 	: start capture immediately  (disable interaction?)
    -f	: specify a capture Filter
    
    -i [IF] : specify interface, eg eth0, hme0
    -n 	: no dns resolution, use ip Number
    
    -V 	: more verbose output, captured data displayed in tree mode instead of 1 line per packet.
    
    -f 	: capture filter expression  (tcpdump notation needed), eg:
    
    	 	tcp port 23 and host 10.0.0.5
    	    	src net 10.0.15.0/24
    	    	dst net 10.0.15.0 mask 255.255.255.0
    	   	[src|dst] host HOSTNAME
    	  	ether [src|dst] host 00:E0:2B:DE:0E:00
    	   	[tcp|udp] [src|dst] port PORTNUM
    
    		eg of multiple host:
    		host 10.215.20.152 || host 10.215.2.21 || host 10.215.19.73
    
    tshark -l -S -k -i eth0 -f "host 12.34.65.123" 		# human readable live capture against a remote host using console
    
    
    ------------------------------------------------------------
    
    ethereal view filter expression 
    [ work in GUI filter box when viewing, 
    NOT as capture filter (which is tcpdump format ]
    
    operatos:
               eq, ==    Equal
               ne, !=    Not equal
               gt, >     Greater than
               lt, <     Less Than
               ge, >=    Greater than or Equal to
               le, <=    Less than or Equal to
    
               and, &&   Logical AND
               or, ||    Logical OR
               not, !    Logical NOT
    
    boolean: true (1) or false (0)
    
    some commonly used filter fields:
    
               eth.src == aa-aa-aa-aa-aa-aa
               ip.dst eq www.mit.edu
               ip.src == 192.168.1.1
               ip.addr == 129.111.0.0/16
               eth.src == aa-aa-aa-aa-aa-aa
               eth.src[0:3] == 00:00:83			# filter by vendor by use of slide
               tcp.port == 80 and ip.src == 192.168.2.1
    		   ip.addr is for both src or dest, these multiple ocurring field is a bit confusing for packet filtering.
    
    for generic filter dealing with a specific host, but not necessary filtering by tcp/udp/icmp
    ip.dst
    ip.src
    ip.addr
    
    udp
    udp.port
    udp.dstport
    udp.srcport
    
    tcp
    tcp.port
    tcp.dstport
    tcp.srcport
    tcp.seq
    
    icmp
    
    
    bootp.dhcp==true		: frame is dhcp
    bootp.hw.addr
    
    smb.cmd==(unsigned 8 bit int)	: smb protocol command number
    smb.cmd == 0x06  		: cmd is smb unlink
    smb.status != 0x0000	: Error code, 4 bytes aka status, lot of items.
    smb.errcls != 0x0		: error class, 1 byte represent the categories
                  0x0       = Success
                  0x1       = DOS Error
                  0x2       = Server Error
                  0x3 	= hardware error
                  0x4	= not a smb cmd
    			Note, netBench Fail code 32 maybe in Dos or Hrd.
    smb.pid
    smb.mid		(multiplex id)
    smb.uid		(user id, maybe per process)
    nfs.*
    nfs.fh.version != 3		= not sure what this is, not nfs protocol version!
    rpc.programversion != 3		= all packet that are rpc program nfs version 3.
    
    lot of higher level protocol stuff available, including vlan on switches, etc.
    see the man page on ethereal or tethereal (very long!)
    
    
    GUI version, filter can just enter a protocol type.  eg: smb
    That means smb protocol is present.  A protocol in the filter w/o any comparison operator means filter packets where such field is present in the packet.  
    eg: smb.errcls  filter packet that contain smb error class.
    
    
    
    
    Network trace capture with tcpdump or snoop, save to file for viewing with ethereal
    
    tcpdump -i [interface] -s 1500 -w [some-file]
    tcpdump -s 8192 -w netuse.tcpdump 'host 10.0.71.232 or host 10.0.71.15'
    snoop -d hme0  -o /tmp/snoop.out host 10.215.55.211
    
    editcap can be used to trim captured file, or convert between formats
    (tcpdump, ethereal, snoop, ms netmon, etc).
    
    
    Good read on ethereal: http://www.ns.aus.com/ethereal/user-guide/ch03capfilt.html

    Network Troubleshooting

    ping6
    fe80::					# ipv6 link local, autogenerated (like IPv5 169.254...) can't go thru router, but usable in same subnet
    
    fe80::ec4:7aff:fe75:c9b0		# bofh    ipv6 link local 
              0cc4:7a75:c9b0		# bofh    mac
    fe80::221:ccff:fed3:3d41		# cueball ipv6 link local
              0021:ccd3:3d41		# cueball mac 
    
    ping6 fe80::ec4:7aff:fe75:c9b0%2	# works (same host or another host on same subnet).  %2 is interface id as per "ip a"
    
    https://askubuntu.com/questions/546405/ping-adress-ipv6
    
    nmap
    nmap: network scanner
    nmapfe: w/ gui front end, supposed to need gtk, but worked anyway.
    
    nmap -sT -O -PI -PT 172.27.31.0/24	# scan whole class C vlan 31, with os identification.  long output.
    
    sudo nmap -p1-62000 somehost		# scan port 1 to 62000 against machine named somehost
    sudo nmap -sU somehost -p8255		# check UDP port 8255 to see if it is open
    
    
    
    nmap -6 ::1				# scan localhost
    nmap -6 fe80::ec4:7aff:fe75:c9b0	# bofh ipv6 link local , can't go thru router, but should be usable in same "subnet"
    
    nmap -Pn -p8255    -6 fe80::ec4:7aff:fe75:c9b0 	 	# scan works when on same host.  
    nmap -Pn -p1-10000 -6 fe80::ec4:7aff:fe75:c9b0%2 	# from remote, works after specifying which interface to use (%2, per ip a)
    	
    
    
    netcat
    
    
    nc -v -l -n 8666			# create a "netcat server", listening on TCP port 8666
    					# -v verbose
    					# -l listen
    					# -n no DNS lookup
    					# Once client disconnect, nc exit.
    
    nc -v -l -6 -n 8666			# create a "netcat server", listening on ipv6 :::8666 (tcp)
    
    nc SERVER 8666				# connect to server on port 8666
    					# if can't connect, will just exit 1 and return to shell
    					# if connect okay, whatever typed in there will be echo on server process
    
    
    nc ::1 8666				# ipv6 localhost connect work
    nc fe80::ec4:7aff:fe75:c9b0%2 8666	# work (locally or local subnet) after specifying which interface to use (%2, id is per "ip a")
    
    
    dd if=/dev/zero bs=100M count=1 | nc -v -v -n NcServer 8666	# simple connectivity and network performance test 
    					# like a crude version of ttcp w/o installing new sw
    
    
    echo "hi" | nc -4u somehost 8255	# send string "hi" via netcat on IPv4 UDP port 8255 to a remote machine named somehost
    
    
    
    echo "GET / HTTP 1.1" | nc www.google.com 80; echo $?	# can test ability to connect, but won't see output
    
    telnet www.google.com 80 		# simple network connectivity test using the telnet tool
    GET / HTTP 1.1				
    
    scamper
    scamper is a tool to analyze internet topology and performance.
    In a simpler form, it combines traceroute and mtu discovery.
    
    http://www.caida.org/tools/measurement/scamper/
    
    wget http://www.caida.org/tools/measurement/scamper/code/scamper-cvs-20161113.tar.gz
    tar xfz scamper-cvs-20161113.tar.gz
    ./configure --prefix=$HOME/prog/	# scamper need root, useful only when nfs hosted
    make					# will take a while...
    make install
    -or-
    ./scamper/scamper -c "trace -M" -i 10.11.12.13
    	# perform a trace with MTU discovery against specified IP 
      # cannot use DNS hostname, scamper just exist without a word
    	# root priv req
    
    
    other


    Intrusion Detection

    tripwire

    A popular Host-Based IDS. Best place to get is from OS vendor package, if not available, then go to source forge. FC5 currently don't have a port from yum (as of 2006-09), it is in orphan status. Older binary will work with a compat-glibc.
    
    genereate site-key, host-key:
    twadmin --generate-keys --site-keyfile ./site.key
    twadmin --generate-keys --local-keyfile ./$HOSTNAME-local.key
    
    
    compile config and policy file from text to binary format:
    
    twadmin --create-cfgfile --cfgfile ./tw.cfg --site-keyfile ./site.key  /floppy/twcfg.txt
    twadmin --create-polfile --cfgfile ./tw.cfg --site-keyfile ./site.key  /floppy/twpol.txt
    
    
    tripwire -m i   	# or --init, to create initial DB of host config.
    
    run tw periodically and monitor db changes, check that all binary and db have not been changed.
    
    tripwire -m c		# or --check
    
    
    twprint --print-report --twrfile  $TRIPWIRE-REPORT/host.date.twr
    	# generate a human readable report from result of --check
    
    
    
    Securing tripwire:
    cd $TRIPWIRE-BIN		# Tripwire binaries, eg /usr/local/tripwire/bin, /usr/sbin
    chmod 0500 siggen tripwire twadmin twprint
    md5sum * > tripwire-bin-md5sum.txt	
    cp tripwire-bin-md5sum.txt		# eject floppy when done!
    
    
    cd $TRIPWIRE-CF		# Tripwire config files, eg /usr/local/tripwire/etc, /var/lib/tripwire
    chmod 0600 tw.cfg* tw.pol*
    
    mv twcfg.txt* twpol.txt* /floppy	
    	# move text config and policy file offline, eject floppy when done!
    
    
    cd $TRIPWIRE-DB		# tripwire DB, eg /usr/local/tripwire/var/db
    
    md5sum * > db-md5sum.txt
    cp db-md5sum.txt /floppy		# eject floppy when done!
    
    chmod -R u=rwX,go-rwx $TRIPWIRE		# eg /usr/local/tripwire
    
    
    
    updating twpol.txt:
    
    /home  -> $(Dynamic) ;
    
    There maybe ref specific to given OS/Distro that may need to be updated acordingly.
    eg /var/lost+found may not exist if it is not a dedicated partition.
    /etc/mail/statistics is probably no longer used, etc
    
    
    
    
    Linux Gazette "Intrusion Dection with Trip Wire A good guide to get overview and installation.

    Linux Journal "How to setup Tripwire A bit more extensive that above (and makes the reading longer).

    http://www.robertb.id.au/tutorial/tripwire/ Tripwire on FC4

    AIDE

    A newer Host-based IDS developed by Perdue University. Better supported in FC5.
    
    
    http://security.linux.com/article.pl?sid=05/01/19/2238249&tid=129&tid=49&tid=47&tid=35
    
    
    
    

    snort

    A very popular Network-Based IDS.
    
    


    Network Performance & Benchmark

    Jumbo Frame

    Jumbo frame is setting ethernet to use a larger frame so that the data payload vs packet management ratio is improved. Default frame is to use MTU=1500. Jumbo Frame is to set MTU=9000.
    Larger MTU is more efficient and therefore can yield better performance. eg. NFS packet at 8k can fit in a single MTU of 9000 bytes, instead of 6 frames at MTU of 1500.

    Caveats in adopting jumbo frame:
    1. Fragmentation should be avoid pretty much universally. A dropped fragment means the whole IP packet isn't received, and it would have to be resend after a timeout. This may mean the large packet with its many fragments are send along the wire multiple times.
    2. Device using different MTU may stop talking to one another, due to dropped packet, often silently.
    3. Enable MTU=9000 on switch/router first, then make changes to the hosts.
    4. Frame Fragmentation is possible, but for (most?/all?) switches, it seems to happen only at layer 3, not layer 2.
      Thus, only when packet need to be ROUTED would large frame be split into multiple packets.
      At layer 2, large frame would just be dropped when the given switch port MTU is exceeded
    5. Probably easiest to have a performance subnet and all devices in this subnet use MTU=9000, then they can communicate with each other efficiently, and still work with other devices with MTU=1500 that are outside the subnet , as they would be fragmented by the router.
    6. IP layer has a flag "DF", Don't Fragment.
    7. Router will fragment packet at Layer 3 if destination MTU is smaller than original sender.
      Sender must not set DF (don't fragment) for router to carry out this work (eg, ping -M dont will get response)
      Recipient would respond, but it would send them as multiple smaller packet (so echo-reply packet, router don't have to refragment, recipient will get multiple fragments and assemble them to form one complete packet).
    8. (Router?) will send ICMP to sender if MTU mismatch, if sender DF flag is set.
      If DF isn't set, router should perform fragmentation, but if somehow it isn't doing its job, then network connectivity fails silently. This is perhaps the biggest pain point of jumbo frame adoption.
      One case where fragmentation doesn't happen is when the switch has enabled a port to be MTU 9000, but the host is still at MTU 1500, then the "routing switch" may think no fragmentation is needed, resulting in silent failures. AFAIK, there is no "autonegotiation" for MTU size. (there is tcp_mtu_probing, but that's between two host on layer 3, not between a switch and a host nic at layer 2).
    9. Path MTU Discovery is a critical protocol for hosts with differing MTU size to configure correct TCP MSS (Max Segment Size) so that router is not overloaded with fragmentation work. See Cisco PMTUD protocol. Note the problem section, where ICMP is blocked due to security concern causes a lot of grief in getting correctly working jumbo frame environment.
    Baby Giant Frame. May see this in MPLS network, where MTU is slightly larger than 1500 so that MPLS protocol can add its packet data on top of the 1500 MTU of Ethernet without restoring to fragmentation.


    RHEL6 config
    
    # temporary change MTU on the fly, but network still unresponsive for about a minute
    sudo ip link set  dev eth0 mtu 9000
         ip link show dev eth0
    
    # permanently set jumbo frame 
    /etc/sysconfig/network-scripts/ifcfg-eth1   add
    	MTU=9000
    sudo service network reload		# can do this when ssh in to host, unresponsive for about a min
    
    
    
    # tcp_mtu_probing controls TCP Packetization-Layer Path MTU Discovery. 
    # traffic with host with smaller MTU would use smaller frame, 
    # doesn't seems to work reliably, cuz depends on switch ICMP directives?
    #   0 Disabled (default, since Linux 2.6.17)
    #   1 Disabled by default, enabled when an ICMP black hole detected
    #   2 Always enabled, use initial MSS of tcp_base_mss.
    # ESnet recommend setting to 1 ( http://fasterdata.es.net/host-tuning/linux/ )
    
    /etc/sysctl.conf  add
    	net.ipv4.tcp_mtu_probing=2
    
    Tmp change:
    cat       /proc/sys/net/ipv4/tcp_mtu_probing
    echo 2 >  /proc/sys/net/ipv4/tcp_mtu_probing
    
    Jumbo test
    
    tcpdump -i eth2 'icmp'
    	# capture all ICMP traffic
    	# -i eth2   	capture only for named interface
    	# -v        	verbose output 
    	# -w filename	output file to write to
    
    
    ping -c 3 -s 8970 -M do 10.11.12.201
    	# -s packet size.  Linux add 28 bytes to the packet header, so need to deduct this from the MTU of 9000 
    	# -M do    set Don't Fragment flag  ie, MTU mismatch will drop packet, would see ICMP error msg
    	# -M dont  set Don't Fragment flag  ie, MTU mismatch will refragment to smaller packet
    	# -M want  ie, desire Don't Fragment,   MTU mismatch fails silently
    	# -c 3     send only count (3) number of packets
      # if -M do -s 8800 works, then jumbo frame is working correctly on the interface with MTU=9000
      # however, if want to test communication with host running MTU=1500, 
      #   -M dont failing does not necessarily means things wont work.
      #   PMTUD (Path MTU Discovery) protocol only works for TCP and UDP, and lots of ICMP traffic is dropped by many network
    
    tracepath 10.11.12.201 
    	# use UDP to to traceroute, no root priv required
    	# discover MTU size between two hosts
    
    iptraf
      # TUI, statistical breakdown, by packet size, on a given interface
    
    
    sudo scamper -c "trace -M" -i 10.11.12.13
    	# perform a trace with MTU discovery against specified IP (cannot use DNS name of host)
    
    

    Benchmark tools

    ttcp
    ttcp
    speed performance test for tcp & udp
    mostly, download a java program, can be placed in user's home dir.
    no root priv req
    
    receiving comptuer:
    java ttcp -r
    java ttcp -r -l 4096 -n 100     # 4096 bytes buffer, 100 of them.
    java ttcp -r -l 32768 -n 4096
    
    Sending computer:
    java ttcp -t 10.215.2.124
    
    
    args: (try these in receiving computer)
    -l N 		= buffer size, 			def 8192, try 32768
    -n N  		= num of buffer to xfer, 	def 2048, try  4096  ==> gives 128 MB xfer.
    
    java version doesn't seems to suppport these:
    -u		= udp test
    -b N		= change system buffer size.
    -v		= verbose, more stat
    -d 		= dbg
    
    ----
    
    various port avail.
    linux rh come with a package
    but seems rather old and no central org support.
    
    http://www.netcordia.com/network-services.html
    
    iperf3
    iperf3 is ESnet rewrite of the original iperf. It performs memory to memory network performance test. There is some obscure memory to disk and disk to memory test after performing network test as well...
    Network perf measurement:
    
    yum install iperf3
    apt-get install iperf
    
    iperf3 -s			# start server, def use TCP, port 5201
    iperf3 -s -p 80			# start server, listen on port 80.  remain LISTEN after client disconnect
    
    iperf3 -c SERVER -t 30 -i 1	# start as client, connect to SERVER
    				# run test for 30 sec, report progress every 1 sec
    
    iperf3 -c SVR -t 10 -i 1 -P 2		# -P 2 = parallel 4 streams
    iperf3 -c SVR -t 10 -i 1 -P 2 -w 32M 	# -w 16M = use 32M TCP buffer
    
    
    iperf3 is single threaded, for 40G and above network, may become CPU bound. Refer to (ESnet) iperf3 at 40 Gbps and above for info in running multiple streams on multiple cores.
    Using -Z for zero copy mode will also reduce cpu load.
    Installation
    BASEDIR=/global/scratch/tin/
    SWDIR=$BASEDIR/sw
    cd $BASEDIR/pub-gh
    git clone https://github.com/esnet/iperf.git
    cd iperf
    ./configure --prefix=${SWDIR}
    make
    make install
    
    export PATH=${SWDIR}/bin:$PATH
    export MANPATH=${SWDIR}/share/man:$MANPATH
    
    


    iozone
    
    wget http://www.iozone.org/src/current/iozone3_434.tar
    tar xf iozone3_434.tar
    cd iozone3_434/current/src
    make linux
    
    ./iozone -a
    ./iozone -a -s 1000 -O
    ./iozone -A -b result.xls 
    
    or
    
    THREAD=2
    DIR=2012.1127.1445
    mkdir $DIR
    time -p ../iozone -i 0 -c -e -w -r 1024k -s 16g -t $THREAD -+n -b $DIR/result.xls
    
    
    using qsub script in cluster (mostly to generate random storage traffic):
    
        qsub -b y -p 1024 -l exclusive=true  -l h_rt=600 -q admin.q@$TARGETHOST  $BINDIR/iozone -i 0 -c -e -w -r 1024k -s 4g -t $THREAD -+n -b $DIR/result.xls && \
          echo "   ... submitted iozone on  $TARGETHOST "
    
    
    


    [Doc URL]
    tiny.cc/TOOL
    https://tin6150.github.io/psg/tool.html
    http://psg.ask-margo.com/tool.html
    (cc) Tin Ho. See main page for copyright info.

    hoti1
    bofh1