In that sequence, to be completed within 3 seconds.
BIOSDo you prefer UEFI? Really??
Anyway, racadm, ipmitool, some stuff are probably common to BIOS and UEFI ;-)
F2 = Setup F10 = Display Boot Menu F12 = Force Network Boot For BMC/IPMI was said to default to share eth0 with host os when service RJ45 isn't connected. However, my systems wasn't setup that way. To make IPMI share eth0 with the host os, set it with: ipmitool raw 0x2e 0xcc 0x5e 0x2b 0 0x0C 0x01 0x02 Note that ipmi already have a default admin/admin account as user 2. If setting new admin acc, try to rename the existing one first: ipmitool user set name 2 oper ipmitool user list 1
F1 Setup F2 Diagnostics F12 Select Boot Device # Need legacy mode to do PXE boot
Enter (or ThinkVantage button) ctrl-m = marvell bios setup (sata, raid config) ctrl-s = mac address config
During Post, keys:
tab Display BIOS POST message del run setup (BIOS) F2 run setup (BIOS) F11 boot menu F12 Network boot RAID controller AVAGO MegaRAID SAS-MFI BIOS v5.50.03.0 July 2015) ^P pause (disk scanning?) ^V skip ^H WebBIOS ^Y preboot CLI
del or F2 run setup (AMI BIOS) F12 boot from ...
del run setup (AMI BIOS) alt+F2 - EzFlash F8 = BBS POPUP # ie boot device menu ^B PXE prompt for Intel I350 on-board NIC
F2 = System Setup F10 = Lifecycle Controller (BIOS setup) F11 = Boot Manager F12 = PXE boot ^A = Avago raid controller
HT - Hyperthreading/Logical Processor
/opt/dell/srvadmin/sbin/racadm get BIOS.ProcSettings.LogicalProc # query HT status /opt/dell/srvadmin/sbin/racadm set BIOS.ProcSettings.LogicalProc Disabled # disable HT /opt/dell/srvadmin/sbin/racadm jobqueue create BIOS.Setup.1-1 # create job to change BIOS setting /opt/dell/srvadmin/sbin/racadm serveraction powercycle # power cycle to effect change
SNC - Sub NUMA Cluster on SkylakeLeave SNC disabled (default).
See Fig 2 of: http://en.community.dell.com/techcenter/high-performance-computing/b/general_hpc/archive/2017/08/01/bios-characterization-for-hpc-with-intel-skylake-processor
Skylake... 14th gen PowerEdge... "introduce a clustering mode called Sub NUMA clustering (SNC). On CPU models that support SNC, enabling SNC is akin to splitting the single socket into two NUMA domains, each with half the physical cores and half the memory of the socket." Query Sub NUMA Cluster (SNC) modes - numactl -H - /opt/dell/srvadmin/sbin/racadm get BIOS.ProcSettings.SubNumaCluster - lstopo - SNC=Disabled (default). ie, two socket show up as two NUMA domains. - SNC=Enabled. within socket has split NUMA domains. two socket system would then have FOUR NUMA domains. application who does not optimize for memory locality would be worse off. Thus, not recommended unless benchmark and tuning has been done that benefit from this setting. SNC disabled is baseline. If enable SNC: - STREAM, WRF, Fluent, which are highly localized memory optimized, have slight gain of 1-2% - HPL was ~1% worse off. - /opt/dell/srvadmin/sbin/racadm get BIOS.MemSettings.NodeInterleave - Disabled (default) = system support NUMA (asymmetric) mem config - Enabled = memory interleave is supported IF symetric memory config is installed. (SNC will not work when memory configured this way?)See racadmSetBios.sh script that set this and many below values.
Numa on Epyc Rome 7xx2
BIOS.ProcSettings.NumaNodesPerSocket # default to 4, and numactl -H will show node with 0 RAM :/ BIOS.ProcSettings.ProcVirtualization
Memory Operating Mode
racadm help BIOS.MemSettings.MemOpMode OptimizerMode - Optimizer Mode (default) SingleRankSpareMode - Single Rank Spare Mode; MirrorMode - Mirror Mode; FaultResilientMode - Fault Resilient Mode; OppSrefEn=Disabled (default)
DAPC & GFLOPS/wattDAPC profile consume less power than Performance profile, yet said to produce essentially same chrunching power. [my actual HPL test lost 7% performance]
See Fig 3 + 4 of: http://en.community.dell.com/techcenter/high-performance-computing/b/general_hpc/archive/2017/08/01/bios-characterization-for-hpc-with-intel-skylake-processor
Skylake... 14th gen PowerEdge... - In idle state, Performance Profile consumes ~28% more power than DAPC. - Peak power consumption in DAPC Profile is ~16% less than in Performance Profile. - STREAM, WRF have essentialy same performance in either Performance or DAPC mode. - HPL is 1.05% better in Performance than DAPC mode. - Fluent is ~1.01% better. /opt/dell/srvadmin/sbin/racadm get BIOS.SysProfileSettings.SysProfile SysProfile=PerfOptimized (default) /opt/dell/srvadmin/sbin/racadm set BIOS.SysProfileSettings.SysProfile PerfPerWattOptimizedOs Using PerfPerWattOptimizedDapc: - TBA Using PerfPerWattOptimizedOs: - Reduce HPL performance by ~7% on Skylake 6130 2.1 GHz 32cores 96G on Dell C6420 - Exposes HPL clock calculation bug in random manner (cuz it messes with CPU clock?) Using LowLatencyOptimizedProfile: - This was undocumented profile in 12th Gen PowerEdge - said to make a world of difference for latency , but has significant draw on power and will keep all fans blowing on full speed all the time racadm help BIOS.SysProfileSettings.SysProfile # list available modes (and dependencies): - PerfPerWattOptimizedDapc - Performance Per Watt (DAPC) - PerfPerWattOptimizedOs - Performance Per Watt (OS) - PerfOptimized - Performance - PerfWorkStationOptimized - Workstation Performance;Custom - Custom;
Other BIOS paramWhen CPU doesn't perform, check these params (racadm):
set BIOS.ProcSettings.LogicalProc Disabled # non default, but HPC don't want HT on set BIOS.MemSettings.MemOpMode OptimizerMode # default, should not need to change this for HPC set BIOS.MemSettings.NodeInterleave Enabled # non default, not compatible with SubNumaCluster, not typically recommended set BIOS.ProcSettings.SubNumaCluster Enabled # non default, only useful if app can better utlize localized mem set BIOS.SysProfileSettings.SysProfile PerfOptimized # default set BIOS.SysProfileSettings.SysProfile PerfPerWattOptimizedDapc # said to save energy set BIOS.SysProfileSettings.SysProfile PerfPerWattOptimizedOs # may save energy, seems to introduce clock/timing bug set BIOS.ProcSettings.ControlledTurbo Enabled # allow for external control of when to engage turbo? Def: Disabled. get BIOS.ProcSettings.ProcTurboMode # def Enabled, changeable only in Custom SysProfile get BIOS.SysInformation.SystemBiosVersion # BIOS version set BIOS.SysProfileSettings.SysProfile Custom # need custom mode to set the next two option: set BIOS.SysProfileSettings.ProcCStates Autonomous # def: Disabled. alt: Enabled # to allow proc to operate in all avail power state set BIOS.SysProfileSettings.UncoreFrequency DynamicUFS # def: MaxUFS # https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/543513 # Perhaps also check: get BIOS.ProcSettings.UpiPrefetch # def: Enabled get BIOS.MemSettings.OppSrefEn # def: Disabled get BIOS.SysProfileSettings.MemFrequency # def: MaxPerf. Could choose diff speed such as 2666 MHz, 2400, 1866. get BIOS.SysProfileSettings.ProcC1E # def: Disabled # but enabled when switch to Custom SysProfile !! # Enable = processor is allowed to switch to minimum performance state when idle. get BIOS.SysProfileSettings.EnergyPerformanceBias # def: MaxPower (ie Performance). # alt: BalancedPerformance, BalancedEfficiency, LowPowerNote: Lost of redundant power supply may cause system to clock down significantly, eg turbostat may report 800 MHz on a 2.1 GHz CascadeLake 6230. Checking the following may help to see if there is not enough power supplied to the system: racadm get System.ChassisInfo System.Power System.Power.RedundancyPolicy System.PowerHistorical System.ServerPwr iDRAC.Info iDRAC.WebServer iDRAC.VNCServer omreport may have data, if can get it to work dmidecde -t chassis
Undocumented HPL parameter
# set env var for HPL to properly use AVX export HPL_HOST_ARCH=3 # AVX2, eg Hashwell export HPL_HOST_ARCH=9 # AVX512 eg Skylake
KNL BIOS settingsKNL specific cpu/memory config:
racadm help BIOS.MemSettings... MemThrottlingMode Cltt (def) Oltt SystemMemoryModel All2All SNC-2 SNC-4 Hemisphere Quadrant (def) # 2x2 ProcEmbMemMode # KNL Memory Mode. affect "free -h", "numactl -H" Cache (def) Memory # ie flat mode, memory seen in "free -h" and malloc-able. Hybrid BIOS.ProcSettings DynamicCoreAllocation=Disabled -- This field enables or disables the OS capability to put logical processors in the idling state in order to reduce power consumption. ProcConfigTdp=Nominal -- This field allows reconfiguration of TDP (Thermal Design Power) to lower levels. AMD can't call Hyperthreading, they call it SMT, disabling it need to go to some obscure meny in BIOS, accept some waiver stuff, before can set thread per core. (at least in Asus)
Retrieve Serial stored in BIOS
racadm get System.ChassisInfo # service tag of chassis, eg Dell C6420 chassis racadm get System... # service tag of blade/sledge ipmitool fru | grep Serial # list all fru and filter for serial of sledge/blade ipmitool raw 0x30 0xc8 0x01 0x00 0x0b 0x00 0x00 0x00 | xxd -r # get Dell C6320 chassis service tag vintage 2017More info, eg fetching Dell C6220 II chassis serial, see: ipmi
racadm techsupreport collect # start a support collection racadm jobqueue view racadm techsupreport export -f tsr_report.zip # didn't work via singularity img (cuz idrac version diff?) racadm supportassist exportlastcollection -f tsr.zip # via sl7-tools vnfs racadm racreset # restart rac, like power cycling the service processor, but not the wipe config and restore to factory default kind of reset, which does have a separate cmd for
ipmiDell maintain their own version of ipmi that handles complex situation that RHEL7 ipmitool didn't. eg:
./ipmitool delloem lan set shared with lom1
racadm set iDRAC.NIC.Selection 2 with the selections as follows 1 - Dedicated 2 - LOM1 3 - LOM2 4 - LOM3 5 - LOM4
RefIntel E5 v3 (Hashwell) bios param tuning for Fabric(Omnipath, Infiniband).
- List of many racadm settable params (vs syscfg)
- v5.0.1 of above
- C6420 Spec and ConfigBIOS/Memory settings in page 40
- Dell 14th Gen server (skylake) BIOS settings
- Dell skylake Performance Benchmark results (HPL, Stream, SpecInt, Ansys, etc)
- Dell Epyc Rome Becnharmk results
- Dell HPC results
SuperMicro (bios) update managerSUM = supermicro update manager, not the checksum command from the OS!
untar gz, no real need to install ./sum -h # not the checksum command in linux default path... sum -i 10.10.... -u ADMIN -p ADMIN -c GetBIOSInfo # use IPMI interface to get info sum -c GetBIOSInfo # this find out firmware version info, etc. run on local machine sum -c UpdateBios --file BIOS.rom # this update bios, not change config sum -c GetCurrentBiosCfg --file smBiosCf.txt # write output of BIOS settings (HT on or off, etc) to file sum -c GetCurrentBiosCfg > smBiosCf.txt # should be same, but couple of places have * (default) next to diff entries. also has SM(c) header, should avoid vi smBiosCf.txt # make desired changes to BIOS use hex code for values. eg turn off Hyperthreading # remove the first two lines of cfg that had the SM copyright info # in KNL, file was simple text file # in skylake, file was xml (UEFI bios?) sum -c ChangeBiosCfg --file smBiosCf.txt --reboot # update bios config, rebooting host # --reboot would actually be an soft reboot telling OS to shutdown (via ACPI?) sum -h -c ChangeBiosCfg # get help on how to make changes to bios sum -c LoadDefaultBiosCfg sum -c GetDmiInfoSelect GetCurrentBiosCfg entries
sum -c GetCurrentBiosCfg | egrep --color 'Hyper-Threading|Turbo|CPU\ C\ State=|Cluster\ Mode=|Memory\ Mode='
[Advanced|Boot Feature] Quiet Boot=01 // Please enter the value in 2 hexadecimal digits. Default value is <<<01>>> [Advanced|Processor Configuration] Intel(R) Hyper-Threading Technology=01 // 01 (Disable), *00 (Enable) [Advanced|Processor Configuration|Advanced Power Management Configuration|CPU P State Control] Energy efficient P-state=01 // 00 (Disable), *01 (Enable) Turbo Mode=01 // 00 (Disable), *01 (Enable) [Advanced|Processor Configuration|Advanced Power Management Configuration|CPU C State Control] CPU C State=01 // 00 (Disable), *01 (Enable) [Advanced|Chipset Configuration|North Bridge|Uncore Configuration] Cluster Mode=00 // *00 (All2All), 01 (SNC-2), 02 (SNC-4), 03 (Hemisphere), 04 (Quadrant), 05 (Auto) Memory Mode=01 // *01 (Flat), 00 (Cache), 02 (Hybrid), 03 (Auto) MCDRAM Cache Size=02 // 01 (25% of MCDRAM size), *02 (50% of MCDRAM size) Memory Mode = "Flat" or Memory Mode = "Cache" or Memory Mode = "Hybrid" and Memory Mode = "Cache" or Memory Mode = "Hybrid" or Memory Mode = "Auto" and Memory Mode = "Flat" or Memory Mode = "Hybrid" or Memory Mode = "Auto" Treat MCDRAM as Hot-Pluggable Memory=00 // *00 (no), 01 (yes) Memory Mode = "Flat" or Memory Mode = "Cache" or Memory Mode = "Hybrid" and Memory Mode = "Flat" or Memory Mode = "Hybrid" or Memory Mode = "Auto" OPIO Parallel Training=01 // *01 (Enable), 00 (Disable) OPIO Parallel Training Channel Count=08 // Please enter the value in 2 hexadecimal digits. Default value is <<<08>>> OPIO Parallel Training = "Enable" MCDRAM Repair=01 // 00 (Disable), *01 (Enable) MCDRAM Diagnostics=01 // 00 (Disable), *01 (Enable) MCDRAM Repair = "Disable" MCDRAM Data in NVRAM=01 // 00 (Disable), *01 (Store), 02 (Use), 03 (UseStore) EDC Demand Scrub=01 // *01 (Enable), 00 (Disable) EDC Patrol Scrub=01 // 00 (Disable), *01 (Enable) [Advanced|Serial Port Console Redirection|COM1 Console Redirection Settings] Terminal Type=01 // 00 (VT100), *01 (VT100+), 02 (VT-UTF8), 03 (ANSI) Bits per second=07 // 03 (9600), 04 (19200), 05 (38400), 06 (57600), *07 (115200) Data Bits=08 // 07 (7), *08 (8) Parity=01 // *01 (None), 02 (Even), 03 (Odd), 04 (Mark), 05 (Space) Stop Bits=01 // *01 (1), 03 (2) Flow Control=00 // *00 (None), 01 (Hardware RTS/CTS) Legacy OS Redirection Resolution=01 // 00 (80x24), *01 (80x25) Putty KeyPad=01 // *01 (VT100), 02 (LINUX), 04 (XTERMR6), 08 (SCO), 10 (ESCN), 20 (VT400) Redirection After BIOS POST=00 // *00 (Always Enable), 01 (BootLoader)
SMC IPMIInfo from serverfault
Supermicro has an IPMICFG util that allows for custom commands like below. Get tool from: ftp Google Drive ver 1.31.1
To change vlan tagging:
./IPMICFG-Linux.x86_64 -vlan offTo reset the service processor:
run ipmicfg -fde to reset all settings back to factory default, then ipmicfg -r to do a BMC cold reset after reset, ipmicfg -m to see what’s IP, try ping it.To change which interface is used for IPMI:
sudo ipmiutil lan -e # check config sudo ipmiutil smcoem lanport dedicated # set to sharedinstead of getting the util, can apparently just use raw commands via ipmitool. worked in 2020 on a server w/ cascade lake processor.
It is slightly unerving not knowing what it is really doing and if it would work on a server of a different vintage... hopefully it can't do lasting damage.
It didn't look like SUM was able to change the IPMI interface settings.
To get LAN mode: ipmitool raw 0x30 0x70 0x0c 0 output (entry for [B] for set): 0x00 = Dedicated, 0x01 = Onboard / Shared 0x02 = Failover [A] = 0 is get [A] = 1 is set . A B To set LAN mode dedicated: ipmitool raw 0x30 0x70 0x0c 1 0 To set LAN mode onboard/shared: ipmitool raw 0x30 0x70 0x0c 1 1 To set LAN mode failover: ipmitool raw 0x30 0x70 0x0c 1 2 # this seems like default; this settings worked as "shared" some say hard reboot needed, other says work right away without reboot. Apparently default depends whether were is live rj45 connected to the dedicated ipmi port when power was connected to computer. So some changes may need power disconnect for it to take effect?Additional SMC IPMI raw commands in their support faq 15868. It is from 2003, and both X8 and X9, AMI or ATEN BIOS, seems to take the same set of commands.
use this carefully this was for a ESC 4000 (G4?) 2U GPU node reset bmc to be programmed via cli ipmitool. changing that thing on the bios is weird. As with all raw commands, use at your own risk! :-D asus in general just use channel 8. But for that one machine had to issue this raw ipmi command to get things reset/fixed: ipmitool raw 0x32 0x71 0x00 0x01 0x01 # set ipmi/bmc to use shared nic ipmitool lan set 8 cipher_privs XaaaaaaaaaaaXXX # reset cipher suite allowed for channel 8For IPMI and OS to share NIC, it needs to support NCIS. Otherwise, the BMC will need a dedicated connection for IPMI access. eg: Intel I350 has this support. https://www.gigabyte.com/us/Enterprise/Accessory/CLN4314-rev-10
Asus IPMI rawAs with all raw commands, use at your own risk! :-D
# toggle on/off among the diff settings/interfaces should revive ipmi over lan # For Asus ESC4000 G4 series for Intel Skylake/Cascadelake # BMC likely ASpeed ast2500 ipmitool raw 0x32 0x71 0x00 0x01 0x01 # set ipmi/bmc to use shared nic # For Asus ASMB7-iKVM (P9D-MV) ipmitool raw 0x32 0x71 0x00 0x01 0x03 # activate ipmi/bmc on lan1 (shared with host/os nic) ipmitool raw 0x32 0x71 0x00 0x01 0x00 # de-activate ipmi/bmc on lan1 (shared with host/os nic) ipmitool raw 0x32 0x71 0x00 0x00 0x03 # activate ipmi/bmc on dm_lan1 (dedicated management port) ipmitool raw 0x32 0x71 0x00 0x00 0x00 # de-activate ipmi/bmc on dm_lan1 (dedicated management port) ref: https://www.thomas-krenn.com/de/wiki/IPMI_Netzwerkports_von_ASUS_Mainboards_konfigurieren # Asus ESC4000 E10 is for AMD Epyc Naples/Rome 7xx2 series, # BMC is ASMB10-iKVM, which Uses newer ASpeed ast2600
Asus Bios Tool
Compared to SuperMicro SUM, it’s more like a centralized management utility, not as compact, but has a lot of functions. Refer to the “BIOS setting” section.
The bloody zip file is a .ova VirtualBox image :-\
.cab file, but once get into bios, have option that read the .cab file and update bios. Windows/DOS NOT needed :)
??tool to manipulate BIOS using cli tool... TBA
Dell Support Assist
singularity exec -B /var/run /global/scratch/tin/singularity-repo/dirac1_dell_idracadm.img \ /opt/dell/srvadmin/sbin/racadm supportassist collect Job ID = JID_926261291256 RAC1154: The requested operation is initiated. Run the RACADM jobqueue sub-command, using the job id to check the status of the requested operation. racadm jobqueue view -------------------------JOB QUEUE------------------------ [Job ID=JID_926261291256] Job Name=SupportAssist Collection Status=Completed Start Time=[Not Applicable] Expiration Time=[Not Applicable] Message=[SRV088: The SupportAssist Collection Operation is completed successfully.] Percent Complete= ---------------------------------------------------------- # this didn't work, not sure where file went: /global/scratch/tin/singularity-repo/dirac1_dell_idracadm.img /opt/dell/srvadmin/sbin/racadm supportassist exportlastcollection -f tsr.zip # ftp server??!! racadm supportassist exportlastcollection -l ftp://192.168.10.24/share -u myuser -p mypass
IBM/Lenovo DSALenovo Dynamic System Analyst (DSA).
Collect system state/log for support to review. Portable version for rhel7 worked for vNFS n0096.s2
Download ( https://support.lenovo.com/us/en/downloads/ds121924 ) self extracting elf .bin and right away run the program. don't seems to leave any binary behind when finsih. When finished, leave collected log in /var/log/Lenovo_Support Probably same thing that is invoked by "bios" collect tool, which save result to usb.
Preliminary okay to give sudo to intern to run... keep a stable binary version in a non-writable place...
wget https://download.lenovo.com/servers/mig/2017/05/17/6836/lnvgy_utl_dsa_dsala7d-10.3_portable_rhel7_x86-64.bi sudo ./lnvgy_utl_dsa_dsala7d-10.3_portable_rhel7_x86-64.bin
Copyright info about this work
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike2.5 License.
Pocket Sys Admin Survival Guide: for content that I wrote, (CC)
some rights reserved.
2005,2012 Tin Ho [ tin6150 (at) gmail.com ]
Some contents are "cached" here for easy reference. Sources include man pages, vendor documents, online references, discussion groups, etc. Copyright of those are obviously those of the vendor and original authors. I am merely caching them here for quick reference and avoid broken URL problems.
Where is PSG hosted these days?
http://tiny.cc/tin6150/ New home in 2011.06.
http://tin6150.s3-website-us-west-1.amazonaws.com/psg.html (coming soon)
ftp://sn.is-a-geek.com/psg/psg.html My home "server". Up sporadically.
http://www.fiu.edu/~tho01/psg/psg.html (no longer updated as of 2007-05)