HP seems to have set up a package repository for Ubuntu 12.04, which is an improvement since I last checked a few years ago. To use the repo, add the following line to /etc/apt/sources.list:
1 |
deb http://downloads.linux.hp.com/downloads/ManagementComponentPack/ubuntu precise current/non-free |
Run “sudo apt-get update”.
You can install a number of software packages from the repository:
- hpsmh: HP System Management Homepage
- hp-smh-template: HP System Management Homepage Templates
- cpqacuxe: HP Array Configuration Utility, web-based
- hp-snmp-agents: Insight Management SNMP Agents for HP ProLiant Systems
- hponcfg: RILOE II/iLO online configuration utility
- hp-health: HP System Health Application and Command line Utility Package
- hpacucli: HP Command Line Array Configuration Utility
- ams: Agentless Monitoring Service for HP ProLiant Gen8 Systems
I installed the iLO configuration utility, System Health App and Array Configuration command line utility.
1 |
root@host:/etc/apt# apt-get install hponcfg hp-health hpacucli |
I couldn’t find a working GPG key so you need to press y or force package installation.
Table of Contents
Useful Commands
You can blink the UID light with the hpuid command.
The hpasmcli is a tool to show and set various system parameters.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
hpasmcli> show powermeter Power Meter #1 Power Reading : 224 hpasmcli> show powersupply Power supply #1 Present : Yes Redundant: No Condition: Ok Hotplug : Supported Power supply #2 Present : Yes Redundant: No Condition: FAILED Hotplug : Supported |
A command called “hplog” can be used to view the log:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
root@host:~# hplog -v ID Severity Initial Time Update Time Count ------------------------------------------------------------- 1026 Repaired 13:44 04/10/2012 13:46 04/10/2012 0001 LOG: System Power Supplies Not Redundant 1027 Repaired 13:46 04/10/2012 13:48 04/10/2012 0001 LOG: System Power Supply: General Failure (Power Supply 2) 1028 Repaired 13:46 04/10/2012 13:48 04/10/2012 0001 LOG: System Power Supplies Not Redundant 1029 Repaired 13:48 04/10/2012 13:49 04/10/2012 0001 LOG: System Power Supply: General Failure (Power Supply 2) 1030 Repaired 13:48 04/10/2012 13:49 04/10/2012 0001 LOG: System Power Supplies Not Redundant |
And show system health information (fans, power supplies, temperatures):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
root@host:~# hplog -f ID TYPE LOCATION STATUS REDUNDANT FAN SPEED 1 Var. Speed I/O Zone Normal Yes Medium ( 45) 2 Var. Speed I/O Zone Normal Yes Medium ( 45) 3 Var. Speed Processor Zone Normal Yes Medium ( 41) 4 Var. Speed Processor Zone Normal Yes Low ( 36) 5 Var. Speed Processor Zone Normal Yes Low ( 36) 6 Var. Speed Processor Zone Normal Yes Low ( 36) root@host:~# hplog -p ID TYPE LOCATION STATUS REDUNDANT 1 Standard Pwr. Supply Bay Normal No 2 Standard Pwr. Supply Bay Failed No root@host:~# hplog -t ID TYPE LOCATION STATUS CURRENT THRESHOLD 1 Basic Sensor I/O Zone Normal 105F/ 41C 158F/ 70C 2 Basic Sensor Ambient Normal 68F/ 20C 102F/ 39C 3 Basic Sensor CPU (1) Normal 86F/ 30C 260F/127C 4 Basic Sensor CPU (1) Normal 86F/ 30C 260F/127C 5 Basic Sensor Pwr. Supply Bay Normal 111F/ 44C 170F/ 77C 6 Basic Sensor CPU (2) Normal 86F/ 30C 260F/127C 7 Basic Sensor CPU (2) Normal 86F/ 30C 260F/127C |
Array Configuration Utility
The “hpacucli” is a Smart Array configuration tool. Some examples (the prompt is the =>):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
root@host:~# hpacucli HP Array Configuration Utility CLI 9.30.15.0 Detecting Controllers...Done. Type "help" for a list of supported commands. Type "exit" to close the console. => ctrl all show Smart Array P400 in Slot 3 (sn: P61630D9SV063I) => ctrl slot=3 show Smart Array P400 in Slot 3 Bus Interface: PCI Slot: 3 Serial Number: P61630D9SV063I Cache Serial Number: PA2270H9VV23RN RAID 6 (ADG) Status: Enabled Controller Status: OK Hardware Revision: D Firmware Version: 7.22 Rebuild Priority: Medium Expand Priority: Medium Surface Scan Delay: 3 secs Surface Scan Mode: Idle Wait for Cache Room: Disabled Surface Analysis Inconsistency Notification: Disabled Post Prompt Timeout: 15 secs Cache Board Present: True Cache Status: OK Cache Ratio: 25% Read / 75% Write Drive Write Cache: Enabled Total Cache Size: 512 MB Total Cache Memory Available: 464 MB No-Battery Write Cache: Enabled Cache Backup Power Source: Batteries Battery/Capacitor Count: 1 Battery/Capacitor Status: OK SATA NCQ Supported: True => ctrl slot=3 array all show Smart Array P400 in Slot 3 array A (SAS, Unused Space: 0 MB) => ctrl slot=3 array A show Smart Array P400 in Slot 3 Array: A Interface Type: SAS Unused Space: 0 MB Status: OK Array Type: Data => ctrl slot=3 physicaldrive all show Smart Array P400 in Slot 3 array A physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK) physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK) physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 146 GB, OK) |
The utility also understands commands directly from the command line:
1 2 3 4 5 6 |
root@host:~# hpacucli ctrl slot=3 show status Smart Array P400 in Slot 3 Controller Status: OK Cache Status: OK Battery/Capacitor Status: OK |
E-mail Alerts
To get e-mail out of the system, I installed Postfix.
1 |
root@host:~# apt-get install postfix mailutils |
Select “Internet Site”. After installation, do a reconfiguration:
1 |
root@host:~# dpkg-reconfigure postfix |
Select “Internet Site” again. Give your username as the recipient for root and postmaster.
Use the default destination list. No forcing of synchronous updates.
Next question is about where to accept mail from. The default is localhost only, which is good for my purposes, because this is not a proper mail server.
For the rest of the questions I just use defaults. For additional security, you can edit the /etc/postfix/main.cf and change the line
1 |
inet_interfaces = all |
to:
1 |
inet_interfaces = loopback-only |
Restart Postfix. To forward all important mail from the system to yourself, edit /etc/aliases:
1 2 |
postmaster: root root: your@email.here |
Run command
1 |
root@host:~# newaliases |
Now you will get all root mail. I also like to change root full name, which will show up as the sender of the e-mail. This way I can see which host’s root is sending me mail.
1 |
chfn -f "Hostname Root" root |
Hardware Health Check Script
For Smart Array checking, I wrote this little script and put it in /usr/local/sbin/smart_array_check:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
#!/bin/sh SLOT="3" # Your controller slot number ARRAY="A" # Your array letter EMAIL="root" # You can put your e-mail address here # This function is called if checks don't pass. Send e-mail. Notify() { SUBJECT=$1 { echo $SUBJECT echo echo Check date: $(date +"%F %T%:::z") echo echo Controller Status: hpacucli ctrl slot=$SLOT show echo Array Status: hpacucli ctrl slot=$SLOT array $ARRAY show echo Physical Drives: hpacucli ctrl slot=$SLOT physicaldrive all show echo Physical Drive Details: for DRIVE in $(hpacucli ctrl slot=$SLOT physicaldrive all show | grep physicaldrive | awk '{ print $2 }') do hpacucli ctrl slot=$SLOT physicaldrive $DRIVE show done } | mail -s "$SUBJECT" $EMAIL } # Check that there's a line saying 'Controller Status: OK' etc. hpacucli ctrl slot=$SLOT show status \ | grep -q 'Controller Status: OK' \ || Notify "Smart Array CONTROLLER FAILURE at $(hostname)" hpacucli ctrl slot=$SLOT show status \ | grep -q 'Cache Status: OK' \ || Notify "Smart Array CACHE FAILURE at $(hostname)" hpacucli ctrl slot=$SLOT show status \ | grep -q 'Battery/Capacitor Status: OK' \ || Notify "Smart Array BATTERY FAILURE at $(hostname)" hpacucli ctrl slot=$SLOT array $ARRAY show \ | grep -q 'Status: OK' \ || Notify "Smart Array ARRAY FAILURE at $(hostname)" # This is for testing: #Notify "This is a test" |
For other hardware health checks I wrote this one and put it in /usr/local/sbin/hw_health_check:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
#!/bin/sh EMAIL="root" # You can put your e-mail address here # This function is called if checks don't pass Notify() { # Something went wrong. Send e-mail SUBJECT=$1 { echo $SUBJECT echo echo Check date: $(date +"%F %T%:::z") echo echo '== Power Supply Status ==' echo hplog -p echo '== System Fan Status ==' echo hplog -f echo '== Temperatures ==' echo hplog -t echo '== HP System Log ==' hplog -v } | mail -s "$SUBJECT" $EMAIL } # Power supply check with hplog -p: # # ID TYPE LOCATION STATUS REDUNDANT # 1 Standard Pwr. Supply Bay Normal No # 2 Standard Pwr. Supply Bay Normal No # # A failed power supply looks like this: # # ID TYPE LOCATION STATUS REDUNDANT # 1 Standard Pwr. Supply Bay Normal No # 2 Standard Pwr. Supply Bay Failed No # # A removed power supply looks like this: # # ID TYPE LOCATION STATUS REDUNDANT # 1 Standard Pwr. Supply Bay Normal No # 2 Standard Pwr. Supply Bay Absent No # # The total number of power supplies should equal the number of # power supplies with the status "Normal" TOTAL_PSU_COUNT=$(hplog -p | tail -n +2 | head -n -1 | wc -l) OK_PSU_COUNT=$(hplog -p | tail -n +2 | head -n -1 | grep -c Normal) if [ "$TOTAL_PSU_COUNT" != "$OK_PSU_COUNT" ] then Notify "SYSTEM POWER SUPPLY PROBLEM at $(hostname)" fi # Fan check with hplog -f: # # ID TYPE LOCATION STATUS REDUNDANT FAN SPEED # 1 Var. Speed I/O Zone Normal Yes Medium ( 45) # 2 Var. Speed I/O Zone Normal Yes Medium ( 45) # 3 Var. Speed Processor Zone Normal Yes Medium ( 41) # 4 Var. Speed Processor Zone Normal Yes Low ( 36) # 5 Var. Speed Processor Zone Normal Yes Low ( 36) # 6 Var. Speed Processor Zone Normal Yes Low ( 36) # # The total number of fans should equal the number of # fans with the status "Normal" TOTAL_FAN_COUNT=$(hplog -f | tail -n +2 | head -n -1 | wc -l) OK_FAN_COUNT=$(hplog -f | tail -n +2 | head -n -1 | grep -c Normal) if [ "$TOTAL_FAN_COUNT" != "$OK_FAN_COUNT" ] then Notify "SYSTEM FAN PROBLEM at $(hostname)" fi # Temperature check with hplog -t: # # ID TYPE LOCATION STATUS CURRENT THRESHOLD # 1 Basic Sensor I/O Zone Normal 105F/ 41C 158F/ 70C # 2 Basic Sensor Ambient Normal 69F/ 21C 102F/ 39C # 3 Basic Sensor CPU (1) Normal 86F/ 30C 260F/127C # 4 Basic Sensor CPU (1) Normal 86F/ 30C 260F/127C # 5 Basic Sensor Pwr. Supply Bay Normal 111F/ 44C 170F/ 77C # 6 Basic Sensor CPU (2) Normal 86F/ 30C 260F/127C # 7 Basic Sensor CPU (2) Normal 86F/ 30C 260F/127C # # The total number of temperature readings should equal the number of # temperature readings with the status "Normal" TOTAL_TEMP_COUNT=$(hplog -t | tail -n +2 | head -n -1 | wc -l) OK_TEMP_COUNT=$(hplog -t | tail -n +2 | head -n -1 | grep -c Normal) if [ "$TOTAL_TEMP_COUNT" != "$OK_TEMP_COUNT" ] then Notify "SYSTEM TEMPERATURE PROBLEM at $(hostname)" fi # This is for testing: #Notify "This is a test" |
Add both scripts to crontab with “crontab -e”:
1 2 3 |
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 0 0,6,18,12 * * * smart_array_check 0 0,6,18,12 * * * hw_health_check |
That will run the checks four times a day and e-mail every time there is a failure.
Links:
I really wish that hp-health actually installed on Ubuntu 12.04 on my DL140 G3. Sadly doesn’t just keep getting
Setting up hp-health (9.4.0.1.7-5.) …
Trying to identify the Product Name…
ERROR: This server is NOT supported!
Error: No supported management controller found
invoke-rc.d: initscript hp-health, action “start” failed.
dpkg: error processing hp-health (–configure):
subprocess installed post-installation script returned error exit status 1
Any pointers as to why?
Look in initscript hp-health and read comments. You will find the answer … Your server PSP compatible ? I think it is not complete …
IMHO: HP servers have a lot of models of the ILO (I know ILO100, ILO , iLO2, ILO3, ILO4 in various series servers ) , and they do not have equal features.
IMHO:Series DL100 usually have a ” lite version ” support functions .
HP usually stops supporting the old series servers in the new versions of utilities , PSP, etc. (see for example relase notes for psp 8.50)
You can try to install the HP-OpenIPMI and ipmitools and have something through IPMI (Google has the answers , how to do it ) . BUT it is not the same …
I have the same problem with DL120G5 ( yes, I bought ILO100c for remote screen), but it does not give all that I wanted to have…
I am saddened … (
curl http://downloads.linux.hp.com/SDR/hpPublicKey1024.pub | apt-key add –
curl http://downloads.linux.hp.com/SDR/hpPublicKey2048.pub | apt-key add –
https://downloads.linux.hp.com/SDR/keys.html lists their keys, but *sigh*
1. the instructions tell you to download the keys over unencrypted HTTP and blindly trust them *headdesk*
2. the SSL configuration on downloads.linux.hp.com fails the Qualys SSL test because it’s EXPLOITABLE BECAUSE OF A VULNERABILITY TO CVE-2014-0224
3. also they don’t serve the required intermediate certs so curl/wget barf
4. the first of the three keys (“hpPublicKey1024.pub”) is expired, but is still listed as one of the keys to be installed
5. even after you install all those keys, apt-get complains that the packages cannot be authenticated
Hey, this was exactly what I was looking for 🙂 Thank you a lot! This script worked perfectly on our HP ProLiant DL585 G2 server with Debian 8.0.
To monitor RAID controller I can suggest this script: https://networklessons.com/linux/send-e-mail-when-raid-fails-on-hp-proliant-running-linux/