Technical Support Team Lead

This role has been designed as ‘’Onsite’ with an expectation that you will primarily work from an HPE partner/customer office.Who We Are:

Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today’s complex world. Our culture thrives on finding new and better ways to accelerate what’s next. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good. If you are looking to stretch and grow your career our culture will embrace you. Open up opportunities with HPE.

Job Description:

Job DescriptionHPE Operations is our innovative IT services organization. It provides the expertise to advise, integrate, and accelerate our customers’ outcomes from their digital transformation. Our teams collaborate to transform insight into innovation. In today’s fast paced, hybrid IT world, being at business speed means overcoming IT complexity to match the speed of actions to the speed of opportunities. Deploy the right technology to respond quickly to market possibilities. Join us and redefine what’s next for you

Technical skills

Strong hands-on experience with ManageEngine OpManager (OPM) and Applications Manager (APM), including configuration and management of agent-based and agentless monitoring in Central–Probe, distributed, and high-availability (HA) architectures.
Hands-on experience in monitoring Windows and Linux servers, performing performance analysis of CPU, memory, disk, filesystem, and processes, along with log analysis and service monitoring.
In-depth knowledge of SNMP, ICMP, WMI, SSH, and API-based monitoring, with hands-on experience in monitoring and troubleshooting network devices (routers, switches, firewalls), interface performance (bandwidth utilization, latency, packet loss, availability), and a strong understanding of NetFlow, QoS, and traffic analysis fundamentals.
Experience in monitoring web servers (Apache, IIS, Tomcat etc), JVM-based applications, databases (Oracle, MS SQL, MySQL, PostgreSQL), URLs, APIs, and synthetic transactions, with strong knowledge of response time analysis, thread monitoring, and memory leak identification.
Strong experience in alerting, thresholds, and automation, including configuration of thresholds, alert profiles, escalation rules, and maintenance windows, along with noise reduction, false alert suppression, alert correlation, and root cause identification.
Ability to analyse performance trends and capacity utilization across servers, storage, applications, and backups, combined with strong knowledge of credential management, role-based access control, and secure communication mechanisms including SSL/TLS and certificate management.
Strong knowledge of ITIL processes (Incident, Problem, and Change Management), with experience working in 24×7 production environments and proven expertise in documentation and Root Cause Analysis (RCA) preparation.
Disaster Recovery (DR): Knowledge of implementing and maintaining DR plans and executing recovery procedures.
Operating Systems: Experience with systems administration for both Unix/Linux and Windows.
Scripting and Automation (Added Advantage)

Soft skills

Ability to monitor, analyse, and optimize the ITSM infrastructure implemented at DC/DR, including storage utilization, backup performance, and overall capacity planning, to ensure optimal system performance and availability.
Documentation: Experience creating and maintaining technical documentation, Standard Operating Procedures (SOPs), RCA and reports.
Problem-Solving
Communication
Teamwork
Attention to Detail

What you’ll do:

. Monitor Infrastructure Administration

Monitor and manage IT infrastructure and application performance using ManageEngine OpManager (OPM) and Applications Manager (APM) to ensure high availability and SLA adherence.
Perform L2-level troubleshooting for alerts related to network, server, and application performance, availability, and capacity issues.
Analyse logs, metrics, and alerts to identify root causes and provide timely resolution or escalation to L3 teams.
Configure and fine-tune monitoring thresholds, alert profiles, escalation policies, and maintenance windows to reduce noise and false positives.
Coordinate with Network, Server, Database, and Application teams during incidents, changes, and planned maintenance activities.
Maintain accurate documentation, incident records, and Root Cause Analysis (RCA) reports while ensuring adherence to ITIL best practices and operational standards in a 24×7 production environment.

**2. Performance and Capac...