Professional Documents
Culture Documents
Introduction 1
Management 5
Monitoring 7
Security 9
Analytics 10
Conclusion 17
Enterprises must support hundreds or even thousands of applications to meet growing business
demands. This growth has driven up the cost of acquiring and managing servers and storage. Clouds
enable customers to consolidate servers, storage, and database workloads onto a shared hardware
and software infrastructure, but the resultant hybrid IT environments raise challenges for operations
management.
Hybrid IT Operations Management must support several goals including the following.
Increasing Quality of Service: IT organizations are not only trying to drive down costs, they are
also looking at solutions that will simultaneously improve quality of service in terms of
performance, availability and security. Cloud consumers inherently benefit from the high
availability characteristics built into the Cloud.
Enabling Faster Deployment: Building the Cloud infrastructure using standard building block
components (for example, servers, CPUs, storage, and network), configurations, and tools,
enables a streamlined, automated, and simplified deployment process.
Providing Resource Elasticity: The ability to grow and shrink the capacity of a given resource, both
in terms of storage size and compute power, allows applications the flexibility to meet the dynamic
nature of business workloads by scaling vertically and horizontally.
Rapid Provisioning: Services in a Cloud can be rapidly provisioned, often by way of a self-service
infrastructure, providing agility in application deployment. This reduces overall time in deploying
production applications, development platforms, or creating test bed configurations.
There are generally two different operations management perspectives with Cloud. We need to
understand both perspectives to employ appropriate Cloud services to achieve the business and
operational goals.
Lifecycle Management
Mass deployment of Oracle database and WebLogic server instances to the Cloud. In order to support the
scale of Cloud, techniques such as gold image cloning and provisioning via templates/profiles need to be
employed.
Migration of Databases from on-premises to Cloud without changes, which is commonly known as Lift and
Shift.
Troubleshoot an application instance by cloning the production environment to a Compute instance in the
Cloud
Service Management
Backup an on-premises database to the Cloud (For example, Backup to Oracle Cloud, Recover from
Oracle Cloud Backup, and Provision from Cloud backup)
Real-time metering of charges incurred on Cloud resources and subsequent Chargeback to the respective
organizations.
Log Monitoring
Monitor application server log files to detect abnormalities. Harvesting useful information from millions of
log records require machine learning, pattern recognition and the ability to cluster and correlate messages.
Performance Monitoring
Periodically check the system from the end user perspective and identify performance issues before the
user encounters them
Troubleshoot WebLogic server performance issues by drilling down the stack. In a Hybrid IT environment
the issue could be either on the provider side or on the enterprise side and may be in any of the tiers (web,
application, database, or system)
System Monitoring
Monitor the CPU and memory usage on servers and forecast the future resource requirements
Oracle Cloud Resources These are the applications and services in the Oracle Public Cloud. Various
service types should be taken into account. For example, SaaS services, applications deployed on PaaS,
and applications deployed on IaaS should all be considered.
Private Cloud Resources Internal private Cloud (including Oracle Cloud Machine) resources need to be
managed as well
Traditional IT There are other local data center resources that are not Cloud-enabled but are integrated
with the Cloud infrastructure.
An integrated approach to monitoring is required for effective operations management and troubleshooting in a
Hybrid IT environment. Four broad areas of operations management is outlined in the conceptual view. These areas
may overlap in some cases but they provide a reasonable logical boundary to identify the capabilities required for
operations management.
Management Lifecycle and administrative activities to setup, configure, and run the Cloud services,
hardware and software systems
Monitoring Surveillance of Cloud and non-Cloud resources to ensure proper day to day operations and
escalate any anomalies.
Analytics Consolidate, assimilate, and analyze operational information to harvest meaningful IT and
business insights
Security Monitor, identify and mitigate security threats and manage security information
Figure 2 provides more details into each of the four areas in the conceptual view.
Management
Lifecycle Management
Lifecycle Management is a comprehensive solution that helps administrators automate the processes required to
manage the lifecycle of Cloud services and other resources. It eliminates manual and time-consuming tasks related
to discovery, initial provisioning, patching, orchestration etc.
In a Hybrid IT scenario, the responsibilities are divided between the enterprise and the Cloud provider and the
degree of responsibility depends on the service type. For example, in SaaS, the provider manages the complete
stack including server, OS, platform, and application, whereas in IaaS, the provider manages only the server but the
Figure 4 shows the provisioning and patching lifecycle in a Cloud scenario. Consumers typically start with a trial
subscription and, once satisfied, subscribe the service, deploy, and configure. During the lifecycle of the service,
various components of the service may need to be patched. Sometimes its the responsibility of the provider and
sometimes its that of the consumer. In case of PaaS, the provider is responsible for patching the platform (such as
database or application server). Or, if the consumer has deployed the platform on a IaaS, then it is the responsibility
of the consumer to patch it. Cloud scale may require a Fail in Place approach where patches are applied to the
templates or golden images and new service instances being created from those images rather than fixing the
running instances. Regardless of who does it, due diligence needs to be done by analyzing, testing, and approving
the patches before applying and verifying them.
Cloud orchestration describes the arrangement and coordination of automated tasks, ultimately resulting in a
consolidated process or workflow. In a Hybrid IT environment, most lifecycle activities require coordinating
resources in the Cloud and on-premises as a single unit of work. Provisioning and management tasks need to be
executed in proper sequence due to the dependencies between them. Orchestration capabilities enable this
coordination among resource activities.
Discovery involves automatically identifying hosts and their software deployments, adding them as manageable
targets, and monitoring and managing their health. You can discover hosts and targets automatically or manually.
Migration and cloning involves moving instances or data from on premises to the Cloud (and vice versa) and making
copies of golden images across the Cloud/Enterprise boundary. This capability is very useful when development and
testing happens in the Cloud as data and configuration need to be replicated across environments.
Hybrid management refers to integrating the Cloud and on-premises management infrastructure using integration
components such as management agents, hybrid agents, gateways, and proxies. Having coordinated components
allows for single pane of glass management.
Quality Management
Quality management includes application testing, infrastructure testing, and test data management in a Cloud
environment. Test Management, Functional Testing, and Load Testing capabilities ensure the quality of applications.
Infrastructure testing involves realistic, production-scale testing of the application and database infrastructure using
real, production workloads and validating Infrastructure changes. Test Data Management and Data Masking
provide efficient, automated, and secure test system creation capabilities with out-of-the-box templates. Cloud
provides the agility necessary to rapidly build and tear down test environments. In a Hybrid IT environment, two
aspects of quality management are important. The first one is to leverage the Cloud services for running tests. The
second aspect is testing the workloads deployed on the Cloud services.
Service Management
Efficient operations management requires managing the services, service level agreements, and the economics
around them. Service Definition provides the foundation of managing and monitoring the many infrastructure
components of a Service as a single logical entity that facilitates business oriented management. Another key
capability is the ability to backup data from on-premises to Cloud and vice versa, and recover or provision from the
backup. The business establishes performance and availability criteria, and the key business activities that a Service
needs to support in order for it to be considered working properly. This criterion forms the foundation of a service
level agreement (SLA). SLA management includes definition of SLA, monitoring the metrics defined in the SLA, and
ensuring that the SLA is met. Reporting capabilities allow the definition of custom reports which can be saved in the
Configuration Management
Configuration management involves collecting and managing configuration information for all managed targets
across the enterprise and in the Cloud. Collected configuration information is periodically sent to the Management
Repository. Management tools should enable you to view, save, track, compare, search, and customize collected
configuration information for all managed targets known. Additionally, a visual layout of a target's relationships with
other Cloud and on-premises targets would be a useful capability that allows you to determine a system's structure
by viewing the members of a system and their interrelationships.
Change Management
Change Management is the ability to manage change by creating a baseline, compare instances with the baseline,
and promote changes from baseline to the target instance.
Compliance Management
Compliance, ensuring that all resources meet the gold standard configuration, is another important aspect of
management. Compliance with organizational standards or best practices ensures efficiency, maintenance, and
ease of operation. Compliance frameworks allow reporting and managing industry and regulatory compliance
standards. Any compliance violation must be appropriately reported or escalated based on the severity.
Monitoring
Performance Monitoring
Development and operations teams need adequate information to find and fix application issues fast. They need to
identify key resource and software bottlenecks before they impact the end user and understand actual, not
assumed, application flow by following complex software requests across all application components. In a hybrid IT
environment where DevOps is more prevalent, developers need broader access to data that is traditionally
unavailable to them.
Synthetic monitoring is website monitoring that is done using web browser emulation or scripted recordings of web
transactions. Behavioral scripts (or paths) are created to simulate an action or path that a customer or end-user
Troubleshooting issues require deep transaction visibility from components distributed across Cloud and on-
premises. Root cause analysis requires drilling down transaction traces to identify the root cause.
Another important aspect of performance monitoring is to monitor the performance of applications from the
perspective of the user. Slow page loads, high page response times and failed transactions frustrate the application
users. Monitor and track page performance and get deep visibility into all web pages and transactions to be able to
troubleshoot end user performance issues.
System Monitoring
System Monitoring refers to monitoring heterogeneous infrastructure components on premises and in the cloud.
Another key objective is to forecast capacity requirements. Agents are deployed to monitor infrastructure
components.
Always-on monitoring is the ability to continuously monitor critical target status even during the downtime of the
management instance and generate alerts and notifications to the administrator. Capacity planning requires
monitoring resource usage and analyzing the capacity requirements and trends. The objective of infrastructure
monitoring is to monitor your entire IT infrastructure, on premises or on the cloud, from a single platform using agent
based architecture.
Log Monitoring
Log files collect a wealth of enterprise information that can provide great business and operational insight if properly
harvested and processed. In a Hybrid IT environment log files from the Cloud and on premises need to be
consolidated. The objective of log monitoring is to monitor, aggregate, index, analyze, search and explore, and
correlate all log data from applications and infrastructure (both On-premises and Cloud) in real-time.
The first step in log monitoring is to parse the log files. Parsing is the ability to read various log files and pick the
information needed for analytics. Prebuilt parsers can be used to parse known target types. Generic parsers can be
used when a purpose-built parser is not available. Meaningful business insight requires you to aggregate the logs
from various sources based on different criteria (e.g. topic or timestamp) and present a unified view. Another useful
capability is topology aware search and exploration of targets and log files with which log files can be effectively
searched and useful information can be harvested.
Clustering uses algorithms to dynamically group relevant information together. This is a very useful capability to
isolate and troubleshoot issues.
Figure 6 - Security
Security threats can be proactively detected based on pattern recognition and machine learning. When security
threats are detected, you need the ability to investigate using security data analysis. Next steps are to identify the
source and scope of the breach, remediate any problem, capture and preserve security information, and harden the
enterprise from future attacks.
Security Operations Center (SOC) efficiency can be optimized by expediting threat hunting and investigation.
Intuitive SOC analyst workflows ensure seamless navigation from high-level dashboard views to user or threat
specific activity details. Similarly, analysts can leverage dynamic object linking functionality to pivot into user, asset,
or threat intelligence views for on-demand investigative context.
Multi-dimensional behavior baselining is about presenting multiple user behavioral attributes for simultaneous
learning/clustering. Examples of these rich multi-factor baselines are logins, location, time of day, and browser.
The higher possibility of vulnerability and increased security threat in a Hybrid IT environment makes it important to
be aware of users and user sessions in order to investigate issues faster. User session awareness requires
capturing IPs, correlating facts from various sources (such as DHCP, VPN, IDM, host, other logs) and enabling
machine learning of activity by identity and by actual user levels.
A cyber kill chain refers to the sequence of steps a hacker would take to execute a cyber attack. Every organization
must have controls in place to identify the kill chain patterns and mitigate any potential security threats. Discovering
and visualizing the kill chain is a key capability that allows them to do it.
Topology awareness enables detecting multi-tier attacks and lateral movement in applications. The ability to quickly
and reliably detect lateral movement in the network is one of the most important emerging skills in information
security today. Lateral movement refers to the various techniques attackers use to progressively spread through a
network as they search for key assets and data.
A powerful analytics engine that combines rules based correlation and machine learning driven anomaly detection is
required to handle the broadest range of threat patterns.
Analytics
Management, Monitoring, and Security all of these domains need analytical capabilities in order to be useful.
Analytical capabilities are not built separately but rather built into each of these areas. Since these capabilities are
common across these domains, they are addressed separately in this section.
Modern Hybrid IT operations management solutions include these capabilities to add value and to differentiate from
the traditional management tools. These capabilities may not only help run the SOC more efficiently but in many
cases may provide significant business advantage over the competition. Figure 7 shows the key analytical
capabilities.
Figure 7 - Analytics
Machine learning helps analyze data to identify patterns and generate predictive models to produce reliable,
repeatable decisions and results. Machine learning is based on an algorithm or set of algorithms that enable a
computer to recognize patterns in a data set and interpret those patterns in actionable ways. Machine learning
techniques shift the burden from the user asking the right questions to these applications finding the right context-
sensitive answers that the user needs to know.
Anomaly detection is about identifying outliers, the data elements that do not conform to an expected pattern. This
helps in detecting issues before they become major incidents.
Predictive modeling or forecasting allows making predictions of the future based on past and present data and
analysis of trends. This is especially useful in areas such as capacity planning. In a Hybrid IT environment, service
usage must be closely monitored to better plan the future capacity requirements. You need to capture and analyze
the resource usage (e.g. Host, Middleware, Database) to ensure that services dont run into bottlenecks.
Performance analytics focuses on analyzing performance and availability data. You need to identify whether
performance issues are widespread or localized, understand patterns of performance overheads and their cause
such garbage collection, heap issues, connection leaks, etc.
Correlation is another important capability that is used in multiple domains. It involves identifying relationships
between data from various sources to make intelligent decisions.
Data Explorer capability allows you to easily search, browse, compare and contrast systems data and correlate that
with business metrics to acquire insight on your applications and systems. Use simple drag-n-drop mechanism to
build advanced analysis such as aggregation, trending, correlation, categorical, overlay, seasonality, forecasting,
clustering, etc.
Purpose built and customizable dashboards are needed by management and monitoring consoles for unified view of
operational information. Vital information is presented using easily readable graphic representations. The idea is to
present appropriate data in the appropriate format based on the role of the user. Executive dashboards present high
This view shows both the on-premises hybrid management solution managing on-premises and Cloud resources,
and Cloud based operational management solution that monitors heterogeneous resources.
The key logical components of the on-premises hybrid management architecture are described below. These
components are shown in dark red boxes and interactions are shown with red lines.
The Management Agent is an integral component that enables the conversion of an unmanaged host to a managed
host. The Management Agent monitors the targets running on that managed host. Management agents are usually
extended using Plug-ins to support various target types.
Management Service is an application that orchestrates with the Management Agents to discover targets, monitor
and manage them, and store the collected information in a repository for future reference and analysis. It also
renders the user interface for the management console.
Hybrid Cloud Gateway Agents are the Management Agents that provide a communication channel between the
Cloud virtual hosts and the management service deployed in the private network.
Hybrid Cloud Gateway Proxy is the proxy process deployed as part of every Hybrid Cloud Agent deployment that
enables the Hybrid Cloud Gateway Agent to communicate with the Hybrid Cloud Agent. The proxy process is always
initiated by the Hybrid Cloud Gateway Agent.
Hybrid Cloud Agents are the Management Agents deployed on the Cloud virtual hosts that enable the management
service deployed in the private network to monitor and manage Cloud targets.
Hybrid Cloud Agents use the Hybrid Cloud Gateway Proxy on the Oracle Cloud and the Hybrid Cloud Gateway
Agents on the on-premise side to communicate with the on-premise management service. The Hybrid Cloud
Gateway Proxy receives requests from the Hybrid Cloud Agent and streams the requests to the Hybrid Cloud
Gateway Agent. The Hybrid Cloud Gateway Agent forwards the requests to the on-premise management service,
and streams the responses it receives from the on-premise management service back to the Hybrid Cloud Gateway
Proxy. The Hybrid Cloud Gateway Proxy then sends the responses back to the Hybrid Cloud Agent.
The key components of the Cloud based operations management architecture are the Cloud Agent, the Data
Collector, and the Gateway. These components are shown as red boxes and interactions are shown as grey arrows
in the logical diagram.
The Gateway buffers analytic data and sends to the management Cloud via proxy. All communications with the
Management Cloud happens through the Gateway. The Data Collector extracts target, configuration, metric and
incident data from the repository and sends the data to Gateway. The Cloud Agent collects and sends the logs and
data from the targets to the Gateway. The targets may be on premises, in Oracle public Cloud, or in third party
public Cloud.
Figure 9 maps Oracle on-premises management offerings to the capabilities view. Oracle Enterprise Manager with
the newly added Hybrid Cloud management features provide a number of management and monitoring capabilities
discussed earlier. Enterprises and Cloud service providers can use Oracle Enterprise Manager to build and operate
their Cloud services. The functionality provided by Enterprise Manager spans the entire Cloud lifecycle and allows
you to setup and manage any type of Cloud service.
Oracle Management Cloud services offer several modern Cloud based operational management capabilities as
mapped in Figure 10. Oracle Management Cloud includes several independent but related Cloud services that
provide modern Cloud-based management capabilities.
Figure 11 shows the mapping of Oracle management offerings to the logical view described earlier. Again, the two
key offerings, Oracle Enterprise Manager and Oracle Management Cloud are shown in the mapping. Although there
are some overlapping features between these, they are uniquely positioned to support the two operations
management perspectives that were discussed earlier in this whitepaper.
The Cloud services and products mapped in the product mapping view are described below.
Oracle Enterprise Manager streamlines and automates complex management tasks across the complete cloud lifecycle.
On-premises administrators can monitor and manage public cloud services, and vice versa. Oracle Cloud services are
managed by the same Oracle Enterprise Manager tools that customers use on-premises to monitor, provision, and maintain
Oracle Databases, Engineered Systems, Oracle Applications, Oracle Middleware, and a variety of third-party systems. This
additionally eliminates the costly consequences of purchasing and learning numerous new tools to manage enterprise
hybrid clouds.
Enterprise Manager Cloud Control now provides you with a single pane of glass for monitoring and managing on-premise,
Oracle Cloud, and Oracle Cloud Machine deployments, all from the same management console. By deploying
Management Agents onto the Oracle Cloud virtual hosts serving your Oracle Cloud services, you are able to manage
Oracle Cloud targets just as you would any other targets. The communication between Management Agents and on-
premise Oracle management service instances is secure from external interference. Support is provided for managing
Oracle Database and Fusion Middleware PaaS targets, as well as JVMD support for monitoring JVMs on your Oracle Cloud
virtual hosts. Oracle Enterprise Manager Cloud Management includes the following key features:
Configuration management including Search and Inventory, comparison between on-premise and cloud
instances, configuration history, and compliance
One-off patching of Oracle Cloud database instances, Database and Java PaaS instances monitoring
Applications are at the core of businesses today. Poor application performance can impact the brand perception in
the marketplace and the bottom line. With Oracle Application Performance Monitoring Cloud Service be alerted to
end user impacting issues and have the information to solve application problems faster.
Machine learning
Dashboards
APM integration
Purpose-built dashboards
Extensible monitoring
Dashboards
Dynamic runtime Cloud asset discovery eliminates manual target reconciliation for reoccurring evaluations
Conclusion
IT Operations Management (ITOM) has always been a critical area for IT organizations to deliver business value.
Adoption of Cloud and the advent of new concepts like DevOps, Cloud-native architecture, containerization, and
automation have a paramount impact on traditional ITOM. This impact is coming from two different perspectives.
The first is being able to manage Cloud and on-premises resources alike. Most organizations are considering
migrating to the Cloud if they are not already doing so. ITOM needs to be extended to support this shift by managing
and monitoring the resources whether they are on premises or in the Cloud, ideally through a single pane of glass.
The second perspective is the modern operational capabilities offered by SaaS management solutions. Oracle is
offering value-added SaaS solutions to support the traditional ITOM functions such as Application Performance
Monitoring and infrastructure monitoring. These services use sophisticated algorithms and machine learning to
provide a level of insight that was not commonly available before.
This whitepaper discussed how the traditional ITOM needs to evolve into Hybrid IT Operations Management
(HITOM) by understanding and supporting the Hybrid IT use cases and the capabilities required to manage and
monitor in a Hybrid IT environment. Oracle offers on-premises and Cloud management solutions to help you design
and implement a strategic Hybrid IT operations management platform.
CONNECT W ITH US
blogs.oracle.com/oracle
Copyright 2016, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only, and the
contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other
facebook.com/oracle warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or
fitness for a particular purpose. We specifically disclaim any liability with respect to this document, and no contractual obligations are
formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any
twitter.com/oracle means, electronic or mechanical, for any purpose, without our prior written permission.
oracle.com Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and
are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are
trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group. 0116