You are on page 1of 9

Resources, Services, and Operations Support Systems

The TMF has done some very insightful things in the last decade. One of the places where I split from
the TMF is the boundary point between operations support systems (OSS) and network technology. My
view has been that OSS/BSS should be dealing with functional entities, high-level service-related intent
models, and never dig into the implementation of functions. The TMF has a more OSS-centric model,
though it may be moving slowly to at least acknowledge the possibility that functional-based OSS/BSS
thinking is acceptable. That would make it easier to reflect transformations in infrastructure into the
service domain without breaking the OSS, and that’s a priority for many operators.

Light Reading did a piece on Telstra’s work in this space. Telstra is Australia’s dominant network
operator, and was (and arguably is) enmeshed in the complicated, politicized, and (many say) doubtful
National Broadband Network (NBN) initiative there. NBN has effectively removed the under-network
from operators and planted it in a separate “national business”, and so Telstra is forced to consider
“overlay” services more seriously than most big network operators. For some time, they’ve grappled
with issues that every operator will face, arising from the decline in profits from connection services.

Right now, practically every operator in the world is struggling to create a new network model that
optimizes capex through hosting and white-box devices, and opex through service lifecycle automation.
At the same time, nearly all are trying to make their operations support and business support systems
(OSS/BSS) more agile and responsive. That’s something like trying to threading a needle while break
dancing. Linking these goals, meaning making the OSS/BSS handle the new infrastructure, continues a
resource-to-service tie that’s been a thorn in the side of operators for a decade or more.

The article says that Telstra is hoping to use the TMF’s Open API definitions to build an OSS that’s more
agile and that creates reusable services to limit the number of specialized implementations of the same
thing that Telstra is currently finding in its operations systems. The barrier cited is that there’s not yet
broad industry support for those APIs, which means that Telstra can’t expect to find OSS elements that
support them and can be integrated. That’s an issue, but to my mind there’s another issue that I’ll get
to later.

The no-support issue shouldn’t be surprising for two reasons. First, the TMF is first and foremost a body
to support operations software vendors and provide them a forum for customer engagement. This isn’t
to say that it’s not doing some useful work; I’ve said many times that the NGOSS Contract work the
forum did a decade ago was the definitive, seminal, introduction to event-driven service automation.
The problem is that a lot of the useful work, including NGOSS Contract by the way, doesn’t seem to get
off the launch pad when it comes to actual deployment in the real world.

OSS/BSS vendors are like network equipment vendors; they want all the money for themselves. An
open set of APIs to support the integration of operations software features would at least encourage
buyers to roll their own OSSs from piece parts. It would lower barriers to market entry by letting new
players introduce pieces of the total software package, and let buyers do best-of-breed
implementations. Good for buyers, bad for sellers, in short, and in almost every market the sellers
spend the most on industry groups, trade shows, and other forums.
If Telstra is hoping for a quick industry revolution in acceptance of the TMF’s Open API stuff, I think
they’ll be disappointed, but I don’t think that’s the biggest problem that Telstra faces. For that, let me
quote the Light Reading article briefly as it quotes Telstra’s Johanne Mayer: “We started with one
technology and added another and then NFV came along and helped us add more spaghetti lines. It was
supposed to make things easier and cheaper because it would build on white boxes, but in terms of
management it is more spaghetti lines and that makes it hard to deliver anything fast.” The goal is to
remove resource-level information from OSS systems and emplace it in service domains, and then to
assure that all the service domains map their visions of a given function/feature to a single
implementation (or small set thereof). It’s not easy to do that.

The problem here is one I’ve blogged about many times, and reflected in my ExperiaSphere work (see
THIS tutorial slide set). Operations systems (in the kind of view Telstra wants) are disconnected from
the resources by recognizing that there’s a “service domain” where functions are modeled and
composed into services, and a “resource domain” where the functions are decomposed into resource
commitments. If this explicit separation is maintained, then it’s possible to build services without
making the results brittle in terms of how the services are implemented, and also to provide for
whatever level of multiplicity of implementation of a given function might suit the operator involved. If
there is no separation here, then you can’t remove resource-level information from OSSs because it
would intertwine with service composition.

It’s fairly easy to convey the notion of service and resource domains, and the mechanism for
composition in each domain, if you presume a model-based approach that recognizes the idea that any
given “element” of a service is a black box that might decompose into resources (resource domain), or
might decompose into other elements (which might be in either the service or resource domain). My
presumption has always been that there are “service models” that describe functional composition, and
resource models that describe resource commitments, and that the juncture of the two is always
“visible”, meaning that service-model “bottoms” must mate with resource-model “tops” in some agile
process (that’s also described in the referenced presentation, but more detail is available HERE, where
you can also find a description of service automation, another Telstra goal).

Service/resource separation and compositional modeling are critical if you intend to expose service-
building through a customer portal. If the modeling follows the approach I’ve described, then lifecycle
management tasks are integrated with the models, which means that support is composed in parallel
with composition of the service features. Moreover, the tie-in between the service and resource layers
creates a set of model elements that present service-abstracted properties to service-builders, and
permits any kind of implementation that meets those properties to be linked.

That’s an essential piece of any goal to reduce the integration burden, which Telstra says is now 80% of
the cost of features and that they’d like to see reduced to 20%. Without some master target for feature
implementations to meet, it’s impossible to present a consistent view of features to the service layer
when implementations are different, unless all those implementations were harmonized somehow.
That means that some resource details could not be separated from OSS concerns, as Telstra wants.

The frustrating part of all of this is that nothing that Telstra wants, and nothing that isn’t currently
available in a uniform way, should come as any surprise. These issues predate NFV, as the timing of the
TMF’s NGOSS Contract work proves. The TMF also grappled with them in their own Service Delivery
Framework (SDF) work, another project that did good stuff that somehow didn’t move the market to
implement it, and that again predated NFV.

Back in March of 2014, I did a final report to the NFV ISG on NFV objectives as I saw them, opening with
these four goals:

To utilize cloud principles and infrastructure to the greatest extent possible, but at the same time to
ensure that functions could be hosted on everything from the cloud through a chip, using common
practices.

To support the use of any software component that provided network functionality or any other
useful functionality, without change, as a VNF, providing only that the rights to do this were legally
available.

To provide a unified model for describing the deployment and management of both VNF-based
infrastructure and legacy infrastructure.

To create all of this in an open framework to encourage multiple implementations and open source
components.

These seem to me to be the things Telstra is looking for, and they’re half-a-decade old this month. What
happened? Did the ISG reject these goals, or did they just not get realized as so many good TMF
projects have gone unrealized? The point to me is that there is a systemic problem here, not with our
ability to define models that meet our long-term objectives in service lifecycle automation, but with our
ability to recognize when we’ve done that, and implement them.

The challenge the TMF faces with its Open API Program in extracting resource specificity from services is
similar to the one that the NFV ISG faced in integrating NFV with operations systems. You have two
different communities here—the OSS/BSS stuff with its own vendors and internal constituencies, and
the NFV, SDN, and “network technology” stuff with its own standards groups and CTO people. We’ve
seen both groups try to advance without paying enough attention to the other, and we’re seeing both
fail.

There are some players out there who have taken steps in the right direction. Cloudify has done some
good things in orchestration, and Apstra has made some strides in resource-layer modeling. Ubicity has
a good TOSCA modeling framework that could be applied fairly easily to this problem, and there are
startups that are starting to do work in cloud orchestration and even “NetOps” or network operations
orchestration. All of these initiatives, I submit, are at risk to the same problems that the TMF’s Open API
program faces—sellers drive our market and sellers aren’t interested in open frameworks.

Buyers aren’t much better. The ETSI work on service lifecycle automation should have started with the
discussion of modeling, service and resource separation, and event-driven, model-guided, operations
processes. That was the state of the art, but that’s not what was done, and service providers
themselves drove it. Same for ECOMP. If we’re not getting convergence on the things network operator
buyers of operations automation technology need, it’s because the operators themselves are not
wrestling effective control of the process. They’ve tried with standards, with open-source initiatives,
and they still miss the key points of modeling that you can’t miss if you want to meet your service
lifecycle automation goals.
The Light Reading article ends with Telstra’s Mayer expressing frustration with the lack of those
function-to-implementation connectors discussed above. “There is no such standard as firewall-as-a-
service and I am looking for a forum where we can agree on these things and speak the same language.”
Well, we’ve had four that I’m aware of (the TMF, the IPsphere Forum, the NFV ISG, and the ETSI ZTA
initiative), and somehow the desired outcome didn’t happen, despite the fact that there were explicit
examples created of just what Telstra wants. I think Telstra and other operators need to look into what
did happen in all those forums, and make sure it doesn’t happen again.

Without an open, effective, way of creating the function-to-implementation linkage, there is no chance
that resource independence or no-integration service-building is going to happen, and the TMF alone
doesn’t provide that. My recommendation is that those who like the points I’ve made here use the
ExperiaSphere project material I’ve cited and the other four presentations to either assess the
commercial tools (some of which I’ve mentioned here) or to frame their own vision. All the concepts are
open for all to use without licensing, permission, or even attribution (you can’t use the presentation
material except for your own reference, or use the ExperiaSphere term, without my consent). At the
least, this will offer a concrete vision you can align with your requirements. That may be the most
important step in achieving what Telstra, and other operators, want to achieve.

Finding the Right Path to Virtual Devices

One of the early points of interest for NFV was “virtual CPE”, meaning the use of cloud hosting of
features that would normally be included in a device at the customer edge of services. I’ve blogged a
number of times on the question of whether this was a sensible approach, concluding that it isn’t. The
real world may agree, because most “vCPE” as it’s known is really not hosted in the cloud at all. Instead
it involves the placement of features in an agile edge device. Is this a viable approach, and if so, how
important might it be?

Agile or “universal” CPE (uCPE) is really a white-box appliance that’s designed (at least in theory) to be
deployed and managed using NFV features. Virtual network functions (VNFs) are loaded into the uCPE
as needed, and in theory (again) you could supplement uCPE features with cloud-hosted features. One
benefit of the uCPE concept is that features could be moved between the uCPE and the cloud, in fact.

What we have here is two possible justifications for the uCPE/vCPE concept. One is that we should
consider this a white-box approach to service edge devices, and the other that we’d consider it an
adjunct to carrier-cloud-hosted NFV. If either of these approaches present enough value, we could
expect the uCPE/vCPE concept to fly, and if neither does, we’ll need to fix some problems to get the
whole notion off the ground.

White-box appliances are obviously a concept with merit, as far as lowering costs are concerned.
However, they depend on someone creating features to stuff in them, and on the pricing of the features
and the uCPE being no greater than and hopefully at least 20% less than, the price of traditional fixed
appliances. According to operators I’ve talked with, that goal hasn’t been easy to achieve.

The biggest problems operators cite for the white-box model are 1) the high cost that feature vendors
want for licensing the features to be loaded into the uCPE, and 2) the difficulties in onboarding the
features. It’s likely the two issues are at least somewhat related; if feature vendors have to customize
features for a uCPE model, they want a return. If in fact there are no “uCPE models” per se, meaning
that there are no architecture or embedded operating system standards for uCPE, then the problem is
magnified significantly.

You could argue that the NFV approach is a way out of at least the second of these two problems, and
thus might impact the first as well. Logical, but it doesn’t seem to be true, because both licensing costs
and onboarding difficulties are cited for VNFs deploying in the cloud as well. Thus, I think we have to
look in a different direction. In fact, two directions.

First, I think we need a reference architecture for uCPE, a set of platform APIs that would be available to
any piece of software on any uCPE device regardless of implementation or vendor. Something like this
has been done with Linux and with the Java Virtual Machine. Suppose we said that all uCPE had to offer
an embedded Linux or JVM implementation? Better yet, suppose we adopted the Linux Foundation’s
DANOS? Then a single set of APIs would make any feature compatible with any piece of uCPE, and we
have at least that problem solved. There are also other open-device operating systems emerging, and in
theory one of them would serve, as long as it was open-source. Big Switch announced an open-source
network operating system recently, and that might be an alternative to DANOS.

The second thing we need is an early focus on open-source features to be added to the uCPE. I’ve
always believed that NFV’s success depended on getting open-source VNFs to force commercial VNF
providers to set rational prices based on benefits they can offer. No real effort to do that has been
made, to the detriment of the marketplace.

These steps are necessary conditions, IMHO, but not sufficient conditions. The big problem with uCPE is
the relatively narrow range of customers where the concept is really viable. Home devices are simply
too cheap to target, which means only business sites would be likely candidates for adopting the
technology. Then you have the question of whether agile features are valuable in the first place. Most
enterprise customers tell me that they believe their sites would require a single static feature set, and a
straw poll I tool in 2018 said that the same feature set (firewall, SD-WAN, encryption) was suitable for
almost 90% of sites. We’ll have to see if a value proposition emerges here.

Let’s move on, then, to our second uCPE possibility. The notion of uCPE as being a kind of outpost to the
NFV carrier cloud has also presented issues. Obviously, it’s more complicated to populate a uCPE device
with features if you have to follow the ETSI NFV model of orchestration and management, and so having
uCPE be considered a part of NFV is logical only if you actually gain something from that approach.
What, other than harmonious management where features might move between uCPE and cloud, could
we present as a benefit? Not much.

Operators tell me that they have concerns over the VNF licensing fees, just as they have for the white-
box model. Some are also telling me that the notion of chaining VNFs together in the cloud to create a
virtual device is too expensive of hosting resources and too complex operationally to be economical.
Onboarding VNFs is too complex, again as it is for white-box solutions. They also say their experience is
that enterprises don’t change the VNF mixture that often, which means it would be more cost-effective
to simply combine the most common VNF configurations into a single machine image.
The solution to these problems seems straightforward. First, you need that common framework for
hosting and to encourage open-source VNFs, the same steps as with white-box uCPE. Second, you need
to abandon the notion of service chains of VNFs in favor of packaging the most common combinations
as machine images. One operator told me that just doing the latter improves the resource efficiency
and opex efficiency by 50% or more.

The common thread here is pretty clear. More work needs to be done to standardize the platform for
hosting vCPE, both on-prem (in uCPE) and in the cloud. If that isn’t done, then it’s likely that neither
framework for vCPE will be economically viable on a broad enough scale to justify all the work being put
into it. Second, the best source for VNFs is open-source, where there are ample business models out
there in sale of support for operators to mimic. In addition, commercial software providers would be
more likely to be aggressive in VNF pricing if they knew they had a free competitor.

It would have been easy to adopt both these recommendations, and the “one-machine-image” one as
well, right at the first, and I know the points were raised because I was one who raised them. Now, the
problem is that a lot of the VNF partnerships created don’t fit these points, and operators would have to
frame their offerings differently in order to adopt them today. The biggest problem, I think, would be
for the NFV community to accept the changes in strategy given the time spent on other approaches.

It would be smart to do that, though, because the DANOS efforts alone would seem to be directing the
market toward a white-box approach. If that’s the case, then the APIs available in DANOS should be
accepted as the standard to be used even for cloud-hosted VNFs, which would make VNFs portable
between white boxes and the cloud. It would also standardize them to the point where onboarding
would be much easier.

To make this all work, we’d need to augment DANOS APIs to include the linkages needed to deploy and
manage the elements, and we’d also need to consider how to get DANOS APIs to work in VMs and
containers. A middleware tool would do the job, and so the Linux Foundation should look at that as part
of their project. With the combination of DANOS available for devices, VMs, and containers, operators
would have the basis for portable data-plane functions hosted in the cloud or uCPE.

The icing on the cake would be to provide P4 flow-language support on DANOS in all these
configurations. P4 could be used to create specialized switch/router features anywhere, and could also
(perhaps with some enhancements) be used to build things like firewalls and VPN on-ramps. Given that
the ONF at least is promoting P4 on DANOS and that AT&T originated DANOS (as dNOS), getting broad
operator support for this approach should be possible.

Vendors with competitive network operating systems (like Big Switch) would need something like the
points I’ve cited here just to bootstrap themselves into credibility. There are already enough options in
the space to confuse prospective users, and none of them so far have really taken aim at the major
value propositions that would justify them. If we had a bit of a feature-value war among these vendors,
it would help elevate the discussion overall, even if it did magnify near-term confusion over selection.

If all this is technically possible and if we could get the framework into place, who would buy into it?
Some operators like AT&T likely would, but another strong possibility is the US Government. The feds
are always looking for ways to get more for less, and they’re supporters of open-source and standard
frameworks with interchangeable parts. Even if operators might drag their feet, government support
could create direct deployments and at the same time push operators to support the same architecture.

I firmly believe that this is the right way to do virtual devices, including vCPE. Despite the fact that
things didn’t get off to a smooth start, I think the approach could still be promoted and help drive vCPE,
uCPE, and even NFV forward.

Turning “Hype Cities” into “Smart Cities”

Smart cities are an idea that generates lots of excitement, perhaps in part because everyone has their
own view of what the term means. I’ve been surprised to find that many of the technologists I’ve talked
with see a smart city as one where practically everything is measured or viewed by a sensor open on the
Internet. This contrasts with the view of most of the experts I know, who think that vision has zero
chance of success. Everyone does agree that we have to step beyond the usual overpromotion; this is
about “smart” cities not “hype” cities, after all.

I don’t want to beat a dead horse on the open-Internet approach, but it is important to understand
what’s behind it. This camp sees IoT as a kind of successor to the Internet. We had open connectivity
and hosting with the Internet. It spawned a whole industry of what we’d call “over-the-top”
applications and players, in no small part because the open connectivity eliminated what would
otherwise have been a big barrier to market entry. The same, say the open-Interneters, could happen
with IoT. Simple.

The contrary view is pretty simple too. The investment needed to create the initial “open Internet” had
already been made by telcos and cable companies. With IoT, the sensors are not out there to exploit,
they’d have to be deployed and connected at considerable cost. Who would invest to create an open
sensor community, the contrarians ask? Then there’s the issue of security and privacy. How would you
ensure that sensors weren’t hacked, and that people didn’t abuse them by (for example)
tracking/stalking others?

You can see arguments for both sides here, but suppose we could figure out a way of uniting the
positions. Could a smart-city architecture provide financial sensibility, security, and privacy and at the
same time create an open community whose attempts to build their own value would end up building
value for all? It might be possible.

The first thing we’d need is a solution to the “first telephone” problem, the old saw that says that
nobody will buy the first telephone because they’d have nobody to call. A smart city needs smarts, and
needs them fast, or there’s no credibility to participation. That’s one reason why the problem of getting
an ROI on “open sensors” is so critical.

There’s a possible solution. We have literally billions of sensors out there today. Most proposed smart
cities are already made up of buildings and homes that have some smarts. Most of those have private
sensor and controller devices installed, accessible remotely via the Internet. However, most of the
information those sensors collect are truly private to the facility owner, tenants, or both. What’s
needed, first and foremost, in any smart city strategy is cooperation from current sensor owners. That
means identifying information that could be shared without risking privacy, and identifying an
architecture for collecting and sharing it.

Suppose you have one of those fancy Internet home thermostats. You set a temperature using your
phone and the heat or air conditioning work to match what you want. Obviously, you don’t want
anonymous third parties setting your heat and air, but there are some things you might be willing to
accept.

Example: Suppose your power/gas company would give you a rate adjustment if they had the right to
change your thermostat setting during an emergency. Many users would accept the idea, as long as
they could agree on just how much they’d get and how much of a change the utility was allowed to
make. Example: Suppose your heating is set at 72 degrees, but your thermostat reading says the
temperature in the home has increased from 72 to 85 in just ten minutes. Could there be a fire or a
major malfunction? Would you be willing to allow that condition to be reported to you, or to a security
company? Example: Would you allow the video feed from your video doorbell or security cameras to be
made available to public safety personnel or security contractors under controlled conditions? Example:
Suppose that every video doorbell and Internet thermostat in a given area suddenly dropped contact.
Could that be an indication of a power problem?

The point here is that the right way to start a smart city initiative is to identify things that IoT would be
able to do based on what’s already deployed in facilities. While an individual city could define a
specification for how a home/building security or facility control system would expose this information,
a better approach would be for a standards body to define both a collection mechanism that home
systems or their vendors could elect to install, and a distribution system to control how that information
can be shared. Cities could then adopt the standards, provide incentives to expose the information,
even require some classes of facilities to share information.

The obvious next step in this process would be to create a “trusted agent”. Smart thermostats and
security systems often store information in the cloud and work in a kind of three-way relationship
between the device, the device vendor, and the device owner. We can envision smart-city IoT starting
with a series of services, represented by APIs and hosted by a trusted entity, that would “publish” sensor
information in a variety of forms.

The obvious question is who will provide these services. This is what network operators should be
looking at in the IoT space, not promoting a host of 5G-attached sensors. The latter space would almost
surely require the operators to invest in the sensors themselves, which of course kills the revenue
associated with the deployment. It would also surely result in legal/regulatory action to open the
sensors to all, which would make the whole deployment a big money pit. The former would be an
opportunity for operators to get into a high-level service space before the OTTs did.

All of the major public cloud providers (Amazon, Google, IBM, Microsoft, and Oracle) could reasonably
decide to get into this business too. IoT-related services would not only provide an early lead in the
realistic IoT deployment model, it would buff up the providers’ credentials in the enterprise hybrid cloud
space. Apple, who is clearly trying to find a revenue secret that doesn’t boil down to “Yuppies buy more
iPhones”, could get smart and make a bet in the space. IBM might be the most obvious player to move
here, given that it’s been a player (or at least prospective player) in the smart cities space for some time.
The big lessons to be learned here are first that we’re not going to get to smart cities because cities or
others bite the bullet and pay big bucks to deploy a bunch of open sensors, and second that the path to
a smart city is by expanding from a smart-building base. Once those points are accepted, I think it’s
fairly easy to plot a rational path forward, which I’ve tried to do here. If they’re not accepted, then we
have IoT problems to face down the road.

You might also like