Recently I have been thinking about some of the common things I have seen organisations do when beginning their cloud journey with Microsoft Azure which may not initially be a problem but eventually when they start realising the benefits of the cloud they find that some decisions and practices that they made in those early stages now give them some problems in trying to move to the next level.

Let’s go through some of these in the context of a fictitious company called Acme.

 


 

Personal Account as the Azure Account Owner

I see this one a lot and its one of the ones I find the most frustrating. What usually happens is that when first starting with Azure one of the developers will setup an azure subscription (probably as a trial) with their email address. The azure resources then all go into this subscription and before you know it dev, test and production are all running in azure subscriptions owned by joe@acme.com.

Why is this a problem?

There are a couple of issues here but the key one is “what are you going to do if Joe leaves”? Well joe@acme.com is really joes account and even though these may be acme’s resources if Joe took ill or left under bad terms he might restrict access to these resources. What are you going to do in that situation?

The 2nd problem is that the Azure AD tenant associated with this account owner is going to belong to Joe and if you decide to do more with this you don’t have any admins on this tenant even if you do have others in the team who are co-admins on the Azure subscriptions.

The 3rd problem is if you just took over joe’s email address the chances are his additional options for password reset and other things would all be setup for joe.

What can I do about it?

If you are in this situation the best thing to do is one of two approaches.

  1. Start a new Azure setup using an email address based on a distribution list for your organisation. Next you would use Azure support to move these subscriptions from one account owner to another so that they came into your new proper account owner
  2. Another option could be to add others in your organisation to the Azure AD tenant that joe would have created and then make them global admins. This would give others in your services team more control over the resources but probably wouldn’t solve the problem of the account owner being joe

If you are using the AD tenant for security of any applications you may have to consider the impact on any application access or RBAC to resources.

 


 

Working as a team on one person’s MSDN

This scenario is similar to the above but the problem here is that the initial account on azure was setup and used Fred@Acme.com’s msdn account. This gives fred some free azure credits every months. These credits are intended to help Fred to learn about Azure but a new project is starting and because fred already has an Azure subscription a decision is made to just put stuff in there. After a few days the blocker of needing azure setup is gone and while the team wait for the billing to go through Acme’s procurement processes they choose to just continue developing in Freds msdn account. Also since Fred gets something like $100 per month this gives us plenty of resource so we don’t need to worry about the bill.

Why is this a problem?

This has the same problem as the above Personal Account as Account Owner scenario and is one of the most common routes to the above situation. The other problem is that you aren’t really using the MSDN credits for what they were intended for.

The other problem is that Fred probably put his personal credit card in when setting up this account so if Fred isn’t careful and the spending cap is removed he might get an unexpected bill.

What can I do about it?

The solution to this situation is the same as for the Personal Account as the Azure Account Owner scenario.

 


 

No Support Plan

In this scenario Acme has been using Azure for 6 months and they have developed some good solutions. They have transitioned these into production and all is well. The partner they were working with has reached the end of the engagement a few weeks ago and moved on. Unfortunately today one of Acme’s Azure resources is not working as expected. Following a period of troubleshooting the team believes it is a problem with Azure. They are getting some heat from users so decide to raise a support ticket with Microsoft. The Acme support team use the Azure portal to raise the support ticket but then realise they can only raise billing support tickets. They then remember back that they hadn’t got around to setting up the Azure support plan.

Why is this a problem?

Well hopefully the above description speaks for itself but its often the case that its only when things are going wrong that an organisation remembers that the support plan hasn’t been sorted out yet and the paperwork is sitting with someone waiting for sign off.

What can I do about it?

I think as a rule every customer who is going live with an Azure project should be setting up a support plan before they went live. They wouldn’t imagine not doing it for other software.

Support plans start from very cheap Id suggest if you don’t have one to take a look at one immediately.

 


 

Developer Candy Box

This challenge is a real problem for architects. In Azure you have a subscription which your developers can use to create a solution. On premise it is likely they will have to liaise with IT Pro’s to create any new resources or servers so each time a new resource is required this prompts a discussion around the reason we need the resource. In the cloud however there are a lot of resources that a developer can just spin up and use without anyone really picking up on it. It is like Stealth Architecture within the IT team.

Why is this a problem?

This can cause a number of problems such as:

  1. 2 difference teams may use different resources to achieve the same task. EG: one team hosts a website on VM’s where as another hosts on App Service Web Apps
  2. When the project is finished you find that a project you thought was only using Azure Web App hosting and Service Bus Relay has somehow also evolved to now be using DocDB, SQL Azure DB, Redis Cache and a Logic App. Where did this come from?
  3. An architectural decision may have been made to choose not to use a certain resource which the developer may have broken. For example the architect may have decided its best for the organisation to host in Web Apps rather than VM’s by default because this lowers the maintenance costs. The developer may have been unaware of this decision and broken the rule

What can I do about it?

In my opinion the root cause of this problem is really down to culture rather than technology. The cloud has empowered developers a lot, but there is a need for architects to change the way they work and to get out of their ivory towers and to get involved and to be a part of the team. Being actively involved in a project means the developers will also engage with you and you will be much closer to the communications in the project teams to see these requirements coming up and to see the challenges that leads to these new resources being needed. A good architect should see these things coming and have a solution or adapted architecture ready for the development team just in time.

Its also important to work with the developers to coach them to understand what is an architecturally significant change so they know when to come and engage with you when they see a need for a new type or resource.

If you are already in the position where the candy box has been raided the problem you probably have is that you don’t know which type of resource was needed and why. Why did the dev team choose SQL DB over Table Storage which on the face of it seems a better solution for requirement X?

In this position you need to get a handle on this project ASAP. You need to catalogue the resources you have, find out where they are used and by what. Id also recommend probably building a new environment for testing from scratch so you can go through the process and workout what is genuinely used and to see if you have the information documented to setup an environment. Make sure its not just in peoples heads.

Once you have got things in a position where it is well understood you now need to look at some constraints where the developers know they are fine to be empowered in these places but you need to be involved as an architect if you go outside of these boundaries.

 


 

Using Live ID’s

When you first start with Azure (unless you already have Office 365 setup) then you will set your first Azure subscription up with a Microsoft Live ID. I mentioned earlier how I believe if you do this you should use a distribution list email address across you service owner team. Once you have your Azure subscriptions setup the next step is around allowing access as co-administrators or access to specific resources using role based access controls. There are a couple of scenarios where you might add a live id:

  • A member of your company who registers their work email address as a live id
  • A member or your company (or a partner) who registers their hotmail/gmail or other email address as their Azure account

Why is this a problem?

There are a number of problems here:

  1. When Joe leaves the organisation I cannot just shut off his account to deny him access to our Azure resources because his joe@acme.com email is registered as a live id and even though we turn his on premise active directory account off, his Live id will still be active and is outside of our control. I would have to go through all of the subscriptions and resources and remove his access to them. Id basically be removing his permission to each resource rather then no longer allowing him to authenticate.
  2. I actually saw a situation where bigboy1234NN@hotmail.com (I cant remember the exact email address but you get the point) was added as a co-administrator on the subscription. The problem here is who the hell is bigboy? It turned out that bigboy was one of the guys who couldn’t get his organisational account working so the dev team had just setup his Hotmail because they could get him working. As people come and go, it turned out no one now knew who bigboy was.

What can we do about it?

The simple thing here is Live ID is great for working as an individual if you’re learning about Azure and this simplicity allows companies to get up and running quickly. As soon as possible though you need to get your team using Organisational/Work ID’s to log into Azure. Ideally you want to also use Azure AD Connect to synchronise the accounts with your On Premise Active Directory inc 2 way password sync so that the team are using the same credentials to log in as on premise. This will give you a solid leavers story when one of the team leaves their access to azure is revoked.

If you’re working with partners than you can potentially add them to your Azure AD as a user from another directory which trusts their Azure AD to look after the user.

In the short term if you have problems with access for your team or for partners you are probably better giving them a .*.onmicrosoft.com email address from your Azure AD rather than adding their live id.

 


 

No one responsible for Cost and Efficiency

When a company is starting to use Azure it is easy to get wrapped up in the features you are delivering and to kind of ignore or forget about running costs. Before you know if you are in production with your first application and have got a couple of test environments running but no one has really asked the question “Are we spending our money efficiently”?

If you are a cloud user now, take a minute and think about your organisation. Ask yourself these 2 questions:

  1. Who is responsible for making sure that the money you have spent on cloud usage is being used effectively?
  2. If you have someone who is responsible for cloud spend, how well do they understand what resources you have and what they are used for?

Why is this a problem?

The most obvious example of this is one company I met who had been successfully delivering projects on Azure and were spending in the region of $1200 per month on Azure. With a little bit of analysis we were able to find a number of cost optimisations which were implemented within a couple of weeks and we got the next months bill down to < $400 with no impact on service. We could have saved more but we would have needed to make some changes to the applications to be even more efficient.

Another example I once came across was an off shore team were having some challenges with the performance of a resource running on an Azure VM so rather than really fixing the problem they just scaled it right up so the performance problem didn’t impact the application they were testing. Suddenly out of no where the customer had a big spike in their bill and didn’t know where this had came from. The root cause of the problem is the old Spiderman quote of “with great power comes great responsibility” and in this case the development team had been given the power to manage their own resources but they had not thought about the fact that they are now responsible for those resources and that scaling up a lot would have a big downstream impact when the month end bill was produced.

What can I do about it?

Fortunately, this one has a number of steps which can be easily implemented.

The first and easiest step you can make is to make someone responsible for cloud spend and to allocate their an appropriate amount of time on a regular basis to make sure that across all projects we are using cloud resources efficiently.

The 2nd easy thing to do if you are an Azure Enterprise customer is to add the Power BI content pack for the Azure billing data so you can regularly get some good analysis of your spend.

The 3rd thing to do is to coach each team that they have this responsibility which comes along with empowerment and that if they think they will have a significant change to spend (up or down ideally) on the resources they use then they should come and talk to the cloud optimization person just so everyone is aware.

The 4th thing to do is to begin an optimization backlog where you can put tasks which are identified to help you spend money better. Maybe you identify some money can be saved by changing the sizes on a redis cache in a test environment. You might not want to do this now but you don’t want to forget that this task needs to be done at some point.

5th, is probably some thinking about the non-obvious costs. As an example its easy to say that resource x costs more than resource y because the Azure bill says so, but you also need to consider other things like the amount of human time spent supporting a component and what impact that may have and the costs associated with changing a component from one scale or feature to another.

6th is all about automation. One of the biggest cost saving areas is around turning off what you don’t use and deallocation of resources which aren’t currently needed. Pushing automation is one of those things that can bring the biggest savings.

Finally the person responsible for spend should also be given time to look at the cost implication of resources. They need to keep up with pricing changes and new services which come out. They might identify that moving a component which was developed as a worker role 2 years ago may not be a lot easier to maintain and cheaper to run if we move it to a web job.

 


 

The everything including the Kitchen Sink Subscription

This is one of my least favourite scenarios. Imagine Acme has developed a couple of Azure projects and has resources for development, test and production. The problem is however that they have put everything in a single Azure subscription. Or perhaps they have put everything for production in one subscription and everything for Dev/System Test/UAT in a 2nd subscription.

Why is this a problem?

The problem here is that the team often forget which resource is used for what. Sometimes you may be fortunate and a good naming convention is used to identify resources but often this is not the case.

From a logic perspective you look at the subscription and its just a mess, you cant tell what is used for what and sometimes resources are even cross used so maybe a website is sometimes used for System Test and sometimes for UAT depending on which way the wind is blowing.

This is a typical environments management problem but just implemented on cloud. The problem is that the poor separation and unstructured approach means that the resources are like a house of cards where no one really knows what uses what and if anyone touches the wrong thing at the wrong time then it all falls down.

What can I do about it?

The reason I dislike this scenario so much is that it is often one of the most difficult to do something about. I think the only real thing to do here is you have to bite the bullet and accept it’s a bit of a mess and you need to rebuild your environments in a structured and logical way. Unfortunately this sometimes means taking resource from other tasks to fix this problem so it is often left to fester until things get really bad. While the project team may still be delivering you have to think of the long term health of the IT departments ability to live with this solution. Once the project team has moved on and the support team need to make a small change you need to think about what position they will be left in to work with an environment like this.

This scenario is also an indicator of the solution being inflexible and difficult to change, probably lacking in documentation and probably lacking in automation if you would find cleaning a messy environment difficult to clean.

 


 

Conclusion

So ive been sat in the airport in Brussels for a couple of hours and got bored so put down some thoughts on this topic. I hope people find this useful but id be keen to hear about the experiences of other people and what common problems they come across. Its really all about trying to help customers make a solid and successful first steps into the cloud but making it in a way where short term gain isn’t going to mean a world of pain in the long term.