Posts by drtooraj

I love systems

Coded Growth

After two months of endless work, on a sunny spring morning PG achieved a big milestone. All the one thousand sensors where successfully connected to the IoT hub and were transmitting force vectors at a a rate of 1000 per second equating to 24MBS or 35GB per day. Wasn’t it anything other than a highly motivated cofounding team, that day would have been spent with celebration and opening champagne bottles but to John and Megan success was only defined as living in the skies. So right after a passionate hug, they began thinking about the next steps. Now they had to start analyzing the massive amount of data and build a model that could be used to specify the position of the floating car in the sky. They then could use this model to predict the object location based on the realtime data the received from sensors. Finally they needed to build simulations that went beyond just John’s Tesla and demonstrated how an entire city floating in the sky.

Megan started looking at options available on Azure for each of the problem at hand. First for building the model, Azure ML Studio seemed to be a simple tool but its limit of only allowing using 10GB to train data made it off the list. A solution that could work for them was to provision several data science virtual machine (DVSM) and use distributed computing to build and optimize the model on them.

The second piece required to predict the location of the floating objects using the model. For that they decided to go with PaaS premium scalable API service that performed 1000 computation a second using the 1000-dimensional force vectors they received from sensors and the model that was built is step 1.

The last piece was the most tricky one. It required a high performance computing (HPC) cluster of VMs with multiple NVIDIA GPUs designed for compute-intensive, graphics-intensive, and visualization workloads.

This required two things. First, they has to raise money since the amount of computational resources needed was way beyond the $150/Month BizSpark quota they were granted by Microsoft. Second, on the technical side, they had to expand the team and also start setting up a network that connected the VMs they were going to provision. Raising money was very easy since investors were lined up to own a tiny piece of the first flying city. John was able to raise $10M in just two weeks. Given that money they started hiring and added Nora a seasoned data science director heading the analytics department. They also brought Thomas as the director of the computations department which was in charge of building the simulation. Ali’s team was split into two departments: internal kept owning the tools they used for their ITAC managed by a new manager called Jim and a new department was added called external managed by another new manager called Layla in charge building the APIs. PG’s IT was now a 6-department, 30 people team and looked like this:



It was time for execution. Magen set up an executive meeting and started it by this statement: “let’s begin coding our departments”. Except for Ali who started smiling, the rest of the directors and managers started rolling their eyes and looking at each other to make sure they had heard Megan correctly. Nora was the first person to break the silence. “Sorry Megan, did I hear you correctly? you said coding our departments!?”. Megan nodded her head. Ali jumped in. “Let me explain. At PG we define our IT department using code and so we call it ITAC. In an ITAC all the resources, hardware and software are coded, stored in code repository, and deployed using dedicated CI/CD pipelines. Does this make sense?” Layla was the next person to react. “But how would I manage my department? Anybody can go ahead and change the code which means my servers are going to change and that means just chaos! I am not sure if …”. Megan interrupted her and said: “I love your sense of responsibility Layla. Now let me tell you how we have been running PG just like that. Each department owns a project in VSTS that contains all of its resources as code. Only your team and your supervisor, Ali and his supervisor, which is me, have access to that project. You have complete power to assign rights and permissions within your team within your project to define who can update the code and who can approve releasing resources to different environments, as simple as that!”. Jim was really intrigued and said “The is fabulous: we own what we code and we code what we own, right?” Megan said “Absolutely, now I am tasking Ali with re-designing the new ITAC structure. Ali, once you are done with it let’s meet with this group again to review, plan, and execute or shall I say it more accurately review, plan, and code”.


A Nimble Organization

Things were moving great a PG specially for John. He has been able to specify the optimal placement of 1000 sensor he wanted to place on on his Tesla to collect the information on the force field the car observed while floating in the air. With help of Megan he had used a batch computing service in Azure allowing to run several heavy computational algorithm to find this optimal solution. He had also brought in interns who were helping him with both the physics theoretical work and also coding the computational algorithms. His team, called R&D, had placed all of the assets they build inside the Org department where Megan owned. The number of tools and algorithm the R&D team was developing was growing very fast and therefore Megan decided to move all of these items into a new department called R&D.

John and Megan were having lunch at their usual place near their office when Megan suggested to move the R&D resource including the BalancingData IoT hub plus the algorithms to a new department owned by John so that he had full control on updating them without requiring Megan to approve the release request every time they had a new deployment (since Megan was the de facto PM and owner of the Org department she had to approve all the releases going to the Prod environment). John did not seem to be happy with what he had heard and went on to say: “is this really necessary? I don’t really wanna lose the time it takes to move these assets. You know Megan we are really short in time and I rather focus my time on doing the important stuff than spending a week or so to move things around. Can we do this some other time when we have less thing to worry about?” Megan knew that time would never come in a busy startup so she said “John, who told you it would take a week or so?” John said “nobody said that but having several services and pipelines I am pretty sure that it would take at least take a week to move them”. Megan said “unlike most of the times you are dead wrong this time. Moving the items won’t take even an hour!” and before letting John to jump in to show his unbelieving face, Megan said “look we are following ITAC. With ITAC all we need to do to move resources is one click on a button!”

In fact, moving resources from one department to another is a golden feature of CloudOrg application. Using CloudOrg, One can pick a department and select the resource type – employees, applications, …” to another – or a new department.



Ali managed to build a web dashboard to manage the structure of the organization and link each department to a PowerBI cost breakdown report, and also to the corresponding resources in VSTS and Azure.

This portal provides complete visibility of the entire organization in terms of:

  1. An org-chart to present all of departments and their managers.
  2. Consumption cost for each department.
  3. IT resources owned by each department for categorized based on environment.
  4. Latest release dates for each department .
  5. Any problem or warning requiring attention to fix the issues.

Although for a small startup it won’t be hard to go to the cloud provider’s portal and a few other places to find this information, as soon as few departments are added it would become necessary to have a single place to view this information at once.

This slideshow requires JavaScript.




Pseudo Gravity’s First Department

Megan was very excited that she was able to build the first resource in their organization, the IoT hub. At the same time she wanted to follow a pattern that could be easily extended as new departments were added to their organization. As mentioned earlier she sought to build three tools that could help them simplify building new resources. At the same time she wanted to make sure she spent her time on building tools that directly helped building the floating car prototype.

Next morning she met John to show the progress she’d made in setting up the hub which make him really pleased. He also demoed the progress he’d made in setting up 1000 wireless sensor on his Tesla and was ready to connect them to the IoT hub. They looked at different ways to connect the sensors to the hub for a few hours and made several decisions. Just before they wanna leave to grab a bite for lunch, Megan explained to John what she had thought about extensibility. John liked the idea but like Megan wanted both of them to stay focused on building the prototype so they decided to add a friend of Megan called Ali to the team. Ali, a graduate from USC, had been Megan’s coworker at Microsoft and had strong DevOps skills and was an Azure certified architect. Megan called Ali and asked if he could join them for lunch. Ali was able to meet them at lunch and did not take any long to convince him to join PG to lead building the internal tools they needed. They decided to establish a department called “internal” and gave Ali the director of software development in charge of the internal department.

Next day Ali and Megan met in the afternoon at a nearby coffee shop and brainstormed building the internal tools. After a few hours, they came up with the following rules to apply to all the departments:

Governance Rules

These rules specify who can do what and are enforced by building specific groups with strict permissions in VSTS.

ITAC #1: There will be one VSTS project per department. For example Org for the root IT department and an internal project for the internal department.

ITAC #2: Create a separate repo for each department/asset combination.

ITAC #3: For each new department create three groups in Azure AD to manage development and testing, project management, and owning the resources. Groups are named using the pattern [Department].[Role]. For example the developers in the internal department will be part of InternalDev group.

ITAC #4: Apply these relationships among the groups:

  1. [ParentDept].[Role] is a member of [ChildDept].[Role]. This will ensure that the parent department has all the permissions given to its children. For example Org.Owner is a member of Internal.Owner giving Megan ownership on all resources in the internal department.
  2. [Department].Owner is a member of all the other groups within the same department to make sure owner has all the permissions that other members of the department have. For example InternalOwner, Ali, would be allowed to write code (inherited from InternalDev), modify the release definition (inherited from Internal.PM).

ITAC #5: A person from [Department].PM team has to manually start a deployment to DEV (no CD for IAC). A member from [Department].Contributor has to pre-approve a QA deployment happen (to make sure she is fine with replacing the existing structure that might be under test). Both [Department].Contributor and [Department].PM have to pre-approve a PROD deployment (to make sure the IAC is vetted by the QA and owner is fine to push to production which requires coordination with other departments ahead of deployment).

ITAC #6: The following rights are defined for member of the three groups. The rights are applied by adding the Azure group to the the corresponding VSTS group: [Department].Contributor to VSTS contributor group and [Department].PM to VSTS project admin group.

  • Contributor: write access to repo, creating test runs, co-pre-approving QA and PROD deployment
  • PM: manage work items, creating build and release definitions, co-pre-approving PROD deployment.
  • Owner: Can do what other groups can do and will have permission to add or remove PMs.

Figure below shows the VSTS group and permissions:


This slideshow requires JavaScript.

ITAC Rules

These rules define how to name, build, and release infrastructure resources and applications.

ITAC #7: The IAC file follows this naming conversion: [Department].[Solution].Arm for example the infrastructure for CloudOrg is called Internal.CloudOrg.Arm.

ITAC #8: IAC provides the same topology for all environments while allowing for variations on size of resources. For example a web app could be using a basic edition of a database in DEV and a standard edition in PROD. To allow this they decided to have a separate parameter file for each environment (similar to having separate configuration files for applications). parameters files are named [ResourceName].parameters.[ENV].json. For example, the web app parameters file in DEV is called WebSiteSQLDatabase.parameters.DEV.json.

ITAC #9: Both AAC and IAC can reside in the same solution.

ITAC #10: Infrastructure and applications have separate release definitions since they are released with different frequencies (Infra is deployed much less frequently that the application). These definitions follow the following naming convention: [Solution]-[Type] where type is either ARM – for infra-  or APP – for application. Release definitions – and – also build definitions – are exported and added to the solution under CICD Definitions folder in the solution.

ITAC #11: The resource group name is built using the combination of organization abbreviation, VSTS project (name after department), the solution name, and the environment. For example: PG-$(System.TeamProject)-BalancingData-$(Release.EnvironmentName)

ITAC #12: All resources are tagged for generating consumption billing breakdown reports. The tag is called dept and set to department’s name by adding a parameter to IAC and providing its value inside the release definition. This simplifies transitioning a resource to another department which only requires moving the repository without any changes to the code or release definition.

Following figures show the solution structure, IAC, and parameters and also the corresponding release definition.


This slideshow requires JavaScript.

Ali started applying these rules in VSTS and promised to have a version of the app the could use to visualize their IT organization in the cloud. Megan suggested to call this app CloudOrg since it could be used to visualize their entire organization. Ali liked the name and drew some wireframes to show what he had in mind on CloudOrg. They spent some time discussing various options. It was around 8PM that Megan felt a bit tired and said it has been a long day for me and I am about to leave, how about you? Ali said: well, it is too early for me to stop working! He then rolled up his sleeves and began coding CloudOrg. Megan giggled and said hasta mañana.





Startup ITAC

PG needed to build the car balancing model as soon as possible to be able to do a demo to investors who were impatiently waiting to see a real world example. John and Megan had picked Megan’s car for the demo which was a Tesla X. Therefore, the first IT assets Rose decided to build were a set of PaaS IoT services that could be used to collect data from the wireless sensors attached to the Tesla. 

At this stage of the development since there is no infrastructure is involved, one can easily go on the cloud portal and provision the required resources with several clicks but our goal it to build ITAC from the ground up which means that we are going to do everything based on the golden standard we have defined: everything as code. The only exception to the golden rule is setting up the subscription itself that has to be done manually. For BizSpark specifically, the person who applied for it, receives an email with instructions to set the subscription and once done she becomes the global admin. The rest of the employees are added as regular users as needed.

Subscription Management

Megan went ahead and set up the subscription and became the global admin. She then did the following:

  1. Registered PG’s own domain, she already bought from
  2. Added herself as and made her both the global admin and the subscription owner.
  3. Added John as as a user.


Having the users set in Azure. She began building the ITAC for the PG. She did it by creating a free account in VSTS called Since she was the global admin of the Azure subscription, the VSTS account was automatically connected to the Azure subscription.

She then added the first project to the VSTS account which she called ITAC which would continue to hold the entire definition of PG’s IT in the years to come.

Under ITAC she built a repository called Org to contain the highest level of assets belonging to CTO. Also to control access to this repo she added the following groups and added her and the only user to all groups.

  • OrgOwner: Have full access to all resources within the organization.
  • OrgDEV: Can update the ARM templates.
  • OrgQA: Can approve a release to QA and co-approve a release to Prod.

VSTS looked like below so far:

This slideshow requires JavaScript.

Next step is to set up the release pipeline which normally includes three environments DEV, QA, and Prod. Whenever the ARM templates are updated, a DEV release is automatically triggered. Deploying to QA requires approval from OrgQA to make sure they are ready to test and deploying to Prod requires approval from both OrgQA and OrgOwner – which at this point means Megan would do all of these by role playing across all.

Once Megan finished the above task, she began to actually add the ARM templates for the IoT hub which collected the data from sensors. Megan did some reverse engineering here. She went to the Azure portal first and configured an Azure IoT hub in the portal and then copied and pasted the generated template and parameters json files into a new cloud ARM project she added in Visual Studio. She decided to follow the following conventions to organize ARM projects and release definitions. 

  1. The solution name is called Org.
  2. Each resource group will have a corresponding project in Org. For example all the resources built for collecting and processing balancing data from sensors will be inside a project called balancingData. 
  3. There will be a separate release task for each resource group.
  4. If a resource group contains multiple resources, each resource will be added in a separate template file. A master template is created that is used in the release task and references the other resources via a URL to their VSTS repo location. (To be tested once she added more resources since at this point she only has a single resource, the IoT hub).

The figure below shows what Megan had achieved so far:

This slideshow requires JavaScript.

Megan felt very content that she had built the first piece of what was going to be extended to define their entire IT soon. But before she would want to go to John to break the great news she thought of adding three more things:

  1. Thinking about extensibility of what she had accomplished for Org to the new departments they would add to their organization in the future, she decided to do some research around how to automate building all the necessary pieces for new projects including provisioning the repository, adding all the necessary groups, and the release pipeline since all needed to follow the same pattern as Org did.
  2. She thought or creating a web portal that provided a hierarchical view of their entire organization. What she had in mind was a org-chart tree where she could start at the top, the org, and drill down into other departments and sub departments and view their allocated resources and the associated groups which basically presented their ITAC. She wanted to build this by extracting metadata from both Azure and VSTS.
  3. In order to have single point of management, she thought of creating a dashboard per department in Azure that provided resource consumption costs, status (working, alerts or potential failures), and the manager of that specific department.





ITAC at PseudoGravity

In order to see the evolution of ITAC, I have decided to use a fictitious story around a startup called AntiGravity. These days Azure and AWS are very generous when it comes to supporting promising startups and both grant a free multi-year subscription to such startups so it makes absolute sense for an early-stage startup to build its IT in the cloud rather than burning personal funds to buy physical hardware.

Let’s begin with telling the exciting story of AntiGravity. John was one of those people who used to lock himself days and nights in the lab using pounds of chuck writing loads of mathematical equations around gravitational waves on the huge blackboard. Being a PhD student at the physics department of MIT, this was not considered abnormal by any means. However, what made John unique was the experiment he did on that cold early morning at his lab in Cambridge. That night it snowed pretty heavily and made the entire city white which also shut down the T red line that John used to take to get home. When John finally got out of his lab around 6AM to figure out how to get to home without any public transportation, he could not believe what he saw: there was no snow around his lab for a radius of 50 feet. All the snow was floating in the air instead. He almost fainted when he suddenly figured out what had happened. He had discovered how to generate pseudogravity waves in his labs which had kept the snow in the air. The next few months he felt really overwhelmed by the sheer amount of interest and intrusion he got exposed to from all around the world.

Finally after a year when things went a bit quiet and he successfully defended his dissertation on anti-gravitational wave, on a beautiful day in early August he met with his best undergrad friend Megan in the Philz Coffee in San Fransisco who had started working at Microsoft Research after graduating from Stanford. After having the first sip of his coffee, John started the conversation by talking about his grand vision: I want to build a city floating in the sky that could save humanity from natural disasters like earthquake and flooding forever.  Megan smiled, held John hand firmly, and said “let do it”. The week after John and Megan started working at their startup called PseudoGravity or PG for short. John, the CEO, was in charge of building a prototype that could hold a car in the sky while Megan, the CTO,  was in charge of programming and IT. They required to perform a massive amount of data analysis to build an accurate mathematical model used to keep the objects balanced in the sky. They also required a lot of wireless sensors connected to the floating objects to control their position in the air. The stream of data collected from sensors was connected to the big data server via an IoT solution. Finally they needed to design a central system which they called “the brain” to calculate the position of objects to avoid collision. Given the amount of hardware needed to run all the components, Rose decided to build their IT on the cloud. A few weeks after applying to Azure BizSpark – Azure’s startup program – PG was granted a 5-year free subscription on Azure.

Such an inspiring story already entices me to leave ITAC completely aside and just focus on finishing the PG story. However, I am going to work on both as the same time. We will see how ITAC evolves to define PG’s IT department from a two-people startup to an organization with more than 10,000 employees and 20 departments all over the world. As the startup grows and requires more IT assets to support its growth, I am expecting to see ITAC evolve smoothly in parallel. Also as we move along, I try to build tools that can be used to build the organization on the cloud and also transition it to the next step as it evolves.

Although not as exciting and grandiose of a vision as John’s, my vision is also a long shot since I am trying to build a theory with tangible assets to systematically design, build, and run an IT department at any size in the cloud.

Cloud-Organization aka IT-as-Code

Imagine 💡

  • Imagine you could copy your entire IT organization into a stick memory.
  • Imagine you could publish your entire IT department (hardware and software), from scratch, with just a single click and in less than an hour.
  • Imagine you would lose zero information when a senior IT manager leaves.
  • Imagine you would exactly know how much each department’s cost is at any specific moment.
  • Imagine you could move all assets (hardware and software) from one department to another with a single click and in a few seconds.
  • Imagine you could communicate what a department head’s responsibility and ownership are by giving him ownership to some specific code in GIT.

Don’t Image Anymore 🙂

In what you will read and see in the Cloud Org blog series all the above imaginations come true. I am proposing a methodology in building and running the IT organizations that will forever change how IT organizations are run.

In order to make it more interesting to read and tangible to grasp I combine the theory with a story of a fictional startup that has a vision to build a city in the sky. I show how my theory expands as this startup expands into a global enterprise.

And guess what? this is not just a theory! As I move along I will introduce tools built by me that realizes what I preach. In fact, I implement my fictional startup in Azure as we go along.

My Vision

Coming from a system and engineering background, I have always been looking at how to systematically run the IT department of the largest enterprises in the world. Public clouds have made it possible to systematically run an IT department at any size by allocating resources via code and defining governance in an exact and tangible manner. These two adjective are of ultimate importance. I am going to briefly talk about each in this blog but before doing so I am going to state my vision around writing these series of the cloud-organization blogs.

If in the past, IT department’s assets the software and the hardware were treated as different types of resources and were managed separately, in the cloud world, they have converged into the same type of resource. These days allocating a set of servers or cluster of databases is not any different than developing a set of enterprise applications. We can use Infrastructure-as-Code (IAC) to define the infrastructure which also include all the security rules and polices also defined as code aka Security-as-Code (SAC). IAC, SAC, and applications (I call the latter Application-as-Code or AAC to be consistent and avoid confusion from now on) are all kept in a code repository (GitHub for example), built and tested using the continuous integration (CI) pipeline, and deployed to various environments (DEV, QA, PROD) using the continuous deployment (CD) pipeline. The pipeline itself is stored as YAML or json code so called Pipeline-as-Code (PAC). A modern IT organization treats all of its assets as code. This new way of conceiving IT is revolutionary and makes administration tasks like business continuity and disaster recovery (BCDR) as simple as deploying the latest version of an application to the production.

I can now specify my vision:

  1. To code the entire IT department (I call this IT as Code or ITAC) and
  2. To specify hierarchies of IT staff each in charge of architecting, developing, maintaining, and releasing a specific level of the ITAC.

Based on this vision, the CTO role is specifically defined as setting up the governance which is to set who is in charge of what portion of ITAC. This definition is exact since all responsibilities of a given head of unit is coded (I call this Unit-as-Code or UAC) and is tangible since when UAC is released it produces hardware and software assets that are managed and owned by the unit. Management is also precisely and consistently defined based on best IT processes like agile or continuous software delivery. This way we can say head of each unit is in fact the delivery manager of his or her UAC.

How the entire ITAC is split among various managers and how efforts among managers are coordinated is what I will try to think and write about in these series of blogs.