Roadmap 2

Goals:

Milestone: node pools sizing, node affinity, and cloud burn, operations

Milestone: development productivity

Repository Organization

Application platform services
Database configuration ( schemas, users)
Terraform
- for AWS (CDN, IAM, SQS)
- for DO (k8s)

Node Affinity

Select an approach and develop rules for assigning pods to appropriate node pools

Microservices -> Microservices node pool
1. APIs
2. Consumers
3. Cron jobs
Cloud Services -> Cloud services node pool
1. RabbitMQ
2. Ambassador
3. Airflow
4. CKAN (?)
Monitoring Services -> Monitoring node pool
1. ELK Stack
2. Prometheus
3. Grafana

Efficient Operation

Identify automated processes for managing resources (like indices) and cleanup

Access

Develop an approach to easily provide access to resources for developer productivity.

Types of access
1. DO Control Panel (dev)
2. k8s Control Panel (dev)
3. User specific databases
4. Development database access
5. IAM (S3 buckets)
6. RabittMQ development queues
7. ELK stack
  1. Dev,test available to all developers who are part of the github organization
  2. Production, only certain developers when required
8. Grafana -> Solved via Github
  1. All 3 environments
9. Etc
Access Request / Granting Process
1. Elevated Access team for people who get access to stuff
2. Issue tracking

Developer Productivity

Environments

Develop an improved approach to provisioning development environments with multiple microservices.

Grafana dashboards

Documentation on how to use the observability tools (grafana, ELK stack)

Service kits

Have pipeline check yaml for k8s in PR workflow to validity and disabled settings

No imagepullpolicy=always
Require node affinity
Make sure kustomize runs successfully
Lint for template manifests that have not been populated with deployment values

https://argo-cd.readthedocs.io/en/stable/

Cloud Burn

Observability - resources are expensive

Should we filter while logs are going into the ELK stack.

Operating Cloud Services

Improve airflow DAG deployment strategy

https://airflow.apache.org/docs/helm-chart/stable/manage-dags-files.html

Set up rabbitmq admin panel as reverse proxy and expose via mapping in ambassador

Microservices Auth

auth0 - paid

okta - paid

keycloak - open source

https://dev.to/tillsanders/how-to-deploy-a-free-auth0-alternative-to-digitalocean-in-5-minutes-2ili
source a keycloak expert to help with the correct setup
https://www.keycloak.org/getting-started/getting-started-kube

Application Level Monitoring

Can we capture metrics for application level activities, like number of tokens transferred, new captures ingested, website access, active organizations in admin panel.

Terraform

Protect Credentials in encrypted files
Separate customizations per environment for each module
Collect all terraform into one directory
Create utility scripts

Notes:

Node Affinity

Node Pools

We need bigger nodes for monitoring, identify available funds now

DevOps/Developer Productivity process and leader(s)

Provisioning resources
Initial automation setup
Knowledge base tools / ticket
feature diff when Deploying services into test / production
End to end testing
Partial staging - admin, map, wallet
Notify github action when a deployment fails on the k8s side
Airflow engineers - what do they need

Organize/Standardize Terraform

ELK stack for audit logging ( persistence and naming scheme, logging approach)

Additional Services

CKAN

PreviousCI-CD NextRabbitMQ

Last updated 2 years ago

Was this helpful?