Roadmap 2
Goals:
Milestone: node pools sizing, node affinity, and cloud burn, operations
Milestone: development productivity
Repository Organization
Application platform services
Database configuration ( schemas, users)
Terraform
for AWS (CDN, IAM, SQS)
for DO (k8s)
Node Affinity
Select an approach and develop rules for assigning pods to appropriate node pools
Microservices -> Microservices node pool
APIs
Consumers
Cron jobs
Cloud Services -> Cloud services node pool
RabbitMQ
Ambassador
Airflow
CKAN (?)
Monitoring Services -> Monitoring node pool
ELK Stack
Prometheus
Grafana
Efficient Operation
Identify automated processes for managing resources (like indices) and cleanup
Access
Develop an approach to easily provide access to resources for developer productivity.
Types of access
DO Control Panel (dev)
k8s Control Panel (dev)
User specific databases
Development database access
IAM (S3 buckets)
RabittMQ development queues
ELK stack
Dev,test available to all developers who are part of the github organization
Production, only certain developers when required
Grafana -> Solved via Github
All 3 environments
Etc
Access Request / Granting Process
Elevated Access team for people who get access to stuff
Issue tracking
Developer Productivity
Environments
Develop an improved approach to provisioning development environments with multiple microservices.
Grafana dashboards
Documentation on how to use the observability tools (grafana, ELK stack)
Service kits
Have pipeline check yaml for k8s in PR workflow to validity and disabled settings
No imagepullpolicy=always
Require node affinity
Make sure kustomize runs successfully
Lint for template manifests that have not been populated with deployment values
https://argo-cd.readthedocs.io/en/stable/
Cloud Burn
Observability - resources are expensive
Should we filter while logs are going into the ELK stack.
Operating Cloud Services
Improve airflow DAG deployment strategy
Set up rabbitmq admin panel as reverse proxy and expose via mapping in ambassador
Microservices Auth
auth0 - paid
okta - paid
keycloak - open source
source a keycloak expert to help with the correct setup
Application Level Monitoring
Can we capture metrics for application level activities, like number of tokens transferred, new captures ingested, website access, active organizations in admin panel.
Terraform
Protect Credentials in encrypted files
Separate customizations per environment for each module
Collect all terraform into one directory
Create utility scripts
Notes:
Node Affinity
Node Pools
We need bigger nodes for monitoring, identify available funds now
DevOps/Developer Productivity process and leader(s)
Provisioning resources
Initial automation setup
Knowledge base tools / ticket
feature diff when Deploying services into test / production
End to end testing
Partial staging - admin, map, wallet
Notify github action when a deployment fails on the k8s side
Airflow engineers - what do they need
Organize/Standardize Terraform
ELK stack for audit logging ( persistence and naming scheme, logging approach)
Additional Services
CKAN
Last updated