使用 Node.js 构建与 GraphQL API 通信的自定义 Ansible 模块

DevOps

文章字数: 3k

阅读时长: 13 分

团队内部维护着一个集中式的配置中心，它通过 GraphQL API 暴露服务。这个配置中心是所有微服务功能开关、A/B 测试参数和动态设置的唯一真实来源（Single Source of Truth）。然而，一个棘手的问题始终存在：如何将这些配置可靠、幂等地同步到那些运行在虚拟机（而非容器）上的老旧应用，以及一些需要文件化配置的基础设施组件上。

现有的流程依赖于一堆脆弱的 Bash 脚本和 curl 命令，它们通过 CI/CD 流水线触发。这种方式缺乏幂等性，重试逻辑复杂，并且状态跟踪几乎为零。每次执行都会全量更新，无论配置是否变更，这不仅效率低下，还让变更审计变得异常困难。

我们需要的是一种声明式的方式。在运维侧，Ansible 是我们的标准工具。我们希望能够像管理其他系统服务一样，通过 Ansible Playbook 来声明配置的状态。比如，我们期望能这样写：

- name: Sync feature flag from GraphQL Config Center
  hosts: all
  tasks:
    - name: Ensure 'new-checkout-flow' feature flag is enabled
      graphql_config:
        api_endpoint: "https://config.internal/graphql"
        api_token: "{{ vault_api_token }}"
        name: "feature.checkout.new-flow"
        value: "true"
        type: "boolean"
        state: "present"

这个设想的核心是 graphql_config 模块。然而，Ansible 官方或社区都没有现成的模块能直接与我们的 GraphQL API 对话。虽然 Ansible 模块通常用 Python 编写，但我们团队的后端技术栈是 Node.js，并且在 Node.js 生态中使用 GraphQL 的经验和工具链远比 Python 成熟。因此，一个大胆但合理的决策被提出：使用 Node.js 构建我们自己的 Ansible 模块。

第一步：理解 Ansible 自定义模块的运行机制

Ansible 模块的本质很简单：它是一个可执行文件，Ansible Controller 将其传送到目标节点上执行。它的工作流程如下：

Ansible 将任务（task）中定义的参数打包成一个文件（通常是 JSON 格式）。
Ansible 在目标节点上执行模块脚本，并将参数文件名作为命令行参数传递给它。
模块脚本读取参数文件，执行核心逻辑。
模块脚本必须向标准输出（stdout）打印一个 JSON 字符串，用以告知 Ansible 执行结果。

关键的返回字段包括：

changed: 一个布尔值，表示目标节点的状态是否发生了改变。这是实现幂等性的核心。
failed: 一个布尔值，表示任务是否执行失败。
以及任何其他需要返回给 playbook 的自定义数据。

一个最基础的 Node.js Ansible 模块骨架如下。它读取参数文件，然后简单地将参数打印回来，并声明状态未改变。

// file: library/graphql_config.js

const fs = require('fs');
const path = require('path');

function exit(result) {
  console.log(JSON.stringify(result, null, 2));
  process.exit(0);
}

function fail(message) {
  console.log(JSON.stringify({ failed: true, msg: message }, null, 2));
  process.exit(1);
}

async function main() {
  const argsPath = process.argv[2];
  if (!argsPath) {
    fail("Ansible arguments file not provided.");
    return;
  }

  let params;
  try {
    const argsContent = fs.readFileSync(argsPath, 'utf8');
    params = JSON.parse(argsContent);
  } catch (err) {
    fail(`Failed to read or parse arguments file: ${err.message}`);
    return;
  }
  
  // 核心逻辑将在这里实现
  // For now, just exit successfully without change
  exit({
    changed: false,
    params_received: params
  });
}

main();

这个脚本需要放置在 playbook 目录下的 library/ 子目录中，Ansible 会自动发现它。

第二步：构建可测试的 GraphQL 服务端 MOCK

在开发模块之前，我们需要一个稳定的、可控的 GraphQL API 环境用于测试。直接对接生产环境的配置中心风险太高。为此，我们使用 express 和 express-graphql 快速搭建一个内存中的 Mock 服务。

这个 Mock 服务需要模拟配置的增、删、改、查操作。

// file: mock-server/server.js

const express = require('express');
const { graphqlHTTP } = require('express-graphql');
const { buildSchema } = require('graphql');

// 内存数据库
const configDB = new Map();
configDB.set('feature.legacy.enabled', { name: 'feature.legacy.enabled', value: 'true', type: 'boolean' });
configDB.set('service.payment.timeout', { name: 'service.payment.timeout', value: '5000', type: 'integer' });

// GraphQL Schema 定义
const schema = buildSchema(`
  type Config {
    name: String!
    value: String!
    type: String!
  }

  type Query {
    getConfig(name: String!): Config
  }

  type Mutation {
    createConfig(name: String!, value: String!, type: String!): Config
    updateConfig(name: String!, value: String!): Config
    deleteConfig(name: String!): Boolean
  }
`);

// Resolvers 实现
const root = {
  getConfig: ({ name }) => {
    console.log(`[Query] getConfig for: ${name}`);
    return configDB.get(name) || null;
  },
  createConfig: ({ name, value, type }) => {
    if (configDB.has(name)) {
      throw new Error(\`Config '${name}' already exists.\`);
    }
    console.log(`[Mutation] createConfig: ${name}=${value}`);
    const newConfig = { name, value, type };
    configDB.set(name, newConfig);
    return newConfig;
  },
  updateConfig: ({ name, value }) => {
    if (!configDB.has(name)) {
      throw new Error(\`Config '${name}' not found.\`);
    }
    console.log(`[Mutation] updateConfig: ${name}=${value}`);
    const existingConfig = configDB.get(name);
    existingConfig.value = value;
    configDB.set(name, existingConfig);
    return existingConfig;
  },
  deleteConfig: ({ name }) => {
    if (!configDB.has(name)) {
      // 在删除场景下，目标不存在也视为成功
      console.log(`[Mutation] deleteConfig: ${name} (not found, idempotent success)`);
      return true;
    }
    console.log(`[Mutation] deleteConfig: ${name}`);
    configDB.delete(name);
    return true;
  }
};

const app = express();
app.use('/graphql', graphqlHTTP({
  schema: schema,
  rootValue: root,
  graphiql: true,
}));

const PORT = 4000;
app.listen(PORT, () => {
  console.log(`Running a GraphQL API server at http://localhost:${PORT}/graphql`);
});

现在我们有了一个功能完备的 Mock API，可以开始模块的核心逻辑开发了。

第三步：实现模块的核心幂等逻辑

这是整个项目的核心。我们需要集成一个 GraphQL 客户端，并根据 state 参数（present 或 absent）来执行不同的操作，同时精确地计算 changed 状态。graphql-request 是一个轻量级、零依赖的客户端，非常适合这个场景。

我们将模块代码 library/graphql_config.js 扩展如下：

// file: library/graphql_config.js

const fs = require('fs');
const { GraphQLClient, gql } = require('graphql-request');

// --- Helper Functions ---
function exit(result) {
  console.log(JSON.stringify(result, null, 2));
  process.exit(0);
}

function fail(message) {
  console.log(JSON.stringify({ failed: true, msg: message }, null, 2));
  process.exit(1);
}

// --- GraphQL Queries and Mutations ---
const GET_CONFIG_QUERY = gql`
  query GetConfig($name: String!) {
    getConfig(name: $name) {
      name
      value
      type
    }
  }
`;

const CREATE_CONFIG_MUTATION = gql`
  mutation CreateConfig($name: String!, $value: String!, $type: String!) {
    createConfig(name: $name, value: $value, type: $type) {
      name
    }
  }
`;

const UPDATE_CONFIG_MUTATION = gql`
  mutation UpdateConfig($name: String!, $value: String!) {
    updateConfig(name: $name, value: $value) {
      name
    }
  }
`;

const DELETE_CONFIG_MUTATION = gql`
  mutation DeleteConfig($name: String!) {
    deleteConfig(name: $name)
  }
`;

// --- Main Logic ---
async function main() {
  const argsPath = process.argv[2];
  if (!argsPath) {
    fail("Ansible arguments file not provided.");
    return;
  }

  let params;
  try {
    const argsContent = fs.readFileSync(argsPath, 'utf8');
    params = JSON.parse(argsContent);
  } catch (err) {
    fail(`Failed to read or parse arguments file: ${err.message}`);
    return;
  }

  const { api_endpoint, api_token, name, value, type, state = 'present' } = params;

  if (!api_endpoint || !name) {
    fail("`api_endpoint` and `name` are required parameters.");
    return;
  }

  const client = new GraphQLClient(api_endpoint, {
    headers: api_token ? { 'Authorization': `Bearer ${api_token}` } : {},
  });

  try {
    // 1. 获取当前状态
    const data = await client.request(GET_CONFIG_QUERY, { name });
    const currentConfig = data.getConfig;

    if (state === 'present') {
      if (!value) {
        fail("`value` is required when state is 'present'.");
        return;
      }
      
      // 2a. 处理 state: present
      if (!currentConfig) {
        // 配置不存在，需要创建
        await client.request(CREATE_CONFIG_MUTATION, { name, value, type });
        exit({ changed: true, name, value });
      } else if (currentConfig.value !== value) {
        // 配置存在但值不同，需要更新
        await client.request(UPDATE_CONFIG_MUTATION, { name, value });
        exit({ changed: true, name, value, old_value: currentConfig.value });
      } else {
        // 配置存在且值相同，无需操作
        exit({ changed: false, name, value });
      }
    } else if (state === 'absent') {
      // 2b. 处理 state: absent
      if (currentConfig) {
        // 配置存在，需要删除
        await client.request(DELETE_CONFIG_MUTATION, { name });
        exit({ changed: true, name });
      } else {
        // 配置不存在，无需操作
        exit({ changed: false, name });
      }
    } else {
        fail(`Invalid state '${state}'. Must be 'present' or 'absent'.`);
    }

  } catch (error) {
    // 统一处理 GraphQL API 的错误
    const errorMessage = error.response ? JSON.stringify(error.response.errors) : error.message;
    fail(`GraphQL request failed: ${errorMessage}`);
  }
}

main();

这段代码实现了幂等性的核心逻辑：

先查后动（Check-then-act）: 每次操作前，先通过 GET_CONFIG_QUERY 查询配置的当前状态。
state: present:
- 如果配置不存在，则创建。changed 为 true。
- 如果配置存在但 value 不同，则更新。changed 为 true。
- 如果配置存在且 value 相同，则什么都不做。changed 为 false。
state: absent:
- 如果配置存在，则删除。changed 为 true。
- 如果配置不存在，则什么都不做。changed 为 false。
错误处理: 任何 GraphQL API 请求的失败都会被捕获，并以 failed: true 的形式返回给 Ansible，中断 Playbook 执行。

第四步：整合与测试

现在，我们可以把所有部分组合起来进行端到端测试了。

目录结构如下：

ansible-graphql-project/
├── library/
│   └── graphql_config.js
├── mock-server/
│   └── server.js
├── node_modules/       # (graphql-request, etc.)
├── package.json
└── sync-config.yml     # Our playbook

package.json 需要包含 graphql-request 和开发依赖 express, express-graphql, graphql。

我们的 sync-config.yml playbook 如下：

- name: Manage Configurations via GraphQL API
  hosts: localhost
  connection: local
  gather_facts: no

  tasks:
    - name: 1. Create a new config key if it does not exist
      graphql_config:
        api_endpoint: "http://localhost:4000/graphql"
        name: "feature.new.dashboard"
        value: "enabled"
        type: "string"
        state: "present"
      register: result
    - debug: var=result

    - name: 2. Run the same task again to verify idempotency (should not change)
      graphql_config:
        api_endpoint: "http://localhost:4000/graphql"
        name: "feature.new.dashboard"
        value: "enabled"
        type: "string"
        state: "present"
      register: result
    - debug: var=result

    - name: 3. Update the value of an existing config key
      graphql_config:
        api_endpoint: "http://localhost:4000/graphql"
        name: "service.payment.timeout"
        value: "10000"
        type: "integer"
        state: "present"
      register: result
    - debug: var=result

    - name: 4. Remove a config key
      graphql_config:
        api_endpoint: "http://localhost:4000/graphql"
        name: "feature.legacy.enabled"
        state: "absent"
      register: result
    - debug: var=result

    - name: 5. Run remove again to verify idempotency (should not change)
      graphql_config:
        api_endpoint: "http://localhost:4000/graphql"
        name: "feature.legacy.enabled"
        state: "absent"
      register: result
    - debug: var=result

执行流程:

在一个终端启动 Mock 服务：node mock-server/server.js。
在另一个终端安装 Node.js 依赖：npm install。
运行 Ansible Playbook: ansible-playbook sync-config.yml。

输出会清晰地显示每个任务的 changed 状态。第一次创建、更新、删除时，changed 会是 true。第二次重复执行相同的任务时，changed 会是 false，这完美证明了模块的幂等性。

为了更清晰地展示整个流程，可以用 Mermaid 图来描述：

sequenceDiagram
    participant Ansible Controller
    participant Node.js Module (on Target)
    participant GraphQL API Server

    Ansible Controller->>Node.js Module (on Target): Execute with params (name, value, state='present')
    Node.js Module (on Target)->>GraphQL API Server: Query: getConfig(name)
    GraphQL API Server-->>Node.js Module (on Target): Response: { getConfig: null }
    Node.js Module (on Target)->>GraphQL API Server: Mutation: createConfig(name, value)
    GraphQL API Server-->>Node.js Module (on Target): Response: { createConfig: ... }
    Node.js Module (on Target)-->>Ansible Controller: Return JSON: { "changed": true }

    %% Second Run for Idempotency Check %%
    Ansible Controller->>Node.js Module (on Target): Execute with same params
    Node.js Module (on Target)->>GraphQL API Server: Query: getConfig(name)
    GraphQL API Server-->>Node.js Module (on Target): Response: { getConfig: { value: 'same_value' } }
    Note right of Node.js Module (on Target): Value matches, do nothing.
    Node.js Module (on Target)-->>Ansible Controller: Return JSON: { "changed": false }

局限性与未来迭代方向

尽管这个自定义模块解决了我们的核心痛点，但在真实生产环境中，它还有一些可以改进的地方。

性能: 当前的实现是“先查后动”，对于每个配置项至少会产生一次网络请求。如果一个 Playbook 需要管理数十个配置项，串行执行将非常缓慢。模块可以被增强以接受一个配置项列表，然后在模块内部通过批处理（batching）GraphQL 请求来一次性获取所有配置项的状态，再进行比较和批量更新。这需要 GraphQL Schema 的支持。
Schema 强依赖: 当前模块的 GraphQL 查询和变更语句是硬编码的。如果需要对接另一个具有不同 Schema 的 GraphQL API，就需要修改模块源码。一个更通用的设计是允许用户在 playbook 中通过参数指定查询和变更的模板，使得模块更具适应性。
更复杂的类型处理: 目前 value 被当作字符串处理。在真实场景中，我们可能需要处理 JSON 对象、数组或其他复杂类型。模块需要更健壮的序列化和比较逻辑来处理这些情况，避免因格式问题导致错误的变更判断。
单元测试: 当前的测试依赖于端到端的 Ansible Playbook 执行。一个生产级的模块应该有独立的单元测试套件。可以使用 Jest 等框架，通过模拟 fs 模块来伪造 Ansible 参数输入，并使用 nock 或 msw 来模拟 GraphQL API 的响应，从而在不启动真实服务和 Ansible 的情况下测试模块的各种逻辑分支。

Node.js GraphQL Client Ansible IaC 自定义模块

整合Vite前端、Go-Fiber网关与Kafka构建高吞吐异步日志处理链路

2023-10-27 可观测性

Vite Kafka Go-Fiber ELK Stack

基于 Nomad 和 Prometheus 构建面向 SLO 的 Django 应用 GitOps 交付闭环

2023-10-27 云原生与DevOps

Prometheus TSDB CI/CD 与 GitOps Nomad Django Buildah