Last updated on

@imhonglu/json-schema: Improving Type Inference

When working with JSON Schema in TypeScript, the accuracy of type inference significantly impacts development experience and maintainability. While implementing the @imhonglu/json-schema library in a recent project, I discovered that type inference wasn’t working as precisely as expected.

Several notable issues emerged:

  • Limited support for array-form type keyword → Type inference breaks when specifying composite types like ["string", "number", "null"]
  • Unnecessary validation issues → Repeated validation checks even for already trusted data
  • Unable to set dynamic default values → Cannot specify runtime-determined values like UUID or current timestamp as defaults

To address these issues, I focused on improving the overall type system and enhancing type inference accuracy.

Improvement Goals

The main objectives for this improvement were:

  • Enhance Schema Type Structure
    • Improve consistency by reorganizing type structures between const, enum, and type keywords
  • Support Array-form type Keyword
    • Ensure accurate type inference by properly handling composite types (e.g., ["string", "number"])
  • Class-based Schema Validation and Optimization
    • Enable safe runtime object type checking through instanceof operator support
  • Dynamic Default Value Support
    • Allow function usage in default keyword for dynamic value setting at runtime

Problems

Limited Support for Array-form type Keyword

The previous implementation couldn’t handle array-form type keywords properly, leading to broken type inference and allowing invalid keywords.

While single types worked fine:

const stringSchema = new Schema({
  type: "string",
  maxLength: 10,    // ✅ Valid keyword for string type
  minLength: 1,     // ✅ Valid keyword for string type
  default: "hello", // ✅ Correctly inferred as string type
});

Issues arose with composite types:

const unionSchema = new Schema({
  type: ["string", "number", "null"],
  
  // ❌ Type inference failure: incorrectly allows object type keywords
  maxProperties: 1,     // object-only keyword
  required: ["field"],  // object-only keyword
  
  // ❌ Type inference failure: allows invalid default value type
  default: {}           // object type shouldn't be allowed
});

Unnecessary Data Validation Issue

All validation checks were being repeatedly performed even on internal APIs and pre-validated data, causing unnecessary performance overhead.

This excessive validation was particularly problematic when handling large datasets, as it could significantly increase computational costs and slow down overall processing.

// Person schema definition
const PersonSchema = new Schema({
  type: "object",
  properties: {
    name: { type: "string" },
    age: { type: "number" },
    email: { 
      type: "string",
      format: "email"
    }
  },
  required: ["name", "email"]
});

// ❌ Issue: Validation runs every time even for trusted data sources
const data = fetch(...).then((res) => PersonSchema.parse(res.text()));

Unable to Set Dynamic Default Values

The JSON Schema specification only allows static values for the default keyword. However, real services often need runtime-determined values like current timestamp, UUID, or environment variables.

The previous version had no way to set such dynamic default values:

const schema = new Schema({
  type: "string",
  format: "date-time",
  // ❌ Issue: Can't specify `default` as a function
  default: () => new Date().toISOString(),
});

This limitation made it difficult to set appropriate defaults for:

  • Document schemas requiring automatic creation timestamps
  • Entity schemas needing unique identifiers (e.g., UUID)
  • Configuration values that should reflect user’s current timezone
  • Cases requiring dynamic defaults based on environment variables or external configurations

Solution Process

The first challenge was to improve the type structure.

While the previous approach used Discriminated Union pattern for type inference, it couldn’t properly handle array-form type keywords.

To solve this, we completely redesigned the type system and introduced SchemaInput and InferSchema<T> types to ensure accurate inference even for composite types.

Improving Schema Type Structure

The first step was to clearly define the relationships between const, enum, and type keywords.

These keywords serve different purposes and can become ambiguous when used together:

  • const: Defines a single fixed value
    • Example: { const: "ADMIN" }
    • Most restrictive, allows only specific value
  • enum: Defines a set of allowed values
    • Example: { enum: ["ADMIN", "USER", "GUEST"] }
    • Allows one value from multiple choices
  • type: Defines basic data type
    • Example: { type: "string" }
    • Provides more flexible typing

However, when these keywords are used together, they can conflict and make schema meaning unclear.

const schema = new Schema({
  type: "string",    // Specifies string type
  const: "hello",    // Fixed to 'hello' value
  enum: ["hello"],   // Allows selection from ['hello']
  // ❌ Keywords conflict, const alone would suffice
});

Analyzing TypeScript Type System

To better understand this issue, we experimented using Generic:

interface ConstSchema {
  const: string;
}

interface EnumSchema {
  enum: string[];
}

interface TypeSchema {
  type: string | string[];
}

type SchemaInput = ConstSchema | EnumSchema | TypeSchema;

function createSchema<const T extends SchemaInput>(schema: T) {
  return schema;
}

While these structures should be mutually exclusive, they could still be used simultaneously in a single schema.

When extending through Generic, mutual exclusivity between properties isn’t guaranteed.
Thus, no errors occur even when const, enum, and type coexist.

For example, this code works without errors:

const schema = createSchema({
  const: "hello",
  type: "string",
  enum: ["hello", "world"],
  // ❌ No TypeScript error occurs
});

To solve this, we designed a type structure where using one keyword automatically excludes others:

interface ConstSchema {
  const: string;
  enum?: never;
  type?: never;
}

interface EnumSchema {
  enum: string[];
  const?: never;
  type?: never;
}

interface TypeSchema {
  type: string | string[];
  const?: never;
  enum?: never;
}

// ... omitted

Now, combinations like this will trigger TypeScript errors:

const schema = createSchema({
  const: "hello",
  type: "string", // ❌ TypeScript error
  enum: ["hello", "world"], // ❌ TypeScript error
});

While JSON Schema specification allows using const, enum, and type together,

We intentionally restricted these combinations for:

  1. Type Safety Guarantee
    • Simultaneous use of const, enum, and type can lead to unclear meaning and unexpected behavior
  2. Schema Clarity
    • Single value (const), one from many (enum), and data type specification (type) are distinct concepts
    • Encourages schema writing with clear intentions

Now, attempting to use const, enum, and type together will trigger type errors, preventing incorrect combinations.

Supporting Composite Types for type Keyword

Next, we needed to implement a precise type system to handle composite types correctly.

First, we created a utility type to map given Tuple types:

// Utility for returning matching types
export type Match<T, Matcher> = T extends readonly [infer First, ...infer Rest]
  ? First extends keyof Matcher
    ? Matcher[First] | Match<Rest, Matcher>
    : never
  : T extends keyof Matcher
    ? Matcher[T]
    : never;

Defining Interface for Type Mapping

To utilize the Match<T, Matcher> defined earlier, we create an interface storing type matching information:

// Type mapping interface
export interface InferSchemaMap<T> {
  number: StructuralValidation.Numeric;
  integer: StructuralValidation.Numeric;
  // ... omitted
}

Implementing Core Utility for Schema Type Inference

Next, we implement a type that infers the entire structure based on input schema (SchemaInput):

  1. Handle const, enum, and type keywords separately
  2. Infer default keyword using BasicMetaData
  3. Use Match<T, Matcher> for array-form type keywords
  4. Finally, process Schema instance to return accurate type

BasicMetaData type includes metadata defined in JSON Schema like title, description, default, etc.

export type InferSchema<T> = Omit<T, keyof BasicMetaData> extends {
  // Infer `ConstSchema`
  const: infer U;
}
  ? T & BasicMetaData<U>
  // Infer `EnumSchema`
  : T extends { enum: infer U }
    ? T & BasicMetaData<ArrayElement<U>>
    // Infer `TypeSchema`
    : T extends { type: infer U }
      ? Omit<T, Exclude<keyof TypeSchema, "type">> &
          Match<U, InferSchemaMap<T>> &
          // Infer excluding `default` keyword type from `BasicMetaData`
          Omit<BasicMetaData<InferSchemaType<T>>, "default"> & {
            // Allow function type for `default` keyword to support dynamic defaults
            default?:
              | InferSchemaType<T>
              | Fn.Callable<{ return: InferSchemaType<T> }>;
          }
      // Infer `Schema` Instance
      : T extends Schema<infer U>
        ? T
        : SchemaInput;

Class-based Schema Validation and Optimization

Finally, we designed the createSchemaClass function to enable dynamic creation and extension of Schema classes while maintaining schema validation functionality, leveraging one of the advantages of class-based design: the instanceof operator.

Specifically, we used Proxy to naturally inherit Schema functionality, maximizing extensibility while utilizing existing Schema instances.

We implemented createSchemaClass function in these steps:

  1. Define function type using SchemaInput and InferSchema<T> types
  2. Create Schema instance based on given schemaDefinition
  3. Implement SchemaBasedClass by creating anonymous class inheriting Schema functionality
  4. Finally, use Proxy to naturally combine Schema functionality with SchemaBasedClass

This approach maintains powerful validation features of existing Schema while enabling flexible extension:

// 1. Define function type using `SchemaInput` and `InferSchema<T>` types
export function createSchemaClass<const T extends SchemaInput>(
  schemaDefinition: InferSchema<T>,
) {
  // 2. Create `Schema` instance based on given `schemaDefinition`
  const schemaContext = new Schema(schemaDefinition);

  // 3. Implement `SchemaBasedClass` by creating anonymous class inheriting `Schema` functionality
  const SchemaBasedClass = class {
    static [SchemaSymbol] = schemaInstance[SchemaSymbol];
    data: InferSchemaType<T>;
    // ... can define additional functionality
  };

  // 4. Use `Proxy` to combine `Schema` instance functionality with `SchemaBasedClass`
  return new Proxy(SchemaBasedClass, {
    get(target, prop) {
      // First look for property in `SchemaBasedClass`
      return prop in target
        ? target[prop as keyof typeof target]
        // If not found, look in `schemaContext`
        : prop in schemaContext
          ? schemaContext[prop as keyof typeof schemaContext]
          // If not found, return `undefined`
          : undefined;
    },
  }) as typeof SchemaBasedClass & typeof schemaInstance;
}

This Proxy pattern provides these benefits:

  • Users can naturally use functionality without being aware of Proxy’s existence
  • Maintains existing Schema functionality while leveraging class-based design’s instanceof operator

Summary of Improvements

This enhancement has improved various aspects including type safety, performance optimization, and extensibility. The table below summarizes the original issues, solutions, and improvements:

ImprovementOriginal IssueEffect
Clarified const, enum, type keyword relationshipsUnclear types due to mixed usageEnhanced type safety, prevented unpredictable errors through mutually exclusive relationships
Array-form type keyword supportType inference breaks with composite types like ["string", "number"]Guaranteed accurate type inference by analyzing array elements individually
Optimized unnecessary validationValidation runs on every trusted dataRemoved unnecessary validation for performance optimization, improved large data processing
Dynamic default value supportCould only set static values for default, couldn’t set dynamic values like UUID or current timeImproved by allowing functions in default keyword to set values at runtime
Enhanced class-based schema validation and extensibilityLimited extensibility and no instanceof check supportEnhanced class-based extensibility with instanceof support and Proxy usage, enabling safe runtime object type checking

Conclusion and Future Plans

Through these improvements, we’ve solved JSON Schema’s composite type inference issues, reduced unnecessary validation, enabled dynamic default values, and implemented various other enhancements.

However, we won’t stop here. We plan to continue researching and improving for more precise type inference and performance optimization. We’ll keep testing in real projects to provide a more intuitive and stable JSON Schema-based type system for practical use.

I hope this article helps you better understand JSON Schema and TypeScript type usage.

We’ll actively incorporate your feedback for further improvements.

Thank you for reading.