@imhonglu/json-schema: Improving Type Inference
When working with JSON Schema in TypeScript, the accuracy of type inference significantly impacts development experience and maintainability. While implementing the @imhonglu/json-schema library in a recent project, I discovered that type inference wasn’t working as precisely as expected.
Several notable issues emerged:
- Limited support for array-form
type
keyword → Type inference breaks when specifying composite types like["string", "number", "null"]
- Unnecessary validation issues → Repeated validation checks even for already trusted data
- Unable to set dynamic default values → Cannot specify runtime-determined values like UUID or current timestamp as defaults
To address these issues, I focused on improving the overall type system and enhancing type inference accuracy.
Improvement Goals
The main objectives for this improvement were:
- Enhance Schema Type Structure
- Improve consistency by reorganizing type structures between
const
,enum
, andtype
keywords
- Improve consistency by reorganizing type structures between
- Support Array-form
type
Keyword- Ensure accurate type inference by properly handling composite types (e.g.,
["string", "number"]
)
- Ensure accurate type inference by properly handling composite types (e.g.,
- Class-based Schema Validation and Optimization
- Enable safe runtime object type checking through
instanceof
operator support
- Enable safe runtime object type checking through
- Dynamic Default Value Support
- Allow function usage in
default
keyword for dynamic value setting at runtime
- Allow function usage in
Problems
Limited Support for Array-form type
Keyword
The previous implementation couldn’t handle array-form type
keywords properly, leading to broken type inference and allowing invalid keywords.
While single types worked fine:
const stringSchema = new Schema({
type: "string",
maxLength: 10, // ✅ Valid keyword for string type
minLength: 1, // ✅ Valid keyword for string type
default: "hello", // ✅ Correctly inferred as string type
});
Issues arose with composite types:
const unionSchema = new Schema({
type: ["string", "number", "null"],
// ❌ Type inference failure: incorrectly allows object type keywords
maxProperties: 1, // object-only keyword
required: ["field"], // object-only keyword
// ❌ Type inference failure: allows invalid default value type
default: {} // object type shouldn't be allowed
});
Unnecessary Data Validation Issue
All validation checks were being repeatedly performed even on internal APIs and pre-validated data, causing unnecessary performance overhead.
This excessive validation was particularly problematic when handling large datasets, as it could significantly increase computational costs and slow down overall processing.
// Person schema definition
const PersonSchema = new Schema({
type: "object",
properties: {
name: { type: "string" },
age: { type: "number" },
email: {
type: "string",
format: "email"
}
},
required: ["name", "email"]
});
// ❌ Issue: Validation runs every time even for trusted data sources
const data = fetch(...).then((res) => PersonSchema.parse(res.text()));
Unable to Set Dynamic Default Values
The JSON Schema specification only allows static values for the default
keyword. However, real services often need runtime-determined values like current timestamp, UUID, or environment variables.
The previous version had no way to set such dynamic default values:
const schema = new Schema({
type: "string",
format: "date-time",
// ❌ Issue: Can't specify `default` as a function
default: () => new Date().toISOString(),
});
This limitation made it difficult to set appropriate defaults for:
- Document schemas requiring automatic creation timestamps
- Entity schemas needing unique identifiers (e.g.,
UUID
) - Configuration values that should reflect user’s current timezone
- Cases requiring dynamic defaults based on environment variables or external configurations
Solution Process
The first challenge was to improve the type structure.
While the previous approach used Discriminated Union pattern for type inference, it couldn’t properly handle array-form type
keywords.
To solve this, we completely redesigned the type system and introduced SchemaInput
and InferSchema<T>
types to ensure accurate inference even for composite types.
Improving Schema Type Structure
The first step was to clearly define the relationships between const
, enum
, and type
keywords.
These keywords serve different purposes and can become ambiguous when used together:
const
: Defines a single fixed value- Example:
{ const: "ADMIN" }
- Most restrictive, allows only specific value
- Example:
enum
: Defines a set of allowed values- Example:
{ enum: ["ADMIN", "USER", "GUEST"] }
- Allows one value from multiple choices
- Example:
type
: Defines basic data type- Example:
{ type: "string" }
- Provides more flexible typing
- Example:
However, when these keywords are used together, they can conflict and make schema meaning unclear.
const schema = new Schema({
type: "string", // Specifies string type
const: "hello", // Fixed to 'hello' value
enum: ["hello"], // Allows selection from ['hello']
// ❌ Keywords conflict, const alone would suffice
});
Analyzing TypeScript Type System
To better understand this issue, we experimented using Generic
:
interface ConstSchema {
const: string;
}
interface EnumSchema {
enum: string[];
}
interface TypeSchema {
type: string | string[];
}
type SchemaInput = ConstSchema | EnumSchema | TypeSchema;
function createSchema<const T extends SchemaInput>(schema: T) {
return schema;
}
While these structures should be mutually exclusive, they could still be used simultaneously in a single schema.
When extending through
Generic
, mutual exclusivity between properties isn’t guaranteed.
Thus, no errors occur even whenconst
,enum
, andtype
coexist.
For example, this code works without errors:
const schema = createSchema({
const: "hello",
type: "string",
enum: ["hello", "world"],
// ❌ No TypeScript error occurs
});
To solve this, we designed a type structure where using one keyword automatically excludes others:
interface ConstSchema {
const: string;
enum?: never;
type?: never;
}
interface EnumSchema {
enum: string[];
const?: never;
type?: never;
}
interface TypeSchema {
type: string | string[];
const?: never;
enum?: never;
}
// ... omitted
Now, combinations like this will trigger TypeScript errors:
const schema = createSchema({
const: "hello",
type: "string", // ❌ TypeScript error
enum: ["hello", "world"], // ❌ TypeScript error
});
While JSON Schema specification allows using const
, enum
, and type
together,
We intentionally restricted these combinations for:
- Type Safety Guarantee
- Simultaneous use of
const
,enum
, andtype
can lead to unclear meaning and unexpected behavior
- Simultaneous use of
- Schema Clarity
- Single value (
const
), one from many (enum
), and data type specification (type
) are distinct concepts - Encourages schema writing with clear intentions
- Single value (
Now, attempting to use const
, enum
, and type
together will trigger type errors, preventing incorrect combinations.
Supporting Composite Types for type
Keyword
Next, we needed to implement a precise type system to handle composite types correctly.
First, we created a utility type to map given Tuple types:
// Utility for returning matching types
export type Match<T, Matcher> = T extends readonly [infer First, ...infer Rest]
? First extends keyof Matcher
? Matcher[First] | Match<Rest, Matcher>
: never
: T extends keyof Matcher
? Matcher[T]
: never;
Defining Interface for Type Mapping
To utilize the Match<T, Matcher>
defined earlier, we create an interface storing type matching information:
// Type mapping interface
export interface InferSchemaMap<T> {
number: StructuralValidation.Numeric;
integer: StructuralValidation.Numeric;
// ... omitted
}
Implementing Core Utility for Schema Type Inference
Next, we implement a type that infers the entire structure based on input schema (SchemaInput
):
- Handle
const
,enum
, andtype
keywords separately - Infer
default
keyword usingBasicMetaData
- Use
Match<T, Matcher>
for array-formtype
keywords - Finally, process
Schema
instance to return accurate type
BasicMetaData type includes metadata defined in JSON Schema like title, description, default, etc.
export type InferSchema<T> = Omit<T, keyof BasicMetaData> extends {
// Infer `ConstSchema`
const: infer U;
}
? T & BasicMetaData<U>
// Infer `EnumSchema`
: T extends { enum: infer U }
? T & BasicMetaData<ArrayElement<U>>
// Infer `TypeSchema`
: T extends { type: infer U }
? Omit<T, Exclude<keyof TypeSchema, "type">> &
Match<U, InferSchemaMap<T>> &
// Infer excluding `default` keyword type from `BasicMetaData`
Omit<BasicMetaData<InferSchemaType<T>>, "default"> & {
// Allow function type for `default` keyword to support dynamic defaults
default?:
| InferSchemaType<T>
| Fn.Callable<{ return: InferSchemaType<T> }>;
}
// Infer `Schema` Instance
: T extends Schema<infer U>
? T
: SchemaInput;
Class-based Schema Validation and Optimization
Finally, we designed the createSchemaClass
function to enable dynamic creation and extension of Schema
classes while maintaining schema validation functionality, leveraging one of the advantages of class-based design: the instanceof
operator.
Specifically, we used Proxy
to naturally inherit Schema
functionality, maximizing extensibility while utilizing existing Schema
instances.
We implemented createSchemaClass
function in these steps:
- Define function type using
SchemaInput
andInferSchema<T>
types - Create
Schema
instance based on givenschemaDefinition
- Implement
SchemaBasedClass
by creating anonymous class inheritingSchema
functionality - Finally, use
Proxy
to naturally combineSchema
functionality withSchemaBasedClass
This approach maintains powerful validation features of existing Schema
while enabling flexible extension:
// 1. Define function type using `SchemaInput` and `InferSchema<T>` types
export function createSchemaClass<const T extends SchemaInput>(
schemaDefinition: InferSchema<T>,
) {
// 2. Create `Schema` instance based on given `schemaDefinition`
const schemaContext = new Schema(schemaDefinition);
// 3. Implement `SchemaBasedClass` by creating anonymous class inheriting `Schema` functionality
const SchemaBasedClass = class {
static [SchemaSymbol] = schemaInstance[SchemaSymbol];
data: InferSchemaType<T>;
// ... can define additional functionality
};
// 4. Use `Proxy` to combine `Schema` instance functionality with `SchemaBasedClass`
return new Proxy(SchemaBasedClass, {
get(target, prop) {
// First look for property in `SchemaBasedClass`
return prop in target
? target[prop as keyof typeof target]
// If not found, look in `schemaContext`
: prop in schemaContext
? schemaContext[prop as keyof typeof schemaContext]
// If not found, return `undefined`
: undefined;
},
}) as typeof SchemaBasedClass & typeof schemaInstance;
}
This Proxy pattern provides these benefits:
- Users can naturally use functionality without being aware of Proxy’s existence
- Maintains existing
Schema
functionality while leveraging class-based design’sinstanceof
operator
Summary of Improvements
This enhancement has improved various aspects including type safety, performance optimization, and extensibility. The table below summarizes the original issues, solutions, and improvements:
Improvement | Original Issue | Effect |
---|---|---|
Clarified const , enum , type keyword relationships | Unclear types due to mixed usage | Enhanced type safety, prevented unpredictable errors through mutually exclusive relationships |
Array-form type keyword support | Type inference breaks with composite types like ["string", "number"] | Guaranteed accurate type inference by analyzing array elements individually |
Optimized unnecessary validation | Validation runs on every trusted data | Removed unnecessary validation for performance optimization, improved large data processing |
Dynamic default value support | Could only set static values for default , couldn’t set dynamic values like UUID or current time | Improved by allowing functions in default keyword to set values at runtime |
Enhanced class-based schema validation and extensibility | Limited extensibility and no instanceof check support | Enhanced class-based extensibility with instanceof support and Proxy usage, enabling safe runtime object type checking |
Conclusion and Future Plans
Through these improvements, we’ve solved JSON Schema’s composite type inference issues, reduced unnecessary validation, enabled dynamic default values, and implemented various other enhancements.
However, we won’t stop here. We plan to continue researching and improving for more precise type inference and performance optimization. We’ll keep testing in real projects to provide a more intuitive and stable JSON Schema-based type system for practical use.
I hope this article helps you better understand JSON Schema and TypeScript type usage.
We’ll actively incorporate your feedback for further improvements.
Thank you for reading.