Advanced Type Annotations

April 10, 2024

By Stefan Gussner

Migrating a large JavaScript codebase to TypeScript is a time intensive process. Even in the age of AI-agents, it is hard to big-bang rewrite large codebases to TypeScript. The TypeScript compiler can be selectively enabled on a per-file basis for JS files in projects where a gradual migration is desired. I struggled to get the syntax right for advanced type annotations so here is my write-up on how I used them in a small library.

Basics of JSDoc Type Annotations

JsDoc type annotations can be used by the TypeScript compiler for full type checking. To enable type checking on a file by file basis, you can add a // @ts-check comment at the top of your JavaScript files. Here is a simple example of a function with type annotations:

//  @ts-check

/**
 * @param {number} a
 * @param {number} b
 * @returns {number}
 */
function add(a, b) {
  return a + b;
}

That's simple enough. However, things get tricky if you want to use more advanced types like generics, and how does exporting and importing types from other files work?

Importing Types from other Files

To import types from other files, you can use the import keyword in JSDoc comments. Here's an example:

/**
 * @param {import('../db/models/ocrjobs').OcrJob} job
 */
function processJob(job) {
  // do something with the job
}

Exporting Types

You don't need to do anything special to make your types available to other files. Just define them in a JSDoc comment, and they will be available to other files. Here's an example:

/**
 * @typedef {{
 *   id: string,
 *   status: 'pending' | 'processing' | 'finished' | 'failed',
 *   retrycount: number,
 *   started?: Date,
 *   finished?: Date,
 *   error?: string,
 *   parsingsteps?: { name: string, maxcount?: number, type: string, skip?: number }[],
 *   schema: any,
 *   result?: any,
 *   timezone?: string,
 *   returnprobabilities?: boolean,
 *   codeData?: { type: string, decoded: string, raw: string, position: number[][] }[]
 * }} OcrJob
 */

Zod

Of course you can define types based on Zod schemas as well.

const { z } = require("zod");

const createUserSchema = z.object({
  id: z.string().uuid(),
  email: z.string().email(),
  name: z.string().min(2).max(100),
  age: z.number().min(0).optional(),
});

/**
 * @typedef {z.infer<typeof createUserSchema>} User
 */

/** @type {User} */
const exampleUser = {
  id: "550e8400-e29b-41d4-a716-446655440000",
  email: "example@example.com",
  name: "John Doe",
  age: 30,
};

Advanced Generics

Sometimes, you want to define dynamic data types. For example, in the ttj-client library, in the TTJClient#inferDocumentBySchema function, the return value depends on the returnprobabilities parameter. If the returnprobabilities parameter is set to true, we return an array of the form { value: any, probability: number }[] for every property of the schema. If it is set to false, we return the value directly (which has type 'any').

Because we support nested objects and arrays, we first need to define a recursive @callback function type for both cases.

First, we deal with the case where returnprobabilities is set to false:

/**
 * @template T
 * @callback TTJParsingFunction
 * @param {T} schema
 * @returns {{ [K in keyof T]?: T[K] extends {} ? ReturnType<TTJParsingFunction<T[K]>> : T[K] }}
 * */

This is essentially a recursive Partial<T> type. We first need to define a generic type T using the @template annotation. Then we define a @callback function type TTJParsingFunction that takes a T. The return type is an object with the same keys as T, but every key is optional. If the value of the key is an object, we recursively call the TTJParsingFunction type on it.

Next up, we deal with the case where returnprobabilities is set to true:

/**
 * @template T
 * @callback ReturnProbabilitiesFunction
 * @param {T} schema
 * @returns {T extends string ? { value: any, probability: number }[] : T extends infer U ? ReturnType<ReturnProbabilitiesFunction<U>>[] : T extends {} ? { [K in keyof T]?: ReturnType<ReturnProbabilitiesFunction<T[K]>> } : { value: any, probability: number }[]}
 * */

Here, we also define a generic type T using the @template annotation. We define a @callback function type ReturnProbabilitiesFunction that takes a T. As we know that every property of our schema is defined as a string, we can check if T is a string by using a ternary expression and the extends keyword and return the { value: any, probability: number }[] type. Otherwise, we check if T is an array. We can declare a new generic variable with the type of the array elements using the infer keyword and use it to call the ReturnProbabilitiesFunction type recursively. If T is an object, similar to the TTJParsingFunction type, we return an object with the same keys as T, but every key is optional.

Finally, we define the inferDocumentBySchema function:

/**
 *
 * @template S
 * @template {boolean} R
 * @param {Buffer|Uint8Array|string} data A PDF, PNG, or JPEG file as a buffer, Uint8Array, or data URL
 * @param {'application/pdf'|'image/png'|'image/jpeg'} mimetype The mimetype of the data
 * @param {S} schema
 * @param {(TextParsingStep|ImageParsingStep)[]=} parsingsteps
 * @param {R=} returnprobabilities
 * @returns {Promise<{
 *      results: R extends true ? ReturnType<ReturnProbabilitiesFunction<S>> : ReturnType<TTJParsingFunction<S>>,
 *      ...
 * }>}
 * }
 * */
async inferDocumentBySchema(data, mimetype, schema, parsingsteps, returnprobabilities) {
    //...
}

Here, we define two generic types S and R. S is the schema type, and R is a boolean that determines if we return probabilities or not. To check if returnprobabilities is set to true, we use the R extends true expression and return the ReturnType<ReturnProbabilitiesFunction<S>> type. Otherwise, we return the ReturnType<TTJParsingFunction<S>> type.

Complex object types

In cases like the parsingsteps parameter, we need to differentiate between two lists of large language models that can handle different types of data. For example, gemini-pro-vision can handle image data while gpt-3.5-turbo cannot. Therefore, if a ParsingStep has type: "raw" or type: "padded", it can't have name: "vertex/gemini-1.0-pro-vision-001". To solve this, we can introduce a union type:

/**
 * @typedef {'openai/gpt-3.5-turbo'|'openai/gpt-4'|'azure/gpt-35-turbo'|'vertex/text-bison@001'|'ollama/mixtral'|'ollama/llama2'|'ollama/llama2:13b'|'ollama/gemma'} SupportedLanguageModel
 * @typedef {SupportedLanguageModel | 'vertex/gemini-1.0-pro-vision-001'} SupportedVisionModel
 */

/**
 * @typedef {{
 *   type: 'raw' | 'padded',
 *   name: SupportedLanguageModel,
 *   maxcount?: number
 * }} TextParsingStep
 */

/**
 * @typedef {{
 *   type: 'image',
 *   name: SupportedVisionModel,
 *   maxcount?: number
 * }} ImageParsingStep
 */

/**
 * ...
 * @param {(TextParsingStep|ImageParsingStep)[]=} parsingsteps
 * ...
 */

This is called a discriminated union type. If the type property is set to raw or padded, the name property can only be of type SupportedLanguageModel. If the type property is set to image, the name property can only be of type SupportedVisionModel. In this case, SupportedVisionModel is a superset of SupportedLanguageModel, and therefore, we effectively just allow more options for the name property if the type is set to image.

Enabling JSDoc Type checking in VSCode

VSCode has built-in support for JSDoc type annotations with type checking and autocompletion.

Add a jsconfig.json file to your project root with the following content:

{
  "compilerOptions": {
    "allowJs": true,
    "checkJs": false // true to check all ts files (then you have to ignore type errors on build), or false to only check files with // @ts-check
  },
  "exclude": ["node_modules"]
}

This way you can break the build in case you have type errors. If you set checkJs to true, all JS files will be checked and you will have to ignore type errors on build.

Maybe add more excludes if you have more folders you don't want to check as this can slow down your editor.