XDL Native Machine Learning Reference (Linfa)
Version: 1.0 Date: November 2025 Status: Complete ✅ Feature Flag: ml
Overview
XDL includes native Rust machine learning capabilities powered by Linfa, a Rust machine learning framework inspired by scikit-learn. These functions provide:
- Pure Rust: No Python dependencies required
- Performance: Native speed with zero-copy data handling
- Memory Safety: Rust’s guarantees apply to ML operations
- Integration: Direct use of XDL arrays
Enabling Native ML
Native ML functions require the ml feature flag:
# Build with ML support
cargo build --features ml
# Or in Cargo.toml
[dependencies]
xdl-stdlib = { version = "0.1", features = ["ml"] }
Function Reference
K-Means Clustering
ML_KMEANS_FIT(X, n_features, n_clusters, [max_iter], [tolerance])
Train a K-Means clustering model.
Parameters: | Parameter | Type | Default | Description | |———–|——|———|————-| | X | array | required | Input data (flattened) | | n_features | integer | required | Number of features per sample | | n_clusters | integer | required | Number of clusters | | max_iter | integer | 100 | Maximum iterations | | tolerance | float | 1e-4 | Convergence threshold |
Returns: Model ID (string)
Example:
; Create sample data (100 samples, 2 features)
X = RANDOMN(seed, 200) ; 100 * 2 = 200 values
; Fit K-Means with 3 clusters
model = ML_KMEANS_FIT(X, 2, 3)
ML_KMEANS_PREDICT(model_id, X, n_features)
Predict cluster labels for new data.
Parameters: | Parameter | Type | Description | |———–|——|————-| | model_id | string | Fitted model ID | | X | array | Input data (flattened) | | n_features | integer | Number of features |
Returns: Array of cluster labels (0 to k-1)
Example:
labels = ML_KMEANS_PREDICT(model, X, 2)
PRINT, 'Cluster assignments:', labels
ML_KMEANS_CENTROIDS(model_id)
Get cluster centroids.
Parameters: | Parameter | Type | Description | |———–|——|————-| | model_id | string | Fitted model ID |
Returns: Flattened array of centroids
Example:
centroids = ML_KMEANS_CENTROIDS(model)
Linear Regression
ML_LINEAR_FIT(X, y, n_features)
Train a linear regression model.
Parameters: | Parameter | Type | Description | |———–|——|————-| | X | array | Feature matrix (flattened) | | y | array | Target values | | n_features | integer | Number of features |
Returns: Model ID (string)
Example:
; Features: [x1, x2] for 100 samples
X = RANDOMN(seed, 200)
; Target: y = 2*x1 + 3*x2 + noise
y = FLTARR(100)
FOR i = 0, 99 DO y[i] = 2*X[i*2] + 3*X[i*2+1] + RANDOMN(seed2)
model = ML_LINEAR_FIT(X, y, 2)
ML_LINEAR_PREDICT(model_id, X, n_features)
Predict with linear regression model.
Parameters: | Parameter | Type | Description | |———–|——|————-| | model_id | string | Fitted model ID | | X | array | Feature matrix (flattened) | | n_features | integer | Number of features |
Returns: Array of predictions
Example:
predictions = ML_LINEAR_PREDICT(model, X_test, 2)
ML_LINEAR_COEFFICIENTS(model_id)
Get regression coefficients.
Returns: Array of coefficients (one per feature)
Example:
coeffs = ML_LINEAR_COEFFICIENTS(model)
PRINT, 'Coefficients:', coeffs
ML_LINEAR_INTERCEPT(model_id)
Get regression intercept.
Returns: Scalar intercept value
Example:
intercept = ML_LINEAR_INTERCEPT(model)
PRINT, 'Intercept:', intercept
Logistic Regression
ML_LOGISTIC_FIT(X, y, n_features)
Train a logistic regression classifier.
Parameters: | Parameter | Type | Description | |———–|——|————-| | X | array | Feature matrix (flattened) | | y | array | Binary labels (0 or 1) | | n_features | integer | Number of features |
Returns: Model ID (string)
Example:
; Binary classification
model = ML_LOGISTIC_FIT(X_train, y_train, 4)
ML_LOGISTIC_PREDICT(model_id, X, n_features)
Predict class labels.
Returns: Array of predicted labels (0 or 1)
Example:
predictions = ML_LOGISTIC_PREDICT(model, X_test, 4)
Principal Component Analysis (PCA)
ML_PCA_FIT(X, n_features, n_components)
Fit a PCA model for dimensionality reduction.
Parameters: | Parameter | Type | Description | |———–|——|————-| | X | array | Input data (flattened) | | n_features | integer | Number of original features | | n_components | integer | Number of components to keep |
Returns: Model ID (string)
Example:
; Reduce 10 features to 2 components
model = ML_PCA_FIT(X, 10, 2)
ML_PCA_TRANSFORM(model_id, X, n_features)
Transform data to reduced dimensions.
Returns: Transformed data (flattened)
Example:
X_reduced = ML_PCA_TRANSFORM(model, X, 10)
ML_PCA_COMPONENTS(model_id)
Get principal components.
Returns: Flattened components matrix
Example:
components = ML_PCA_COMPONENTS(model)
ML_PCA_VARIANCE(model_id)
Get explained variance ratio.
Returns: Array of variance ratios (sums to ~1.0)
Example:
variance = ML_PCA_VARIANCE(model)
PRINT, 'Explained variance:', variance
PRINT, 'Total:', TOTAL(variance)
Model Evaluation
ML_TRAIN_TEST_SPLIT(X, y, n_features, test_ratio)
Split data into training and test sets.
Parameters: | Parameter | Type | Description | |———–|——|————-| | X | array | Feature matrix (flattened) | | y | array | Target values | | n_features | integer | Number of features | | test_ratio | float | Fraction for test set (0-1) |
Returns: Array of 4 elements: [X_train, X_test, y_train, y_test]
Example:
split = ML_TRAIN_TEST_SPLIT(X, y, 4, 0.2)
X_train = split[0]
X_test = split[1]
y_train = split[2]
y_test = split[3]
ML_ACCURACY(y_true, y_pred)
Calculate classification accuracy.
Returns: Accuracy score (0 to 1)
Example:
acc = ML_ACCURACY(y_test, predictions)
PRINT, 'Accuracy:', acc * 100, '%'
ML_MSE(y_true, y_pred)
Calculate mean squared error.
Returns: MSE value
Example:
mse = ML_MSE(y_test, predictions)
PRINT, 'MSE:', mse
ML_R2_SCORE(y_true, y_pred)
Calculate R² coefficient of determination.
Returns: R² score (-∞ to 1, higher is better)
Example:
r2 = ML_R2_SCORE(y_test, predictions)
PRINT, 'R²:', r2
Memory Management
ML_DROP_MODEL(model_id)
Remove model from memory.
Example:
ML_DROP_MODEL, model
Complete Examples
K-Means Clustering Example
; Generate synthetic data: 3 clusters
n_samples = 300
n_features = 2
; Cluster 1: center (0, 0)
X1 = RANDOMN(seed1, 200) * 0.5
; Cluster 2: center (3, 3)
X2 = RANDOMN(seed2, 200) * 0.5 + 3.0
; Cluster 3: center (0, 3)
X3 = RANDOMN(seed3, 200) * 0.5
FOR i = 0, 99 DO X3[i*2+1] = X3[i*2+1] + 3.0
; Combine data
X = [X1, X2, X3]
; Fit K-Means
model = ML_KMEANS_FIT(X, 2, 3, 100, 1e-4)
; Predict clusters
labels = ML_KMEANS_PREDICT(model, X, 2)
; Get centroids
centroids = ML_KMEANS_CENTROIDS(model)
PRINT, 'Cluster centroids:', centroids
; Clean up
ML_DROP_MODEL, model
Linear Regression Example
; Generate data: y = 2*x + 1 + noise
n_samples = 100
X = FINDGEN(n_samples) / 10.0
y = 2.0 * X + 1.0 + RANDOMN(seed, n_samples) * 0.5
; Split data
split = ML_TRAIN_TEST_SPLIT(X, y, 1, 0.2)
X_train = split[0]
X_test = split[1]
y_train = split[2]
y_test = split[3]
; Fit model
model = ML_LINEAR_FIT(X_train, y_train, 1)
; Get coefficients
coef = ML_LINEAR_COEFFICIENTS(model)
intercept = ML_LINEAR_INTERCEPT(model)
PRINT, 'Learned: y =', coef[0], '* x +', intercept
; Predict and evaluate
y_pred = ML_LINEAR_PREDICT(model, X_test, 1)
mse = ML_MSE(y_test, y_pred)
r2 = ML_R2_SCORE(y_test, y_pred)
PRINT, 'MSE:', mse
PRINT, 'R²:', r2
; Clean up
ML_DROP_MODEL, model
PCA Dimensionality Reduction
; High-dimensional data (10 features)
n_samples = 200
n_features = 10
X = RANDOMN(seed, n_samples * n_features)
; Reduce to 2 components
model = ML_PCA_FIT(X, n_features, 2)
; Transform data
X_reduced = ML_PCA_TRANSFORM(model, X, n_features)
; Check explained variance
variance = ML_PCA_VARIANCE(model)
PRINT, 'Component 1 explains:', variance[0] * 100, '% variance'
PRINT, 'Component 2 explains:', variance[1] * 100, '% variance'
PRINT, 'Total explained:', TOTAL(variance) * 100, '%'
; Clean up
ML_DROP_MODEL, model
Data Format Notes
Flattened Arrays
All Linfa ML functions expect data in row-major flattened format:
For 3 samples with 2 features each:
Sample 0: [x0, y0]
Sample 1: [x1, y1]
Sample 2: [x2, y2]
Flattened: [x0, y0, x1, y1, x2, y2]
Converting Multi-dimensional Arrays
; If you have a 2D array, flatten it:
X_2d = REFORM(X_2d, N_ELEMENTS(X_2d))
Comparison with XDLML Functions
XDL has two ML systems:
| Feature | XDLML_* Functions | ML_* Functions (Linfa) |
|---|---|---|
| Implementation | Custom XDL | Linfa (Rust) |
| Speed | Good | Excellent |
| Features | 50+ functions | Core algorithms |
| Neural Networks | Yes | No |
| SVM | Yes | No |
| K-Means | Yes | Yes |
| Linear Regression | Yes | Yes |
| PCA | Yes | Yes |
Use XDLML_* for neural networks and advanced features. Use ML_* for fastest native performance on core algorithms.