Skip to main content

Skry: Hybrid LLM Static Analysis for Sui Move

· 24 min read

tl;dr: A hybrid static analysis + LLM security tool for Sui Move, focused on access control, governance, and centralization issues. Skry uses static analysis to narrow candidates, then applies targeted LLM classification, then calls interprocedural and cross-module taint propagation and uses static analysis to detect the issues. This avoids most LLM hallucinations and reaches bugs pure static analysis can't. Proof-of-concept source code is available.

The blog post contains the following sections:

  1. Static analysis + LLM: description of the approach
  2. Skry: Design & implementation: analyzer pipeline, key features, and what makes it different
  3. Evaluation: findings on real-world contracts, detection accuracy, reproduced audit findings
  4. Conclusion: current use cases and future work

The current project state is a proof-of-concept. It does find issues in real-world projects demonstrating low false-positive ratio, but needs more work to make its analysis more precise and capable.

Static analysis + LLM

LLMs are already used in smart-contract security. Typical usage is straightforward: provide source code as input and ask the model to identify potential vulnerabilities.

Some real issues have been found this way, but practical experimentation exposes several limitations:

  1. Noise: without strict scoping, the model reasons about irrelevant code and paths.
  2. Cost: large contexts and RAG-style setups significantly increase inference cost.
  3. Non-determinism: results are fuzzy and difficult to reproduce.
  4. High false-positive rate: models tend to over-report issues without semantic grounding.

Using LLMs alone is not effective for systematic vulnerability detection. The issue is not the model itself, but the lack of structure, constraints, and analysis scope. As a result, approaches that combine deterministic methods with LLMs have been proposed. In static analysis + LLM systems, existing tools integrate LLMs into the analysis pipeline either to extend detection logic [1] [2] or to reduce the false-positive rate [3].

In these approaches, static analysis performs the core reasoning and defines the analysis scope, while LLMs are used only for properties that cannot be reliably inferred statically, such as semantic intent or project-specific logic. The goal is not to replace static analysis, but to extend it where classic techniques rely on approximations or heuristics.

Skry: Design & implementation

Skry is a static program analyzer that uses LLMs to reduce reliance on heuristics and overapproximations when dealing with bugs that are difficult to express using classic static analysis alone. LLMs are used only for data classification and limited semantic reasoning about smart contract constructs. Bug detection and soundness remain the responsibility of the static analyzer.

Internally, the analyzer collects information about the contract as Datalog-style facts. These facts are later queried using a small eDSL based on Hy macros. This design allows both structural and semantic information to be reused across analyses and rules.

The overall pipeline architecture is shown below:

The implementation is based on Python, with Hy used for the eDSL. Choosing Python is intellectually violent, but sufficient for the proof-of-concept version when set up carefully. Python simplifies integration with tree-sitter for parsing, LLM APIs, and provides quality testing and debugging support. It also simplifies future integration with external tooling that may be worth considering, such as probabilistic Datalog engines or SMT solvers.

The analyzer is:

  • source-level and currently supports Sui Move only,
  • interprocedural and cross-module for taint and dataflow analysis,
  • path-insensitive in the current version.

The following sections describe specific components in more detail, including the rule system, fact representation, LLM-based classification, and detection of access control and centralization risks.

Scope and focus

Skry is intentionally focused on a narrow set of security patterns that are difficult to express using classic static analysis:

  • Access control: capability misuse, missing authorization, pause bypass, generic type safety.
  • Centralization: admin drain patterns, missing audit events, immutable configuration, single-step ownership.
  • Structural checks: double initialization, missing transfers, duplicated branches, weak randomness.

Structural issues come for free when building a static analyzer and are included when detected. Access-control and centralization issues depend on semantic properties such as what a capability represents, who owns what, where privilege boundaries are, and what the project intends. Traditional tools usually cannot model this with confidence and either guess based on heuristics or ignore them.

Skry's detection logic is deterministic. LLM-based classification is used only to extract the minimal semantic information in a constrained scope needed to reason about access control patterns where static analysis alone cannot.

Static analysis and code facts

The analyzer implements a classic monotone framework and interprocedural taint analysis, propagating information across Move modules and packages.

In addition to classic IRs common in static analyzers, the tool stores extracted information about the source code as Datalog-style facts. This is done for extensibility: users can access these facts directly from the eDSL to create new rules, regardless of whether the information is purely structural or “semantic” data gathered from LLM classification. This approach can also be used to generate project-specific rules or to cover common vulnerability patterns.

The code facts are defined in src/core/facts.py, and represent the structural and semantic information. The goal of such a separation is flexibility: it enables the user or the model to combine these facts to build custom rules. Without changing the analyzer itself or using it as a framework.

A typical dump of code facts produced via --dump-facts=<DIR> contains information about each function and struct in the project, including LLM classifications and project categories. This output can be used for debugging and for designing new project-specific rules. It prints the code fragments of all the modules with the relevant facts. Here is a small example:

Rules and structural filters

The main detection logic is implemented in a small macro-based eDSL. Hy is chosen for dual interoperability with Python and its macro capabilities, which allow rules to be expressed in a concise format. This makes rules easier to read and, if needed, easier to generate via an LLM for a specific project.

After collecting structural information, the eDSL is used to directly access these facts, along with utilities to manipulate and combine them.

Currently, the tool supports 45 rules.

Here is one of the available rules:

;; ---------------------------------------------------------------------------
;; centralized-reward-distribution - Admin picks lottery/game winners
;; ---------------------------------------------------------------------------
;; Gaming project where admin-controlled function distributes rewards to
;; admin-chosen recipients. No verifiable on-chain randomness - users must
;; trust the admin to be fair.
;; Impact: Unfair game - legitimate players never win.
(defrule centralized-reward-distribution
:severity :medium
:categories [:centralization :fairness]
:description "Admin-controlled reward distribution - no verifiable winner selection"
:match (fun :public)
:filter (and
(project-category? "gaming" facts ctx) ;; Gaming/lottery/gambling project
(checks-sender*? f facts ctx) ;; Admin-gated (transitive sender check)
(transfers-from-shared-object? f facts ctx) ;; Extracts from shared pool
(has-param-type? f "address" facts ctx) ;; Has address param = admin picks recipient
(not (transfers-from-sender? f facts ctx)) ;; Not user withdrawing own funds
(not (is-init? f facts ctx))))

Here, the rule file uses helper functions that combine existing code facts to make them more convenient to use. The naming is literal: functions ending with ? return booleans, and * indicates transitive behavior.

LLM classification

LLM classification is used in three cases:

  1. Project-wide feature classification: for example, the presence of a global pause, versioning, and project categories. This information is used to adjust specific rules.
  2. Data classification: for each struct, the tool determines whether it contains sensitive data, configuration parameters, or protocol invariants, and whether it is intended to be owned by a privileged user.
  3. Rule double-checking: in the :classify section for a subset of rules, to reduce the false-positive rate by handling subtle Move patterns or intentional design decisions.

The generated prompts are based on Jinja2 templates and are available in src/prompts/.

Here is the real-world example prompt generated from the data sensitivity classification template:

> Show spoiler

LLM output:

[
{
"field": "LockerCap::liquidation_ratio",
"reason": "economic",
"confidence": 0.85
},
{
"field": "LockerCap::price_with_discount_ratio",
"reason": "economic",
"confidence": 0.90
},
{
"field": "LockerCap::inactivation_delay",
"reason": "availability",
"confidence": 0.75
}
]

Which is correctly used to generate the following semantic facts:

  • liquidation_ratioeconomic (manipulate = unfair liquidations)
  • price_with_discount_ratioeconomic (manipulate = steal funds via discounts)
  • inactivation_delayavailability (manipulate = trap lockers or let them escape early)

The model also correctly ignored counters, timestamps, data structures – because of the reduced scope and concrete prompt.

Access control and centralization risks detection

In Sui Move, access control is expressed through ownership of capability objects. A capability is a first-class object that grants permission to perform a restricted operation. Functions require specific capability types as parameters, and only callers that own the corresponding object can invoke those operations. Capabilities are typically created during initialization and transferred explicitly, making access control decisions explicit and traceable in the code.

The tool implements a simple IR based on code facts that tracks capabilities in a graph. A Mermaid dump of this IR (accessible through --dump-cap-graph=<DIR>) looks like this:

Skry models this access control structure using a capability graph derived from code facts. The graph captures which addresses own which capabilities, which functions require those capabilities, and which objects are mutated as a result. This representation makes privilege boundaries, capability hierarchies, and sensitive state transitions explicit and analyzable.

In addition, the Move ownership model distinguishes between shared and owned objects. This distinction is critical for access control analysis, as shared objects enable global access patterns, while owned objects enforce per-address authority.

Evaluation

Evaluation Setup

Data

We evaluated the tool on a set of real-world Sui contracts with source code available on GitHub.

The following criteria were used to select projects:

  • smart contracts only (no libraries),
  • only Sui Move; Aptos and other Move-based ecosystems were excluded,
  • production-quality projects: no forks, tutorials, or hackathon artifacts,
  • primarily targeting Move 2024, with a small number of older projects included.

In total, we identified 94 projects matching these criteria.

In addition, audit reports from prior security assessments were used to reproduce historical findings.

Approach

The evaluation approach was to reproduce critical access control issues on historical code from large Sui projects and to test non-critical rules on a collected set of contracts.

To evaluate the tool on large, production-ready projects, we reintroduced findings previously reported in audits by top firms. This was done by applying targeted mutations to production code that matched those findings and verifying that the tool detected them.

The evaluation focuses only on manually verified, non-exploitable issues. These findings are used to demonstrate the tool’s detection capabilities. All cases were reviewed manually to confirm that they either reflect intentional design decisions that warrant manual validation or originate from historical source code. No new exploitable vulnerabilities are claimed.

Models and the cost of evaluation

For evaluation purposes, two models were used:

  • DeepSeek — chosen for its inexpensive API and precise adherence to prompt instructions.
  • Opus 4.5 — provides the best accuracy and demonstrates strong "knowledge" of Move; available via Claude Code and callable from the CLI, making it cheaper than API-based usage.

In addition to API-based execution, the tool supports multiple modes, including a manual mode for prompt debugging and integration with Claude Code.

The typical number of prompts and overall cost depend on the project. Larger codebases and a higher number of potentially sensitive functions result in more prompts. In practice, small projects (around three Move files) require approximately seven prompts, while large production projects require around 30–40 queries.

The tool also uses caching: all LLM prompts and responses are cached and reused in subsequent executions unless a fresh run is explicitly forced.

Critical access control issues

These rules detect high-impact access control and capability-handling issues that are typically exploitable and unacceptable in production code.

These rules include:

  • unprotected-pause – Access control issue where a user can manipulate the global lock mechanism (critical).
  • sensitive-internal-public-exposure – Internal helper exposed as public.
  • generic-type-mismatch – Generic type parameter used without validation.
  • arbitrary-recipient-drain – Transfer to a user-controlled address without authorization.
  • missing-authorization – Entry function reaches a dangerous sink without an authorization check.
  • user-asset-write-without-ownership – Write to user assets without ownership proof.
  • missing-destroy-guard – Capability destruction without authorization.
  • capability-takeover – Capability can be acquired by an unauthorized address.
  • capability-leak-via-store – Capability stored in a shared object field.
  • phantom-type-mismatch – Capability guard uses a different phantom type than the target.
  • test-only-missing – Public function returns a privileged capability without #[test_only].
  • duplicated_branch_condition – Same branch condition appears multiple times.

These issues have critical severity. A valid finding in production code is likely exploitable, and therefore such issues are not expected to appear in audited projects.

To validate the tool, previously reported audit findings were reintroduced, and the analyzer was tested to ensure it detects them. The evaluated protocols are production-grade, large, and have mature, audited codebases. The critical issues listed below were present in earlier audits and are not deployed in live systems; the purpose here is tool evaluation.

Reproduced findings include:

Finding IDProjectAnalyzer’s warning
STG-03Navisensitive-internal-public-exposure
POOL-01Naviarbitrary-recipient-drain
OS-NVI-ADV-00Navigeneric-type-mismatch
AMA-1Balancedsensitive-internal-public-exposure

This validation approach reintroduces known security issues from audits, runs the analyzer on the modified code, and checks that the expected warnings are produced. This approach is used because unaudited source code for large production projects is generally no longer available.

To validate these warnings, the following mutations were applied.

Pause and version usage anomalies

A pause or global lock is a common pattern in smart contracts and is typically used for security purposes. The absence of a pause mechanism may indicate a centralization issue, a missing check, or a valid design decision. In most cases, such issues have medium severity and should be mentioned in audits. Only in rare cases does a missing pause check lead to severe or unexpected behavior.

In the current implementation, there are three related rules. The pause-check-missing and version-check-missing rules detect anomalies in version and global pause usage. If a public function omits a check that is consistently applied in similar functions, it is reported.

While an unprotected version check in Move upgradable contracts is considered a critical vulnerability, missing pause checks are often intentional design decisions or non-exploitable bugs.

An example of such a warning is shown below:

[HIGH][pause-check-missing][sources/treasury.move:419:5] in function 'multisig_treasury::treasury::create_simple_proposal'

The treasury has an is_frozen state that is checked in create_emergency_proposal, but create_proposal and create_simple_proposal never check it. As a result, this is highlighted by the rule as a pause check anomaly. In this specific case, it may be a deliberate design choice if the freeze mechanism is intended to block only the emergency fast-track.

Centralization risks

Centralization risks correspond to intentional design decisions or missing features where privileged user(s) have excessive control over critical project logic.

Below are some example rules and real-world findings.

single-step-ownership

The rule highlights single-step ownership transfers.

In the source code, this appears as the following pattern:

public fun admin_transfer(ac AdminCap, recipient: address) {
transfer::transfer(ac, recipient);
}

The issue with this pattern is that if the current admin provides an incorrect address (e.g., due to a typo, phishing attack, or copy-paste error), the admin capability is irrecoverably lost, with no mechanism to cancel or correct the transfer.

Here are example findings highlighting similar patterns:

[HIGH][single-step-ownership][liquid_staking/sources/ownership.move:33:5] in function 'liquid_staking::ownership::transfer_owner'
[HIGH][single-step-ownership][liquid_staking/sources/ownership.move:47:5] in function 'liquid_staking::ownership::transfer_operator'

Both functions transfer_operator and transfer_owner implement the single-step ownership pattern. These can be improved by using a two-step transfer pattern: the current owner calls, for example, offer(new_addr) to set a pending recipient, and the new owner then calls claim() to accept. This ensures the recipient address is valid and controlled, and allows cancellation in case of a mistake.

centralized-reward-distribution

Detects admin-controlled reward distribution in projects classified as "gaming", where there is no verifiable winner selection.

An example warning for a lottery project:

[MEDIUM][centralized-reward-distribution][sources/core.move:147:5] in function 'rtmtree::longshot_jackpot::goal_shot'

An admin-gated function decides who receives rewards and when. The player address is passed as a parameter, but execution is controlled by the admin. A malicious admin can refuse to distribute rewards to legitimate winners.

admin-bypasses-pause

Detects a possible centralization risk when an admin can bypass the global lock mechanism.

Here is an example warning:

[INFO][admin-bypasses-pause][lending_core/sources/pool.move:228:5] in function 'lending_core::pool::withdraw_treasury'

withdraw_treasury requires the PoolAdminCap capability. A pause mechanism exists via the Storage.paused field, which is checked through when_not_paused() in user-facing lending functions. There is no pause check here: withdraw_treasury does not take Storage as a parameter and therefore cannot check the pause state. This creates a centralization risk: an admin can drain the treasury even when the protocol is paused.

The severity is informational, reflecting the fact that this may be an intentional design decision, while still requiring additional attention.

missing-admin-event

Generates an informative warning when critical protocol changes may require emitting an event. This is similar to Slither’s missing event detector, but relies on LLM classification.

Some example informational warnings that require manual validation:

[INFO][missing-admin-event][sources/patience.move:160:5] in function 'patience::patience::withdraw_fee' - extracts value. Emit event with: amount
[INFO][missing-admin-event][sources/patience.move:166:5] in function 'patience::patience::admin_transfer' - transfers to user-controlled address. Emit event with: recipient
[INFO][missing-admin-event][contracts/core/sources/prize_pool.move:161:1] in function 'anglerfish::prize_pool::claim_protocol_fee' - extracts value. Emit event with: amount
[INFO][missing-admin-event][contracts/core/sources/prize_pool.move:172:1] in function 'anglerfish::prize_pool::claim_treasury_reserve' - extracts value. Emit event with: amount

Protocol configuration parameters and invariants

In Sui Move, all data is expressed in structs. Configuration parameters may be either:

  • mutable configuration parameters — values expected to change during the contract’s lifecycle. Examples: fee_rate, loyalty_address.
  • immutable protocol invariants — parameters set once during initialization. Examples: protocol_fee_bps, decimals (e.g., to manipulate specific tokens).

Some detectors rely on LLM-based classification and highlight anomalies that should be checked manually:

missing-mutable-config-setter

The rule relies on LLM classification for struct fields and static analysis that propagates data. If the model detects that a struct field represents a configuration value that is expected to be mutable, the analysis checks whether any setters exist for it.

Some example real-world findings:

[MEDIUM][missing-mutable-config-setter][core/sources/satlayer_pool.move:55:1] field 'satlayer_core::satlayer_pool::Vault.min_deposit_amount'

The Vault struct has admin setters for similar configuration fields:

  • staking_capset_staking_cap
  • withdrawal_cooldownupdate_withdrawal_time
  • is_pausedtoggle_vault_pause
  • caps_enabledset_caps_enabled
  • min_deposit_amountno setter

This makes the finding valid: min_deposit_amount is correctly classified as a configuration field and may require a setter, unless this is an intentional design decision.

[MEDIUM][missing-mutable-config-setter][sources/curve.move:42:5] field 'pumpfun::curve::Configurator.swap_fee'

The admin has setters for all other Configurator fields, but not swap_fee. This looks like an oversight correctly classified and highlighted by the analyzer. If the swap fee needs adjustment, the admin has no way to change it.

mutable-protocol-invariant

The rule is similar to missing-mutable-config-setter, but detects the opposite pattern: if the LLM classifies a field as an immutable protocol invariant set once during initialization, it should never be changed.

Some examples of real-world findings include:

[HIGH][mutable-protocol-invariant][sources/lootbox.move:118:5] in function 'suigar::lootbox::edit_lootbox' writes to invariant 'suigar::lootbox::LootBox.reward_amounts'
[HIGH][mutable-protocol-invariant][sources/lootbox.move:118:5] in function 'suigar::lootbox::edit_lootbox' writes to invariant 'suigar::lootbox::LootBox.reward_probabilities'

This means that the LLM classified two fields of LootBox as protocol invariants that should be immutable, but they are modified in edit_lootbox. If the contract were to separate purchase and reveal into two transactions, an admin could change reward distributions between these steps, causing users to receive different payouts than expected at purchase time. While this may be an intentional design decision or a centralization risk, it warrants medium severity.

Evaluation Results

The problem with precision evaluation is that non-exploitable findings often resemble intentional design decisions, even if these don't follow security best practices. As a result, it is impossible to provide an exact number for non-critical warnings.

For small projects with 3–4 modules, the tool typically generates 0 to 5 warnings, including some unclear cases that require manual review, some true positives, and some false positives. Overall, it is not noisy. For the largest production codebases, the tool generates around 40 warnings, including a number of valid concerns that deserve discussion during a security assessment.

Key insights on false positives:

  • Many false positives are related to limitations in the current (proof-of-concept) analyzer's implementation; a more mature codebase would significantly reduce the rate.
  • LLM misclassification does occur, but can be mitigated by providing more focused context and requesting explicit reasoning steps, followed by prompt refinement. While 100% accuracy or guarantees are not possible, precision can be improved at the cost of higher inference overhead.

Challenges

The key challenges we encountered include:

  • It is impossible to distinguish design decisions from valid findings without direct contact with project owners, which is especially relevant for non-critical findings.
  • In many cases, there is no source code available to reproduce previous audit findings. Projects are often renamed, removed, or do not publish the audited code, making historical verification difficult.
  • Most evaluated projects are mature and well audited, so access control issues are relatively rare.
  • Cost is a minor concern. Having Claude Code installed allowed us to run the tool on around 100 contracts, occasionally hitting usage limits.

Conclusion

The tool is still proof-of-concept. The approach has been tested and it works. The analyzer produces valid warnings that belong in a security assessment for Sui Move smart contracts. However, the analysis is not yet comprehensive or systematic and requires further work.

Skry is a static analyzer at its core – this is where most engineering effort belongs. The current version shows feasibility, not precision. Improving accuracy requires improving the analysis engine itself. Move is a relatively large language, and supporting more real-world patterns will require additional engineering effort.

Current use cases

Despite its proof-of-concept status, the analysis is relatively cheap and already usable in practice:

  • Bug finding before deployment: can be used as an additional check before audit and deployment.
  • Audit assistance: the primary use case. The tool highlights issues for manual validation and can surface potential audit findings.
  • Centralization risk validation: can be used to assess trust assumptions and administrative control when evaluating a project.

Future work

The core of the tool is the static analysis engine. Improving it will improve both static precision and LLM-based classification by enabling more specific and constrained prompts.

Some important areas for future work include:

  • Advanced Sui Move pattern support: for example, OTW-based access control, address-based access control patterns (more common in Aptos, but sometimes present in Sui in the wild), and more comprehensive tainted data propagation covering all language constructs.
  • IR improvement: the current IR is relatively simplistic and expresses only the information required for access control detection.
  • Path sensitivity: full symbolic execution for Move is complex but feasible; introducing simple sink-protection facts and dominance relations in the IR is a reasonable starting point.
  • Object lifecycle tracking: as an extension of dataflow analysis to cover additional patterns.
  • Execution optimization: routines involving fact processing and interprocedural, cross-module analysis can be optimized to improve performance.
  • Dependency management: understanding external dependencies is important; the implementation may require integration with the build system to obtain actual dependency sources.

While improving the analysis engine will make the analyzer more accurate and effective, new rule categories beyond access control and governance can also be introduced:

  • Variable misuse: using variables of the correct type in an incorrect context (e.g., incorrect argument ordering or checking properties of the wrong variable, as in this example). Papers such as this one cover this topic in detail. A proper implementation would require IR improvements to reduce the number of LLM requests.
  • Oracle patterns: while less widespread compared to access control issues, oracle API misuse still appears in audit findings and is worth covering.
  • Flash loan and slippage issues: although Sui smart contract design protects against many flash loan attacks, some patterns still appear in audits and could be addressed.
  • Cross-project pattern extraction: while invariant checking is typically handled by a different class of tools, the existing eDSL could be leveraged to generate new rules using LLMs, based on patterns extracted from existing projects via a RAG-style approach combined with code mutation.

References

  1. Xia et al – AuditGPT: Auditing Smart Contracts with ChatGPT
  2. Sun et al – GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis
  3. Li et al – The Hitchhiker’s Guide to Program Analysis, Part II: Deep Thoughts by LLMs