User Tools

Site Tools


yescrypt

yescrypt

Designer

  • Alexander Peslyak

Download Submission

Strengths summary

  • Builds upon scrypt, computes classic scrypt and native yescrypt hashes
  • High flexibility and large arsenal of defenses
    • One password hashing scheme for many possible use cases that deals with many kinds of attacks
  • Scalable from kilobytes (RAM) to terabytes (RAM + ROM) and beyond
    • while providing adequate (or better) security vs. alternatives for same defender's {time, memory} cost
  • Scalable to arbitrary SIMD vector width and instruction-level parallelism
  • Optional TMTO resistance
    • When enabled, additionally makes attacks' area-time a few times higher
  • Optional bcrypt-like GPU unfriendliness (especially important at low memory usage settings)
  • Optional multiplication latency hardening (efficient at least on common x86 and ARM CPUs)
  • Running time optimally tunable separately from memory usage and parallelism
  • Client-side computation of almost final yescrypt hashes (server relief)
    • in a way allowing for a straightforward extension of SCRAM (RFC 5802)
  • Hash upgrades to higher cost settings without knowledge of passwords
  • Cryptographic security is based on that of SHA-256, HMAC, and PBKDF2
    • The rest of processing may be considered non-cryptographic
    • Known unfortunate peculiarities of HMAC and PBKDF2 are fully avoided

Strengths in detail

Builds upon scrypt

  • Directly builds upon the well-studied scrypt
  • Can compute classic scrypt hashes (for compatibility or when scrypt's properties suit a given use case well), sharing much of the code with yescrypt native hash support

SMix improvements

Read-only lookup table (ROM)
  • Optional ROM allows for scalability from kilobytes (RAM) 1) to terabytes (RAM + ROM) and beyond
    • while providing adequate (or better) security vs. alternatives for same defender's {time, memory} cost
  • Can be arbitrarily large regardless of running time
    • This differs from scrypt, where memory usage for a given running time is limited, so scrypt's memory usage might not be high enough when high request rate capacity is desired (scrypt's recommended minimum of 16+ MB may be hard to achieve at thousands of requests per second)
  • Optional time-memory trade-off (TMTO) resistance
    • Strongly recommended for most use cases, enabled with the YESCRYPT_RW flag
    • Can be disabled (by not setting the flag) when TMTO is expected to be needed by a defender (e.g., for multi-gigabyte key derivation that might need to be repeated later on a small mobile device)
  • TMTO resistance (when enabled) additionally increases attacker's area-time cost by a factor of 2x to 4x relative to scrypt
  • TMTO resistance (when enabled) additionally results in the highest normalized area-time cost for attacks being reached at 4/3*N combined iterations of SMix's two loops, vs. scrypt's 2*N - or at 2/3 of scrypt's SMix iteration count
    • Allows for usage of a higher N and thus for higher area-time cost to attacker
Tunable running time
  • Running time tunable separately from memory usage and parallelism
    • Provides an up to 2x improvement in area-time cost over the workaround (normally suggested for scrypt) of using scrypt's p to increase running time at same memory usage
Thread-level parallelism
  • Thread-level parallelism may optionally be made deliberately inflexible (not allowing for efficient sequential computation in less memory), enabled along with the YESCRYPT_RW flag

BlockMix improvements

The revised BlockMix, which uses mostly yescrypt's pwxform (parallel wide transformation) instead of scrypt's Salsa20/8, is enabled along with the YESCRYPT_RW flag.

  • Scalable to arbitrary SIMD vector width and instruction-level parallelism
    • To be configurable via compile-time settings, or chosen at runtime from common presets
    • Current defaults are for 4x 128-bit SIMD lanes (512 bits)
    • scrypt's p could be brought down to instruction level, but it would be sub-optimal
  • bcrypt-like GPU unfriendliness (especially important at low memory usage settings)
    • by including bcrypt-like rapid random reads, typically from defender CPU's L1 data cache
  • Multiplication latency hardening (efficient at least on common x86 and ARM CPUs)
    • (Integer) multiplication takes about as much time on common CPUs as it does in ASIC, limiting possible attack speedups (for same memory usage) even with much faster memory
    • This differs from scrypt's Salsa20/8, which is optimally computed in a lot fewer clock cycles in ASIC than on CPU
    • 32×32 to 64-bit: [I]MUL on x86[-64], [V]PMULUDQ with SSE2/AVX/AVX2; UMLAL [or VMLAL] on ARM [with NEON]

Catena-like features

  • Client-side computation of almost final yescrypt hashes (server relief)
    • in a way allowing for a straightforward extension of SCRAM (RFC 5802)
  • Hash upgrades to higher cost settings without knowledge of passwords
    • supporting over 60% area-time efficiency, as opposed to Catena's over 33%

Cryptographic security

  • Cryptographic security (collision resistance, preimage and second preimage resistance) is based on that of SHA-256, HMAC, and PBKDF2
  • The rest of processing, while crucial for increasing the cost of password cracking attacks, may be considered non-cryptographic
    • There are entropy bypasses to final PBKDF2 step for both password and salt inputs
    • For comparison, scrypt has such entropy bypass for the password input only
  • Known unfortunate peculiarities of HMAC and PBKDF2 are fully avoided in the way these primitives are used

Drawbacks

  • Just one major drawback: complexity!
    • scrypt was already pretty complicated, and the full yescrypt is more so
    • The complexity is arguably justified by yescrypt's flexibility and its arsenal of defenses: it's one password hashing scheme for many possible use cases that deals with many kinds of attacks
    • The alternative would have been defining multiple simpler schemes, but their total complexity would probably be even greater
    • Cut-down yet compatible implementation with partial functionality is planned
  • While pwxform's default 4x 128-bit SIMD parallelism fits SIMD-enabled builds running on modern CPUs well, it is excessive for older CPUs and non-SIMD builds, making such slower builds of yescrypt weaker than similar builds of bcrypt in terms of GPU attack resistance (when yescrypt's memory usage is low)
    • Using much lower parallelism settings when appropriate should address this, but the risk of people (inadvertently?) using a high SIMD- and instruction-level parallelism mode in a build not suitable for that will remain
  • Mostly susceptible to cache-timing side-channels, with only a small initial portion of computation being cache-timing resistant

Pseudocode

This pseudocode illustrates yescrypt running with p=1 (no thread-level parallelism), g=0 (no hash upgrades performed yet), flags=YESCRYPT_RW (full native mode), and no ROM.

# ** Functions/symbols **
# ||                           Concatenate two strings
# HMAC(h, k, v)                HMAC with hash function h and key k over value v
# PBKDF2(prf, p, s, c, dklen)  PBKDF2 key derivation function
# substr(s, start, length)     Substring from start (zero-based) of length bytes
# le32dec(), le32enc()         32-bit little-endian decoding/encoding
# SIMD_[un]shuffle()           Salsa20 SIMD (un)shuffling of 32-bit words
# Integerify(B, r)             Parse B_{2r-1} as a little-endian integer
# p2floor(x)                   Largest power of 2 not greater than x
 
# ** Inputs **
string   password
string   salt
integer  t_cost
integer  m_cost
integer  outlen
 
# ** Algorithm **
N = 8 << m_cost
r = 8
 
# ** Settings hard-coded/assumed below in this pseudocode **
# p = 1
# g = 0
# flags = YESCRYPT_RW
# no ROM
 
# If m_cost is 16 MB per thread or more, pre-hash using 1/64th of m_cost first,
# to mitigate garbage collector attacks.  yescrypt_prehash() is almost the same
# as this function, but its personalization HMAC key is "yescrypt-prehash"
# rather than "yescrypt", it skips builtin SCRAM finalization, and it will not
# invoke another yescrypt_prehash().
if (N / p >= 0x100 && N / p * r >= 0x20000)
    password = yescrypt_prehash(password, salt, t_cost, m_cost / 64, 32)
 
password = HMAC(SHA-256, "yescrypt", password)
B = PBKDF2(HMAC-SHA-256, password, salt, 1, 128 * r)
password = substr(B, 0, 32)
 
# SMix1 invoked to initialize pwxform S-boxes
X = SIMD_shuffle(le32dec(B))
for i = 0 to Sbytes/128 - 1
    S[i] = X
    X = BlockMix_{Salsa20/8, 1}(X)
 
# SMix1 invoked "for real"
for i = 0 to N - 1
    V[i] = X
    if (i > 1)
        j = Wrap(Integerify(X, r), i)
        X = X xor V[j]
    X = BlockMix_pwxform{Salsa20/2, S, r}(X)
 
# SMix2
if (t_cost = 0)
    Nloop = (N + 2) / 3
else if (t_cost = 1)
    Nloop = (N * 2 + 2) / 3
else
    Nloop = N * (t - 1)
for i = 0 to Nloop - 1
    j = Integerify(X, r) mod N
    X = X xor V[j]
    V[j] = X
    X = BlockMix_pwxform{Salsa20/2, S, r}(X)
B = le32enc(SIMD_unshuffle(X))
 
out = PBKDF2(HMAC-SHA-256, password, B, 1, outlen)
 
# Builtin SCRAM (RFC 5802) support
clen = min(outlen, 32)
dk = PBKDF2(HMAC-SHA-256, password, B, 1, 32)
dk = SHA-256(HMAC(SHA-256, dk, "Client Key"))
out = substr(dk, 0, clen) || substr(out, clen, outlen - clen)
 
return out
 
# ** Helper functions **
 
# Wrap x to the range 0 to i-1
Wrap(x, i)
    n = p2floor(i)
    return (x mod n) + (i - n)

The BlockMix_{Salsa20/8, r}(X) function is exactly the same as in scrypt (and is invoked with hard-coded r=1 here, regardless of yescrypt's r). The BlockMix_pwxform{Salsa20/2, S, r}(X) function is yescrypt's own, and it uses yescrypt's own pwxform in place of most uses of Salsa20/2. Please refer to the yescrypt specification document and the deliberately mostly not optimized reference implementation (yescrypt-ref.c) for how these functions, as well as the SIMD (un)shuffling, are specified.

Presentation

1) Also possible due to yescrypt's bcrypt-like GPU unfriendliness in BlockMix, enabled along with YESCRYPT_RW
yescrypt.txt · Last modified: 2016/03/15 02:06 by solardiz