CryptoHack was asked to make some challenges for CSAW 2021 and Bits was our submission for the qualifiers, written by Robin and Jack. For those who qualified for the finals, you’ll have the chance to solve a few more CryptoHack challenges, but for now, we wanted to go through Bits, explain some potential solutions and some cover a few interesting things we learnt when building the challenge itself.

Let’s kick off with the puzzle itself, which was given to CSAW players as an interactive challenge with the following source code

Challenge

I wrote this oracle in rust so that it can’t sue companies over java stuff

Source

use std::io::BufRead;
use getrandom::getrandom;
use rug::{
    rand::{RandGen,RandState},
    Integer
};
use sha2::{Sha256,Digest};
use aes::{Aes256,Aes256Ctr,NewBlockCipher,cipher::{FromBlockCipher,StreamCipher}};
use generic_array::GenericArray;

// Secret sauce
// N = p*q; p ≡ q ≡ 3 (mod 4); p, q prime
use hardcore::{dlog, N, G, ORDER, FLAG};

struct SystemRandom;
impl RandGen for SystemRandom {
    fn gen(&mut self) -> u32 {
        let mut buf: [u8; 4] = [0; 4];
        let _ = getrandom(&mut buf).unwrap();
        ((buf[0] as u32) << 24) | ((buf[1] as u32) << 16) | ((buf[2] as u32) << 8) | (buf[3] as u32)
    }
}

fn encrypt_flag(shared: Integer) {
    let mut hasher = Sha256::new();
    hasher.update(shared.to_string());
    let key = hasher.finalize();
    let mut cipher = Aes256Ctr::from_block_cipher(
        Aes256::new_from_slice(&key.as_slice()).unwrap(),
        &GenericArray::clone_from_slice(&[0; 16])
        );
    let mut flag = FLAG.clone();
    cipher.apply_keystream(&mut flag);
    println!("FLAG = {}", flag.iter().map(|c| format!("{:02x}", c)).collect::<String>());
}

fn main() {
    println!("+++++++++++++++++++++++++++++++++++++++++++++++\n\
              + I hear there's a mythical oracle at Delphi. +\n\
              +++++++++++++++++++++++++++++++++++++++++++++++\n");
    let mut sysrng = SystemRandom;
    let mut rnd = RandState::new_custom(&mut sysrng);
    let d = Integer::from(&*ORDER).random_below(&mut rnd);
    let publ = Integer::from(&*G).pow_mod(&d, &*N).unwrap();
    let nbits = ORDER.significant_bits();
    let alice = Integer::from(&*G).pow_mod(&Integer::from(&*ORDER).random_below(&mut rnd), &*N).unwrap();
    println!("N = {}\nG = {}\npubl = {}\nalice = {}\nnbits = {}",
        *N,
        *G,
        publ,
        alice,
        nbits);
    encrypt_flag(alice.pow_mod(&d, &N).unwrap());
    for line in std::io::stdin().lock().lines() {
        let input = line.unwrap().parse::<Integer>().unwrap();
        match dlog(input.clone()) {
            None => println!("-1"),
            Some(x) => {
                assert!(G.clone().pow_mod(&x, &*N).unwrap() == input % &*N);
                assert!(x < *ORDER);
                assert!(x >= 0);
                println!("{}", x.get_bit(nbits - 123) as i32)
            }
        }
    }
}

Overview of the challenge

Upon inspection of the source code, we see that we can obtain an encrypted version of the flag, encrypted with the shared secret of a Diffie-Hellman key exchange. What’s special about this key exchange, is that instead of the usual $\mathbb{F}_p^\star$, we’re working in the group $(\mathbb{Z}/n\mathbb{Z})^\star$ with

\[\begin{align} n &= pq\newline p &\equiv 3 \pmod 4\newline q &\equiv 3 \pmod 4\newline p, q &\text{ prime}, \end{align}\]

which implies among other things that there is no element $g$ that generates the entire group.

Furthermore, the challenge provides us with an oracle that reveals the $123^{\textrm{rd}}$ MSB to us. While later we will cover exactly how the group is backdoored such that this oracle can even exist — and how this backdoor can be abused to solved the challenge — we will cover a proper solution, that does not depend on the practical existence of this oracle. This solution serves as a proof by construction of the bit-hardness of the $123^{\textrm{rd}}$ MSB for the discrete log in the group $(\mathbb{Z}/n\mathbb{Z})^\star$, something that can be easily extended to most other bits.

Challenge plumbing

Let’s start by implementing the communication we will need with the server, and some other utility functions:

import os; os.environ["PWNLIB_NOTERM"] = "1" # for tqdm this time
from pwn import *
import tqdm
from json import loads, dumps
from Crypto.Cipher import AES
from Crypto.PublicKey import RSA
from Crypto.Util.Padding import unpad
import hashlib
from sage.all import GF, discrete_log, crt, sqrt, ZZ


if args.LOCAL:
    io = process(["./target/release/hardcore"])
else:
    io = remote(args.HOST, args.PORT)

io.recvuntil(b"N = ")
n = int(io.recvline())
g = int(io.recvline().split(b" = ")[1])
pub = int(io.recvline().split(b" = ")[1])
alice = int(io.recvline().split(b" = ")[1])
nbits = int(io.recvline().split(b" = ")[1])
FLAG = bytes.fromhex(io.recvline().split(b" = ")[1].strip().decode())

# Convert so that LSB is at index 0
index = nbits - 123

# Once we obtain the private key, we can decrypt the flag
def dec(d):
    shared = pow(alice, d, n)
    key = hashlib.sha256(str(shared).encode()).digest()
    cipher = AES.new(key, AES.MODE_CTR, nonce=bytes(12))
    try:
        return cipher.decrypt(FLAG).decode()
    except Exception as e:
        return str(e)

# Query the oracle for point P, by default fail hard 
# if it's not a point generated by `g`
def has(P, fail=True):
    io.sendline(str(P).encode())
    res = int(io.recvline())
    if fail:
        assert res in [0,1]
    return {1: True, 0: False, -1: None}[res]

Recovering low bits

Given the public information $g$, $g^d \mod n$ and the challenge oracle, we will first proceed to recover all less significant bits than the one the oracle can reveal to us. When the number of more significant bits would be lower than it is in this challenge (or when you’re willing to spend plenty of CPU time on this challenge), this would be sufficient to mount an attack with e.g. the baby-step-giant-step algorithm to retrieve the rest of the private key $d$ in $\tilde{\mathcal{O}}(\sqrt{2^\ell})$, where $\ell$ represents the number of more significant bits.

One of the key observations is that given $g^d$ we can easily compute $g^{d + a}$ and $g^{d - a}$, without any knowledge of $d$. In particular this allows us to set a bit we know is unset, or clear it when we know it is set.

def setbit(P, i):
    return (P * pow(g, 1 << i, n)) % n

def clearbit(P, i):
    # requires modern enough python 3 for the negative 
    # exponent, use modular inverse otherwise
    return (P * pow(g, -(1 << i), n)) % n

Then, noting how (binary) addition works with carry, we see that adding a single bit at position $i$, will change a bit sequence of the form 011..1 (with the least significant $1$ at position $i$), into 100...0 without changing any other bits. So, when we arrange our input $g^{d’}$ into the oracle such that position $123$ contains a $0$, and every bit from there until the position $\alpha$ we’re looking at contains a $1$ (we can easily do this inductively by first determining the bit and setting it when needed), we know that the bit at $\alpha$ contains a one if and only if the oracle responds with $1$ for the input $g^{d’ + 2^\alpha}$.

# Quick and dirty global variable
d = 0
def set(i):
    global d
    d |= (1 << i)

def right_bits(P, i):
    if has(P):
        set(i)
        P = clearbit(P, i)
    for j in tqdm.trange(i - 1, -1, -1):
        if has(setbit(P, j)):
            set(j)
        else:
            P = setbit(P, j)
    return P

Let’s factor

At this point, we’d like to be able to somehow slide the more significant bits into the view the oracle offers us. If we could take a square root mod $n$, and distinguish whether it is the principal square root, i.e. the one that corresponds to $g^{\frac{d}{2}}$, this would be achievable. Unfortunately, taking a square root mod $n$ turns out to be as hard as factoring $n$ when $n$ is not a prime power.

Therefore, we will first apply our newfound power to find a good amount of least significant bits to factor $n$, before we continue down that road.

By Carmichael’s theorem, we know that $g^{\lambda(n)} \equiv 1 \pmod n$ and thus we can see that $g^{n} \equiv g^{\lambda(n)} g^{n - \lambda(n)} \equiv g^{n - \lambda(n)} \pmod n$. We know that $n = pq$, so

\[\lambda(n) = \mathrm{lcm}(p - 1, q - 1) = \frac{n - p - q + 1}{2},\]

under the minor assumption that $\gcd(p - 1, q - 1) = 2$. When we now assume that $p$ and $q$ are balanced enough — and otherwise factoring $n$ would be easier with the elliptic curve method — it follows that $n - \lambda(n) < 2^{1024-123}$ and can entirely be revealed with our right_bits method discussed above. From there, we can factor $n$ by the usual techniques, since we have 2 independent equations in 2 variables.

from Crypto.PublicKey import RSA

right_bits(pow(g, n, n), index)
order = n - d
assert pow(g, order, n) == 1

# Ugly hack so I don't have to implement the factorization myself :)
p = RSA.construct((n, 0x10001, pow(0x10001, -1, order))).p
assert n % p == 0 and n != p
q = n // p

# Reset d, don't reuse the stuff from `n - order`
d = 0

A first solution

Now that we’ve successfully factored $n$, we are actually already able to find the secret $d$ without explicitely recovering the high bits as we planned to initially. However, this is only due to the backdoor in the discrete log problem that had to be introduced to allow the construction of this bit oracle (recall that this entire challenge is essentially a proof of the bit-hardness of the discrete log, so it would not make sense to have an oracle without some weakness introduced into the group).

We constructed a backdoor-DLP group, in the following way: we choose two primes $p$ and $q$ such that $p-1$ and $q - 1$ are both $B$-smooth, for some appropriate bound $B$. Knowing the factorisation of $n = pq$, and additionally the factorization of $p-1$ and $q - 1$, we can calculate a discrete log in time $\tilde{\mathcal{O}}\left(\sqrt{B}\frac{\log p}{\log B}\right)$.

To see this, observe that we can calculate the discrete log mod $p$ and $q$ individually with the Pohlig-Hellman algorithm, which runs in time $\tilde{\mathcal{O}}(\sqrt{B})$ per prime factor, and then combine these two results with the chinese remainder theorem.

Introducing this weakness to the DLP additionally opens up $n$ to Pollard’s $p - 1$ factoring algorithm. This generally allows us to factor in time $\tilde{\mathcal{O}}(B \log^2(n))$, if $p - 1$ is powersmooth.

To avoid the unintended solution (🧀) of $p-1$ factoring without the lower bits known, we introduced an extra countermeasure into our construction of the backdoor by making $p - 1 = 2p_0^{16}p_1$ (and similarly for $q - 1$). This means that either of two things need to happen in order to successfully factor with a pollard $p - 1$ variant (within reasonable time):

  • Either a player needs to guess this fact, enumerating primes up to $B = 2^{30}$, but taking powers up to $B^{16}$ or higher;
  • or applying the variant where $B’!$ is used for $B’ = 16B$ as an exponent, leading to a seriously higher running time.

Given that we have at least a quadratic advantage in our discrete logarithm compared to factoring, and including some extra safety measures, we deemed this approach safe enough for a CTF challenge! Another potential extra mitigation could consist of adding some factor that is not $B$-smooth to each of $p - 1$ and $q - 1$ that is not used by the subgroup generated by $g$. That gives us a hidden subgroup generated by $g$ without the ability to factor $n$ with Pollard’s $p - 1$ method anymore, but this order is necessarily significantly smaller than $n$, which could potentially break the factoring approach outlined earlier, and is in our opinion less satisfying to have than a maximal subgroup of size $\frac{\varphi(n)}{2}$.

Given that the trapdoor has now been found by our solution-in-progress, we can apply the Pohlig-Hellman approach ourselves, and solve the challenge. Unfortunately, we have to rely on a technicality of the challenge implementation, so it is not quite satisfying yet.

from sage.all import GF, discrete_log, crt, sqrt, ZZ
dl_p = discrete_log(GF(p)(pub), GF(p)(g))
dl_q = discrete_log(GF(q)(pub), GF(q)(g))
private = int(crt([dl_p, dl_q], [p - 1, q - 1]))
print(dec(private))

Recovering high bits

Returning to the full solution that doesn’t depend on implementation details, we again start looking at finding principal square roots of $g^d$.

We want to find the square root of $g^d$ that corresponds to $g^{\frac{d}{2}}$ (this also still requires setting the LSB to $0$ to ensure $g^d$ is actually a quadratic residue). To get started, we first make sure we can find all modular square roots of $g^d$ and afterwards, we will use our established abilities to verify which of these is the principal square root. Once we have identified that, it’s only a matter of “shifting” the current $d$ to the right, and repeating these steps until all high bits have been found.

To find square roots $\mod pq$, we can find the square roots $\mod p$ and $\mod q$ individually, and combine them pairwise with the chinese remainder theorem. Because $(\mathbb{Z}/n\mathbb{Z})^*$ is not a cyclic group, and $g$ only generates half of the elements in the group, only 2 out of the 4 possible square roots will have a discrete log. To identify which of those two corresponds to the shift of $d$, we can use our old approach to identify if everything to the right of the oracle bit is still set to $1$, which will only be preserved for the square root where a shift happens.

def left_bits(P, idx):
    for i in tqdm.trange(idx + 1, nbits):
        P = setbit(clearbit(P, 0), idx) # Clear LSB -> make square; make the current bit part of the 1 sled before shifting
        for ss in [crt([ZZ(x), ZZ(y)], [p, q]) for x in sqrt(GF(p)(P), all=True) for y in sqrt(GF(q)(P), all=True)]:
            candidate_bit = has(ss, fail=False)
            if candidate_bit is None: continue
            if candidate_bit:
                Q = clearbit(ss, idx)
            else:
                Q = ss
            if has(setbit(Q, 0)): # see if it flows all the way over the 1s
                P = Q
                if candidate_bit:
                    set(i)
                break
        else:
            raise RuntimeError("Could not find a good square root")

If we had less missing bits, an alternative approach to recovering the high bits — not depending on the factorization of $n$ — could be constructed from a variant of Shanks’ Baby-Step Giant-Step algorithm, in time $\tilde{\mathcal{O}}\left(2^{\mathsf{index}/2}\right)$.

Putting it all together

With all prerequisites out of the way, it’s simply a matter of applying it to find the full solution.

d = 0
P = right_bits(pub, index)
left_bits(P, index)
print(dec(d))

One potential problem

One thing this writeup neglected so far is the possibility for right_bits to go wrong. When $d’ \ge |g|$, we would see a reduction mod $|g|$, and get invalid results. This doesn’t happen in this case because it becomes unlikely that this is triggered the farther to the right the oracle lies. Should this case happen nonetheless — and we can detect this by noticing that the discrete log is incorrect — this would imply that the most significant bit of $d$ has to be $1$, and as such we can set it to $0$, repeat our algorithm, and set the bit back to $1$ in our final result. This modification is then guaranteed not to have this wrap around problem.

Backdoor and preventing easy factorization

As discussed above, to build an oracle which could solve the discrete log problem in a reasonable amount of time we needed to weaken the problem by carefully picking the primes $p,q$. By allowing $(p-1)$ and $(q-1)$ to have many small factors, we could solve the discrete log $g^x \pmod p$ and $g^x \pmod q$ using Pohlig-Hellman and Baby-Step-Giant-Step, then combine the results using the Chinese remainder theorem. As mentioned in the main text, these small factors weakened the challenge by allowing Pollard’s $(p-1)$ factoring algorithm to factor $n$ without using $\lambda(n)$ at all. We attempted to protect against this by allowing $p = 2 \cdot p_0^{16} \cdot p_1$. The repeated factors complicated the factoring algorithm but speed up the discrete log as we can reuse our baby steps.

Ultimately, we chose to make the challenge in rust to enjoy the moderate speed up when compared with python, however the total time save (roughly a factor of two) was much smaller than we initally anticipated. We believe this ultimately comes down to the rug crate for rust and gmpy2 in python3 both calling GMP in C. Using python int rather than mpz slows down python by another factor of five or so.

The secret oracle we designed is given below.

#[macro_use]
extern crate lazy_static;

use std::collections::HashMap;
use std::sync::Mutex;
use rug::{Integer,Complete,ops::Pow};

pub static FLAG: &[u8;49] = b"flag{https://www.youtube.com/watch?v=uhTCeZasCmc}";

static P0: i64 = 785685301;
static P1: i64 = 633462701;
static GP: i64 = 2;
static Q0: i64 = 794309437;
static Q1: i64 = 942797321;
static GQ: i64 = 2;

lazy_static!(
    static ref PPOW: Integer = Integer::from(P0).pow(16u32);
    static ref QPOW: Integer = Integer::from(Q0).pow(16u32);
    static ref P: Integer = 2 * (&*PPOW * Integer::from(P1)) + 1;
    static ref Q: Integer = 2 * (&*QPOW * Integer::from(Q1)) + 1;
    pub static ref N: Integer = Integer::from(&*P * &*Q);
    pub static ref ORDER: Integer = Integer::from(&*P - 1).lcm(&Integer::from(&*Q - 1));
    pub static ref G: Integer = crt(&Integer::from(GP), &Integer::from(GQ), &*P, &*Q).unwrap();
);

fn crt(a: &Integer, b: &Integer, m1: &Integer, m2: &Integer) -> Option<Integer> {
    let common = Integer::from(m1.gcd_ref(m2));
    let m = Integer::from(m1.lcm_ref(m2));
    if b < a {
        crt(b, a, m2, m1)
    } else if Integer::from(b - a) % &common != 0 {
        None
    } else {
        let q = Integer::from(b - a) / &common;
        Some((a + q * m1 * Integer::from(m1/&common).invert(&Integer::from(m2/&common)).ok()?) % m)
    }
}

type BSKey = (Integer, Integer);
type BSVal = HashMap<Integer, Integer>;
type BSCache = HashMap<BSKey, BSVal>;
lazy_static!(
    static ref BS_CACHE: Mutex<BSCache> = Mutex::<BSCache>::new(BSCache::new());
);
pub fn baby_step(p: Integer, ell: Integer, gamma: Integer) {
    let key = (p.clone(), ell.clone());
    if !BS_CACHE.lock().unwrap().contains_key(&key) {
        let s: Integer = ell.sqrt() + 1;
        let mut bs = BSVal::with_capacity(s.to_usize_wrapping());
        let mut g = Integer::from(1);
        let mut m = Integer::from(0);
        while m <= s {
            bs.insert(g.clone(), m.clone());
            g = g * &gamma;
            g = g % &p;
            m += 1;
        }
        BS_CACHE.lock().unwrap().insert(key, bs);
    }
}

pub fn giant_step(p: &Integer, ell: &Integer, g: &Integer, h: &Integer) -> Option<Integer> {
    let s = Integer::from(ell.sqrt_ref()) + 1;
    let step = g.clone().pow_mod(&s, p).unwrap().invert(p).unwrap();
    let mut m = 0;
    let mut hh = h.clone();
    let bs = &BS_CACHE.lock().unwrap()[&(p.clone(), ell.clone())];

    while m <= s {
        if bs.contains_key(&hh) {
            return Some((bs[&hh].clone() + m*s) % ell);
        }
        hh = hh * &step;
        hh = hh % p;
        m += 1;
    }

    None
}

fn dlog_prime_power(target: &Integer, p: &Integer, pi: &Integer, ei: u32) -> Option<Integer> {
    let ni = Integer::from(pi).pow(ei);
    let inject = Integer::from(p - 1)/&ni;
    let gi = Integer::from(&*G).pow_mod(&inject, p).unwrap();
    let hi = Integer::from(target).pow_mod(&inject, p).unwrap();

    let mut xi = Integer::from(0);
    let mut hk_exp = Integer::from(p - 1)/pi;
    let gamma = Integer::from(&gi).pow_mod(&hk_exp, p).unwrap();

    baby_step(p.clone(), pi.clone(), gamma.clone());

    for k in 0..ei {
        let gk = Integer::from(&gi).pow_mod(&xi, p).unwrap().invert(p).unwrap();
        let hk = Integer::from(&gk * &hi).pow_mod(&hk_exp, p).unwrap();
        let dk = giant_step(p, pi, &gamma, &hk)?;
        // assert_eq!(dk, rho(p, pi, &gamma, &hk));
        xi += &dk * Integer::from(pi).pow(k);
        if k != ei - 1 { hk_exp = hk_exp / pi; }
    }

    Some(xi)
}

pub fn dlog(target: Integer) -> Option<Integer> {
    let modp = crt(
        &crt(
            &dlog_prime_power(&target, &*P, &Integer::from(2), 1)?,
            &dlog_prime_power(&target, &*P, &Integer::from(P0), 16)?,
            &Integer::from(2),
            &*PPOW
        )?,
        &dlog_prime_power(&target, &*P, &Integer::from(P1), 1)?,
        &Integer::from(2 * &*PPOW),
        &Integer::from(P1)
    )?;
    let modq = crt(
        &crt(
            &dlog_prime_power(&target, &*Q, &Integer::from(2), 1)?,
            &dlog_prime_power(&target, &*Q, &Integer::from(Q0), 16)?,
            &Integer::from(2),
            &*QPOW
        )?,
        &dlog_prime_power(&target, &*Q, &Integer::from(Q1), 1)?,
        &Integer::from(2 * &*QPOW),
        &Integer::from(Q1)
    )?;
    Some(crt(&modp, &modq, &(&*P - Integer::from(1)), &(&*Q - Integer::from(1)))?)
}

Originally, we were planning to use Pollard’s Rho algorithm instead of BSGS:

fn f(tup : (Integer, Integer, Integer), g : &Integer, target : &Integer, p : &Integer, ell : &Integer) -> (Integer, Integer, Integer) {
    let (x, a, b) = tup;
    match Integer::from(&x % 3).to_i64().unwrap() {
        0 => {
            ((&x * &x).complete() % p, (2 * a) % ell, (2 * b) % ell)
        },
        1 => {
            ((target * x) % p, a, (b + 1))
        },
        2 => {
            ((g * x) % p, (a + 1), b)
        },
        _ => {unreachable!();}
    }
}

pub fn rho(p : &Integer, ell : &Integer, g : &Integer, target : &Integer) -> Integer {
    if g == target { return Integer::from(1); }
    let mut a = (Integer::from(1), Integer::from(0), Integer::from(0));
    let mut b = (Integer::from(1), Integer::from(0), Integer::from(0));
    loop {
        a = f(a, g, target, p, ell);
        b = f(b, g, target, p, ell);
        b = f(b, g, target, p, ell);
        if a.0 == b.0 {
            break;
        }
    }
    ((a.1 - b.1) * (b.2 - a.2).invert(ell).unwrap() % ell + ell) % ell
}

but we found BSGS didnt use too much memory and the additional speed up by saving the baby steps for the repeated prime factors of $p$ and $q$ meant ultimately BSGS was a better pick for this challenge.

For those interested, a python implementation of essentially what is given above is included for comparison:

from math import ceil, sqrt, gcd, lcm
import random
import time
from gmpy2 import mpz

def bsgs(g, h, p, upper_bound=None):
    if upper_bound:
        m = ceil(sqrt(upper_bound))
    else:
        m = ceil(sqrt(p-1))

    if not hasattr(bsgs, 'baby_steps'):
        bsgs.baby_steps = dict()
        gi = mpz(1)
        for i in range(m):
            bsgs.baby_steps[gi] = i
            gi = (gi * g) % p

    c = pow(g, m * (p - 2), p)
    hi = h
    # giant steps
    for j in range(m):
        if hi in bsgs.baby_steps:
            return j * m + bsgs.baby_steps[hi]
        hi = (hi * c) % p
    # No solution
    return None

def crt(xs, ns_fac, n):
    x = 0
    ns = [p**e for p,e in ns_fac]
    common = gcd(*ns)
    ns = [n // common for n in ns]

    for xi, ni in zip(xs, ns):
        yi = n // ni
        zi = pow(yi, -1, ni)
        x += xi * yi * zi
    return x % n

def pohlig_hellman(g,h,p,n,n_factors):
    dlogs = []
    for pi, ei in n_factors:
        # Set up for each step
        ni = pi**ei
        gi = pow(g, n // ni, p)
        hi = pow(h, n // ni, p)

        # Groups of prime-power order
        xi = 0
        hk_exp = ni // pi
        gamma = pow(gi, hk_exp, p)

        for k in range(ei):
            # Create hk in <γ>
            gk = pow(gi, -xi, p)
            hk = pow(gk*hi, hk_exp, p)
            # make call to rust
            dk = bsgs(gamma, hk, p, upper_bound=pi)
            # increment the secret
            xi += dk*(pi**k)
            # Reduce the exponent
            hk_exp = hk_exp // pi
        
        del bsgs.baby_steps
        dlogs.append(xi)
    return crt(dlogs, n_factors, n)

def dlog_backdoor(g,h,N,p,q):
    np_factors = [(2, 1), (785685301, 16), (633462701, 1)]
    np = p-1

    nq_factors = [(2, 1), (794309437, 16), (942797321, 1)]
    nq = q-1

    xp = pohlig_hellman(g,h,p,np,np_factors)
    assert pow(g,xp,p) == pow(h,1,p)

    xq = pohlig_hellman(g,h,q,nq,nq_factors)
    assert pow(g,xq,q) == pow(h,1,q)

    x = crt([xp, xq], [(np, 1), (nq, 1)], np*nq)
    return x % order

p = mpz(26713395582018967511973684657814004241261156269415358729692119332394978760010789226380713422950849602617267772456438810738143011486768190080495256375003)
q = mpz(47346065295850807479811692397225726348630781686943994678601678975909956314423885777086052944991365707991632035242429229693774362516043822438274496319123)
np = p-1
nq = q-1

N = p*q
order = lcm(np,nq)

g = mpz(2)
x = mpz(random.randint(2,order))
h = pow(g,x,N)

print(f'x = {x}')

t = time.time()
x_guess = dlog_backdoor(g,h,N,p,q)

print(f'x_guess = {x_guess}')
print(f'Time taken: {time.time() - t}')
print(f'Solution found: {x == x_guess}')
print(f'Solution found: {pow(g,x_guess,N) == h}')

One alternative we briefly explored was using even larger prime factors and performing a precomputation step with GNFS (as implemented by cado-nfs) to allow fast on-the-spot computation of the discrete logarithms. While testing, this had a few drawbacks. First off, the precomputation time doesn’t depend on the subgroup size, but on the size of the modulus, which would allow us to alleviate the smooth $p - 1$ weakness, but at the cost of a very heavy precomputation if we wanted to keep $n$ hard enough to factor. Secondly, even for smaller primes $p$, we found that the overhead of calling into cado to perform the online discrete logarithm had a large overhead.

Overall, we are happy with our chosen backdoor and feel that it was the best solution in the inherent trade-off between the speed of the oracle — since we don’t want a solution to take hours to perform either — the precomputation cost before the CTF and the room for unintended factorization of $n$. We are unaware of any team having succeeded in factoring $n$ without recovering the group order, so it does appear that our approach was successful enough to withstand what some smart CTF players tried to do to it.