Accelerate base64 encode using AVX2

Sort of an excersise..

The four LUTs for converting numeric values to ASCII chars is ugly. Besides, since both _mm256_shuffle_epi8 and _mm256_blendv_epi8 suffers from low throughput (1 CPI / 2 CPI on most Intel platform, respectively), it can be a bottleneck.

Note that:

  • It assumes [input, input + 30) is valid for reading,
  • You need to handle smaller blocks (smaller than 24 bytes) using other (e.g., the “classical” one) algorithm.

Leave a Reply

Your email address will not be published. Required fields are marked *