Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
245ed1d
Add support for kTexture storage rank.
csullivan Oct 27, 2020
9c09213
Add scaffolding for texture_flatten pass.
csullivan Nov 2, 2020
d76878c
Add scaffolding for texture allocation.
csullivan Nov 2, 2020
e7c276b
Implement 2d texture flattening to builtin tir.text2d_alloca.
csullivan Nov 3, 2020
bf321c9
Lower BufferStore/Load to builtin texture store/load.
csullivan Nov 4, 2020
8afb611
Add vectorizable attribure to texture load and store.
csullivan Nov 4, 2020
17ca755
Support auto-vectorization on the innermost (RGBA) axis.
csullivan Nov 5, 2020
560baa9
Add read/write_imagef opencl codegen for builtin texture load/store.
csullivan Nov 9, 2020
14806f5
Add TextureType support.
csullivan Nov 12, 2020
0a321d1
Add InferTextureAccess pass to deduce __read_only
csullivan Nov 13, 2020
b96daaf
Implement texture allocation as external function in TIR lowering.
csullivan Mar 19, 2021
83e9af3
Remove commented lines.
csullivan Nov 18, 2020
fd0d23a
Add nd->2d texture flattening.
csullivan Nov 20, 2020
5645112
Bug fixes in opencl codegen (row<>col, access quals.)
csullivan Dec 3, 2020
34549c7
Improve texture codegen by explicitly allocating local vector
csullivan Dec 9, 2020
623f2eb
Remove automatic vectorization
csullivan Dec 8, 2020
5f9ebd1
Improve SSA local use when storing texture read to scalar buffer.
csullivan Dec 15, 2020
d8fbcfd
Define texture flattening convention such
csullivan Dec 10, 2020
b81620f
Add tir lowering and opencl codegen support for float16 textures.
csullivan Mar 19, 2021
557e07d
Disable SSA when texture load is immediately casted.
csullivan Feb 4, 2021
19469c6
Allow RGBA extent to be of length 1.
csullivan Mar 1, 2021
8b3b3de
Add pass to forward externally allocated textures
csullivan Mar 19, 2021
d7d3195
Add buffer_common.h to house buffer offset simplification routines.
csullivan Aug 17, 2021
6be6e60
More refactor and clean up in texture lowering.
csullivan Mar 6, 2021
48cf179
Add IsTextureType to tir and allow buffer
csullivan Mar 7, 2021
91f00ee
Bug fix in texture access qualifier inference pass
csullivan Mar 9, 2021
0ac0875
Step toward handling external texture buffer forwarding
csullivan Mar 9, 2021
50d4ee5
[Part 2/3] Support texture:weight lowering convention for externally …
csullivan Mar 17, 2021
fba2d3f
Bug fix in texture access qualifier inference pass
csullivan Mar 9, 2021
2fc4238
Tighten constraint on external buffer forwarding --
csullivan Mar 12, 2021
384dad0
Use texture lowering convention from texture runtime util.
csullivan Mar 19, 2021
3c1b122
Use updated texture lowering utilities
csullivan Mar 19, 2021
137da36
Use inherited visitor overloads in texture flattener.
csullivan Mar 15, 2021
5b6787e
Add check in codegen for float/half until
csullivan Mar 17, 2021
c839505
Rename tir texture builtins
csullivan Mar 17, 2021
8041cc9
Remove codegen and tir runtime dependence on for TVMBackendAlloc/Free…
csullivan Mar 23, 2021
684e513
Dispatch texture allocas via target specialized tir.tvm_call_packed
csullivan Apr 29, 2021
069ec77
Remove kTexture scope and use kGlobal with texture tag.
csullivan May 4, 2021
86bb2e6
Remove TextureType.
csullivan May 11, 2021
3e17295
Remove TextureType from OpenCL codegen.
csullivan May 19, 2021
b3cdc52
Remove TextureType from TIR lowering.
csullivan May 19, 2021
17b8808
Remove dependency on MergeMulMod.
csullivan Jun 19, 2021
c758eb9
Revert "Add buffer_common.h to house buffer offset simplification rou…
csullivan Jun 19, 2021
a794abd
Prune include list
csullivan Jun 19, 2021
11fe640
Add more documentation to texture flattening.
csullivan Jun 19, 2021
89d832f
Add TextureFlatten transform to refactored tvm lower API.
csullivan Jun 19, 2021
349eb51
Apply clang formatting.
csullivan Jun 19, 2021
da3e146
Blacken python APIs.
csullivan Jun 19, 2021
b43e0e7
Apply cpplint changes.
csullivan Jun 21, 2021
6138726
Attempt to extract storage scope from pointer scope.
csullivan Jul 19, 2021
bd084c0
Remove ExternalBufferForwarding (cache_read cancellation) for now.
csullivan Jul 21, 2021
d2f8bda
Apply MyPy.
csullivan Jul 21, 2021
b0ea283
Clang format
csullivan Jul 21, 2021
1102731
Only visit RealizeBuffer body for texture storage.
csullivan Jul 22, 2021
46ae5ab
Fix bad merge.
csullivan Jul 23, 2021
9fa362d
Utilize OpenCL preprocessor to switch between
csullivan Jul 26, 2021
213492c
Add texture codegen test example.
csullivan Aug 5, 2021
7d76707
Refactor tests to use pytest parameterization.
csullivan Aug 17, 2021
2903e53
Respond to CRs.
csullivan Aug 18, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions include/tvm/tir/builtin.h
Original file line number Diff line number Diff line change
Expand Up @@ -600,6 +600,20 @@ TVM_DLL const Op& vectorcombine();
* \brief atomic add instruction, corresponding e.g. to atomicAdd in CUDA
*/
TVM_DLL const Op& atomic_add();
/*!
* \brief Create a texture 2d memory allocation
*/
TVM_DLL const Op& texture2d_alloca();

/*!
* \brief Store to texture 2d memory
*/
TVM_DLL const Op& texture2d_store();

/*!
* \brief Load from texture 2d memory
*/
TVM_DLL const Op& texture2d_load();

/*! \brief The kind of structure field info used in intrinsic */
enum TVMStructFieldKind : int {
Expand Down
9 changes: 9 additions & 0 deletions include/tvm/tir/transform.h
Original file line number Diff line number Diff line change
Expand Up @@ -437,6 +437,15 @@ TVM_DLL Pass LowerMatchBuffer();
*/
TVM_DLL Pass FlattenBuffer();

/*
* \brief Flatten the multi-dimensional read/write
* to two dimensional texture Load/Store and realize
* texture buffer allocations.
*
* \return The Pass
*/
TVM_DLL Pass TextureFlatten();

/*!
* A pass to merge multiple TIR-level dynamic shared memory allocations into one
*/
Expand Down
15 changes: 15 additions & 0 deletions python/tvm/tir/transform/transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,21 @@ def StorageFlatten(cache_line_size, create_bound_attribute: bool = False):
return _ffi_api.StorageFlatten(cache_line_size, create_bound_attribute) # type: ignore


def TextureFlatten():
"""Flatten the multi-dimensional read/write to 2D.


Parameters
----------

Returns
-------
fpass : tvm.transform.Pass
The result pass
"""
return _ffi_api.TextureFlatten() # type: ignore


def InjectCopyIntrin(pragma_key: str, fintrin):
"""Inject virtual thread loops.

Expand Down
1 change: 1 addition & 0 deletions src/driver/driver_api.cc
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,7 @@ Array<tvm::transform::Pass> CreatePassList(bool disable_loop_partition) {

// PHASE 1
pass_list.push_back(tir::transform::InjectPrefetch());
pass_list.push_back(tir::transform::TextureFlatten());
pass_list.push_back(tir::transform::StorageFlatten(64, instrument_bound_checkers));
pass_list.push_back(tir::transform::LowerInitBlock());
pass_list.push_back(tir::transform::PlanAndUpdateBufferAllocationLocation());
Expand Down
10 changes: 8 additions & 2 deletions src/target/source/codegen_c.cc
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,8 @@ void CodeGenC::AddFunction(const PrimFunc& f) {
}
}

if (no_alias && restrict_keyword_.length() != 0) {
stream << ' ' << restrict_keyword_;
if (no_alias) {
PrintRestrict(v, stream);
}
} else {
PrintType(GetType(v), stream);
Expand Down Expand Up @@ -1018,6 +1018,12 @@ void CodeGenC::PrintVecElemLoadExpr(DataType t, int i, const std::string& value,
return;
}

void CodeGenC::PrintRestrict(const Var& v, std::ostream& os) {
if (restrict_keyword_.length() != 0) {
os << ' ' << restrict_keyword_;
}
}

static bool CheckOutermostBracketMatch(const std::string& s) {
if (!s.empty() && s.front() == '(' && s.back() == ')') {
size_t len = s.size();
Expand Down
2 changes: 2 additions & 0 deletions src/target/source/codegen_c.h
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,8 @@ class CodeGenC : public ExprFunctor<void(const PrimExpr&, std::ostream&)>,
virtual std::string CastFromTo(std::string value, DataType from, DataType target);
// Get load of single element with expression
virtual void PrintVecElemLoadExpr(DataType t, int i, const std::string& value, std::ostream& os);
// Print restrict keyword for a given Var if applicable
virtual void PrintRestrict(const Var& v, std::ostream& os);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does restrict mean here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the no-aliasing qualifier for opencl kernels. A compiler hint that pointer aliasing does not occur among the restricted pointer arguments.


protected:
// Print reference to struct location
Expand Down
211 changes: 209 additions & 2 deletions src/target/source/codegen_opencl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -27,18 +27,63 @@
#include <vector>

#include "../../runtime/opencl/opencl_module.h"
#include "../../runtime/texture.h"
#include "../../runtime/thread_storage_scope.h"
#include "../build_common.h"

namespace tvm {
namespace codegen {

CodeGenOpenCL::CodeGenOpenCL() { restrict_keyword_ = "restrict"; }
class InferTextureAccess : public StmtExprVisitor {
public:
static constexpr const uint8_t kReadAccess = 1;
static constexpr const uint8_t kWriteAccess = 2;

InferTextureAccess() {}
std::unordered_map<const VarNode*, std::string> Infer(const Stmt& n) {
StmtExprVisitor::VisitStmt(n);
std::unordered_map<const VarNode*, std::string> storage_scope_qualifiers;
for (auto& texture : var_access_map_) {
if (texture.second == kReadAccess) {
storage_scope_qualifiers.insert({texture.first, "texture_read"});
} else if (texture.second == kWriteAccess) {
storage_scope_qualifiers.insert({texture.first, "texture_write"});
} else if (texture.second == (kReadAccess | kWriteAccess)) {
storage_scope_qualifiers.insert({texture.first, ""});
}
}
return storage_scope_qualifiers;
}
void VisitExpr_(const CallNode* op) {
if (op->op.same_as(builtin::texture2d_load())) {
var_access_map_[op->args[0].as<VarNode>()] |= kReadAccess;
} else if (op->op.same_as(builtin::texture2d_store())) {
var_access_map_[op->args[0].as<VarNode>()] |= kWriteAccess;
} else {
StmtExprVisitor::VisitExpr_(op);
}
StmtExprVisitor::VisitExpr_(op);
}

private:
std::unordered_map<const VarNode*, uint8_t> var_access_map_;
};

CodeGenOpenCL::CodeGenOpenCL() {
// Set OpenCL specific restrict keyword
restrict_keyword_ = "restrict";
}

void CodeGenOpenCL::InitFuncState(const PrimFunc& f) {
CodeGenC::InitFuncState(f);
this->SetTextureScope(InferTextureAccess().Infer(f->body));
for (Var arg : f->params) {
if (arg.dtype().is_handle()) {
auto ptr_type = arg->type_annotation.as<PointerTypeNode>();
if (ptr_type && runtime::IsTextureStorage(std::string(ptr_type->storage_scope))) {
// Storage scope qualifiers for textures are inferred
// and set prior to function codegen.
continue;
} else if (arg.dtype().is_handle()) {
alloc_storage_scope_[arg.get()] = "global";
}
}
Expand Down Expand Up @@ -75,6 +120,40 @@ std::string CodeGenOpenCL::Finish() {
decl_stream << "#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable\n"
"#pragma OPENCL EXTENSION cl_khr_global_int32_extended_atomics : enable\n\n";
}

// Enable OpenCL 1.2 sampler-less texture reads, but utilize
// provided sampler in OpenCL 2.0.
if (enable_compliant_texture_reads_) {
// TODO(csullivan, lunderberg): Extend device attribute querying to support remote devices
// generically through the device API such that a target can be created from a specific device's
// attributes and utilized during codegen. Potential generlization of #8127 (c02cafb) for remote
// devices.
//
// E.g. Only provide an image sampler when the local or remote device supports OpenCL 2.0,
// see below for context.
//
// For backwards compatibility with OpenCL 1.2, sampler-less read_image calls are used.
// By default in sampler-less read_image calls OpenCL defaults to
// sampler_ = "CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST";
// See section 6.12.14.3 Built-in Image Sampler-less Read Functions in the OpenCL 1.2
// specification. For OpenCL 2.0 it can be preferable to use,
// sampler_ = "CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST";
// For now we rely on OpenCL preprocessor directives to utilize the correct behavior
// depending on the OpenCL version detected at OpenCL compile time.
decl_stream << "#ifdef __OPENCL_VERSION__\n"
<< "#if __OPENCL_VERSION__ == CL_VERSION_2_0\n"
<< "#define READ_IMAGEH(image, sampler, coord) "
<< "read_imageh(image, sampler, coord)\n"
<< "#define READ_IMAGEF(image, sampler, coord) "
<< "read_imagef(image, sampler, coord)\n"
<< "#else\n"
<< "#define READ_IMAGEH(image, sampler, coord) "
<< "read_imageh(image, coord)\n"
<< "#define READ_IMAGEF(image, sampler, coord) "
<< "read_imagef(image, coord)\n"
<< "#endif\n"
<< "#endif\n\n";
}
return CodeGenC::Finish();
}

Expand Down Expand Up @@ -162,6 +241,23 @@ void CodeGenOpenCL::PrintType(DataType t, std::ostream& os) { // NOLINT(*)
LOG(FATAL) << "Cannot convert type " << t << " to OpenCL type";
}

void CodeGenOpenCL::PrintType(const Type& type, std::ostream& os) { // NOLINT(*)
if (auto* ptr = type.as<PrimTypeNode>()) {
return PrintType(ptr->dtype, os);
} else if (auto* ptr = type.as<PointerTypeNode>()) {
if (runtime::IsTextureStorage(std::string(ptr->storage_scope))) {
os << "image2d_t";
} else {
PrintType(ptr->element_type, os);
os << '*';
}
} else if (IsVoidType(type)) {
os << "void";
} else {
LOG(FATAL) << "Type " << type << " does not have a corresponding C Type";
}
}

void CodeGenOpenCL::PrintVecAddr(const VarNode* buffer, DataType t, PrimExpr base,
std::ostream& os) { // NOLINT(*)
if (!HandleTypeMatch(buffer, t.element_of())) {
Expand Down Expand Up @@ -210,6 +306,19 @@ void CodeGenOpenCL::PrintStorageScope(const std::string& scope, std::ostream& os
os << "__global ";
} else if (scope == "shared") {
os << "__local ";
} else if (scope == "texture_read") {
os << "__read_only ";
} else if (scope == "texture_write") {
os << "__write_only ";
}
}

void CodeGenOpenCL::PrintRestrict(const Var& v, std::ostream& os) {
// Apply restrict qualifer for non-texture types only
if (auto* ptr = v->type_annotation.as<PointerTypeNode>()) {
if (!runtime::IsTextureStorage(std::string(ptr->storage_scope))) {
os << ' ' << restrict_keyword_;
}
}
}

Expand All @@ -229,6 +338,39 @@ std::string CodeGenOpenCL::CastFromTo(std::string value, DataType from, DataType
return os.str();
}

void CodeGenOpenCL::VisitStmt_(const StoreNode* op) {
if (auto call = op->value.as<CallNode>()) {
if (call->op.same_as(builtin::texture2d_load())) {
need_texture_ssa_ = false;
// If storing a texture load into a buffer, don't use an
// intermediate local unless the buffer allocation is a
// single element selected from the texture read.
auto it = allocation_size_.find(op->buffer_var.get());
if (it != allocation_size_.end() && it->second == 1) {
need_texture_ssa_ = true;
}
}
}
CodeGenC::VisitStmt_(op);
need_texture_ssa_ = true;
}

void CodeGenOpenCL::VisitExpr_(const CastNode* op, std::ostream& os) {
if (auto call = op->value.as<CallNode>()) {
if (call->op.same_as(builtin::texture2d_load())) {
need_texture_ssa_ = false;
}
}
CodeGenC::VisitExpr_(op, os);
need_texture_ssa_ = true;
}

void CodeGenOpenCL::VisitStmt_(const AllocateNode* op) {
allocation_size_.insert(
{op->buffer_var.get(), op->constant_allocation_size() * op->dtype.lanes()});
CodeGenC::VisitStmt_(op);
}

void CodeGenOpenCL::VisitExpr_(const CallNode* op, std::ostream& os) {
if (op->op.same_as(builtin::address_of())) {
// Overload tvm_address_of to add storage scope (e.g. __global).
Expand All @@ -243,6 +385,64 @@ void CodeGenOpenCL::VisitExpr_(const CallNode* op, std::ostream& os) {
os << " *)" << this->GetVarID(load->buffer_var.get()) << " + ";
this->PrintExpr(load->index, os);
os << ')';
} else if (op->op.same_as(builtin::texture2d_store())) {
auto* ptr_type = op->args[0].as<VarNode>()->type_annotation.as<PointerTypeNode>();
ICHECK(ptr_type != nullptr) << "Texture Var's must be of PointerType";
ICHECK(runtime::IsTextureStorage(std::string(ptr_type->storage_scope)))
<< "builtin::texture2d_store() only supports storing to texture buffers";
DataType buffer_type = ptr_type->element_type.as<PrimTypeNode>()->dtype;
if (buffer_type.is_float16()) {
os << "write_imageh(";
} else if (buffer_type.is_float()) {
os << "write_imagef(";
} else {
LOG(FATAL) << "Unsupported type: " << buffer_type
<< ", currently only float and half are supported for image2d OpenCL codegen.";
}
this->PrintExpr(op->args[0], os);
os << ", ";
os << "(int2)(";
this->PrintExpr(op->args[1], os);
os << ", ";
this->PrintExpr(op->args[2], os);
os << "), ";
this->PrintExpr(op->args[3], os);
os << ")";
} else if (op->op.same_as(builtin::texture2d_load())) {
enable_compliant_texture_reads_ = true;
std::stringstream ss;
if (op->dtype.is_float16()) {
ss << "READ_IMAGEH(";
} else if (op->dtype.is_float()) {
ss << "READ_IMAGEF(";
} else {
LOG(FATAL) << "Unsupported type: " << op->dtype
<< ", currently only float and half are supported for image2d OpenCL codegen.";
}
this->PrintExpr(op->args[0], ss);
ss << ", ";
ss << "CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST, ";
ss << "((int2)(";
this->PrintExpr(op->args[1], ss);
ss << ", ";
this->PrintExpr(op->args[2], ss);
ss << ")))";

// Only use local SSA if texture is not already being stored
if (need_texture_ssa_) {
std::string rhs = SSAGetID(ss.str(), op->dtype.with_lanes(4));
if (op->args.back().as<RampNode>()) {
os << rhs;
} else {
os << "((";
this->PrintType(op->dtype.with_lanes(1), os);
os << "*)&" << rhs << ")[";
this->PrintExpr(op->args.back(), os);
os << "]";
}
} else {
os << ss.str();
}
} else if (op->op.same_as(builtin_call_extern_)) {
auto func = Downcast<StringImm>(op->args[0]);
// Enable atomics extension if used.
Expand Down Expand Up @@ -280,6 +480,13 @@ void CodeGenOpenCL::VisitExpr_(const FloatImmNode* op, std::ostream& os) { // N
}
}

void CodeGenOpenCL::SetTextureScope(
const std::unordered_map<const VarNode*, std::string>& scope) { // NOLINT(*)
for (auto& texture : scope) {
alloc_storage_scope_.insert(texture);
}
}

runtime::Module BuildOpenCL(IRModule mod, Target target) {
using tvm::runtime::Registry;
bool output_ssa = false;
Expand Down
Loading