Skip to content

Conversation

@masahi
Copy link
Member

@masahi masahi commented Oct 24, 2023

JSONRuntimeBase class the cuBLAS BYOC is based on has data_entry_ member variable which is populated with input and output DLTensor from TVMArgs at
https://github.com/apache/tvm/blob/unity/src/runtime/contrib/json/json_runtime.h#L157-L179

When cuBLAS BYOC is used by Disco with ThreadedSession, each gpu will populate and end up overwriting the same data_entry_ variable with DLTensors on different devices, since data_entry_ is not thread local. This results in a runtime crash since, for example, the device 0 may try to call cuBLAS on DLTensors in data_entry_ which might live on other devices.

This PR fixes this issue and now cuBLAS BYOC can be used from Disco with ThreadedSession.

@vinx13 @junrushao

@Lunderberg
Copy link
Contributor

@tvm-bot re-run

@github-actions
Copy link
Contributor

Failed to re-run CI in https://github.com/apache/tvm/actions/runs/6641806894

Details
Traceback (most recent call last):
  File "ci/scripts/github/github_tvmbot.py", line 594, in comment_failure
    raise item
  File "ci/scripts/github/github_tvmbot.py", line 700, in run
    pr.rerun_jenkins_ci()
  File "ci/scripts/github/github_tvmbot.py", line 553, in rerun_jenkins_ci
    post(url, auth=("tvm-bot", TVM_BOT_JENKINS_TOKEN))
  File "/home/runner/work/tvm/tvm/ci/scripts/jenkins/git_utils.py", line 53, in post
    with request.urlopen(req, data) as response:
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: Server Error

with response


  
  <!DOCTYPE html><html><head resURL="/static/bb039fcf" data-rooturl="" data-resurl="/static/bb039fcf" data-extensions-available="true" data-unit-test="false" data-imagesurl="/static/bb039fcf/images" data-crumb-header="Jenkins-Crumb" data-crumb-value="bdefc61d5b55b7d7b361186ebb037e029250f0071e3ce4598365d1ecde2713222746d5dee2f140fc342ff970525d002f793682c2cbebd13e7a2767863e821687">
    
    

    <title>Jenkins [Jenkins]</title><link rel="stylesheet" href="/static/bb039fcf/jsbundles/styles.css" type="text/css"><link rel="stylesheet" href="/static/bb039fcf/css/responsive-grid.css" type="text/css"><link rel="shortcut icon" href="/static/bb039fcf/favicon.ico" type="image/vnd.microsoft.icon"><script src="/static/bb039fcf/scripts/prototype.js" type="text/javascript"></script><script src="/static/bb039fcf/scripts/behavior.js" type="text/javascript"></script><script src='/adjuncts/bb039fcf/org/kohsuke/stapler/bind.js' type='text/javascript'></script><script src="/static/bb039fcf/scripts/yui/yahoo/yahoo-min.js"></script><script src="/static/bb039fcf/scripts/yui/dom/dom-min.js"></script><script src="/static/bb039fcf/scripts/yui/event/event-min.js"></script><script src="/static/bb039fcf/scripts/yui/animation/animation-min.js"></script><script src="/static/bb039fcf/scripts/yui/dragdrop/dragdrop-min.js"></script><script src="/static/bb039fcf/scripts/yui/container/container-min.js"></script><script src="/static/bb039fcf/scripts/yui/connection/connection-min.js"></script><script src="/static/bb039fcf/scripts/yui/datasource/datasource-min.js"></script><script src="/static/bb039fcf/scripts/yui/autocomplete/autocomplete-min.js"></script><script src="/static/bb039fcf/scripts/yui/menu/menu-min.js"></script><script src="/static/bb039fcf/scripts/yui/element/element-min.js"></script><script src="/static/bb039fcf/scripts/yui/button/button-min.js"></script><script src="/static/bb039fcf/scripts/yui/storage/storage-min.js"></script><script src="/static/bb039fcf/scripts/hudson-behavior.js" type="text/javascript"></script><script src="/static/bb039fcf/scripts/sortable.js" type="text/javascript"></script><link rel="stylesheet" href="/static/bb039fcf/scripts/yui/container/assets/container.css" type="text/css"><link rel="stylesheet" href="/static/bb039fcf/scripts/yui/container/assets/skins/sam/container.css" type="text/css"><link rel="stylesheet" href="/static/bb039fcf/scripts/yui/menu/assets/skins/sam/menu.css" type="text/css"><link rel="search" href="/opensearch.xml" type="application/opensearchdescription+xml" title="Jenkins"><meta name="ROBOTS" content="INDEX,NOFOLLOW"><meta name="viewport" content="width=device-width, initial-scale=1"><script src="/static/bb039fcf/jsbundles/vendors.js" type="text/javascript"></script><script src="/static/bb039fcf/jsbundles/page-init.js" type="text/javascript"></script><script src="/static/bb039fcf/jsbundles/sortable-drag-drop.js" type="text/javascript"></script></head><body data-model-type="hudson.model.Hudson" id="jenkins" class="yui-skin-sam one-column jenkins-2.361.2" data-version="2.361.2"><a href="#skip2content" class="skiplink">Skip to content</a><header id="page-header" class="page-header"><div class="page-header__brand"><div class="logo"><a id="jenkins-home-link" href="/"><img src="/static/bb039fcf/images/svgs/logo.svg" alt="[Jenkins]" id="jenkins-head-icon"><img src="/static/bb039fcf/images/title.svg" alt="Jenkins" width="139" id="jenkins-name-icon" height="34"></a></div><a href="/" class="page-header__brand-link"><img src="/static/bb039fcf/images/svgs/logo.svg" alt="[Jenkins]" class="page-header__brand-image"><span class="page-header__brand-name">Jenkins</span></a></div><div class="searchbox hidden-xs"><form role="search" method="get" name="search" action="/search/" style="position:relative;" class="no-json"><div id="search-box-sizer"></div><div id="searchform"><input role="searchbox" name="q" placeholder="Search" id="search-box" class="main-search__input"><span class="main-search__icon-leading"><svg class="" class="" aria-hidden="true" xmlns="http://www.w3.org/2000/svg" class="" viewBox="0 0 512 512"><title></title><path d="M221.09 64a157.09 157.09 0 10157.09 157.09A157.1 157.1 0 00221.09 64z" fill="none" stroke="currentColor" stroke-miterlimit="10" stroke-width="32"/><path fill="none" stroke="currentColor" stroke-linecap="round" stroke-miterlimit="10" stroke-width="32" d="M338.29 338.29L448 448"/></svg></span><a href="https://www.jenkins.io/redirect/search-box" class="main-search__icon-trailing"><svg class="" class="" aria-hidden="true" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M256 40a216 216 0 10216 216A216 216 0 00256 40z" fill="none" stroke="currentColor" stroke-miterlimit="10" stroke-width="38"/><path d="M200 202.29s.84-17.5 19.57-32.57C230.68 160.77 244 158.18 256 158c10.93-.14 20.69 1.67 26.53 4.45 10 4.76 29.47 16.38 29.47 41.09 0 26-17 37.81-36.37 50.8S251 281.43 251 296" fill="none" stroke="currentColor" stroke-linecap="round" stroke-miterlimit="10" stroke-width="38"/><circle cx="250" cy="360" r="25" fill="currentColor"/></svg></a><div id="search-box-completion" data-search-url="/search/"></div><script src='/adjuncts/bb039fcf/jenkins/views/JenkinsHeader/search-box.js' type='text/javascript'></script></div></form></div><div class="login page-header__hyperlinks"><div id="visible-am-insertion" class="page-header__am-wrapper"></div><div id="visible-sec-am-insertion" class="page-header__am-wrapper"></div><a href="/securityRealm/commenceLogin?from=%2Fjob%2Ftvm-arm%2Fjob%2FPR-15977%2FbuildWithParameters"><b>log in</b></a></div></header><script src="/static/bb039fcf/jsbundles/keyboard-shortcuts.js" type="text/javascript"></script><div id="breadcrumbBar"><script src='/adjuncts/bb039fcf/lib/layout/breadcrumbs.js' type='text/javascript'></script><div class="top-sticker noedge"><div class="top-sticker-inner"><div class="jenkins-breadcrumbs"><ul id="breadcrumbs"><li class="item"><a href="/" class="model-link">Dashboard</a></li><li href="/" class="children"></li></ul><div id="breadcrumb-menu-target"></div></div></div></div></div><div id="page-body" class="clear"><div id="main-panel"><a name="skip2content"></a><h1 style="text-align: center"><img src="/static/bb039fcf/images/rage.svg" width="154" height="179"><span style="font-size:50px"> Oops!</span></h1><div id="error-description"><h2 style="text-align: center">A problem occurred while processing the request.</h2><p style="text-align: center">Logging ID=12d4dfbc-7b4b-4b5b-a66f-c86490e2ec3b</div></div></div><footer class="page-footer"><div class="container-fluid"><div class="page-footer__flex-row"><div class="page-footer__footer-id-placeholder" id="footer"></div><div class="page-footer__links rest_api hidden-xs"><a href="api/">REST API</a></div><div class="page-footer__links page-footer__links--white jenkins_ver"><a rel="noopener noreferrer" href="https://www.jenkins.io/" target="_blank">Jenkins 2.361.2</a></div></div></div></footer></body></html>

@Lunderberg
Copy link
Contributor

Current CI failures were present in unity head, but should be resolved after PR#15941. (See this comment for details.)

@masahi masahi force-pushed the cublas-byoc-thread-safe branch from e3f114c to ccea726 Compare October 25, 2023 19:26
@vinx13 vinx13 merged commit 04c6863 into apache:unity Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants