Print this page
don't block in nvme_bd_cmd
8629 nvme: rework command abortion
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Jason King <jason.king@joyent.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>

Split Close
Expand all
Collapse all
          --- old/usr/src/uts/common/io/nvme/nvme.c
          +++ new/usr/src/uts/common/io/nvme/nvme.c
↓ open down ↓ 48 lines elided ↑ open up ↑
  49   49   * available interrupt vector, with the queue length usually much smaller than
  50   50   * the maximum of 65536. If the hardware doesn't provide enough queues, fewer
  51   51   * interrupt vectors will be used.
  52   52   *
  53   53   * Additionally the hardware provides a single special admin queue pair that can
  54   54   * hold up to 4096 admin commands.
  55   55   *
  56   56   * From the hardware perspective both queues of a queue pair are independent,
  57   57   * but they share some driver state: the command array (holding pointers to
  58   58   * commands currently being processed by the hardware) and the active command
  59      - * counter. Access to the submission side of a queue pair and the shared state
  60      - * is protected by nq_mutex. The completion side of a queue pair does not need
  61      - * that protection apart from its access to the shared state; it is called only
  62      - * in the interrupt handler which does not run concurrently for the same
  63      - * interrupt vector.
       59 + * counter. Access to a queue pair and the shared state is protected by
       60 + * nq_mutex.
  64   61   *
  65   62   * When a command is submitted to a queue pair the active command counter is
  66   63   * incremented and a pointer to the command is stored in the command array. The
  67   64   * array index is used as command identifier (CID) in the submission queue
  68   65   * entry. Some commands may take a very long time to complete, and if the queue
  69   66   * wraps around in that time a submission may find the next array slot to still
  70   67   * be used by a long-running command. In this case the array is sequentially
  71   68   * searched for the next free slot. The length of the command array is the same
  72   69   * as the configured queue length. Queue overrun is prevented by the semaphore,
  73   70   * so a command submission may block if the queue is full.
↓ open down ↓ 68 lines elided ↑ open up ↑
 142  139   * status fields and log information or fault the device, depending on the
 143  140   * severity of the asynchronous event. The asynchronous event request is then
 144  141   * reused and posted to the admin queue again.
 145  142   *
 146  143   * On command completion the command status is checked for errors. In case of
 147  144   * errors indicating a driver bug the driver panics. Almost all other error
 148  145   * status values just cause EIO to be returned.
 149  146   *
 150  147   * Command timeouts are currently detected for all admin commands except
 151  148   * asynchronous event requests. If a command times out and the hardware appears
 152      - * to be healthy the driver attempts to abort the command. If this fails the
      149 + * to be healthy the driver attempts to abort the command. The original command
      150 + * timeout is also applied to the abort command. If the abort times out too the
 153  151   * driver assumes the device to be dead, fences it off, and calls FMA to retire
 154      - * it. In general admin commands are issued at attach time only. No timeout
 155      - * handling of normal I/O commands is presently done.
      152 + * it. In all other cases the aborted command should return immediately with a
      153 + * status indicating it was aborted, and the driver will wait indefinitely for
      154 + * that to happen. No timeout handling of normal I/O commands is presently done.
 156  155   *
 157      - * In some cases it may be possible that the ABORT command times out, too. In
 158      - * that case the device is also declared dead and fenced off.
      156 + * Any command that times out due to the controller dropping dead will be put on
      157 + * nvme_lost_cmds list if it references DMA memory. This will prevent the DMA
      158 + * memory being reused by the system and later be written to by a "dead" NVMe
      159 + * controller.
 159  160   *
 160  161   *
      162 + * Locking:
      163 + *
      164 + * Each queue pair has its own nq_mutex, which must be held when accessing the
      165 + * associated queue registers or the shared state of the queue pair. Callers of
      166 + * nvme_unqueue_cmd() must make sure that nq_mutex is held, while
      167 + * nvme_submit_{admin,io}_cmd() and nvme_retrieve_cmd() take care of this
      168 + * themselves.
      169 + *
      170 + * Each command also has its own nc_mutex, which is associated with the
      171 + * condition variable nc_cv. It is only used on admin commands which are run
      172 + * synchronously. In that case it must be held across calls to
      173 + * nvme_submit_{admin,io}_cmd() and nvme_wait_cmd(), which is taken care of by
      174 + * nvme_admin_cmd(). It must also be held whenever the completion state of the
      175 + * command is changed or while a admin command timeout is handled.
      176 + *
      177 + * If both nc_mutex and nq_mutex must be held, nc_mutex must be acquired first.
      178 + * More than one nc_mutex may only be held when aborting commands. In this case,
      179 + * the nc_mutex of the command to be aborted must be held across the call to
      180 + * nvme_abort_cmd() to prevent the command from completing while the abort is in
      181 + * progress.
      182 + *
      183 + * Each minor node has its own nm_mutex, which protects the open count nm_ocnt
      184 + * and exclusive-open flag nm_oexcl.
      185 + *
      186 + *
 161  187   * Quiesce / Fast Reboot:
 162  188   *
 163  189   * The driver currently does not support fast reboot. A quiesce(9E) entry point
 164  190   * is still provided which is used to send a shutdown notification to the
 165  191   * device.
 166  192   *
 167  193   *
 168  194   * Driver Configuration:
 169  195   *
 170  196   * The following driver properties can be changed to control some aspects of the
↓ open down ↓ 43 lines elided ↑ open up ↑
 214  240  #include <sys/param.h>
 215  241  #include <sys/varargs.h>
 216  242  #include <sys/cpuvar.h>
 217  243  #include <sys/disp.h>
 218  244  #include <sys/blkdev.h>
 219  245  #include <sys/atomic.h>
 220  246  #include <sys/archsystm.h>
 221  247  #include <sys/sata/sata_hba.h>
 222  248  #include <sys/stat.h>
 223  249  #include <sys/policy.h>
      250 +#include <sys/list.h>
 224  251  
 225  252  #include <sys/nvme.h>
 226  253  
 227  254  #ifdef __x86
 228  255  #include <sys/x86_archext.h>
 229  256  #endif
 230  257  
 231  258  #include "nvme_reg.h"
 232  259  #include "nvme_var.h"
 233  260  
↓ open down ↓ 16 lines elided ↑ open up ↑
 250  277  static void nvme_release_interrupts(nvme_t *);
 251  278  static uint_t nvme_intr(caddr_t, caddr_t);
 252  279  
 253  280  static void nvme_shutdown(nvme_t *, int, boolean_t);
 254  281  static boolean_t nvme_reset(nvme_t *, boolean_t);
 255  282  static int nvme_init(nvme_t *);
 256  283  static nvme_cmd_t *nvme_alloc_cmd(nvme_t *, int);
 257  284  static void nvme_free_cmd(nvme_cmd_t *);
 258  285  static nvme_cmd_t *nvme_create_nvm_cmd(nvme_namespace_t *, uint8_t,
 259  286      bd_xfer_t *);
 260      -static int nvme_admin_cmd(nvme_cmd_t *, int);
      287 +static void nvme_admin_cmd(nvme_cmd_t *, int);
 261  288  static void nvme_submit_admin_cmd(nvme_qpair_t *, nvme_cmd_t *);
 262  289  static int nvme_submit_io_cmd(nvme_qpair_t *, nvme_cmd_t *);
 263  290  static void nvme_submit_cmd_common(nvme_qpair_t *, nvme_cmd_t *);
      291 +static nvme_cmd_t *nvme_unqueue_cmd(nvme_t *, nvme_qpair_t *, int);
 264  292  static nvme_cmd_t *nvme_retrieve_cmd(nvme_t *, nvme_qpair_t *);
 265      -static boolean_t nvme_wait_cmd(nvme_cmd_t *, uint_t);
      293 +static void nvme_wait_cmd(nvme_cmd_t *, uint_t);
 266  294  static void nvme_wakeup_cmd(void *);
 267  295  static void nvme_async_event_task(void *);
 268  296  
 269  297  static int nvme_check_unknown_cmd_status(nvme_cmd_t *);
 270  298  static int nvme_check_vendor_cmd_status(nvme_cmd_t *);
 271  299  static int nvme_check_integrity_cmd_status(nvme_cmd_t *);
 272  300  static int nvme_check_specific_cmd_status(nvme_cmd_t *);
 273  301  static int nvme_check_generic_cmd_status(nvme_cmd_t *);
 274  302  static inline int nvme_check_cmd_status(nvme_cmd_t *);
 275  303  
 276      -static void nvme_abort_cmd(nvme_cmd_t *);
      304 +static int nvme_abort_cmd(nvme_cmd_t *, uint_t);
 277  305  static void nvme_async_event(nvme_t *);
 278  306  static int nvme_format_nvm(nvme_t *, uint32_t, uint8_t, boolean_t, uint8_t,
 279  307      boolean_t, uint8_t);
 280  308  static int nvme_get_logpage(nvme_t *, void **, size_t *, uint8_t, ...);
 281      -static void *nvme_identify(nvme_t *, uint32_t);
 282      -static boolean_t nvme_set_features(nvme_t *, uint32_t, uint8_t, uint32_t,
      309 +static int nvme_identify(nvme_t *, uint32_t, void **);
      310 +static int nvme_set_features(nvme_t *, uint32_t, uint8_t, uint32_t,
 283  311      uint32_t *);
 284      -static boolean_t nvme_get_features(nvme_t *, uint32_t, uint8_t, uint32_t *,
      312 +static int nvme_get_features(nvme_t *, uint32_t, uint8_t, uint32_t *,
 285  313      void **, size_t *);
 286      -static boolean_t nvme_write_cache_set(nvme_t *, boolean_t);
 287      -static int nvme_set_nqueues(nvme_t *, uint16_t);
      314 +static int nvme_write_cache_set(nvme_t *, boolean_t);
      315 +static int nvme_set_nqueues(nvme_t *, uint16_t *);
 288  316  
 289  317  static void nvme_free_dma(nvme_dma_t *);
 290  318  static int nvme_zalloc_dma(nvme_t *, size_t, uint_t, ddi_dma_attr_t *,
 291  319      nvme_dma_t **);
 292  320  static int nvme_zalloc_queue_dma(nvme_t *, uint32_t, uint16_t, uint_t,
 293  321      nvme_dma_t **);
 294  322  static void nvme_free_qpair(nvme_qpair_t *);
 295  323  static int nvme_alloc_qpair(nvme_t *, uint32_t, nvme_qpair_t **, int);
 296  324  static int nvme_create_io_qpair(nvme_t *, nvme_qpair_t *, uint16_t);
 297  325  
↓ open down ↓ 156 lines elided ↑ open up ↑
 454  482  static bd_ops_t nvme_bd_ops = {
 455  483          .o_version      = BD_OPS_VERSION_0,
 456  484          .o_drive_info   = nvme_bd_driveinfo,
 457  485          .o_media_info   = nvme_bd_mediainfo,
 458  486          .o_devid_init   = nvme_bd_devid,
 459  487          .o_sync_cache   = nvme_bd_sync,
 460  488          .o_read         = nvme_bd_read,
 461  489          .o_write        = nvme_bd_write,
 462  490  };
 463  491  
      492 +/*
      493 + * This list will hold commands that have timed out and couldn't be aborted.
      494 + * As we don't know what the hardware may still do with the DMA memory we can't
      495 + * free them, so we'll keep them forever on this list where we can easily look
      496 + * at them with mdb.
      497 + */
      498 +static struct list nvme_lost_cmds;
      499 +static kmutex_t nvme_lc_mutex;
      500 +
 464  501  int
 465  502  _init(void)
 466  503  {
 467  504          int error;
 468  505  
 469  506          error = ddi_soft_state_init(&nvme_state, sizeof (nvme_t), 1);
 470  507          if (error != DDI_SUCCESS)
 471  508                  return (error);
 472  509  
 473  510          nvme_cmd_cache = kmem_cache_create("nvme_cmd_cache",
 474  511              sizeof (nvme_cmd_t), 64, NULL, NULL, NULL, NULL, NULL, 0);
 475  512  
      513 +        mutex_init(&nvme_lc_mutex, NULL, MUTEX_DRIVER, NULL);
      514 +        list_create(&nvme_lost_cmds, sizeof (nvme_cmd_t),
      515 +            offsetof(nvme_cmd_t, nc_list));
      516 +
 476  517          bd_mod_init(&nvme_dev_ops);
 477  518  
 478  519          error = mod_install(&nvme_modlinkage);
 479  520          if (error != DDI_SUCCESS) {
 480  521                  ddi_soft_state_fini(&nvme_state);
      522 +                mutex_destroy(&nvme_lc_mutex);
      523 +                list_destroy(&nvme_lost_cmds);
 481  524                  bd_mod_fini(&nvme_dev_ops);
 482  525          }
 483  526  
 484  527          return (error);
 485  528  }
 486  529  
 487  530  int
 488  531  _fini(void)
 489  532  {
 490  533          int error;
 491  534  
      535 +        if (!list_is_empty(&nvme_lost_cmds))
      536 +                return (DDI_FAILURE);
      537 +
 492  538          error = mod_remove(&nvme_modlinkage);
 493  539          if (error == DDI_SUCCESS) {
 494  540                  ddi_soft_state_fini(&nvme_state);
 495  541                  kmem_cache_destroy(nvme_cmd_cache);
      542 +                mutex_destroy(&nvme_lc_mutex);
      543 +                list_destroy(&nvme_lost_cmds);
 496  544                  bd_mod_fini(&nvme_dev_ops);
 497  545          }
 498  546  
 499  547          return (error);
 500  548  }
 501  549  
 502  550  int
 503  551  _info(struct modinfo *modinfop)
 504  552  {
 505  553          return (mod_info(&nvme_modlinkage, modinfop));
↓ open down ↓ 289 lines elided ↑ open up ↑
 795  843          mutex_init(&cmd->nc_mutex, NULL, MUTEX_DRIVER,
 796  844              DDI_INTR_PRI(nvme->n_intr_pri));
 797  845          cv_init(&cmd->nc_cv, NULL, CV_DRIVER, NULL);
 798  846  
 799  847          return (cmd);
 800  848  }
 801  849  
 802  850  static void
 803  851  nvme_free_cmd(nvme_cmd_t *cmd)
 804  852  {
      853 +        /* Don't free commands on the lost commands list. */
      854 +        if (list_link_active(&cmd->nc_list))
      855 +                return;
      856 +
 805  857          if (cmd->nc_dma) {
 806  858                  if (cmd->nc_dma->nd_cached)
 807  859                          kmem_cache_free(cmd->nc_nvme->n_prp_cache,
 808  860                              cmd->nc_dma);
 809  861                  else
 810  862                          nvme_free_dma(cmd->nc_dma);
 811  863                  cmd->nc_dma = NULL;
 812  864          }
 813  865  
 814  866          cv_destroy(&cmd->nc_cv);
↓ open down ↓ 46 lines elided ↑ open up ↑
 861  913              sizeof (nvme_sqe_t), DDI_DMA_SYNC_FORDEV);
 862  914          qp->nq_next_cmd = (qp->nq_next_cmd + 1) % qp->nq_nentry;
 863  915  
 864  916          tail.b.sqtdbl_sqt = qp->nq_sqtail = (qp->nq_sqtail + 1) % qp->nq_nentry;
 865  917          nvme_put32(cmd->nc_nvme, qp->nq_sqtdbl, tail.r);
 866  918  
 867  919          mutex_exit(&qp->nq_mutex);
 868  920  }
 869  921  
 870  922  static nvme_cmd_t *
      923 +nvme_unqueue_cmd(nvme_t *nvme, nvme_qpair_t *qp, int cid)
      924 +{
      925 +        nvme_cmd_t *cmd;
      926 +
      927 +        ASSERT(mutex_owned(&qp->nq_mutex));
      928 +        ASSERT3S(cid, <, qp->nq_nentry);
      929 +
      930 +        cmd = qp->nq_cmd[cid];
      931 +        qp->nq_cmd[cid] = NULL;
      932 +        ASSERT3U(qp->nq_active_cmds, >, 0);
      933 +        qp->nq_active_cmds--;
      934 +        sema_v(&qp->nq_sema);
      935 +
      936 +        ASSERT3P(cmd, !=, NULL);
      937 +        ASSERT3P(cmd->nc_nvme, ==, nvme);
      938 +        ASSERT3S(cmd->nc_sqe.sqe_cid, ==, cid);
      939 +
      940 +        return (cmd);
      941 +}
      942 +
      943 +static nvme_cmd_t *
 871  944  nvme_retrieve_cmd(nvme_t *nvme, nvme_qpair_t *qp)
 872  945  {
 873  946          nvme_reg_cqhdbl_t head = { 0 };
 874  947  
 875  948          nvme_cqe_t *cqe;
 876  949          nvme_cmd_t *cmd;
 877  950  
 878  951          (void) ddi_dma_sync(qp->nq_cqdma->nd_dmah, 0,
 879  952              sizeof (nvme_cqe_t) * qp->nq_nentry, DDI_DMA_SYNC_FORKERNEL);
 880  953  
 881  954          mutex_enter(&qp->nq_mutex);
 882  955          cqe = &qp->nq_cq[qp->nq_cqhead];
 883  956  
 884  957          /* Check phase tag of CQE. Hardware inverts it for new entries. */
 885  958          if (cqe->cqe_sf.sf_p == qp->nq_phase) {
 886  959                  mutex_exit(&qp->nq_mutex);
 887  960                  return (NULL);
 888  961          }
 889  962  
 890  963          ASSERT(nvme->n_ioq[cqe->cqe_sqid] == qp);
 891      -        ASSERT(cqe->cqe_cid < qp->nq_nentry);
 892  964  
 893      -        cmd = qp->nq_cmd[cqe->cqe_cid];
 894      -        qp->nq_cmd[cqe->cqe_cid] = NULL;
 895      -        qp->nq_active_cmds--;
      965 +        cmd = nvme_unqueue_cmd(nvme, qp, cqe->cqe_cid);
 896  966  
 897      -        ASSERT(cmd != NULL);
 898      -        ASSERT(cmd->nc_nvme == nvme);
 899  967          ASSERT(cmd->nc_sqid == cqe->cqe_sqid);
 900      -        ASSERT(cmd->nc_sqe.sqe_cid == cqe->cqe_cid);
 901  968          bcopy(cqe, &cmd->nc_cqe, sizeof (nvme_cqe_t));
 902  969  
 903  970          qp->nq_sqhead = cqe->cqe_sqhd;
 904  971  
 905  972          head.b.cqhdbl_cqh = qp->nq_cqhead = (qp->nq_cqhead + 1) % qp->nq_nentry;
 906  973  
 907  974          /* Toggle phase on wrap-around. */
 908  975          if (qp->nq_cqhead == 0)
 909  976                  qp->nq_phase = qp->nq_phase ? 0 : 1;
 910  977  
 911  978          nvme_put32(cmd->nc_nvme, qp->nq_cqhdbl, head.r);
 912  979          mutex_exit(&qp->nq_mutex);
 913      -        sema_v(&qp->nq_sema);
 914  980  
 915  981          return (cmd);
 916  982  }
 917  983  
 918  984  static int
 919  985  nvme_check_unknown_cmd_status(nvme_cmd_t *cmd)
 920  986  {
 921  987          nvme_cqe_t *cqe = &cmd->nc_cqe;
 922  988  
 923  989          dev_err(cmd->nc_nvme->n_dip, CE_WARN,
↓ open down ↓ 264 lines elided ↑ open up ↑
1188 1254          default:
1189 1255                  return (nvme_check_unknown_cmd_status(cmd));
1190 1256          }
1191 1257  }
1192 1258  
1193 1259  static inline int
1194 1260  nvme_check_cmd_status(nvme_cmd_t *cmd)
1195 1261  {
1196 1262          nvme_cqe_t *cqe = &cmd->nc_cqe;
1197 1263  
1198      -        /* take a shortcut if everything is alright */
     1264 +        /*
     1265 +         * Take a shortcut if the controller is dead, or if
     1266 +         * command status indicates no error.
     1267 +         */
     1268 +        if (cmd->nc_nvme->n_dead)
     1269 +                return (EIO);
     1270 +
1199 1271          if (cqe->cqe_sf.sf_sct == NVME_CQE_SCT_GENERIC &&
1200 1272              cqe->cqe_sf.sf_sc == NVME_CQE_SC_GEN_SUCCESS)
1201 1273                  return (0);
1202 1274  
1203 1275          if (cqe->cqe_sf.sf_sct == NVME_CQE_SCT_GENERIC)
1204 1276                  return (nvme_check_generic_cmd_status(cmd));
1205 1277          else if (cqe->cqe_sf.sf_sct == NVME_CQE_SCT_SPECIFIC)
1206 1278                  return (nvme_check_specific_cmd_status(cmd));
1207 1279          else if (cqe->cqe_sf.sf_sct == NVME_CQE_SCT_INTEGRITY)
1208 1280                  return (nvme_check_integrity_cmd_status(cmd));
1209 1281          else if (cqe->cqe_sf.sf_sct == NVME_CQE_SCT_VENDOR)
1210 1282                  return (nvme_check_vendor_cmd_status(cmd));
1211 1283  
1212 1284          return (nvme_check_unknown_cmd_status(cmd));
1213 1285  }
1214 1286  
1215      -/*
1216      - * nvme_abort_cmd_cb -- replaces nc_callback of aborted commands
1217      - *
1218      - * This functions takes care of cleaning up aborted commands. The command
1219      - * status is checked to catch any fatal errors.
1220      - */
1221      -static void
1222      -nvme_abort_cmd_cb(void *arg)
     1287 +static int
     1288 +nvme_abort_cmd(nvme_cmd_t *abort_cmd, uint_t sec)
1223 1289  {
1224      -        nvme_cmd_t *cmd = arg;
1225      -
1226      -        /*
1227      -         * Grab the command mutex. Once we have it we hold the last reference
1228      -         * to the command and can safely free it.
1229      -         */
1230      -        mutex_enter(&cmd->nc_mutex);
1231      -        (void) nvme_check_cmd_status(cmd);
1232      -        mutex_exit(&cmd->nc_mutex);
1233      -
1234      -        nvme_free_cmd(cmd);
1235      -}
1236      -
1237      -static void
1238      -nvme_abort_cmd(nvme_cmd_t *abort_cmd)
1239      -{
1240 1290          nvme_t *nvme = abort_cmd->nc_nvme;
1241 1291          nvme_cmd_t *cmd = nvme_alloc_cmd(nvme, KM_SLEEP);
1242 1292          nvme_abort_cmd_t ac = { 0 };
     1293 +        int ret = 0;
1243 1294  
1244 1295          sema_p(&nvme->n_abort_sema);
1245 1296  
1246 1297          ac.b.ac_cid = abort_cmd->nc_sqe.sqe_cid;
1247 1298          ac.b.ac_sqid = abort_cmd->nc_sqid;
1248 1299  
1249      -        /*
1250      -         * Drop the mutex of the aborted command. From this point on
1251      -         * we must assume that the abort callback has freed the command.
1252      -         */
1253      -        mutex_exit(&abort_cmd->nc_mutex);
1254      -
1255 1300          cmd->nc_sqid = 0;
1256 1301          cmd->nc_sqe.sqe_opc = NVME_OPC_ABORT;
1257 1302          cmd->nc_callback = nvme_wakeup_cmd;
1258 1303          cmd->nc_sqe.sqe_cdw10 = ac.r;
1259 1304  
1260 1305          /*
1261 1306           * Send the ABORT to the hardware. The ABORT command will return _after_
1262      -         * the aborted command has completed (aborted or otherwise).
     1307 +         * the aborted command has completed (aborted or otherwise), but since
     1308 +         * we still hold the aborted command's mutex its callback hasn't been
     1309 +         * processed yet.
1263 1310           */
1264      -        if (nvme_admin_cmd(cmd, nvme_admin_cmd_timeout) != DDI_SUCCESS) {
1265      -                sema_v(&nvme->n_abort_sema);
1266      -                dev_err(nvme->n_dip, CE_WARN,
1267      -                    "!nvme_admin_cmd failed for ABORT");
1268      -                atomic_inc_32(&nvme->n_abort_failed);
1269      -                return;
1270      -        }
     1311 +        nvme_admin_cmd(cmd, sec);
1271 1312          sema_v(&nvme->n_abort_sema);
1272 1313  
1273      -        if (nvme_check_cmd_status(cmd)) {
     1314 +        if ((ret = nvme_check_cmd_status(cmd)) != 0) {
1274 1315                  dev_err(nvme->n_dip, CE_WARN,
1275 1316                      "!ABORT failed with sct = %x, sc = %x",
1276 1317                      cmd->nc_cqe.cqe_sf.sf_sct, cmd->nc_cqe.cqe_sf.sf_sc);
1277 1318                  atomic_inc_32(&nvme->n_abort_failed);
1278 1319          } else {
1279      -                atomic_inc_32(&nvme->n_cmd_aborted);
     1320 +                dev_err(nvme->n_dip, CE_WARN,
     1321 +                    "!ABORT of command %d/%d %ssuccessful",
     1322 +                    abort_cmd->nc_sqe.sqe_cid, abort_cmd->nc_sqid,
     1323 +                    cmd->nc_cqe.cqe_dw0 & 1 ? "un" : "");
     1324 +                if ((cmd->nc_cqe.cqe_dw0 & 1) == 0)
     1325 +                        atomic_inc_32(&nvme->n_cmd_aborted);
1280 1326          }
1281 1327  
1282 1328          nvme_free_cmd(cmd);
     1329 +        return (ret);
1283 1330  }
1284 1331  
1285 1332  /*
1286 1333   * nvme_wait_cmd -- wait for command completion or timeout
1287 1334   *
1288      - * Returns B_TRUE if the command completed normally.
1289      - *
1290      - * Returns B_FALSE if the command timed out and an abort was attempted. The
1291      - * command mutex will be dropped and the command must be considered freed. The
1292      - * freeing of the command is normally done by the abort command callback.
1293      - *
1294 1335   * In case of a serious error or a timeout of the abort command the hardware
1295 1336   * will be declared dead and FMA will be notified.
1296 1337   */
1297      -static boolean_t
     1338 +static void
1298 1339  nvme_wait_cmd(nvme_cmd_t *cmd, uint_t sec)
1299 1340  {
1300 1341          clock_t timeout = ddi_get_lbolt() + drv_usectohz(sec * MICROSEC);
1301 1342          nvme_t *nvme = cmd->nc_nvme;
1302 1343          nvme_reg_csts_t csts;
     1344 +        nvme_qpair_t *qp;
1303 1345  
1304 1346          ASSERT(mutex_owned(&cmd->nc_mutex));
1305 1347  
1306 1348          while (!cmd->nc_completed) {
1307 1349                  if (cv_timedwait(&cmd->nc_cv, &cmd->nc_mutex, timeout) == -1)
1308 1350                          break;
1309 1351          }
1310 1352  
1311 1353          if (cmd->nc_completed)
1312      -                return (B_TRUE);
     1354 +                return;
1313 1355  
1314 1356          /*
1315      -         * The command timed out. Change the callback to the cleanup function.
1316      -         */
1317      -        cmd->nc_callback = nvme_abort_cmd_cb;
1318      -
1319      -        /*
     1357 +         * The command timed out.
     1358 +         *
1320 1359           * Check controller for fatal status, any errors associated with the
1321 1360           * register or DMA handle, or for a double timeout (abort command timed
1322 1361           * out). If necessary log a warning and call FMA.
1323 1362           */
1324 1363          csts.r = nvme_get32(nvme, NVME_REG_CSTS);
1325      -        dev_err(nvme->n_dip, CE_WARN, "!command timeout, "
1326      -            "OPC = %x, CFS = %d", cmd->nc_sqe.sqe_opc, csts.b.csts_cfs);
     1364 +        dev_err(nvme->n_dip, CE_WARN, "!command %d/%d timeout, "
     1365 +            "OPC = %x, CFS = %d", cmd->nc_sqe.sqe_cid, cmd->nc_sqid,
     1366 +            cmd->nc_sqe.sqe_opc, csts.b.csts_cfs);
1327 1367          atomic_inc_32(&nvme->n_cmd_timeout);
1328 1368  
1329 1369          if (csts.b.csts_cfs ||
1330 1370              nvme_check_regs_hdl(nvme) ||
1331 1371              nvme_check_dma_hdl(cmd->nc_dma) ||
1332 1372              cmd->nc_sqe.sqe_opc == NVME_OPC_ABORT) {
1333 1373                  ddi_fm_service_impact(nvme->n_dip, DDI_SERVICE_LOST);
1334 1374                  nvme->n_dead = B_TRUE;
1335      -                mutex_exit(&cmd->nc_mutex);
1336      -        } else {
     1375 +        } else if (nvme_abort_cmd(cmd, sec) == 0) {
1337 1376                  /*
1338      -                 * Try to abort the command. The command mutex is released by
1339      -                 * nvme_abort_cmd().
1340      -                 * If the abort succeeds it will have freed the aborted command.
1341      -                 * If the abort fails for other reasons we must assume that the
1342      -                 * command may complete at any time, and the callback will free
1343      -                 * it for us.
     1377 +                 * If the abort succeeded the command should complete
     1378 +                 * immediately with an appropriate status.
1344 1379                   */
1345      -                nvme_abort_cmd(cmd);
     1380 +                while (!cmd->nc_completed)
     1381 +                        cv_wait(&cmd->nc_cv, &cmd->nc_mutex);
     1382 +
     1383 +                return;
1346 1384          }
1347 1385  
1348      -        return (B_FALSE);
     1386 +        qp = nvme->n_ioq[cmd->nc_sqid];
     1387 +
     1388 +        mutex_enter(&qp->nq_mutex);
     1389 +        (void) nvme_unqueue_cmd(nvme, qp, cmd->nc_sqe.sqe_cid);
     1390 +        mutex_exit(&qp->nq_mutex);
     1391 +
     1392 +        /*
     1393 +         * As we don't know what the presumed dead hardware might still do with
     1394 +         * the DMA memory, we'll put the command on the lost commands list if it
     1395 +         * has any DMA memory.
     1396 +         */
     1397 +        if (cmd->nc_dma != NULL) {
     1398 +                mutex_enter(&nvme_lc_mutex);
     1399 +                list_insert_head(&nvme_lost_cmds, cmd);
     1400 +                mutex_exit(&nvme_lc_mutex);
     1401 +        }
1349 1402  }
1350 1403  
1351 1404  static void
1352 1405  nvme_wakeup_cmd(void *arg)
1353 1406  {
1354 1407          nvme_cmd_t *cmd = arg;
1355 1408  
1356 1409          mutex_enter(&cmd->nc_mutex);
1357      -        /*
1358      -         * There is a slight chance that this command completed shortly after
1359      -         * the timeout was hit in nvme_wait_cmd() but before the callback was
1360      -         * changed. Catch that case here and clean up accordingly.
1361      -         */
1362      -        if (cmd->nc_callback == nvme_abort_cmd_cb) {
1363      -                mutex_exit(&cmd->nc_mutex);
1364      -                nvme_abort_cmd_cb(cmd);
1365      -                return;
1366      -        }
1367      -
1368 1410          cmd->nc_completed = B_TRUE;
1369 1411          cv_signal(&cmd->nc_cv);
1370 1412          mutex_exit(&cmd->nc_mutex);
1371 1413  }
1372 1414  
1373 1415  static void
1374 1416  nvme_async_event_task(void *arg)
1375 1417  {
1376 1418          nvme_cmd_t *cmd = arg;
1377 1419          nvme_t *nvme = cmd->nc_nvme;
↓ open down ↓ 5 lines elided ↑ open up ↑
1383 1425          /*
1384 1426           * Check for errors associated with the async request itself. The only
1385 1427           * command-specific error is "async event limit exceeded", which
1386 1428           * indicates a programming error in the driver and causes a panic in
1387 1429           * nvme_check_cmd_status().
1388 1430           *
1389 1431           * Other possible errors are various scenarios where the async request
1390 1432           * was aborted, or internal errors in the device. Internal errors are
1391 1433           * reported to FMA, the command aborts need no special handling here.
1392 1434           */
1393      -        if (nvme_check_cmd_status(cmd)) {
     1435 +        if (nvme_check_cmd_status(cmd) != 0) {
1394 1436                  dev_err(cmd->nc_nvme->n_dip, CE_WARN,
1395 1437                      "!async event request returned failure, sct = %x, "
1396 1438                      "sc = %x, dnr = %d, m = %d", cmd->nc_cqe.cqe_sf.sf_sct,
1397 1439                      cmd->nc_cqe.cqe_sf.sf_sc, cmd->nc_cqe.cqe_sf.sf_dnr,
1398 1440                      cmd->nc_cqe.cqe_sf.sf_m);
1399 1441  
1400 1442                  if (cmd->nc_cqe.cqe_sf.sf_sct == NVME_CQE_SCT_GENERIC &&
1401 1443                      cmd->nc_cqe.cqe_sf.sf_sc == NVME_CQE_SC_GEN_INTERNAL_ERR) {
1402 1444                          cmd->nc_nvme->n_dead = B_TRUE;
1403 1445                          ddi_fm_service_impact(cmd->nc_nvme->n_dip,
↓ open down ↓ 111 lines elided ↑ open up ↑
1515 1557                  break;
1516 1558          }
1517 1559  
1518 1560          if (error_log)
1519 1561                  kmem_free(error_log, logsize);
1520 1562  
1521 1563          if (health_log)
1522 1564                  kmem_free(health_log, logsize);
1523 1565  }
1524 1566  
1525      -static int
     1567 +static void
1526 1568  nvme_admin_cmd(nvme_cmd_t *cmd, int sec)
1527 1569  {
1528 1570          mutex_enter(&cmd->nc_mutex);
1529 1571          nvme_submit_admin_cmd(cmd->nc_nvme->n_adminq, cmd);
1530      -
1531      -        if (nvme_wait_cmd(cmd, sec) == B_FALSE) {
1532      -                /*
1533      -                 * The command timed out. An abort command was posted that
1534      -                 * will take care of the cleanup.
1535      -                 */
1536      -                return (DDI_FAILURE);
1537      -        }
     1572 +        nvme_wait_cmd(cmd, sec);
1538 1573          mutex_exit(&cmd->nc_mutex);
1539      -
1540      -        return (DDI_SUCCESS);
1541 1574  }
1542 1575  
1543 1576  static void
1544 1577  nvme_async_event(nvme_t *nvme)
1545 1578  {
1546 1579          nvme_cmd_t *cmd = nvme_alloc_cmd(nvme, KM_SLEEP);
1547 1580  
1548 1581          cmd->nc_sqid = 0;
1549 1582          cmd->nc_sqe.sqe_opc = NVME_OPC_ASYNC_EVENT;
1550 1583          cmd->nc_callback = nvme_async_event_task;
↓ open down ↓ 21 lines elided ↑ open up ↑
1572 1605          cmd->nc_sqe.sqe_opc = NVME_OPC_NVM_FORMAT;
1573 1606          cmd->nc_sqe.sqe_cdw10 = format_nvm.r;
1574 1607  
1575 1608          /*
1576 1609           * Some devices like Samsung SM951 don't allow formatting of all
1577 1610           * namespaces in one command. Handle that gracefully.
1578 1611           */
1579 1612          if (nsid == (uint32_t)-1)
1580 1613                  cmd->nc_dontpanic = B_TRUE;
1581 1614  
1582      -        if ((ret = nvme_admin_cmd(cmd, nvme_format_cmd_timeout))
1583      -            != DDI_SUCCESS) {
1584      -                dev_err(nvme->n_dip, CE_WARN,
1585      -                    "!nvme_admin_cmd failed for FORMAT NVM");
1586      -                return (EIO);
1587      -        }
     1615 +        nvme_admin_cmd(cmd, nvme_format_cmd_timeout);
1588 1616  
1589 1617          if ((ret = nvme_check_cmd_status(cmd)) != 0) {
1590 1618                  dev_err(nvme->n_dip, CE_WARN,
1591 1619                      "!FORMAT failed with sct = %x, sc = %x",
1592 1620                      cmd->nc_cqe.cqe_sf.sf_sct, cmd->nc_cqe.cqe_sf.sf_sc);
1593 1621          }
1594 1622  
1595 1623          nvme_free_cmd(cmd);
1596 1624          return (ret);
1597 1625  }
1598 1626  
1599 1627  static int
1600 1628  nvme_get_logpage(nvme_t *nvme, void **buf, size_t *bufsize, uint8_t logpage,
1601 1629      ...)
1602 1630  {
1603 1631          nvme_cmd_t *cmd = nvme_alloc_cmd(nvme, KM_SLEEP);
1604 1632          nvme_getlogpage_t getlogpage = { 0 };
1605 1633          va_list ap;
1606      -        int ret = DDI_FAILURE;
     1634 +        int ret;
1607 1635  
1608 1636          va_start(ap, logpage);
1609 1637  
1610 1638          cmd->nc_sqid = 0;
1611 1639          cmd->nc_callback = nvme_wakeup_cmd;
1612 1640          cmd->nc_sqe.sqe_opc = NVME_OPC_GET_LOG_PAGE;
1613 1641  
1614 1642          getlogpage.b.lp_lid = logpage;
1615 1643  
1616 1644          switch (logpage) {
↓ open down ↓ 14 lines elided ↑ open up ↑
1631 1659  
1632 1660          case NVME_LOGPAGE_FWSLOT:
1633 1661                  cmd->nc_sqe.sqe_nsid = (uint32_t)-1;
1634 1662                  *bufsize = sizeof (nvme_fwslot_log_t);
1635 1663                  break;
1636 1664  
1637 1665          default:
1638 1666                  dev_err(nvme->n_dip, CE_WARN, "!unknown log page requested: %d",
1639 1667                      logpage);
1640 1668                  atomic_inc_32(&nvme->n_unknown_logpage);
     1669 +                ret = EINVAL;
1641 1670                  goto fail;
1642 1671          }
1643 1672  
1644 1673          va_end(ap);
1645 1674  
1646 1675          getlogpage.b.lp_numd = *bufsize / sizeof (uint32_t) - 1;
1647 1676  
1648 1677          cmd->nc_sqe.sqe_cdw10 = getlogpage.r;
1649 1678  
1650 1679          if (nvme_zalloc_dma(nvme, getlogpage.b.lp_numd * sizeof (uint32_t),
1651 1680              DDI_DMA_READ, &nvme->n_prp_dma_attr, &cmd->nc_dma) != DDI_SUCCESS) {
1652 1681                  dev_err(nvme->n_dip, CE_WARN,
1653 1682                      "!nvme_zalloc_dma failed for GET LOG PAGE");
     1683 +                ret = ENOMEM;
1654 1684                  goto fail;
1655 1685          }
1656 1686  
1657 1687          if (cmd->nc_dma->nd_ncookie > 2) {
1658 1688                  dev_err(nvme->n_dip, CE_WARN,
1659 1689                      "!too many DMA cookies for GET LOG PAGE");
1660 1690                  atomic_inc_32(&nvme->n_too_many_cookies);
     1691 +                ret = ENOMEM;
1661 1692                  goto fail;
1662 1693          }
1663 1694  
1664 1695          cmd->nc_sqe.sqe_dptr.d_prp[0] = cmd->nc_dma->nd_cookie.dmac_laddress;
1665 1696          if (cmd->nc_dma->nd_ncookie > 1) {
1666 1697                  ddi_dma_nextcookie(cmd->nc_dma->nd_dmah,
1667 1698                      &cmd->nc_dma->nd_cookie);
1668 1699                  cmd->nc_sqe.sqe_dptr.d_prp[1] =
1669 1700                      cmd->nc_dma->nd_cookie.dmac_laddress;
1670 1701          }
1671 1702  
1672      -        if (nvme_admin_cmd(cmd, nvme_admin_cmd_timeout) != DDI_SUCCESS) {
1673      -                dev_err(nvme->n_dip, CE_WARN,
1674      -                    "!nvme_admin_cmd failed for GET LOG PAGE");
1675      -                return (ret);
1676      -        }
     1703 +        nvme_admin_cmd(cmd, nvme_admin_cmd_timeout);
1677 1704  
1678      -        if (nvme_check_cmd_status(cmd)) {
     1705 +        if ((ret = nvme_check_cmd_status(cmd)) != 0) {
1679 1706                  dev_err(nvme->n_dip, CE_WARN,
1680 1707                      "!GET LOG PAGE failed with sct = %x, sc = %x",
1681 1708                      cmd->nc_cqe.cqe_sf.sf_sct, cmd->nc_cqe.cqe_sf.sf_sc);
1682 1709                  goto fail;
1683 1710          }
1684 1711  
1685 1712          *buf = kmem_alloc(*bufsize, KM_SLEEP);
1686 1713          bcopy(cmd->nc_dma->nd_memp, *buf, *bufsize);
1687 1714  
1688      -        ret = DDI_SUCCESS;
1689      -
1690 1715  fail:
1691 1716          nvme_free_cmd(cmd);
1692 1717  
1693 1718          return (ret);
1694 1719  }
1695 1720  
1696      -static void *
1697      -nvme_identify(nvme_t *nvme, uint32_t nsid)
     1721 +static int
     1722 +nvme_identify(nvme_t *nvme, uint32_t nsid, void **buf)
1698 1723  {
1699 1724          nvme_cmd_t *cmd = nvme_alloc_cmd(nvme, KM_SLEEP);
1700      -        void *buf = NULL;
     1725 +        int ret;
1701 1726  
     1727 +        if (buf == NULL)
     1728 +                return (EINVAL);
     1729 +
1702 1730          cmd->nc_sqid = 0;
1703 1731          cmd->nc_callback = nvme_wakeup_cmd;
1704 1732          cmd->nc_sqe.sqe_opc = NVME_OPC_IDENTIFY;
1705 1733          cmd->nc_sqe.sqe_nsid = nsid;
1706 1734          cmd->nc_sqe.sqe_cdw10 = nsid ? NVME_IDENTIFY_NSID : NVME_IDENTIFY_CTRL;
1707 1735  
1708 1736          if (nvme_zalloc_dma(nvme, NVME_IDENTIFY_BUFSIZE, DDI_DMA_READ,
1709 1737              &nvme->n_prp_dma_attr, &cmd->nc_dma) != DDI_SUCCESS) {
1710 1738                  dev_err(nvme->n_dip, CE_WARN,
1711 1739                      "!nvme_zalloc_dma failed for IDENTIFY");
     1740 +                ret = ENOMEM;
1712 1741                  goto fail;
1713 1742          }
1714 1743  
1715 1744          if (cmd->nc_dma->nd_ncookie > 2) {
1716 1745                  dev_err(nvme->n_dip, CE_WARN,
1717 1746                      "!too many DMA cookies for IDENTIFY");
1718 1747                  atomic_inc_32(&nvme->n_too_many_cookies);
     1748 +                ret = ENOMEM;
1719 1749                  goto fail;
1720 1750          }
1721 1751  
1722 1752          cmd->nc_sqe.sqe_dptr.d_prp[0] = cmd->nc_dma->nd_cookie.dmac_laddress;
1723 1753          if (cmd->nc_dma->nd_ncookie > 1) {
1724 1754                  ddi_dma_nextcookie(cmd->nc_dma->nd_dmah,
1725 1755                      &cmd->nc_dma->nd_cookie);
1726 1756                  cmd->nc_sqe.sqe_dptr.d_prp[1] =
1727 1757                      cmd->nc_dma->nd_cookie.dmac_laddress;
1728 1758          }
1729 1759  
1730      -        if (nvme_admin_cmd(cmd, nvme_admin_cmd_timeout) != DDI_SUCCESS) {
1731      -                dev_err(nvme->n_dip, CE_WARN,
1732      -                    "!nvme_admin_cmd failed for IDENTIFY");
1733      -                return (NULL);
1734      -        }
     1760 +        nvme_admin_cmd(cmd, nvme_admin_cmd_timeout);
1735 1761  
1736      -        if (nvme_check_cmd_status(cmd)) {
     1762 +        if ((ret = nvme_check_cmd_status(cmd)) != 0) {
1737 1763                  dev_err(nvme->n_dip, CE_WARN,
1738 1764                      "!IDENTIFY failed with sct = %x, sc = %x",
1739 1765                      cmd->nc_cqe.cqe_sf.sf_sct, cmd->nc_cqe.cqe_sf.sf_sc);
1740 1766                  goto fail;
1741 1767          }
1742 1768  
1743      -        buf = kmem_alloc(NVME_IDENTIFY_BUFSIZE, KM_SLEEP);
1744      -        bcopy(cmd->nc_dma->nd_memp, buf, NVME_IDENTIFY_BUFSIZE);
     1769 +        *buf = kmem_alloc(NVME_IDENTIFY_BUFSIZE, KM_SLEEP);
     1770 +        bcopy(cmd->nc_dma->nd_memp, *buf, NVME_IDENTIFY_BUFSIZE);
1745 1771  
1746 1772  fail:
1747 1773          nvme_free_cmd(cmd);
1748 1774  
1749      -        return (buf);
     1775 +        return (ret);
1750 1776  }
1751 1777  
1752      -static boolean_t
     1778 +static int
1753 1779  nvme_set_features(nvme_t *nvme, uint32_t nsid, uint8_t feature, uint32_t val,
1754 1780      uint32_t *res)
1755 1781  {
1756 1782          _NOTE(ARGUNUSED(nsid));
1757 1783          nvme_cmd_t *cmd = nvme_alloc_cmd(nvme, KM_SLEEP);
1758      -        boolean_t ret = B_FALSE;
     1784 +        int ret = EINVAL;
1759 1785  
1760 1786          ASSERT(res != NULL);
1761 1787  
1762 1788          cmd->nc_sqid = 0;
1763 1789          cmd->nc_callback = nvme_wakeup_cmd;
1764 1790          cmd->nc_sqe.sqe_opc = NVME_OPC_SET_FEATURES;
1765 1791          cmd->nc_sqe.sqe_cdw10 = feature;
1766 1792          cmd->nc_sqe.sqe_cdw11 = val;
1767 1793  
1768 1794          switch (feature) {
↓ open down ↓ 2 lines elided ↑ open up ↑
1771 1797                          goto fail;
1772 1798                  break;
1773 1799  
1774 1800          case NVME_FEAT_NQUEUES:
1775 1801                  break;
1776 1802  
1777 1803          default:
1778 1804                  goto fail;
1779 1805          }
1780 1806  
1781      -        if (nvme_admin_cmd(cmd, nvme_admin_cmd_timeout) != DDI_SUCCESS) {
1782      -                dev_err(nvme->n_dip, CE_WARN,
1783      -                    "!nvme_admin_cmd failed for SET FEATURES");
1784      -                return (ret);
1785      -        }
     1807 +        nvme_admin_cmd(cmd, nvme_admin_cmd_timeout);
1786 1808  
1787      -        if (nvme_check_cmd_status(cmd)) {
     1809 +        if ((ret = nvme_check_cmd_status(cmd)) != 0) {
1788 1810                  dev_err(nvme->n_dip, CE_WARN,
1789 1811                      "!SET FEATURES %d failed with sct = %x, sc = %x",
1790 1812                      feature, cmd->nc_cqe.cqe_sf.sf_sct,
1791 1813                      cmd->nc_cqe.cqe_sf.sf_sc);
1792 1814                  goto fail;
1793 1815          }
1794 1816  
1795 1817          *res = cmd->nc_cqe.cqe_dw0;
1796      -        ret = B_TRUE;
1797 1818  
1798 1819  fail:
1799 1820          nvme_free_cmd(cmd);
1800 1821          return (ret);
1801 1822  }
1802 1823  
1803      -static boolean_t
     1824 +static int
1804 1825  nvme_get_features(nvme_t *nvme, uint32_t nsid, uint8_t feature, uint32_t *res,
1805 1826      void **buf, size_t *bufsize)
1806 1827  {
1807 1828          nvme_cmd_t *cmd = nvme_alloc_cmd(nvme, KM_SLEEP);
1808      -        boolean_t ret = B_FALSE;
     1829 +        int ret = EINVAL;
1809 1830  
1810 1831          ASSERT(res != NULL);
1811 1832  
1812 1833          if (bufsize != NULL)
1813 1834                  *bufsize = 0;
1814 1835  
1815 1836          cmd->nc_sqid = 0;
1816 1837          cmd->nc_callback = nvme_wakeup_cmd;
1817 1838          cmd->nc_sqe.sqe_opc = NVME_OPC_GET_FEATURES;
1818 1839          cmd->nc_sqe.sqe_cdw10 = feature;
↓ open down ↓ 45 lines elided ↑ open up ↑
1864 1885  
1865 1886          default:
1866 1887                  goto fail;
1867 1888          }
1868 1889  
1869 1890          if (bufsize != NULL && *bufsize != 0) {
1870 1891                  if (nvme_zalloc_dma(nvme, *bufsize, DDI_DMA_READ,
1871 1892                      &nvme->n_prp_dma_attr, &cmd->nc_dma) != DDI_SUCCESS) {
1872 1893                          dev_err(nvme->n_dip, CE_WARN,
1873 1894                              "!nvme_zalloc_dma failed for GET FEATURES");
     1895 +                        ret = ENOMEM;
1874 1896                          goto fail;
1875 1897                  }
1876 1898  
1877 1899                  if (cmd->nc_dma->nd_ncookie > 2) {
1878 1900                          dev_err(nvme->n_dip, CE_WARN,
1879 1901                              "!too many DMA cookies for GET FEATURES");
1880 1902                          atomic_inc_32(&nvme->n_too_many_cookies);
     1903 +                        ret = ENOMEM;
1881 1904                          goto fail;
1882 1905                  }
1883 1906  
1884 1907                  cmd->nc_sqe.sqe_dptr.d_prp[0] =
1885 1908                      cmd->nc_dma->nd_cookie.dmac_laddress;
1886 1909                  if (cmd->nc_dma->nd_ncookie > 1) {
1887 1910                          ddi_dma_nextcookie(cmd->nc_dma->nd_dmah,
1888 1911                              &cmd->nc_dma->nd_cookie);
1889 1912                          cmd->nc_sqe.sqe_dptr.d_prp[1] =
1890 1913                              cmd->nc_dma->nd_cookie.dmac_laddress;
1891 1914                  }
1892 1915          }
1893 1916  
1894      -        if (nvme_admin_cmd(cmd, nvme_admin_cmd_timeout) != DDI_SUCCESS) {
1895      -                dev_err(nvme->n_dip, CE_WARN,
1896      -                    "!nvme_admin_cmd failed for GET FEATURES");
1897      -                return (ret);
1898      -        }
     1917 +        nvme_admin_cmd(cmd, nvme_admin_cmd_timeout);
1899 1918  
1900      -        if (nvme_check_cmd_status(cmd)) {
     1919 +        if ((ret = nvme_check_cmd_status(cmd)) != 0) {
1901 1920                  if (feature == NVME_FEAT_LBA_RANGE &&
1902 1921                      cmd->nc_cqe.cqe_sf.sf_sct == NVME_CQE_SCT_GENERIC &&
1903 1922                      cmd->nc_cqe.cqe_sf.sf_sc == NVME_CQE_SC_GEN_INV_FLD)
1904 1923                          nvme->n_lba_range_supported = B_FALSE;
1905 1924                  else
1906 1925                          dev_err(nvme->n_dip, CE_WARN,
1907 1926                              "!GET FEATURES %d failed with sct = %x, sc = %x",
1908 1927                              feature, cmd->nc_cqe.cqe_sf.sf_sct,
1909 1928                              cmd->nc_cqe.cqe_sf.sf_sc);
1910 1929                  goto fail;
1911 1930          }
1912 1931  
1913 1932          if (bufsize != NULL && *bufsize != 0) {
1914 1933                  ASSERT(buf != NULL);
1915 1934                  *buf = kmem_alloc(*bufsize, KM_SLEEP);
1916 1935                  bcopy(cmd->nc_dma->nd_memp, *buf, *bufsize);
1917 1936          }
1918 1937  
1919 1938          *res = cmd->nc_cqe.cqe_dw0;
1920      -        ret = B_TRUE;
1921 1939  
1922 1940  fail:
1923 1941          nvme_free_cmd(cmd);
1924 1942          return (ret);
1925 1943  }
1926 1944  
1927      -static boolean_t
     1945 +static int
1928 1946  nvme_write_cache_set(nvme_t *nvme, boolean_t enable)
1929 1947  {
1930 1948          nvme_write_cache_t nwc = { 0 };
1931 1949  
1932 1950          if (enable)
1933 1951                  nwc.b.wc_wce = 1;
1934 1952  
1935      -        if (!nvme_set_features(nvme, 0, NVME_FEAT_WRITE_CACHE, nwc.r, &nwc.r))
1936      -                return (B_FALSE);
1937      -
1938      -        return (B_TRUE);
     1953 +        return (nvme_set_features(nvme, 0, NVME_FEAT_WRITE_CACHE, nwc.r,
     1954 +            &nwc.r));
1939 1955  }
1940 1956  
1941 1957  static int
1942      -nvme_set_nqueues(nvme_t *nvme, uint16_t nqueues)
     1958 +nvme_set_nqueues(nvme_t *nvme, uint16_t *nqueues)
1943 1959  {
1944 1960          nvme_nqueues_t nq = { 0 };
     1961 +        int ret;
1945 1962  
1946      -        nq.b.nq_nsq = nq.b.nq_ncq = nqueues - 1;
     1963 +        nq.b.nq_nsq = nq.b.nq_ncq = *nqueues - 1;
1947 1964  
1948      -        if (!nvme_set_features(nvme, 0, NVME_FEAT_NQUEUES, nq.r, &nq.r)) {
1949      -                return (0);
     1965 +        ret = nvme_set_features(nvme, 0, NVME_FEAT_NQUEUES, nq.r, &nq.r);
     1966 +
     1967 +        if (ret == 0) {
     1968 +                /*
     1969 +                 * Always use the same number of submission and completion
     1970 +                 * queues, and never use more than the requested number of
     1971 +                 * queues.
     1972 +                 */
     1973 +                *nqueues = MIN(*nqueues, MIN(nq.b.nq_nsq, nq.b.nq_ncq) + 1);
1950 1974          }
1951 1975  
1952      -        /*
1953      -         * Always use the same number of submission and completion queues, and
1954      -         * never use more than the requested number of queues.
1955      -         */
1956      -        return (MIN(nqueues, MIN(nq.b.nq_nsq, nq.b.nq_ncq) + 1));
     1976 +        return (ret);
1957 1977  }
1958 1978  
1959 1979  static int
1960 1980  nvme_create_io_qpair(nvme_t *nvme, nvme_qpair_t *qp, uint16_t idx)
1961 1981  {
1962 1982          nvme_cmd_t *cmd = nvme_alloc_cmd(nvme, KM_SLEEP);
1963 1983          nvme_create_queue_dw10_t dw10 = { 0 };
1964 1984          nvme_create_cq_dw11_t c_dw11 = { 0 };
1965 1985          nvme_create_sq_dw11_t s_dw11 = { 0 };
     1986 +        int ret;
1966 1987  
1967 1988          dw10.b.q_qid = idx;
1968 1989          dw10.b.q_qsize = qp->nq_nentry - 1;
1969 1990  
1970 1991          c_dw11.b.cq_pc = 1;
1971 1992          c_dw11.b.cq_ien = 1;
1972 1993          c_dw11.b.cq_iv = idx % nvme->n_intr_cnt;
1973 1994  
1974 1995          cmd->nc_sqid = 0;
1975 1996          cmd->nc_callback = nvme_wakeup_cmd;
1976 1997          cmd->nc_sqe.sqe_opc = NVME_OPC_CREATE_CQUEUE;
1977 1998          cmd->nc_sqe.sqe_cdw10 = dw10.r;
1978 1999          cmd->nc_sqe.sqe_cdw11 = c_dw11.r;
1979 2000          cmd->nc_sqe.sqe_dptr.d_prp[0] = qp->nq_cqdma->nd_cookie.dmac_laddress;
1980 2001  
1981      -        if (nvme_admin_cmd(cmd, nvme_admin_cmd_timeout) != DDI_SUCCESS) {
1982      -                dev_err(nvme->n_dip, CE_WARN,
1983      -                    "!nvme_admin_cmd failed for CREATE CQUEUE");
1984      -                return (DDI_FAILURE);
1985      -        }
     2002 +        nvme_admin_cmd(cmd, nvme_admin_cmd_timeout);
1986 2003  
1987      -        if (nvme_check_cmd_status(cmd)) {
     2004 +        if ((ret = nvme_check_cmd_status(cmd)) != 0) {
1988 2005                  dev_err(nvme->n_dip, CE_WARN,
1989 2006                      "!CREATE CQUEUE failed with sct = %x, sc = %x",
1990 2007                      cmd->nc_cqe.cqe_sf.sf_sct, cmd->nc_cqe.cqe_sf.sf_sc);
1991      -                nvme_free_cmd(cmd);
1992      -                return (DDI_FAILURE);
     2008 +                goto fail;
1993 2009          }
1994 2010  
1995 2011          nvme_free_cmd(cmd);
1996 2012  
1997 2013          s_dw11.b.sq_pc = 1;
1998 2014          s_dw11.b.sq_cqid = idx;
1999 2015  
2000 2016          cmd = nvme_alloc_cmd(nvme, KM_SLEEP);
2001 2017          cmd->nc_sqid = 0;
2002 2018          cmd->nc_callback = nvme_wakeup_cmd;
2003 2019          cmd->nc_sqe.sqe_opc = NVME_OPC_CREATE_SQUEUE;
2004 2020          cmd->nc_sqe.sqe_cdw10 = dw10.r;
2005 2021          cmd->nc_sqe.sqe_cdw11 = s_dw11.r;
2006 2022          cmd->nc_sqe.sqe_dptr.d_prp[0] = qp->nq_sqdma->nd_cookie.dmac_laddress;
2007 2023  
2008      -        if (nvme_admin_cmd(cmd, nvme_admin_cmd_timeout) != DDI_SUCCESS) {
2009      -                dev_err(nvme->n_dip, CE_WARN,
2010      -                    "!nvme_admin_cmd failed for CREATE SQUEUE");
2011      -                return (DDI_FAILURE);
2012      -        }
     2024 +        nvme_admin_cmd(cmd, nvme_admin_cmd_timeout);
2013 2025  
2014      -        if (nvme_check_cmd_status(cmd)) {
     2026 +        if ((ret = nvme_check_cmd_status(cmd)) != 0) {
2015 2027                  dev_err(nvme->n_dip, CE_WARN,
2016 2028                      "!CREATE SQUEUE failed with sct = %x, sc = %x",
2017 2029                      cmd->nc_cqe.cqe_sf.sf_sct, cmd->nc_cqe.cqe_sf.sf_sc);
2018      -                nvme_free_cmd(cmd);
2019      -                return (DDI_FAILURE);
     2030 +                goto fail;
2020 2031          }
2021 2032  
     2033 +fail:
2022 2034          nvme_free_cmd(cmd);
2023 2035  
2024      -        return (DDI_SUCCESS);
     2036 +        return (ret);
2025 2037  }
2026 2038  
2027 2039  static boolean_t
2028 2040  nvme_reset(nvme_t *nvme, boolean_t quiesce)
2029 2041  {
2030 2042          nvme_reg_csts_t csts;
2031 2043          int i;
2032 2044  
2033 2045          nvme_put32(nvme, NVME_REG_CC, 0);
2034 2046  
↓ open down ↓ 72 lines elided ↑ open up ↑
2107 2119  }
2108 2120  
2109 2121  static int
2110 2122  nvme_init_ns(nvme_t *nvme, int nsid)
2111 2123  {
2112 2124          nvme_namespace_t *ns = &nvme->n_ns[nsid - 1];
2113 2125          nvme_identify_nsid_t *idns;
2114 2126          int last_rp;
2115 2127  
2116 2128          ns->ns_nvme = nvme;
2117      -        idns = nvme_identify(nvme, nsid);
2118 2129  
2119      -        if (idns == NULL) {
     2130 +        if (nvme_identify(nvme, nsid, (void **)&idns) != 0) {
2120 2131                  dev_err(nvme->n_dip, CE_WARN,
2121 2132                      "!failed to identify namespace %d", nsid);
2122 2133                  return (DDI_FAILURE);
2123 2134          }
2124 2135  
2125 2136          ns->ns_idns = idns;
2126 2137          ns->ns_id = nsid;
2127 2138          ns->ns_block_count = idns->id_nsize;
2128 2139          ns->ns_block_size =
2129 2140              1 << idns->id_lbaf[idns->id_flbas.lba_format].lbaf_lbads;
↓ open down ↓ 69 lines elided ↑ open up ↑
2199 2210  nvme_init(nvme_t *nvme)
2200 2211  {
2201 2212          nvme_reg_cc_t cc = { 0 };
2202 2213          nvme_reg_aqa_t aqa = { 0 };
2203 2214          nvme_reg_asq_t asq = { 0 };
2204 2215          nvme_reg_acq_t acq = { 0 };
2205 2216          nvme_reg_cap_t cap;
2206 2217          nvme_reg_vs_t vs;
2207 2218          nvme_reg_csts_t csts;
2208 2219          int i = 0;
2209      -        int nqueues;
     2220 +        uint16_t nqueues;
2210 2221          char model[sizeof (nvme->n_idctl->id_model) + 1];
2211 2222          char *vendor, *product;
2212 2223  
2213 2224          /* Check controller version */
2214 2225          vs.r = nvme_get32(nvme, NVME_REG_VS);
2215 2226          nvme->n_version.v_major = vs.b.vs_mjr;
2216 2227          nvme->n_version.v_minor = vs.b.vs_mnr;
2217 2228          dev_err(nvme->n_dip, CE_CONT, "?NVMe spec version %d.%d",
2218 2229              nvme->n_version.v_major, nvme->n_version.v_minor);
2219 2230  
↓ open down ↓ 144 lines elided ↑ open up ↑
2364 2375          }
2365 2376  
2366 2377          /*
2367 2378           * Post an asynchronous event command to catch errors.
2368 2379           */
2369 2380          nvme_async_event(nvme);
2370 2381  
2371 2382          /*
2372 2383           * Identify Controller
2373 2384           */
2374      -        nvme->n_idctl = nvme_identify(nvme, 0);
2375      -        if (nvme->n_idctl == NULL) {
     2385 +        if (nvme_identify(nvme, 0, (void **)&nvme->n_idctl) != 0) {
2376 2386                  dev_err(nvme->n_dip, CE_WARN,
2377 2387                      "!failed to identify controller");
2378 2388                  goto fail;
2379 2389          }
2380 2390  
2381 2391          /*
2382 2392           * Get Vendor & Product ID
2383 2393           */
2384 2394          bcopy(nvme->n_idctl->id_model, model, sizeof (nvme->n_idctl->id_model));
2385 2395          model[sizeof (nvme->n_idctl->id_model)] = '\0';
↓ open down ↓ 68 lines elided ↑ open up ↑
2454 2464           */
2455 2465          nvme->n_write_cache_present =
2456 2466              nvme->n_idctl->id_vwc.vwc_present == 0 ? B_FALSE : B_TRUE;
2457 2467  
2458 2468          (void) ddi_prop_update_int(DDI_DEV_T_NONE, nvme->n_dip,
2459 2469              "volatile-write-cache-present",
2460 2470              nvme->n_write_cache_present ? 1 : 0);
2461 2471  
2462 2472          if (!nvme->n_write_cache_present) {
2463 2473                  nvme->n_write_cache_enabled = B_FALSE;
2464      -        } else if (!nvme_write_cache_set(nvme, nvme->n_write_cache_enabled)) {
     2474 +        } else if (nvme_write_cache_set(nvme, nvme->n_write_cache_enabled)
     2475 +            != 0) {
2465 2476                  dev_err(nvme->n_dip, CE_WARN,
2466 2477                      "!failed to %sable volatile write cache",
2467 2478                      nvme->n_write_cache_enabled ? "en" : "dis");
2468 2479                  /*
2469 2480                   * Assume the cache is (still) enabled.
2470 2481                   */
2471 2482                  nvme->n_write_cache_enabled = B_TRUE;
2472 2483          }
2473 2484  
2474 2485          (void) ddi_prop_update_int(DDI_DEV_T_NONE, nvme->n_dip,
↓ open down ↓ 51 lines elided ↑ open up ↑
2526 2537                              "!failed to setup MSI/MSI-X interrupts");
2527 2538                          goto fail;
2528 2539                  }
2529 2540          }
2530 2541  
2531 2542          nqueues = nvme->n_intr_cnt;
2532 2543  
2533 2544          /*
2534 2545           * Create I/O queue pairs.
2535 2546           */
2536      -        nvme->n_ioq_count = nvme_set_nqueues(nvme, nqueues);
2537      -        if (nvme->n_ioq_count == 0) {
     2547 +
     2548 +        if (nvme_set_nqueues(nvme, &nqueues) != 0) {
2538 2549                  dev_err(nvme->n_dip, CE_WARN,
2539      -                    "!failed to set number of I/O queues to %d", nqueues);
     2550 +                    "!failed to set number of I/O queues to %d",
     2551 +                    nvme->n_intr_cnt);
2540 2552                  goto fail;
2541 2553          }
2542 2554  
2543 2555          /*
2544 2556           * Reallocate I/O queue array
2545 2557           */
2546 2558          kmem_free(nvme->n_ioq, sizeof (nvme_qpair_t *));
2547 2559          nvme->n_ioq = kmem_zalloc(sizeof (nvme_qpair_t *) *
2548      -            (nvme->n_ioq_count + 1), KM_SLEEP);
     2560 +            (nqueues + 1), KM_SLEEP);
2549 2561          nvme->n_ioq[0] = nvme->n_adminq;
2550 2562  
     2563 +        nvme->n_ioq_count = nqueues;
     2564 +
2551 2565          /*
2552 2566           * If we got less queues than we asked for we might as well give
2553 2567           * some of the interrupt vectors back to the system.
2554 2568           */
2555      -        if (nvme->n_ioq_count < nqueues) {
     2569 +        if (nvme->n_ioq_count < nvme->n_intr_cnt) {
2556 2570                  nvme_release_interrupts(nvme);
2557 2571  
2558 2572                  if (nvme_setup_interrupts(nvme, nvme->n_intr_type,
2559 2573                      nvme->n_ioq_count) != DDI_SUCCESS) {
2560 2574                          dev_err(nvme->n_dip, CE_WARN,
2561 2575                              "!failed to reduce number of interrupts");
2562 2576                          goto fail;
2563 2577                  }
2564 2578          }
2565 2579  
↓ open down ↓ 6 lines elided ↑ open up ↑
2572 2586              nvme->n_io_queue_len);
2573 2587  
2574 2588          for (i = 1; i != nvme->n_ioq_count + 1; i++) {
2575 2589                  if (nvme_alloc_qpair(nvme, nvme->n_io_queue_len,
2576 2590                      &nvme->n_ioq[i], i) != DDI_SUCCESS) {
2577 2591                          dev_err(nvme->n_dip, CE_WARN,
2578 2592                              "!unable to allocate I/O qpair %d", i);
2579 2593                          goto fail;
2580 2594                  }
2581 2595  
2582      -                if (nvme_create_io_qpair(nvme, nvme->n_ioq[i], i)
2583      -                    != DDI_SUCCESS) {
     2596 +                if (nvme_create_io_qpair(nvme, nvme->n_ioq[i], i) != 0) {
2584 2597                          dev_err(nvme->n_dip, CE_WARN,
2585 2598                              "!unable to create I/O qpair %d", i);
2586 2599                          goto fail;
2587 2600                  }
2588 2601          }
2589 2602  
2590 2603          /*
2591 2604           * Post more asynchronous events commands to reduce event reporting
2592 2605           * latency as suggested by the spec.
2593 2606           */
↓ open down ↓ 13 lines elided ↑ open up ↑
2607 2620          /*LINTED: E_PTR_BAD_CAST_ALIGN*/
2608 2621          nvme_t *nvme = (nvme_t *)arg1;
2609 2622          int inum = (int)(uintptr_t)arg2;
2610 2623          int ccnt = 0;
2611 2624          int qnum;
2612 2625          nvme_cmd_t *cmd;
2613 2626  
2614 2627          if (inum >= nvme->n_intr_cnt)
2615 2628                  return (DDI_INTR_UNCLAIMED);
2616 2629  
     2630 +        if (nvme->n_dead)
     2631 +                return (nvme->n_intr_type == DDI_INTR_TYPE_FIXED ?
     2632 +                    DDI_INTR_UNCLAIMED : DDI_INTR_CLAIMED);
     2633 +
2617 2634          /*
2618 2635           * The interrupt vector a queue uses is calculated as queue_idx %
2619 2636           * intr_cnt in nvme_create_io_qpair(). Iterate through the queue array
2620 2637           * in steps of n_intr_cnt to process all queues using this vector.
2621 2638           */
2622 2639          for (qnum = inum;
2623 2640              qnum < nvme->n_ioq_count + 1 && nvme->n_ioq[qnum] != NULL;
2624 2641              qnum += nvme->n_intr_cnt) {
2625 2642                  while ((cmd = nvme_retrieve_cmd(nvme, nvme->n_ioq[qnum]))) {
2626 2643                          taskq_dispatch_ent((taskq_t *)cmd->nc_nvme->n_cmd_taskq,
↓ open down ↓ 741 lines elided ↑ open up ↑
3368 3385  
3369 3386          if (otyp != OTYP_CHR)
3370 3387                  return (EINVAL);
3371 3388  
3372 3389          if (nvme == NULL)
3373 3390                  return (ENXIO);
3374 3391  
3375 3392          if (nsid > nvme->n_namespace_count)
3376 3393                  return (ENXIO);
3377 3394  
     3395 +        if (nvme->n_dead)
     3396 +                return (EIO);
     3397 +
3378 3398          nm = nsid == 0 ? &nvme->n_minor : &nvme->n_ns[nsid - 1].ns_minor;
3379 3399  
3380 3400          mutex_enter(&nm->nm_mutex);
3381 3401          if (nm->nm_oexcl) {
3382 3402                  rv = EBUSY;
3383 3403                  goto out;
3384 3404          }
3385 3405  
3386 3406          if (flag & FEXCL) {
3387 3407                  if (nm->nm_ocnt != 0) {
↓ open down ↓ 52 lines elided ↑ open up ↑
3440 3460          _NOTE(ARGUNUSED(cred_p));
3441 3461          int rv = 0;
3442 3462          void *idctl;
3443 3463  
3444 3464          if ((mode & FREAD) == 0)
3445 3465                  return (EPERM);
3446 3466  
3447 3467          if (nioc->n_len < NVME_IDENTIFY_BUFSIZE)
3448 3468                  return (EINVAL);
3449 3469  
3450      -        idctl = nvme_identify(nvme, nsid);
3451      -        if (idctl == NULL)
3452      -                return (EIO);
     3470 +        if ((rv = nvme_identify(nvme, nsid, (void **)&idctl)) != 0)
     3471 +                return (rv);
3453 3472  
3454 3473          if (ddi_copyout(idctl, (void *)nioc->n_buf, NVME_IDENTIFY_BUFSIZE, mode)
3455 3474              != 0)
3456 3475                  rv = EFAULT;
3457 3476  
3458 3477          kmem_free(idctl, NVME_IDENTIFY_BUFSIZE);
3459 3478  
3460 3479          return (rv);
3461 3480  }
3462 3481  
↓ open down ↓ 146 lines elided ↑ open up ↑
3609 3628  
3610 3629                  if (!nvme->n_auto_pst_supported)
3611 3630                          return (EINVAL);
3612 3631  
3613 3632                  break;
3614 3633  
3615 3634          default:
3616 3635                  return (EINVAL);
3617 3636          }
3618 3637  
3619      -        if (nvme_get_features(nvme, nsid, feature, &res, &buf, &bufsize) ==
3620      -            B_FALSE)
3621      -                return (EIO);
     3638 +        rv = nvme_get_features(nvme, nsid, feature, &res, &buf, &bufsize);
     3639 +        if (rv != 0)
     3640 +                return (rv);
3622 3641  
3623 3642          if (nioc->n_len < bufsize) {
3624 3643                  kmem_free(buf, bufsize);
3625 3644                  return (EINVAL);
3626 3645          }
3627 3646  
3628 3647          if (buf && ddi_copyout(buf, (void*)nioc->n_buf, bufsize, mode) != 0)
3629 3648                  rv = EFAULT;
3630 3649  
3631 3650          kmem_free(buf, bufsize);
↓ open down ↓ 196 lines elided ↑ open up ↑
3828 3847          case DDI_MODEL_NONE:
3829 3848  #endif
3830 3849                  if (ddi_copyin((void*)arg, &nioc, sizeof (nvme_ioctl_t), mode)
3831 3850                      != 0)
3832 3851                          return (EFAULT);
3833 3852  #ifdef _MULTI_DATAMODEL
3834 3853                  break;
3835 3854          }
3836 3855  #endif
3837 3856  
     3857 +        if (nvme->n_dead && cmd != NVME_IOC_DETACH)
     3858 +                return (EIO);
     3859 +
     3860 +
3838 3861          if (cmd == NVME_IOC_IDENTIFY_CTRL) {
3839 3862                  /*
3840 3863                   * This makes NVME_IOC_IDENTIFY_CTRL work the same on devctl and
3841 3864                   * attachment point nodes.
3842 3865                   */
3843 3866                  nsid = 0;
3844 3867          } else if (cmd == NVME_IOC_IDENTIFY_NSID && nsid == 0) {
3845 3868                  /*
3846 3869                   * This makes NVME_IOC_IDENTIFY_NSID work on a devctl node, it
3847 3870                   * will always return identify data for namespace 1.
↓ open down ↓ 36 lines elided ↑ open up ↑
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX