aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/scsi/scsi_lib.c
AgeCommit message (Collapse)AuthorFilesLines
2018-02-04Merge tag 'for-linus-20180204' of git://git.kernel.dk/linux-blockLinus Torvalds1-3/+3
Pull more block updates from Jens Axboe: "Most of this is fixes and not new code/features: - skd fix from Arnd, fixing a build error dependent on sla allocator type. - blk-mq scheduler discard merging fixes, one from me and one from Keith. This fixes a segment miscalculation for blk-mq-sched, where we mistakenly think two segments are physically contigious even though the request isn't carrying real data. Also fixes a bio-to-rq merge case. - Don't re-set a bit on the buffer_head flags, if it's already set. This can cause scalability concerns on bigger machines and workloads. From Kemi Wang. - Add BLK_STS_DEV_RESOURCE return value to blk-mq, allowing us to distuingish between a local (device related) resource starvation and a global one. The latter might happen without IO being in flight, so it has to be handled a bit differently. From Ming" * tag 'for-linus-20180204' of git://git.kernel.dk/linux-block: block: skd: fix incorrect linux/slab_def.h inclusion buffer: Avoid setting buffer bits that are already set blk-mq-sched: Enable merging discard bio into request blk-mq: fix discard merge with scheduler attached blk-mq: introduce BLK_STS_DEV_RESOURCE
2018-02-03Merge tag 'usercopy-v4.16-rc1' of ↵Linus Torvalds1-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardened usercopy whitelisting from Kees Cook: "Currently, hardened usercopy performs dynamic bounds checking on slab cache objects. This is good, but still leaves a lot of kernel memory available to be copied to/from userspace in the face of bugs. To further restrict what memory is available for copying, this creates a way to whitelist specific areas of a given slab cache object for copying to/from userspace, allowing much finer granularity of access control. Slab caches that are never exposed to userspace can declare no whitelist for their objects, thereby keeping them unavailable to userspace via dynamic copy operations. (Note, an implicit form of whitelisting is the use of constant sizes in usercopy operations and get_user()/put_user(); these bypass all hardened usercopy checks since these sizes cannot change at runtime.) This new check is WARN-by-default, so any mistakes can be found over the next several releases without breaking anyone's system. The series has roughly the following sections: - remove %p and improve reporting with offset - prepare infrastructure and whitelist kmalloc - update VFS subsystem with whitelists - update SCSI subsystem with whitelists - update network subsystem with whitelists - update process memory with whitelists - update per-architecture thread_struct with whitelists - update KVM with whitelists and fix ioctl bug - mark all other allocations as not whitelisted - update lkdtm for more sensible test overage" * tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits) lkdtm: Update usercopy tests for whitelisting usercopy: Restrict non-usercopy caches to size 0 kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl kvm: whitelist struct kvm_vcpu_arch arm: Implement thread_struct whitelist for hardened usercopy arm64: Implement thread_struct whitelist for hardened usercopy x86: Implement thread_struct whitelist for hardened usercopy fork: Provide usercopy whitelisting for task_struct fork: Define usercopy region in thread_stack slab caches fork: Define usercopy region in mm_struct slab caches net: Restrict unwhitelisted proto caches to size 0 sctp: Copy struct sctp_sock.autoclose to userspace using put_user() sctp: Define usercopy region in SCTP proto slab cache caif: Define usercopy region in caif proto slab cache ip: Define usercopy region in IP proto slab cache net: Define usercopy region in struct proto slab cache scsi: Define usercopy region in scsi_sense_cache slab cache cifs: Define usercopy region in cifs_request slab cache vxfs: Define usercopy region in vxfs_inode slab cache ufs: Define usercopy region in ufs_inode_cache slab cache ...
2018-01-31Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-17/+33
Pull SCSI updates from James Bottomley: "This is mostly updates of the usual driver suspects: arcmsr, scsi_debug, mpt3sas, lpfc, cxlflash, qla2xxx, aacraid, megaraid_sas, hisi_sas. We also have a rework of the libsas hotplug handling to make it more robust, a slew of 32 bit time conversions and fixes, and a host of the usual minor updates and style changes. The biggest potential for regressions is the libsas hotplug changes, but so far they seem stable under testing" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (313 commits) scsi: qla2xxx: Fix logo flag for qlt_free_session_done() scsi: arcmsr: avoid do_gettimeofday scsi: core: Add VENDOR_SPECIFIC sense code definitions scsi: qedi: Drop cqe response during connection recovery scsi: fas216: fix sense buffer initialization scsi: ibmvfc: Remove unneeded semicolons scsi: hisi_sas: fix a bug in hisi_sas_dev_gone() scsi: hisi_sas: directly attached disk LED feature for v2 hw scsi: hisi_sas: devicetree: bindings: add LED feature for v2 hw scsi: megaraid_sas: NVMe passthrough command support scsi: megaraid: use ktime_get_real for firmware time scsi: fnic: use 64-bit timestamps scsi: qedf: Fix error return code in __qedf_probe() scsi: devinfo: fix format of the device list scsi: qla2xxx: Update driver version to 10.00.00.05-k scsi: qla2xxx: Add XCB counters to debugfs scsi: qla2xxx: Fix queue ID for async abort with Multiqueue scsi: qla2xxx: Fix warning for code intentation in __qla24xx_handle_gpdb_event() scsi: qla2xxx: Fix warning during port_name debug print scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout() ...
2018-01-30blk-mq: introduce BLK_STS_DEV_RESOURCEMing Lei1-3/+3
This status is returned from driver to block layer if device related resource is unavailable, but driver can guarantee that IO dispatch will be triggered in future when the resource is available. Convert some drivers to return BLK_STS_DEV_RESOURCE. Also, if driver returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls. BLK_MQ_DELAY_QUEUE is 3 ms because both scsi-mq and nvmefc are using that magic value. If a driver can make sure there is in-flight IO, it is safe to return BLK_STS_DEV_RESOURCE because: 1) If all in-flight IOs complete before examining SCHED_RESTART in blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue is run immediately in this case by blk_mq_dispatch_rq_list(); 2) if there is any in-flight IO after/when examining SCHED_RESTART in blk_mq_dispatch_rq_list(): - if SCHED_RESTART isn't set, queue is run immediately as handled in 1) - otherwise, this request will be dispatched after any in-flight IO is completed via blk_mq_sched_restart() 3) if SCHED_RESTART is set concurently in context because of BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two cases and make sure IO hang can be avoided. One invariant is that queue will be rerun if SCHED_RESTART is set. Suggested-by: Jens Axboe <axboe@kernel.dk> Tested-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-15scsi: Define usercopy region in scsi_sense_cache slab cacheDavid Windsor1-4/+5
SCSI sense buffers, stored in struct scsi_cmnd.sense and therefore contained in the scsi_sense_cache slab cache, need to be copied to/from userspace. cache object allocation: drivers/scsi/scsi_lib.c: scsi_select_sense_cache(...): return ... ? scsi_sense_isadma_cache : scsi_sense_cache scsi_alloc_sense_buffer(...): return kmem_cache_alloc_node(scsi_select_sense_cache(), ...); scsi_init_request(...): ... cmd->sense_buffer = scsi_alloc_sense_buffer(...); ... cmd->req.sense = cmd->sense_buffer example usage trace: block/scsi_ioctl.c: (inline from sg_io) blk_complete_sghdr_rq(...): struct scsi_request *req = scsi_req(rq); ... copy_to_user(..., req->sense, len) scsi_cmd_ioctl(...): sg_io(...); In support of usercopy hardening, this patch defines a region in the scsi_sense_cache slab cache in which userspace copy operations are allowed. This region is known as the slab cache's usercopy region. Slab caches can now check that each dynamically sized copy operation involving cache-managed memory falls entirely within the slab's usercopy region. Signed-off-by: David Windsor <dave@nullcore.net> [kees: adjust commit log, provide usage trace] Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2018-01-10scsi: core: Change third __scsi_queue_insert() argument from int to boolBart Van Assche1-4/+4
This patch does not change any functionality but makes the SCSI core source code slightly easier to read. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-12-07scsi: core: Unexport scsi_initialize_rq()Bart Van Assche1-2/+1
Commit 651a01364994 ("scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough") removed the only call to scsi_initialize_rq() from outside the SCSI core. Hence unexport scsi_initialize_rq(). Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-12-07scsi: core: Ensure that the SCSI error handler gets woken upBart Van Assche1-11/+28
If scsi_eh_scmd_add() is called concurrently with scsi_host_queue_ready() while shost->host_blocked > 0 then it can happen that neither function wakes up the SCSI error handler. Fix this by making every function that decreases the host_busy counter wake up the error handler if necessary and by protecting the host_failed checks with the SCSI host lock. Reported-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> References: https://marc.info/?l=linux-kernel&m=150461610630736 Fixes: commit 746650160866 ("scsi: convert host_busy to atomic_t") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com> Cc: Konstantin Khorenko <khorenko@virtuozzo.com> Cc: Stuart Hayes <stuart.w.hayes@gmail.com> Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-12-07scsi: core: run queue if SCSI device queue isn't ready and queue is idleMing Lei1-0/+2
Before commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq"), we run queue after 3ms if queue is idle and SCSI device queue isn't ready, which is done in handling BLK_STS_RESOURCE. After commit 0df21c86bdbf is introduced, queue won't be run any more under this situation. IO hang is observed when timeout happened, and this patch fixes the IO hang issue by running queue after delay in scsi_dev_queue_ready, just like non-mq. This issue can be triggered by the following script[1]. There is another issue which can be covered by running idle queue: when .get_budget() is called on request coming from hctx->dispatch_list, if one request just completes during .get_budget(), we can't depend on SCSI's restart to make progress any more. This patch fixes the race too. With this patch, we basically recover to previous behaviour (before commit 0df21c86bdbf) of handling idle queue when running out of resource. [1] script for test/verify SCSI timeout rmmod scsi_debug modprobe scsi_debug max_queue=1 DEVICE=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename` DISK_DIR=`ls -d /sys/block/$DEVICE/device/scsi_disk/*` echo "using scsi device $DEVICE" echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth echo "temporary write through" >$DISK_DIR/cache_type echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts echo none > /sys/block/$DEVICE/queue/scheduler dd if=/dev/$DEVICE of=/dev/null bs=1M iflag=direct count=1 & sleep 5 echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts wait echo "SUCCESS" Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq") Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-11-21scsi: use dma_get_cache_alignment() as minimum DMA alignmentHuacai Chen1-4/+6
In non-coherent DMA mode, kernel uses cache flushing operations to maintain I/O coherency, so scsi's block queue should be aligned to the value returned by dma_get_cache_alignment(). Otherwise, If a DMA buffer and a kernel structure share a same cache line, and if the kernel structure has dirty data, cache_invalidate (no writeback) will cause data corruption. Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhc@lemote.com> [hch: rebased and updated the comment and changelog] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-11-14Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-1/+8
Pull SCSI updates from James Bottomley: "This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas, megaraid_sas, pm80xx, mpt3sas, be2iscsi, hpsa. and a host of minor updates. There's no major behaviour change or additions to the core in all of this, so the potential for regressions should be small (biggest potential being in the scsi error handler changes)" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (203 commits) scsi: lpfc: Fix hard lock up NMI in els timeout handling. scsi: mpt3sas: remove a stray KERN_INFO scsi: mpt3sas: cleanup _scsih_pcie_enumeration_event() scsi: aacraid: use timespec64 instead of timeval scsi: scsi_transport_fc: add 64GBIT and 128GBIT port speed definitions scsi: qla2xxx: Suppress a kernel complaint in qla_init_base_qpair() scsi: mpt3sas: fix dma_addr_t casts scsi: be2iscsi: Use kasprintf scsi: storvsc: Avoid excessive host scan on controller change scsi: lpfc: fix kzalloc-simple.cocci warnings scsi: mpt3sas: Update mpt3sas driver version. scsi: mpt3sas: Fix sparse warnings scsi: mpt3sas: Fix nvme drives checking for tlr. scsi: mpt3sas: NVMe drive support for BTDHMAPPING ioctl command and log info scsi: mpt3sas: Add-Task-management-debug-info-for-NVMe-drives. scsi: mpt3sas: scan and add nvme device after controller reset scsi: mpt3sas: Set NVMe device queue depth as 128 scsi: mpt3sas: Handle NVMe PCIe device related events generated from firmware. scsi: mpt3sas: API's to remove nvme drive from sml scsi: mpt3sas: API 's to support NVMe drive addition to SML ...
2017-11-14Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-blockLinus Torvalds1-30/+69
Pull core block layer updates from Jens Axboe: "This is the main pull request for block storage for 4.15-rc1. Nothing out of the ordinary in here, and no API changes or anything like that. Just various new features for drivers, core changes, etc. In particular, this pull request contains: - A patch series from Bart, closing the whole on blk/scsi-mq queue quescing. - A series from Christoph, building towards hidden gendisks (for multipath) and ability to move bio chains around. - NVMe - Support for native multipath for NVMe (Christoph). - Userspace notifications for AENs (Keith). - Command side-effects support (Keith). - SGL support (Chaitanya Kulkarni) - FC fixes and improvements (James Smart) - Lots of fixes and tweaks (Various) - bcache - New maintainer (Michael Lyle) - Writeback control improvements (Michael) - Various fixes (Coly, Elena, Eric, Liang, et al) - lightnvm updates, mostly centered around the pblk interface (Javier, Hans, and Rakesh). - Removal of unused bio/bvec kmap atomic interfaces (me, Christoph) - Writeback series that fix the much discussed hundreds of millions of sync-all units. This goes all the way, as discussed previously (me). - Fix for missing wakeup on writeback timer adjustments (Yafang Shao). - Fix laptop mode on blk-mq (me). - {mq,name} tupple lookup for IO schedulers, allowing us to have alias names. This means you can use 'deadline' on both !mq and on mq (where it's called mq-deadline). (me). - blktrace race fix, oopsing on sg load (me). - blk-mq optimizations (me). - Obscure waitqueue race fix for kyber (Omar). - NBD fixes (Josef). - Disable writeback throttling by default on bfq, like we do on cfq (Luca Miccio). - Series from Ming that enable us to treat flush requests on blk-mq like any other request. This is a really nice cleanup. - Series from Ming that improves merging on blk-mq with schedulers, getting us closer to flipping the switch on scsi-mq again. - BFQ updates (Paolo). - blk-mq atomic flags memory ordering fixes (Peter Z). - Loop cgroup support (Shaohua). - Lots of minor fixes from lots of different folks, both for core and driver code" * 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits) nvme: fix visibility of "uuid" ns attribute blk-mq: fixup some comment typos and lengths ide: ide-atapi: fix compile error with defining macro DEBUG blk-mq: improve tag waiting setup for non-shared tags brd: remove unused brd_mutex blk-mq: only run the hardware queue if IO is pending block: avoid null pointer dereference on null disk fs: guard_bio_eod() needs to consider partitions xtensa/simdisk: fix compile error nvme: expose subsys attribute to sysfs nvme: create 'slaves' and 'holders' entries for hidden controllers block: create 'slaves' and 'holders' entries for hidden gendisks nvme: also expose the namespace identification sysfs files for mpath nodes nvme: implement multipath access to nvme subsystems nvme: track shared namespaces nvme: introduce a nvme_ns_ids structure nvme: track subsystems block, nvme: Introduce blk_mq_req_flags_t block, scsi: Make SCSI quiesce and resume work reliably block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag ...
2017-11-10block, scsi: Make SCSI quiesce and resume work reliablyBart Van Assche1-12/+30
The contexts from which a SCSI device can be quiesced or resumed are: * Writing into /sys/class/scsi_device/*/device/state. * SCSI parallel (SPI) domain validation. * The SCSI device power management methods. See also scsi_bus_pm_ops. It is essential during suspend and resume that neither the filesystem state nor the filesystem metadata in RAM changes. This is why while the hibernation image is being written or restored that SCSI devices are quiesced. The SCSI core quiesces devices through scsi_device_quiesce() and scsi_device_resume(). In the SDEV_QUIESCE state execution of non-preempt requests is deferred. This is realized by returning BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI devices. Avoid that a full queue prevents power management requests to be submitted by deferring allocation of non-preempt requests for devices in the quiesced state. This patch has been tested by running the following commands and by verifying that after each resume the fio job was still running: for ((i=0; i<10; i++)); do ( cd /sys/block/md0/md && while true; do [ "$(<sync_action)" = "idle" ] && echo check > sync_action sleep 1 done ) & pids=($!) for d in /sys/class/block/sd*[a-z]; do bdev=${d#/sys/class/block/} hcil=$(readlink "$d/device") hcil=${hcil#../../../} echo 4 > "$d/queue/nr_requests" echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth" fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 \ --rw=randread --ioengine=libaio --numjobs=4 --iodepth=16 \ --iodepth_batch=1 --thread --loops=$((2**31)) & pids+=($!) done sleep 1 echo "$(date) Hibernating ..." >>hibernate-test-log.txt systemctl hibernate sleep 10 kill "${pids[@]}" echo idle > /sys/block/md0/md/sync_action wait echo "$(date) Done." >>hibernate-test-log.txt done Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> References: "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348). Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Martin Steigerwald <martin@lichtvoll.de> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-10ide, scsi: Tell the block layer at request allocation time about preempt ↵Bart Van Assche1-3/+3
requests Convert blk_get_request(q, op, __GFP_RECLAIM) into blk_get_request_flags(q, op, BLK_MQ_PREEMPT). This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Tested-by: Martin Steigerwald <martin@lichtvoll.de> Acked-by: David S. Miller <davem@davemloft.net> [ for IDE ] Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-07Revert "scsi: make 'state' device attribute pollable"Linus Torvalds1-3/+0
This reverts commit 8a97712e5314aefe16b3ffb4583a34deaa49de04. This commit added a call to sysfs_notify() from within scsi_device_set_state(), which in turn turns out to make libata very unhappy, because ata_eh_detach_dev() does spin_lock_irqsave(ap->lock, flags); .. if (ata_scsi_offline_dev(dev)) { dev->flags |= ATA_DFLAG_DETACHED; ap->pflags |= ATA_PFLAG_SCSI_HOTPLUG; } and ata_scsi_offline_dev() then does that scsi_device_set_state() to set it offline. So now we called sysfs_notify() from within a spinlocked region, which really doesn't work. The 0day robot reported this as: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238 because sysfs_notify() ends up calling kernfs_find_and_get_ns() which then does mutex_lock(&kernfs_mutex).. The pollability of the device state isn't critical, so revert this all for now, and maybe we'll do it differently in the future. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-04blk-mq: don't handle failure in .get_budgetMing Lei1-8/+3
It is enough to just check if we can get the budget via .get_budget(). And we don't need to deal with device state change in .get_budget(). For SCSI, one issue to be fixed is that we have to call scsi_mq_uninit_cmd() to free allocated ressources if SCSI device fails to handle the request. And it isn't enough to simply call blk_mq_end_request() to do that if this request is marked as RQF_DONTPREP. Fixes: 0df21c86bdbf(scsi: implement .get_budget and .put_budget for blk-mq) Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-04SCSI: don't get target/host busy_count in scsi_mq_get_budget()Ming Lei1-16/+13
It is very expensive to atomic_inc/atomic_dec the host wide counter of host->busy_count, and it should have been avoided via blk-mq's mechanism of getting driver tag, which uses the more efficient way of sbitmap queue. Also we don't check atomic_read(&sdev->device_busy) in scsi_mq_get_budget() and don't run queue if the counter becomes zero, so IO hang may be caused if all requests are completed just before the current SCSI device is added to shost->starved_list. Fixes: 0df21c86bdbf(scsi: implement .get_budget and .put_budget for blk-mq) Reported-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-01scsi: implement .get_budget and .put_budget for blk-mqMing Lei1-23/+52
We need to tell blk-mq to reserve resources before queuing one request, so implement these two callbacks. Then blk-mq can avoid to dequeue request too early, and IO merging can be improved a lot. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-01scsi: allow passing in null rq to scsi_prep_state_check()Ming Lei1-2/+2
In the following patch, we will implement scsi_get_budget() which need to call scsi_prep_state_check() when rq isn't dequeued yet. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-23scsi: Suppress a kernel warning in case the prep function returns BLKPREP_DEFERBart Van Assche1-7/+1
The legacy block layer handles requests as follows: - If the prep function returns BLKPREP_OK, let blk_peek_request() return the pointer to that request. - If the prep function returns BLKPREP_DEFER, keep the RQF_STARTED flag and retry calling the prep function later. - If the prep function returns BLKPREP_KILL or BLKPREP_INVALID, end the request. In none of these cases it is correct to clear the SCMD_INITIALIZED flag from inside scsi_prep_fn(). Since scsi_prep_fn() already guarantees that scsi_init_command() will be called once even if scsi_prep_fn() is called multiple times, remove the code that clears SCMD_INITIALIZED from scsi_prep_fn(). The scsi-mq code handles requests as follows: - If scsi_mq_prep_fn() returns BLKPREP_OK, set the RQF_DONTPREP flag and submit the request to the SCSI LLD. - If scsi_mq_prep_fn() returns BLKPREP_DEFER, call blk_mq_delay_run_hw_queue() and return BLK_STS_RESOURCE. - If the prep function returns BLKPREP_KILL or BLKPREP_INVALID, call scsi_mq_uninit_cmd() and let the blk-mq core end the request. In none of these cases scsi_mq_prep_fn() should clear the SCMD_INITIALIZED flag. Hence remove the code from scsi_mq_prep_fn() function that clears that flag. This patch avoids that the following warning is triggered when using the legacy block layer: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 4198 at drivers/scsi/scsi_lib.c:654 scsi_end_request+0x1de/0x220 CPU: 1 PID: 4198 Comm: mkfs.f2fs Not tainted 4.14.0-rc5+ #1 task: ffff91c147a4b800 task.stack: ffffb282c37b8000 RIP: 0010:scsi_end_request+0x1de/0x220 Call Trace: <IRQ> scsi_io_completion+0x204/0x5e0 scsi_finish_command+0xce/0xe0 scsi_softirq_done+0x126/0x130 blk_done_softirq+0x6e/0x80 __do_softirq+0xcf/0x2a8 irq_exit+0xab/0xb0 do_IRQ+0x7b/0xc0 common_interrupt+0x90/0x90 </IRQ> RIP: 0010:_raw_spin_unlock_irqrestore+0x9/0x10 __test_set_page_writeback+0xc7/0x2c0 __block_write_full_page+0x158/0x3b0 block_write_full_page+0xc4/0xd0 blkdev_writepage+0x13/0x20 __writepage+0x12/0x40 write_cache_pages+0x204/0x500 generic_writepages+0x48/0x70 blkdev_writepages+0x9/0x10 do_writepages+0x34/0xc0 __filemap_fdatawrite_range+0x6c/0x90 file_write_and_wait_range+0x31/0x90 blkdev_fsync+0x16/0x40 vfs_fsync_range+0x44/0xa0 do_fsync+0x38/0x60 SyS_fsync+0xb/0x10 entry_SYSCALL_64_fastpath+0x13/0x94 ---[ end trace 86e8ef85a4a6c1d1 ]--- Fixes: commit 64104f703212 ("scsi: Call scsi_initialize_rq() for filesystem requests") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-10-18scsi: scsi_error: Handle power-on reset unit attentionHannes Reinecke1-0/+4
As per SAM there is a status precedence, with any sense code 29/XX taking second place just after an ACA ACTIVE status. Additionally, each target might prefer to not queue any unit attention conditions, but just report one. Due to the above, this will be that one with the highest precedence. This results in the sense code 29/XX effectively overwriting any other unit attention. Hence we should report the power-on reset to userland so that it can take appropriate action. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-10-16scsi: sd_zbc: Fix comments and indentationDamien Le Moal1-1/+4
Fix comments style (use kernel-doc style) and content to clarify some functions. Also fix some functions signature indentation and remove a useless blank line in sd_zbc_read_zones(). No functional change is introduced by this patch. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-31scsi: scsi-mq: Always unprepare before requeuing a requestBart Van Assche1-2/+8
One of the two scsi-mq functions that requeue a request unprepares a request before requeueing (scsi_io_completion()) but the other function not (__scsi_queue_insert()). Make sure that a request is unprepared before requeuing it. Fixes: commit d285203cf647 ("scsi: add support for a blk-mq based I/O path.") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-31scsi: Improve requeuing behaviorBart Van Assche1-2/+13
Requests are unprepared and reprepared when being requeued. Avoid that requeuing resets .jiffies_at_alloc and .retries by initializing these two member variables from inside scsi_initialize_rq() and by preserving both member variables when preparing a request. This patch affects the requeuing behavior of both the legacy scsi and the scsi-mq code paths. Reported-by: Brian King <brking@linux.vnet.ibm.com> References: https://lkml.org/lkml/2017/8/18/923 ("Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-31scsi: Call scsi_initialize_rq() for filesystem requestsBart Van Assche1-4/+22
If a pass-through request is submitted then blk_get_request() initializes that request by calling scsi_initialize_rq(). Also call this function for filesystem requests. Introduce CMD_INITIALIZED to keep track of whether or not a request has already been initialized. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-29scsi: Rework handling of scsi_device.vpd_pg8[03]Bart Van Assche1-8/+8
Introduce struct scsi_vpd for the VPD page length, data and the RCU head that will be used to free the VPD data. Use kfree_rcu() instead of kfree() to free VPD data. Move the VPD buffer pointer check inside the RCU read lock in the sysfs code. Only annotate pointers that are shared across threads with __rcu. Use rcu_dereference() when dereferencing an RCU pointer. This patch suppresses about twenty sparse complaints about the vpd_pg8[03] pointers. This patch also fixes a race condition, namely that updating of the VPD pointers and length variables in struct scsi_device was not atomic with reference to the code reading these variables. See also "Does the update code tolerate concurrent accesses?" in Documentation/RCU/checklist.txt. Fixes: commit 09e2b0b14690 ("scsi: rescan VPD attributes") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Shane Seymour <shane.seymour@hpe.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Shane Seymour <shane.seymour@hpe.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-25scsi: Fix the kerneldoc for scsi_initialize_rq()Jonathan Corbet1-0/+1
The kerneldoc comment for scsi_initialize_rq() neglected to document the "rq" parameter, leading to this docs build warning: ./drivers/scsi/scsi_lib.c:1116: warning: No description found for parameter 'rq' Document the parameter and make the build slightly quieter. [mkp: used wording suggested by Bart] Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-25scsi: fix comment in scsi_device_set_state()Hannes Reinecke1-1/+1
The function returns '0' if successful; with the original comment the function doesn't have a way to indicate success ... Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bart van Assche <bvanassche@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-25scsi: Use blk_mq_rq_to_pdu() to convert a request to a SCSI command pointerBart Van Assche1-9/+9
Since commit e9c787e65c0c ("scsi: allocate scsi_cmnd structures as part of struct request") struct request and struct scsi_cmnd are adjacent. This means that there is now an alternative to reading req->special to convert a pointer to a prepared request into a SCSI command pointer, namely by using blk_mq_rq_to_pdu(). Make this change where appropriate. Although this patch does not change any functionality, it slightly improves performance and slightly improves readability. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-25scsi: Document which queue type a function is intended forBart Van Assche1-11/+12
Rename several functions to make it easy to see which queue type a function is intended for. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-24scsi: make 'state' device attribute pollableHannes Reinecke1-0/+3
While the 'state' attribute can (and will) change occasionally, calling 'poll()' or 'select()' on it fails as sysfs is never notified that the state has changed. With this patch calling 'poll()' or 'select()' will work properly. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-24scsi: scsi_lib: rework scsi_internal_device_unblock_nowait()Hannes Reinecke1-6/+11
Rework scsi_internal_device_unblock_nowait() into using a switch statement. No functional changes. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-07-06Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-115/+191
Pull SCSI updates from James Bottomley: "This is mostly updates of the usual suspects: lpfc, qla2xxx, bnx2fc, qedf, hpsa, hisi_sas, smartpqi, cxlflash, aacraid, csiostor along with a host of minor and miscellaneous changes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (276 commits) qla2xxx: Fix NVMe entry_type for iocb packet on BE system scsi: qla2xxx: avoid unused-function warning scsi: snic: fix a couple of spelling mistakes/typos scsi: qla2xxx: fix a bunch of typos and spelling mistakes scsi: lpfc: don't double count abort errors scsi: lpfc: spin_lock_irq() is not nestable scsi: hisi_sas: optimise DMA slot memory scsi: ibmvfc: constify dev_pm_ops structures. scsi: ibmvscsi: constify dev_pm_ops structures. scsi: cxlflash: Update debug prints in reset handlers scsi: cxlflash: Update send_tmf() parameters scsi: cxlflash: Avoid double free of character device scsi: Add STARGET_CREATED_REMOVE state to scsi_target_state scsi: ses: do not add a device to an enclosure if enclosure_add_links() fails. scsi: ufs: flush eh_work when eh_work scheduled. scsi: qla2xxx: Protect access to qpair members with qpair->qp_lock scsi: sun_esp: fix device reference leaks scsi: fnic: changing queue command to return result DID_IMM_RETRY when rport is init scsi: fnic: correct speed display and add support for 25,40 and 100G scsi: fnic: added timestamp reporting in fnic debug stats ...
2017-06-20block: Change argument type of scsi_req_init()Bart Van Assche1-1/+3
Since scsi_req_init() works on a struct scsi_request, change the argument type into struct scsi_request *. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-20block: Make most scsi_req_init() calls implicitBart Van Assche1-1/+14
Instead of explicitly calling scsi_req_init() after blk_get_request(), call that function from inside blk_get_request(). Add an .initialize_rq_fn() callback function to the block drivers that need it. Merge the IDE .init_rq_fn() function into .initialize_rq_fn() because it is too small to keep it as a separate function. Keep the scsi_req_init() call in ide_prep_sense() because it follows a blk_rq_init() call. References: commit 82ed4db499b8 ("block: split scsi_request out of struct request") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18blk-mq: use the introduced blk_mq_unquiesce_queue()Ming Lei1-2/+2
blk_mq_unquiesce_queue() is used for unquiescing the queue explicitly, so replace blk_mq_start_stopped_hw_queues() with it. For the scsi part, this patch takes Bart's suggestion to switch to block quiesce/unquiesce API completely. Cc: linux-nvme@lists.infradead.org Cc: linux-scsi@vger.kernel.org Cc: dm-devel@redhat.com Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-12scsi: Make scsi_mq_prep_fn() call scsi_init_command()Bart Van Assche1-16/+9
This patch reduces code duplication. There are two functional changes in this patch: - It causes scsi_mq_prep_fn() to clear driver-private command data, just like the already upstream commit 1bad6c4a57ef ("scsi: zero per-cmd private driver data for each MQ I/O"). - The initialization of .prot_sdb is moved from scsi_mq_prep_fn() into scsi_init_request(). [mkp: applied by hand] Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-12scsi: Introduce scsi_mq_sgl_size()Bart Van Assche1-9/+10
This patch does not change any functionality but makes the next patch easier to read. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-12scsi: Only add commands to the device command list if required by the LLDBart Van Assche1-20/+32
Just like for the scsi-mq code path, in the single queue SCSI code path only add commands to the per-device command list if required by the SCSI LLD. This patch will make it easier to merge the single-queue and multiqueue command initialization code. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-12scsi: Make __scsi_remove_device go straight from BLOCKED to DELBart Van Assche1-1/+1
If a device is blocked, make __scsi_remove_device() cause it to transition to the DEL state. This means that all the commands issued in .shutdown() will error in the mid-layer, thus making the removal proceed without being stopped. This patch is a slightly modified version of a patch from James Bottomley. This patch avoids that the following lockup occurs: Call Trace: schedule+0x35/0x80 schedule_timeout+0x237/0x2d0 io_schedule_timeout+0xa6/0x110 wait_for_completion_io+0xa3/0x110 blk_execute_rq+0xdf/0x120 scsi_execute+0xce/0x150 [scsi_mod] scsi_execute_req_flags+0x8f/0xf0 [scsi_mod] sd_sync_cache+0xa9/0x190 [sd_mod] sd_shutdown+0x6a/0x100 [sd_mod] sd_remove+0x64/0xc0 [sd_mod] __device_release_driver+0x8d/0x120 device_release_driver+0x1e/0x30 bus_remove_device+0xf9/0x170 device_del+0x127/0x240 __scsi_remove_device+0xc1/0xd0 [scsi_mod] scsi_forget_host+0x57/0x60 [scsi_mod] scsi_remove_host+0x72/0x110 [scsi_mod] srp_remove_work+0x8b/0x200 [ib_srp] Reported-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Israel Rukshin <israelr@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-12scsi: Introduce scsi_start_queue()Bart Van Assche1-10/+15
This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Israel Rukshin <israelr@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-12scsi: Protect SCSI device state changes with a mutexBart Van Assche1-6/+21
Serializing SCSI device state changes avoids that two state changes can occur concurrently, e.g. the state changes in scsi_target_block() and __scsi_remove_device(). This serialization is essential to make patch "Make __scsi_remove_device go straight from BLOCKED to DEL" work reliably. Enable this mechanism for all scsi_target_*block() callers but not for the scsi_internal_device_unblock() calls from the mpt3sas driver because that driver can call scsi_internal_device_unblock() from atomic context. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-12scsi: Create two versions of scsi_internal_device_unblock()Bart Van Assche1-14/+32
This will make it easier to serialize SCSI device state changes through a mutex. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-12scsi: Split scsi_internal_device_block()Bart Van Assche1-26/+47
Instead of passing a "wait" argument to scsi_internal_device_block(), split this function into a function that waits and a function that doesn't wait. This will make it easier to serialize SCSI device state changes through a mutex. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-12scsi: Avoid that scsi_exit_rq() triggers a use-after-freeBart Van Assche1-18/+29
Dereferencing shost from scsi_exit_rq() is not safe because the SCSI host may already have been freed when scsi_exit_rq() is called. Increasing the shost reference count in scsi_init_rq() and dropping that reference in scsi_exit_rq() is nontrivial since scsi_host_dev_release() may sleep and since scsi_exit_rq() may be called from interrupt context. Since scsi_exit_rq() only needs a single bit from shost, copy that bit into struct scsi_cmnd. Reported-by: Scott Bauer <scott.bauer@intel.com> Fixes: e9c787e65c0c ("scsi: allocate scsi_cmnd structures as part of struct request") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Scott Bauer <scott.bauer@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-09blk-mq: switch ->queue_rq return value to blk_status_tChristoph Hellwig1-15/+15
Use the same values for use for request completion errors as the return value from ->queue_rq. BLK_STS_RESOURCE is special cased to cause a requeue, and all the others are completed as-is. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09block: introduce new block status code typeChristoph Hellwig1-34/+17
Currently we use nornal Linux errno values in the block layer, and while we accept any error a few have overloaded magic meanings. This patch instead introduces a new blk_status_t value that holds block layer specific status codes and explicitly explains their meaning. Helpers to convert from and to the previous special meanings are provided for now, but I suspect we want to get rid of them in the long run - those drivers that have a errno input (e.g. networking) usually get errnos that don't know about the special block layer overloads, and similarly returning them to userspace will usually return somethings that strictly speaking isn't correct for file system operations, but that's left as an exercise for later. For now the set of errors is a very limited set that closely corresponds to the previous overloaded errno values, but there is some low hanging fruite to improve it. blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse typechecking, so that we can easily catch places passing the wrong values. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-01block: Introduce queue flag QUEUE_FLAG_SCSI_PASSTHROUGHBart Van Assche1-0/+2
From the context where a SCSI command is submitted it is not always possible to figure out whether or not the queue the command is submitted to has struct scsi_request as the first member of its private data. Hence introduce the flag QUEUE_FLAG_SCSI_PASSTHROUGH. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Don Brace <don.brace@microsemi.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-18scsi: zero per-cmd private driver data for each MQ I/OLong Li1-1/+1
In lower layer driver's (LLD) scsi_host_template, the driver may optionally ask SCSI to allocate its private driver memory for each command, by specifying cmd_size. This memory is allocated at the end of scsi_cmnd by SCSI. Later when SCSI queues a command, the LLD can use scsi_cmd_priv to get to its private data. Some LLD, e.g. hv_storvsc, doesn't clear its private data before use. In this case, the LLD may get to stale or uninitialized data in its private driver memory. This may result in unexpected driver and hardware behavior. Fix this problem by also zeroing the private driver memory before passing them to LLD. Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Reviewed-by: KY Srinivasan <kys@microsoft.com> Reviewed-by: Christoph Hellwig <hch@infradead.org> CC: <stable@vger.kernel.org> # 4.11+ Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-08scsi: scsi_lib: Add #include <scsi/scsi_transport.h>Bart Van Assche1-0/+1
This patch avoids that when building with W=1 the compiler complains that __scsi_init_queue() has not been declared. See also commit d48777a633d6 ("scsi: remove __scsi_alloc_queue"). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Privacy Policy