Computer Vision

How to scan payment cards using Vision framework in iOS

Learn how to process credit & debit cards using iPhones and iPad cameras

When shopping online, every second counts. It is important to complete the payment transaction when the user decides to make a purchase as frictionless as possible. Amazon’s one click payment system was a gamer changer in its inception.

One important step of a user completing their purchase is to add the payment details. In some cases the user might already have their card details saved with the merchant or the platform. But there are other cases where the user is a new customer or one who does not save their card details. For those cases it is important to reduce the friction the user might face during the input of their payment details. Otherwise the merchant risks loosing that sale.

To make the purchase process more convenient for the user the merchant can offer credit and debit card scanner.

In this post we will learn how to extract the payment card number using Apple’s Vision framework.

I assume you have intermediate iOS development with Swift experience. You should be familiar with Grand Central Dispatch too.

I have used Swift 5.2.4 and Xcode 11.5 whilst writing this post.

How to extract payment card number using Vision framework

In this section we’ll start off with an existing app. The app contains two screens. The first screen shows a label and button. The label will display the extraction results. The button will launch a another screen on tap. The new screen will display the back camera feed. In this screen we’ll process the payment card and extract its number.

For this post we’ll focus only on processing the live feed from the back camera. I have already setup the camera so we can start processing the live camera feed. Knowledge on how to access and manage the camera are not needed for this post.

The steps we’ll take:

  1. Download the starter project
  2. Detecting a payment card
  3. Tracking the payment card
  4. Extracting the payment card number

Note I assume at all times the app will run in an iOS device and not on a simulator. This project will not work on simulators as there are no cameras accessible to iOS simulators.

1. Download the starter project

Let’s start by retrieving the starter project. For such we’ll make use if the terminal. Open the terminal app and execute the following commands:

cd $HOME
curl https://github.com/anuragajwani/payment_card_scanner/archive/starter.zip -o payment_card_extractor.zip -L -s
unzip -q payment_card_extractor.zip
cd payment_card_scanner-starter/PaymentCardScanner

Let’s open the project. Execute the following command:

open -a Xcode PaymentCardScanner.xcodeproj

We’ll only be working on PaymentCardExtractionViewController.swift file in this tutorial. This is where we’ll be processing the images from the back camera and extracting the payment card number. Open the file within Xcode or execute the following command:

open -a Xcode PaymentCardScanner/PaymentCardExtractionViewController.swift

2. Detecting a payment card

In this section we’ll analyse the live camera feed image. In every frame we’ll first need check whether there is a rectangle of similar dimensions to a payment card. We’ll also need to assess whether it contains 16 digits within.

To check these attributes of a payment card we’ll be making use of rectangle and text detection from the Vision framework. To use the framework functionality add the following line right under import AVFoundation:

import Vision

Next we’ll create a function to detect payment cards. Add the following function:

private func detectPaymentCard(frame: CVImageBuffer) -> VNRectangleObservation? {
}

The function will take an image and return to us information on the location of the payment card if detected, otherwise it will return nil.

Let’s add the functionality to detect payment cards. First we’ll need to check for any rectangles of the same dimension as the payment cards. We’ll be making use of Vision framework rectangle detection; VNDetectRectanglesRequest. Add the following lines to the detectPaymentCard function:

let rectangleDetectionRequest = VNDetectRectanglesRequest()
let paymentCardAspectRatio: Float = 85.60/53.98
rectangleDetectionRequest.minimumAspectRatio = paymentCardAspectRatio * 0.95
rectangleDetectionRequest.maximumAspectRatio = paymentCardAspectRatio * 1.10

Above we create an instance of VNDetectRectanglesRequest and configure it to find rectangles of payment cards aspect ratio. Payment cards are 85.60 mm wide and 53.98 mm high. We use that aspect ratio with 5% error margin.

Next we’re just going to detect that the rectangle has some text. At this point we’re only using the text detection to filter any false positives from the rectangle detection. A credit card usually contains at least some text other than the 16 digits card number. To detect text we’ll be using VNDetectTextRectanglesRequest. Add the following line to detectPaymentCard function:

let textDetectionRequest = VNDetectTextRectanglesRequest()

Next let’s perform the detections on the image. For such we’ll need to make use of VNSequenceRequestHandler. The sequence request handler is best suited to perform analysis on sequences of images such video images or like in this case the camera feed. Add the following property to the PaymentCardExtractionViewController class:

private let requestHandler = VNSequenceRequestHandler()

We keep this as a instance property as detectPaymentCard only processes one image at a time. Thus keeping it as an instance property means that the detectPaymentCard reuses the same sequence handler on each request.

Next let’s perform the rectangle and text detection request. At the end of detectPaymentCard function add the following line:

try? self.requestHandler.perform([rectangleDetectionRequest, textDetectionRequest], on: frame)

Let’s process the results. Add the following lines at the end of detectPaymentCard:

guard let rectangle = (rectangleDetectionRequest.results as? [VNRectangleObservation])?.first,
let text = (textDetectionRequest.results as? [VNTextObservation])?.first,
rectangle.boundingBox.contains(text.boundingBox) else {
// no credit card rectangle detected
return nil
}

Above we’re optmistically taking the first rectangle observed and checking that it contains the first text detected. We’ll return nil if no rectangles with texts were detected.

Lastly let’s return the observed rectangle with text to the caller of the function. At the end of detectPaymentCard add the following line:

return rectangle

Before consuming our card payment detector function let’s create a function to draw a rectangle around the payment card. Add the following function to PaymentCardExtractionViewController:

private func createRectangleDrawing(_ rectangleObservation: VNRectangleObservation) -> CAShapeLayer {
let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -self.previewLayer.frame.height)
let scale = CGAffineTransform.identity.scaledBy(x: self.previewLayer.frame.width, y: self.previewLayer.frame.height)
let rectangleOnScreen = rectangleObservation.boundingBox.applying(scale).applying(transform)
let boundingBoxPath = CGPath(rect: rectangleOnScreen, transform: nil)
let shapeLayer = CAShapeLayer()
shapeLayer.path = boundingBoxPath
shapeLayer.fillColor = UIColor.clear.cgColor
shapeLayer.strokeColor = UIColor.green.cgColor
shapeLayer.lineWidth = 5
shapeLayer.borderWidth = 5
return shapeLayer
}

Above we’re simply transforming the coordinates from the image to screen coordinates. Then we’re creating a reactangle drawing for those coordinates and returning that to the caller.

Finally let’s tie this up all together. Let’s create a property to hold a reference to our drawing on PaymentCardExtractorViewController. Add the following line:

private var rectangleDrawing: CAShapeLayer?

Next add the following lines at the end of captureOutput function:

DispatchQueue.main.async {
self.rectangleDrawing?.removeFromSuperlayer()
if let paymentCardRectangle = self.detectPaymentCard(frame: frame) {
self.rectangleDrawing = self.createRectangleDrawing(paymentCardRectangle)
self.view.layer.addSublayer(self.rectangleDrawing!)
}
}

Run the app and see it action!

You’ll notice that the rectangle flashes on screen. This is because our card payment detector is able to detect a rectangle with text on some frames and not on the others. We know that once that rectangle has some text detected it on it then it probably has some text on the next frame. So how can we know that the rectangle is the same in a sequence of images?

3. Tracking the payment card

In this section we’ll be looking at how to track a payment card through frames. Luckily for us Vision framework offers object tracking. We can tell Vision to track a rectangle through the images and Vision will do its magic for us!

Let’s start by creating a function that will track a observed rectangle in an image. Add the following function to PaymentCardExtractionViewController:

private func trackPaymentCard(for observation: VNRectangleObservation, in frame: CVImageBuffer) -> VNRectangleObservation? {
}

To track the payment card smoothly through the images we need the original payment card rectangle observed and the new frame. If we’re able to track the rectangle in this new frame we’ll return the new position of such rectangle. Otherwise we’ll return nil.

Let’s create a tracking rectangle request. We’ll use VNTrackRectangleRequest for such task. Add the following lines of code to trackPaymentCard:

let request = VNTrackRectangleRequest(rectangleObservation: observation)
request.trackingLevel = .fast

Here we are simply providing a reference of the observed rectangle to create a tracking request. We have also configured this to be a fast tracking which can be error prone. Accurate tracking of the payment card can be processing intensive and slower.

Next let’s perform the tracking operation. Add the following line of code to the end of trackPaymentCard:

try? self.requestHandler.perform([request], on: frame)

The above is the same operation we executed in detectPaymentCard with one difference; the request is rectangle tracking request.

Next let’s process the request results. Add the following lines of code:

guard let trackedRectangle = (request.results as? [VNRectangleObservation])?.first else {
return nil
}
return trackedRectangle

Similarly to what we did on detectPaymentCard we process the results. If the tracking request results have no observed tracked rectangle we will return nil. Otherwise we return the observed tracked rectangle.

Next let’s get rectangle tracking working in our app. First we need to store a reference the first detected rectangle with text from our detectPaymentCard. Add the following property to PaymentCardExtractionViewController:

private var paymentCardRectangleObservation: VNRectangleObservation?

Next let’s create a function that will handle the scenario where we have an observed payment card. Add the following function:

private func handleObservedPaymentCard(_ observation: VNRectangleObservation, in frame: CVImageBuffer) {
if let trackedPaymentCardRectangle = self.trackPaymentCard(for: observation, in: frame) {
DispatchQueue.main.async {
self.rectangleDrawing = self.createRectangleDrawing(trackedPaymentCardRectangle)
self.view.layer.addSublayer(self.rectangleDrawing!)
}
} else {
self.paymentCardRectangleObservation = nil
}
}

Next change the captureOutput to:

func captureOutput(_ output: AVCaptureOutput,
didOutput sampleBuffer: CMSampleBuffer,
from connection: AVCaptureConnection) {
guard let frame = CMSampleBufferGetImageBuffer(sampleBuffer) else {
debugPrint("unable to get image from sample buffer")
return
}
DispatchQueue.main.async {
self.rectangleDrawing?.removeFromSuperlayer() // removes old rectangle drawings
}
if let paymentCardRectangleObservation = self.paymentCardRectangleObservation {
self.handleObservedPaymentCard(paymentCardRectangleObservation, in: frame)
} else if let paymentCardRectangleObservation = self.detectPaymentCard(frame: frame) {
self.paymentCardRectangleObservation = paymentCardRectangleObservation
}
}

The code above first tracks a detected payment card if there is one. If the the tracking is successful then we draw a rectangle around the card. If we aren’t able to track a detected payment card then we remove the reference to that payment card as its probably no longer within the camera view.

Finally we’re detecting for new cards if there are no existing payment card observed.

Run the app. The app is now be smoother at tracking the payment card!

4. Extracting the payment card number

In this section we’ll extract the payment card number. For such a task we’ll be making use of VNRecognizeTextRequest. VNRecognizeTextRequest is another tool provided in the Vision framework to extract texts.

First let’s declare a function to carry out the digit extraction task. This function will take the camera feed image, the tracked rectangle and finally a callback closure to be called with the payment card number extracted if any.

private func extractPaymentCardNumber(frame: CVImageBuffer, rectangle: VNRectangleObservation) -> String? {
}

Let’s crop the rectangle from the camera feed image. Add the following lines of code:

let cardPositionInImage = VNImageRectForNormalizedRect(rectangle.boundingBox, CVPixelBufferGetWidth(frame), CVPixelBufferGetHeight(frame))
let ciImage = CIImage(cvImageBuffer: frame)
let croppedImage = ciImage.cropped(to: cardPositionInImage)

First we find the card posiition in the image and then we crop that from the image.

Next lets create an instance of VNRecognizeTextRequest. Add the following lines of codes to extractPaymentCardNumber:

let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate
request.usesLanguageCorrection = false

Above we have configured the request to be accurate and deactivated language correction as numbers doesn’t require this feature.

Text extraction request can only run on single image request handler. Let’s create the request handler instance and perform the text extraction operation. Add the following lines of code to extractPaymentCardNumber:

let stillImageRequestHandler = VNImageRequestHandler(ciImage: croppedImage, options: [:])
try? stillImageRequestHandler.perform([request])

Next let’s process the results. Add the following lines of code:

guard let texts = request.results as? [VNRecognizedTextObservation], texts.count > 0 else {
// no text detected
return nil
}

Next let’s extract the 16 digits payment card number. This might be one single string or 4 strings of 4 digits each. Alternatively the text results might not contain either of those. Add the following lines of codes to extractPaymentCardNumber:

let digitsRecognized = texts
.flatMap({ $0.topCandidates(10).map({ $0.string }) })
.map({ $0.trimmingCharacters(in: .whitespaces) })
.filter({ CharacterSet.decimalDigits.isSuperset(of: CharacterSet(charactersIn: $0)) })
let _16digits = digitsRecognized.first(where: { $0.count == 16 })
let has16Digits = _16digits != nil
let _4digits = digitsRecognized.filter({ $0.count == 4 })
let has4sections4digits = _4digits.count == 4

If the we have a 16 digit payment card number the next step is to verify that the extracted numbers are correct as the text extraction can get the numbers wrong. Fortunately payment cards numbers last digit is a checksum digit. This means that last digit is calculated be performing a mathematical formula. Add the following function to PaymentCardExtractorViewController:

private func checkDigits(_ digits: String) -> Bool {
guard digits.count == 16, CharacterSet.decimalDigits.isSuperset(of: CharacterSet(charactersIn: digits)) else {
return false
}
var digits = digits
let checksum = digits.removeLast()
let sum = digits.reversed()
.enumerated()
.map({ (index, element) -> Int in
if (index % 2) == 0 {
let doubled = Int(String(element))!*2
return doubled > 9
? Int(String(String(doubled).first!))! + Int(String(String(doubled).last!))!
: doubled
} else {
return Int(String(element))!
}
})
.reduce(0, { (res, next) in res + next })
let checkDigitCalc = (sum * 9) % 10
return Int(String(checksum))! == checkDigitCalc
}

The function above performs Luhn’s formula. Explaining this formula is out of scope from this post. Simply note the formula will compute the checksum of the first 15 digits and compare the results with the extracted last digit; the check digit. If the check digit matches we have a valid payment card number so we return true otherwise false.

Next let’s finish our extractPaymentCardNumber function. We’ll now be calling be calling the checkDigits function with the 16 digits extracted string. At the end of extractPaymentCardNumber add the following line of code:

let digits = _16digits ?? _4digits.joined()
let digitsIsValid = (has16Digits || has4sections4digits) && self.checkDigits(digits)
return digitsIsValid ? digits : nil

Here were simply checking first there are 16 digits. If there are 16 digits we then check whether the digits are valid. If those conditions are met we’ll return the digits extracted otherwise nil.

Finally we’re ready to consume our extractPaymentCardNumber. Let’s go back to handleObservedPaymentCard and call our extract function when card is tracked. Change the handleObservedPaymentCard function to the following:

private func handleObservedPaymentCard(_ observation: VNRectangleObservation, in frame: CVImageBuffer) {
if let trackedPaymentCardRectangle = self.trackPaymentCard(for: observation, in: frame) {
DispatchQueue.main.async {
self.rectangleDrawing?.removeFromSuperlayer()
self.rectangleDrawing = self.createRectangleDrawing(trackedPaymentCardRectangle)
self.view.layer.addSublayer(self.rectangleDrawing!)
}
DispatchQueue.global(qos: .userInitiated).async {
if let extractedNumber = self.extractPaymentCardNumber(frame: frame, rectangle: observation) {
DispatchQueue.main.async {
self.resultsHandler(extractedNumber)
}
}
}
} else {
self.paymentCardRectangleObservation = nil
}
}

Above we simply added to the handleObservedPaymentCard:

DispatchQueue.global(qos: .userInitiated).async {
if let extractedNumber = self.extractPaymentCardNumber(frame: frame, rectangle: observation) {
DispatchQueue.main.async {
self.resultsHandler(extractedNumber)
}
}
}

We’re basically calling our extractPaymentCardNumber and handling the results. Note we do this on another thread as payment card number extraction can slow and computationally intensive. We’re then calling the resultsHandler of PaymentCardExtractionViewController on the main thread with the extracted payment card number if any. It’s always good practice to communicate on the main thread as the presenter may perform UI operations (which is the case here) on callback.

And that’s it! Run the app and see it in action! 🎉

Summary

In this post we have learnt:

  • how to detect rectangles of specific dimensions
  • how to detect text
  • how to track rectangles
  • how to extract text
  • how to check payment card numbers are valid

Final notes

You can find the completed project in the like below:

The Vision framework is a great tool at getting started on solving computer vision problems on iOS. However the tool seems very slow and inaccurate. This solution struggles a lot in extracting the payment card number.

Something that could increase performance is by reducing the area to scan in the image and guiding the user to place the card in that area.

Alternative solutions such as Card.io make use of OpenCV instead of Vision framework. Card.io is not only more effective but also much faster solution.

Stay tuned for more posts on iOS development! Follow me on Twitter or Medium!

Senior iOS Engineer @ Onfido. Writing weekly blogs on iOS and programming. Follow me to stay tuned!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store