While there is no murder scene left from the death of a small PIC microcontroller, detective work identified how the microcontroller died, who and what was responsible and eventually the case was closed with a fix to prevent another occurrence. The subject of the investigation died while working in an alarm circuit.
The Alarm
The alarm was built using a PIC12F508 microcontroller to sense the status of a normally closed (NC) loop, i.e. a loop that is shorted to ground when all sensors are secure. The PIC12F508 is a small 8 pin microcontroller, made by Microchip, that works well for this application. The pin out is shown below.
When a door is opened, a sensor in the loop will open causing a break in the NC loop. When the microcontroller detects this, it goes into alarm mode. First it is determined if the alarm is armed, if so relays are turned on to enable a siren and light. The relay is part of an external relay board that is driven by the microcontroller board. The microcontroller also provides a delay for exit and timing to turn off the alarm after a set period.
The relay board with two relays for a light and siren. The microcontroller board connects to the header.
A 12V wall wart supplies power to the siren and light and is regulated down to 5Vdc for powering the microcontroller and relay boards.
Both boards and the power supply are mounted in a metal enclosure with metal screws and nuts and nylon standoffs supporting the boards.
The Failure
The alarm had been operating well for a few years. The alarm was tested a few times a year and the status of the alarm was monitored by the red power LED and green input LEDs on the relay board. At one point it was noticed that the power LED was dim, not its normal brilliance. By applying pressure to the microcontroller and relay boards the power LED was returned back to normal operation. So I thought that the problem was a bad connection between the relay board header and the microcontroller board header socket. But as time went on the power LED again would dim with increasing frequency. Eventually the alarm function was found not to work properly as well.
At this point the microcontroller and relay board were removed from the metal enclosure and taken to the bench for analysis. A new microcontroller was programed and installed. The male header on the relay board and the header socket on the microcontroller board were inspected for bad solder joints and other anomalies. None were found. After inspection and install of the new microcontroller, the circuit returned to normal operation. In addition, the power LED constantly operated at normal brightness no matter how much the connection between the two boards was stressed.
The fact that I could no longer make the power LED dim did not make sense to me until I closely inspected the top side of the relay board. I discovered that the plastic body of the green LED closest to the mounting hole was damaged. Apparently the metal nut of the mounting screw damaged the side of the LED closest to the mounting hole and would make electrical contact with the cathode of the LED.
The yellow arrow shows the area of damage to the green LED, the board was remounted with a nylon screw and nut to prevent recurrence of the failure.
The mounting screw is connected to the metal enclosure which is at AC power ground. Later measurements showed about 50 Vac and varying DC voltage on the screw. The point of contact was to the LED cathode which is connected to 5Vdc power through the relay coil. This interacted with the power supply causing the power LED to dim and likely damaged the microcontroller. The boards were remounted in the metal enclosure with a nylon screw and nut substituted for the offending metal screw and nut. The alarm has worked well since.
The schematic of one channel of the relay board with the mounting screw contact point indicated. Vdd and Vcc both refer to the positive side of the power supply.
What Happened to the Damaged Chip?
There are a number of ways that an integrated circuit (IC) may be damaged. In this case I thought the damage may be due to electrical overstress or latch up.
Electrical overstress is basically applying too much voltage or current to the device, this may be through its power connections, an input or an output. This usually causes damage in one area of the device where the overstress occurred. Infrequently, secondary damage may occur in other areas as well.
Latch up is when the a CMOS IC is turned into a SCR (silicon controlled rectifier), causing very low resistance and excessive current flow in the power circuitry. Latch up is often induced by forcing an input or output a diode drop above Vcc or a diode drop below Vss. Those conditions will forward bias the diodes that are part of the protection network and also a parasitic byproduct of the IC structure. Under normal operation, these diodes are reversed bias and do not affect the operation of the device. Latchup will cause wide areas of damage on the device and may also damage the power supply connections inside the chip.
When performing failure analysis on an IC the first step after verification of the electrical failure and visual inspection is to simply test the IV (current and voltage) characteristics of each pin of the device with respect to Vdd and Vss. This simple test often will pinpoint which pins are damaged and give you some insight whether the device latched up or was overstressed. I used the Atlas DCA Pro 75 semiconductor analyzer to measure the diode characteristics of each pin to Vss and Vdd and also between Vss and Vdd. In the forward direction you expect a forward biased diode and the reverse direction may give some leakage current.
Testing showed that all pins looked normal except for pin 6, the pin that was connected to the NC loop. This pin showed 35 ohms of resistance to Vss where a diode would normally be seen. Below are the measurements on the failed device and a good device.
The measurements point to damage at pin 6 only. Likely the device did not go into latch up since the power supply pins measured normal. Let's take a look at the equivalent circuit of pin 6, the attached NC loop circuit is also included.
On this device pin 6, shown on the right side of the circuit, may be configured as an input or an output. The diodes are protection diodes for the circuitry used when the pin is configured as an input. The n channel and p channel MOSFETs are large transistors used as output drivers when the pin is configured as an output. These output drivers also look like diodes when measuring the IV characteristics. Note that there is additional circuitry associated with the pin that is not shown here.
A good guess would be that either or both the diode connected to Vss_IC or the n channel MOSFET is damaged. How do we find out where the damage is specifically? We will open up the chip and expose the die!
Exposing the Die
This device is a plastic dual inline package (DIP) as shown in a prior figure. The plastic is actually a specially formulated epoxy with proper flow characteristics, thermal expansion parameters and flame retardants to enable high volume manufacturing, protection of the die and conformance to safety regulations. There are a number of ways to decapsulate (i.e. decap) a device to expose the die. There are mechanical methods that involve heating the device while in compressive stress causing the package to break away from the die. The advantage of this method is almost anyone can do it with some practice but the likelihood of damaging the die is high and the connections to the lead frame are not maintained.
For failure analysis of a single failing die, decapsulation is performed by a trained technician using a combination of milling away the epoxy just above the die and then removing the remaining epoxy with aggressive heated acids. All this is performed in a properly vented fume hood. The connections to the lead frame are maintained by careful milling and acid application. Unlike some amateur chip detectives like Ken Shirriff, I don't have the equipment and materials for this so I sent the device to a professional lab for decapsulation. I also asked the lab to provide a few photos of the exposed die showing the damaged area(s) since I also don't have a reflected light microscope.
Image of the 12F508 die, the numbers denote the pin numbers.
Above is the image of the entire 12F508 die. The bright squares are aluminum metal bond pads that are connected to each pin. The bond pads are bright because the top protective layer of oxide is removed from the bond pads so electrical connection to them may be made. The dark circles on each bond pad is a gold ball that has formed a bond to the pad, gold wire emanates from the ball and is connected to the corresponding pin on the device. This is how the die is connected to the outside world.
The wide yellowish traces are aluminum metal that supply power to areas of the die. I have labeled the two major metal traces: Vcc and Vss. Note that the die has two distinct and separate metal layers with Vcc the top metal layer and Vss the bottom metal layer. The other areas of the die consist mostly of transistors and their interconnections of much narrower metal traces. At the magnifications used for this investigation, individual transistors are not readily resolved.
Notice near pin 6 bond pad that there is a black blob. This blob is residual epoxy that was not removed during decap. This is typically due to the area dissipating excessive heat for a period of time, this changes the epoxy so it does not react to the decap acid like the rest of the epoxy encapsulant. This is the area of damage that caused the device to fail. The next figure is a higher magnification view of this area.
The red arrow points to the area that experienced overheating. In addition to the residual epoxy there is a narrow bright area at the end of the arrow. This is an indication of excessive heat as well. Likely this is a disruption of the top layer of oxide, there is also melted metalization in this area as well.
This metal trace is Vss so that also correlates with the electrical characterization of the device, specifically pin 6.
Because of the low magnification and presence of the top layer of the oxide we can not exactly determine the structures on either side of the bond pad. But we can look at the circuitry at pin 5 which is the same as pin 6 since they are both I/O (Input/Output) pins. The structures that are outlined by the aqua ovals are the output driver MOSFETs with the n channel MOSFET under the Vss metal and the p channel MOSFET under the Vcc metal. By zooming in on the photo, one can see that the structure looks like a typical output driver. In addition, pins 2, 3, 5, 6, and 7 are all I/O pins and have the same structure. Pin 4 is an input only and does not have the structures corresponding to output drivers.
So the damage definitely points to a damaged output driver n channel MOSFET transistor. The figure below shows a simplified cross section of a n channel MOSFET. The drain junction is the junction that was damaged, likely in multiple places. The 35 ohm resistance that was measured during electrical characterization is predominately the resistivity of the p substrate between the drain junction and the p+ substrate contact. Likely the junction was damaged with excessive voltage that caused a defect somewhere along the junction. Then much of the current flowing through the junction will crowd through this defect, causing that area to heat and causing more damage.
Simplified Cross Section of the N Channel MOSFET, Junction Damage at Red
More About Failure
We have concluded that the failure is a damaged output n channel MOSFET at pin 6. Physical damage and electrical characteristics correlate and point to that conclusion. It would be fun to further deprocess the die to examine the exact location of the failure site(s) and the extent of damage. But I don't have the facilities to perform that. But we can still make some generalizations about the failure by examining the die photos in detail.
The residual epoxy and damaged oxide and metallization at the failure site indicates that there was significant heat generated in the area for a moderate duration of time.
The size of the damage area would point to multiple damage sites in the output n channel MOSFET
The damage was likely caused with moderate current flowing through the junction. High current would have caused damage to the Vss metal outside of the pin 6 area.
Other areas of the die look normal. Widespread areas of damage would indicate that the die went into latch up. Latch up could also damage the Vcc and Vss metalization. The failure was caused instead by electrical overstress.
The failure likely occurred when the NC loop was open and the power to the boards was being compromised by the shorting of the LED cathode to the metal mounting screw and nut. Repeated instances of manipulating the boards to return the power LED to normal operation caused further damage.
Since remounting the boards using a nylon screw and nut, the alarm and power LED has been operating in a normal manner.
There was nothing found that suspects that the reliability of the microcontroller was poor or led to the failure, the failure was induced by external overstress that exceed the operating conditions specified for this device.
Comments